Project 2 – Map and Set
Word Counting Project (JAVA)
________________________________________
In this project, students implement a Map class AVLMap and a Set class AVLSet by using AVL tree class implemented by author Mark Allen Weiss. Your AVLMap and AVLSet do not have to have all methods as in TreeMap and TreeSet class in Java API. Your only need the methods necessary for building a software tool described next.
Use your AVLMap and AVLSet classes to build a software tool Concordance_1, which is the first version of this tool. Concordance_1 reads a text file( any text file beyond 100kb and make sure to parse it to take away all spaces and anything that is not letter) and extracts all of the identifiers in the file, along with the line numbers on which the identifier appear. An identifier is defined as a string that begins with a letter (A-Z, a-z) and is followed by zero or more other characters that are a letter or a digit (0-9). Note that, if an identifier appears multiple times in a line, only one line number is recorded.
The input of the concordance is a text file, for example a source code or a text document.
The concordance outputs the information for each identifier in the following format:
identifier _1 n_1: L_1, L_2, L_3, …
identifier _2 n_2: L_1, L_2, L_3, …
…
Here n_1 is the number of lines containing the identifier _1 and L_i (i >= 1) is the list of line numbers on which it appears.
In the last part of this project, students re-build two different versions of concordance. Concordance_2 uses TreeMap and TreeSet in Java API. And Concordance_3 uses HasMap and HasSet in Java API. Run three versions of concordance on the same input text file of size larger than 100 KB, compare the running time.
[login to view URL]~weiss/dsaajava3/code/ for source code (it is a long list please look down the list and find those that apply) and you can also use any source code from the internet. You do not have to write the report but make sure there is code to calculate the running time
note: PLEASE COMMENT ON EVERY METHOD AND IMPORTANT LINE OF CODE WITTEN