I need a program written in perl with both the .pl and the exe file. There needs to be a simple Gui interface
There are two input files. One is a text file which is pretty messy. The second is an Excel file with about 75 lists of words. These can be in one workbook with separate sheets for each or just a separate csv file for each list. It is up to you.
The output of the program should be the text file with about 75+ new fields for each row of text. What should be in the new fields is a count of how many words from each of the word lists in the Excel file, each record contains. There are a few extra fields and calculations that need to be done.
I have perl code that does this for two lists of words which I will attach. The two lists it works with are [login to view URL] and Negative.csv. and two other lists as well. And it also does some calculations on those results to produce a few more fields.
There are some dates in the text as well. There should be a date field as well and this should be calculated in the following way. If there is a recognizable date in that record then that date should be in the date field. If there is not a recognizable date then the date field should be the same as the previous record.
The output must be readable by a program called Weka. The attached code makes the results input-able into Weka so you can follow that code as an example. However the input file for this code was somewhat different than the input for this project.
Actually the text is not really structured so that each row is a new record. It is usually between 5 and 50 rows of text that represent a single record. But I cannot figure out anyway to determine algorithmically where one ends and the next starts and so I will treat each row as a new record. If you can figure out how to do that I will pay an extra $50 but I don't think it is possible.
I will attach a sample text file and the old code.
The search algorithm should not be too slow since there may be a lot of words and records to do. Also it should not crash for big files.