I require a program with a GUI that will perform probabilistic record linkage upon ANY two databases the user loads into the program. The resulting records must then be placed into a new database. The program should do the following:
Have a simple GUI
Allow user to load ANY two databases in
Program must remove duplicates in each database respectively, before comparing the records in each one to the records in the other to find further duplicates, and then removing the duplicate.
Program must also consider if each database has different naming standards, example 'Salary' vs 'Wage'.
Program must consider that attributes will not necessarily be in same order.
Program must consider that different formatting may be used for data, example, one data base could store address details under one attribute 'Address' while another may use 'Street', 'City', 'State' etc. thus the program must consider that one attribute in database A could be a duplicate of multiple attributes in Database B.
If Database A has one unique attribute that Database B does not, that attribute must be added to the resulting database, even though some records in database B will contain null.
Probabilistic techniques must be used to achieve this, and further requirements will likely be identified during development but I will be in constant communication so do not worry. Thank you.
People with knowledge of record linkage preferred.
This program will be used by staff and must be simple and easy to understand. It must also be easily maintained by our software engineers so good commenting is a must as modifications may be made in future if further needs are identified. This program will be used at our firm so accuracy and reliability is a must. thank you for understanding