Summary of Project: Would prefer Java or any other open source language only.
Project requires to convert data on webpage to csv format. You can use any source language as long as source is well documented. I would prefer Java. You can also iMacro as its works with firefox and is free add on. This might reduce cost based on your knowledge with iMacro.
This project also requires you to use Zillow api to get value of property using zillow api.
If you know scrapping, this project should be very small.
Web scrapping application to be built will be called "Program" here after
Steps of data extraction.
1. User will click on search button after putting some criteria manually.
2. Result set will be displayed on webpage. Results will be more than 1 page. Complete Source of sample page attached ( [login to view URL])
3. User will initiate the web-scrapping program built in this project.
Web scrapping program will:
A. Program will Take each row of data and copy it in csv
B. Program will Click 3rd hyperlink that will open pop up html page ( attached complete format. [login to view URL] )
C. Program will parse the HTML page and copy as data element in the page in same row as written in step A. Look at the [login to view URL] for description . Basically its html table where you get values and copy in CSV.
4. Take Address fields from the csv current row( Column name of each address field will be provided by user in properties files or any other way convenient for the application) and calculate zillow value using webservice. Details at
[login to view URL] Add Zestimate following data returned by result set:
* Zestimate (in $)
* Last updated date
* 30-day change (in $)
* Valuation range (high) (in $)
* Valuation range (low) (in $)
* Percentile Value
* Zillow Home Value Index
* Zillow Home Value Index 1-Yr change
You should restrict the number of request you can make in one shot and interval between each request by number stored in properties file provide by the user or any other convenient way. E.g. 1000 request per session and 2 second interval per request.
D. Got to Next row and repeat again from Step A. Loop till end and then click next page till no more page and rows are there.
Check this for the live webpage that will be required to be scrapped.
You can checkout the following links
[login to view URL] This is List Page to be scraped.
Click on Adv Number 000036, html pop up will open, this is detail page for each item on the list page.
Information on detail page will be added to csv row along with info information for that row on list page.
The winner will be given live demo before starting the project to make sure requirement are understood and all questions are answered before the project is started.
Hello. I think this can be easily done in Python. I have been coding software for web automation for some time and working with various output file types. I am confident that you'll be quite satisfied with my work.
Cheers,
Arthur.
Hi
Thank you for inviting me. I lready worked with foreclosure sites like zillow,fidelityasap,foreclosureradar etc.. Please read my private message.
Thanks
Sean