Create re-usable spider to scrape information from website

En curso Publicado Aug 26, 2014 Pagado a la entrega
En curso Pagado a la entrega

We need a re-usable script to iterate through many web pages to pull a table of information from each page.

The script will need to iterate through a list of 600,000 URLS, not every URL will return a table of data, so we need to record just those that return valid data.

It is very important not to crash the website that is being scraped, so a delay of 2-3 seconds between each request to the server must occur.

The results of the scraping should be stored in a csv file.

Python

Nº del proyecto: #6372625

Sobre el proyecto

Proyecto remoto Activo Aug 26, 2014