Distributed web page scraper (preferably on EC2)

Cerrado Publicado Aug 26, 2010 Pagado a la entrega
Cerrado Pagado a la entrega

As input to your script, I have a list of about 1M URLs. I want these URLs scraped, and inserted into a database. You do NOT need to recursively crawl the URLs. You just need to retrieve them.

I want a distributed scraper. In particular, I want to give a parameter N, and have the script automatically provision N scrapers, maybe N different Amazon EC2 instances, or some other cloud service. The N instances should avoid doing the same work.

I don't care you write a wrapper script around Scrapy, or another existing web scraper implementation. You can do this if you already know Scrapy or Bixo and want to use it.

The script should really require very little configuration. It should be convenient and one-click if possible. That way, the next time I have a batch of 1M URLs, I can easily run your script.

Amazon Web Services Ingeniería Java Linux Gestión de proyectos Python Instalación de scripts Shell Script Arquitectura de software Verificación de software

Nº del proyecto: #3680209

Sobre el proyecto

13 propuestas Proyecto remoto Activo Dec 16, 2010

13 freelancers están ofertando un promedio de $217 por este trabajo

ddemidenko

See private message.

$255 USD en 14 días
(72 comentarios)
6.1
johnweavervw

See private message.

$170 USD en 14 días
(55 comentarios)
5.3
mlys

See private message.

$254.15 USD en 14 días
(31 comentarios)
5.4
happytron

See private message.

$212.5 USD en 14 días
(9 comentarios)
4.8
happydotnet

See private message.

$235.45 USD en 14 días
(17 comentarios)
4.3
app2technologies

See private message.

$255 USD en 14 días
(16 comentarios)
3.9
readyfacts

See private message.

$212.5 USD en 14 días
(32 comentarios)
4.2
kwovw

See private message.

$254.15 USD en 14 días
(2 comentarios)
3.9
quintonwebz

See private message.

$204 USD en 14 días
(6 comentarios)
3.6
napoleonmr

See private message.

$255 USD en 14 días
(2 comentarios)
2.8
richmondcd

See private message.

$127.5 USD en 14 días
(2 comentarios)
0.7
woolee

See private message.

$170 USD en 14 días
(0 comentarios)
0.0
bryano

See private message.

$212.5 USD en 14 días
(0 comentarios)
0.0