Find Jobs
Hire Freelancers

EzineArticles Scraper

$30-250 USD

En curso
Publicado hace más de 11 años

$30-250 USD

Pagado a la entrega
I am looking for a PHP script that scrapes all of EzineArticles and saves each article as a MySQL entry that includes the URL, Title, Category, and Article Text. Your script should find, scrape, and store every single article on EzineArticles (there will be millions of them). So with that in mind there should be some sort of threading to help speed things up. I have thousands of private proxies, so there should also be the ability for me to provide a text file with proxies. Some of the proxies will have usernames and passwords and some won't (so you will need to account for both). So I would recommend having several hundred threads with some sort of proxy switcher in place. A good way to do this (without getting IPs banned) is to have a universal list that keeps track of what proxies are being used by a thread and which ones aren't. Then every couple articles you pull a new IP that currently isn't being used. If a page fails to load properly (either because EzineArticles rate limited you or because the proxy itself was having issues) you should have it try again using a different proxy. If a page fails 10 consecutive times, have it save in the database that it failed (make everything blank but the URL) and then continue. Lastly it needs to save its progress, so if the script is closed for some reason it can continue where it is left off. This can be controlled by data in a MySQL database as well. MySQL structure: | URL | Title | Category | Article | Proxy document structure: IP:Port:Username:Password IP:Port So : separates IP and Port (and Username and Password if it exists). Proxies are separated by newline. When testing you do not need to test it all the way until completion (since you won't have the proxies besides a couple ones to do testing) but when it is done I will need to run it myself and make sure it is working before paying (and make sure that it will find every article). The actual scraping/parsing should be relatively easy as the articles are always in a very well defined tag. A good way to find every article is just go through each category page and go through every single listing. I will accept applications for people who don't use PHP. Let me know your language and I will decide. However PHP is preferred. When contacting me please let me know how you will be scraping the site (what framework).
ID del proyecto: 4014744

Información sobre el proyecto

6 propuestas
Proyecto remoto
Activo hace 11 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
Adjudicado a:
Avatar del usuario
Dear Sir, I've completed ezine scrapper before. please check your message
$100 USD en 3 días
5,0 (8 comentarios)
4,5
4,5
6 freelancers están ofertando un promedio de $177 USD por este trabajo
Avatar del usuario
I can help in your project, please check PMB and our ratings/reviews to get idea of our experience. Please let me know if you have any queries.
$250 USD en 7 días
4,8 (220 comentarios)
7,7
7,7
Avatar del usuario
Hi sir, please check PM, thx Kimi.
$60 USD en 2 días
4,9 (118 comentarios)
6,4
6,4
Avatar del usuario
Seasoned scraper writer with hundreds of scraper scraping millions of pages each day. Please check my reviews to know more about my work : http://www.freelancer.com/u/sayno2bugs.html More in pm. Cheers, SayNo2Bugs
$500 USD en 10 días
5,0 (18 comentarios)
6,0
6,0
Avatar del usuario
Please check in PM.
$55 USD en 2 días
4,5 (9 comentarios)
4,8
4,8
Avatar del usuario
hello, i am on inbox. we must discuss.
$97 USD en 1 día
5,0 (6 comentarios)
2,8
2,8

Sobre este cliente

Bandera de UNITED STATES
Baltimore, United States
4,6
8
Forma de pago verificada
Miembro desde mar 16, 2011

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.