Find Jobs
Hire Freelancers

Crawling

$100-300 USD

En curso
Publicado hace alrededor de 20 años

$100-300 USD

Pagado a la entrega
Web Crawling utility that crawls a specific site, parses the data according to a template, and then inserts the extracted data into a database. Inputs (with examples given): URL Root: [login to view URL] Starting Integer: **10000** Ending Integer: **10100 **Wait Interval: **2** Template name: **[login to view URL]** Destination database: **testdb** Destination stored procedure: **testrun** The user is responsible for setting up a stored procedure that receives the data In the above example, the program retrieves 101 web pages as specified in the range starting with [login to view URL] It reads the [login to view URL] file which contains pattern information defining (1) fieldname (2) type of data (integer, date, or character) (3) max field lenght (4) are null values allowed? (5) starting pattern to match (6) ending pattern The template might look something like this, although you can define your own "template language" PROFILETOID type:integer nulls:no start:[login to view URL] end:"> ALIASTO type:**char** nulls:**no** start: end:**</a>** PROFILEFROMID type: **integer** nulls:**no** start: **[login to view URL]** end:**">** and so on for the fields **ALIASFROM,DATEPOSTED,MSGNUMBER,MAXMESSAGE,MSGBODY ** The program starts at the beginning of the file and searches for the string **[login to view URL]**, and then takes what immediately follows but before **">** and extracts it as the **PROFILETOID** field. For the **ALIAS** field, since there is no start information, the program knows to begin reading that field immediately and end with </a>. and so on to the end of the template file. ## Deliverables (More info that wouldn't fit above) As the page is being read, the results are validated to make sure that every field is filled with the correct type of data. If nothing is found between the start and end patterns, the field is valid only if NULLs are allowed. If one of the fields is found not valid, an error dialog box pops up and says which url failed and why. Example error messages: "Failed to parse #10002 because [ALIAS] exceeded the maximum length of 20" "Failed to parse #10002 because [ALIAS] start pattern not found." But even if there is an error, the program continues once the dialog box is Okayed. There should be a checkbox on the dialog that offers the user to "Ignore errors" so that the program may continue without interruption. After each page is read and validated, the program calls a stored procedure with all of the template fields, along with RAWDATA which is a text field including the html of the page retrieved and an ERROR variable which indicates whether the page contained a validation error or not. The wait interval of "2" is the number of seconds to wait between page requests (so as not to overload the target site). If set to "0", there is no wait between requests. Deliverables: 1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. Must run on Visual Studio .NET in VB, C#, or Java **Please configure your code to work with SQL Server using (localhost) as the server, "sa" as the userid, and "crawler" as the password.** 2) Sample database named testdb, containing the testrun storedprocedure and 1 table with the following 10 fields: **ERROR, RAWDATA,PROFILETO, ALIASTO, PROFILEFROM, ALIASFROM, DATEPOSTED MSGNUMBER, MAXMESSAGE, MSGBODY ** When I run the program without alteration, the above table should be filled in. 3) Installation instructions ## Platform Windows XP, SQL Server 2000, [login to view URL]
ID del proyecto: 3101217

Información sobre el proyecto

2 propuestas
Proyecto remoto
Activo hace 20 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
Adjudicado a:
Avatar del usuario
See private message.
$170 USD en 14 días
5,0 (3 comentarios)
3,5
3,5
2 freelancers están ofertando un promedio de $213 USD por este trabajo
Avatar del usuario
See private message.
$255 USD en 14 días
5,0 (5 comentarios)
5,9
5,9

Sobre este cliente

Bandera de UNITED STATES
United States
5,0
45
Miembro desde nov 4, 2003

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.