Cerrado

News Aggregator -- 2

The aim of this project is to create a news aggregator in python.

Specifications:

1. The engine should be build on top of Scrapy and needs to be well structured, scalable and well optimized

2. The engine should crawl websites that will be provided from a JSON file or database (should be flexible because we haven’t decided yet)

3. Spiders should be build in that way that can easily scale up, like a baseline spider that can be inherited

4. When the spider hits the website for the first time, the spider should try and find if the website has RSS feeds or sitemap, in order to track and scrape the latest news/content from. If there is no RSS feed or sitemap should hit the fallback spider, which will find all the latest news

5. Machine Learning model/AI techniques in order to identify the page structure and extract the content automatically

6. The content should be cleaned with best techniques possible and retrieve back all the important information from an article (title, image/images, videos, date, author/authors, content etc.)

7. The output should be flexible, having options to save the articles in files (locally or in a remote storage like Amazon or Microsoft azure blob storage or database).

8. Every spider should be monitored and by gathering information from them (like statistics on how many articles have they scraped from the website, in case a spider fails to crawl a specific website it should raise a flag and report the reason why it has failed etc.).

9. Please do not hesitate to suggest ideas or better ways of doing it, especially using AI and Machine learning.

Requirements:

- The programming language that should be used in this project is Python 3

- The project should be well-structured, clean code and should be build with the scaling strategy in mind

- Modular/component based

- Speed and accuracy is of utmost importance

- The service should be optimized to be able to run in low performance servers and achieve good results

- External dependencies should be kept to a minimum. If you can't avoid external dependencies, please list them in a text file named [login to view URL]

Habilidades: Python, Arquitectura de software, Extracción de datos web, Scrapy, Machine Learning (ML)

Ver más: news aggregator script techmeme php, best news aggregator website, news aggregator scripts, best script news aggregator, news aggregator script, news aggregator top websites, news aggregator website, news aggregator engine freelancer, breitbart php news aggregator, horizontal news ticker flash txt, news flash com txt, flex news aggregator website, fixed costs news aggregator website, bid news aggregator site, news aggregator site integration, create news aggregator site, best cms create news aggregator site, news aggregator engine, create news aggregator website, best build news aggregator website

Información del empleador:
( 1 comentario ) Glostrup, Denmark

Nº del proyecto: #29367861

10 freelancers están ofertando un promedio de $370 por este trabajo

Venkat2011sri

Hello, This is sree, I'm a programmer. I have worked for many clients in this type of platforms. I write programs, that can read the data from websites or files and produce the exact output you want. I will write pro Más

$500 USD en 4 días
(138 comentarios)
7.2
umg536

Hi there, I'm bidding on your project "News Aggregator -- 2" I am a data scientist and Being an expert machine learning and artificial intelligence I can do this project for you. please leave a message on my chat so Más

$450 USD en 5 días
(30 comentarios)
6.9
(74 comentarios)
6.2
zivkovicdevelop1

Hello. Which site do you want to scrape? I am a Web Scraping Expert and have finished many scraping jobs in Python I have many ideas to scrape the website. I am sure to scrape the data perfectly as you wish Hope y Más

$200 USD en 3 días
(23 comentarios)
5.4
kimhuo99

Hello I am interested in your job. I understand all and want to discuss about Machine Learning . Please drop message to me. Thanks

$500 USD en 7 días
(17 comentarios)
4.8
Demenntor

Dear Employer, I have seen your Job description - News Aggregator. I have good experience in Python Machine Learning Scrapy and have done similar projects. Kindly message me so that we can discuss more about the wo Más

$300 USD en 5 días
(26 comentarios)
5.0
MohammedJARROU

Hello, I m a Data Scientist and Python Developer with an extensive experience and a lot of Web scraping, Machine learning, Deep learning projects with Python. I have several certificates in the same field (you can cons Más

$275 USD en 5 días
(8 comentarios)
3.8
gobyweb2

Hello, I have read your description and I can - 1. The engine should be build on top of Scrapy and needs to be well structured, scalable and well optimized 2. The engine should crawl websites that will be provided fr Más

$400 USD en 7 días
(3 comentarios)
3.9
yagodinsashuta

Hello I have read job description carefully and understood your requirements. I have worked on several web scraping projects similar with yours in past few months. So far, I have completed scraping projects using Scrap Más

$450 USD en 7 días
(1 comentario)
2.5
saubhagyamweb

***News Aggregator -- 2*** Hello, I hope you are doing Great! I have gone through your project description and I can say I understand the whole statement of work clearly. I am the best match for this requirement as Más

$270 USD en 7 días
(1 comentario)
2.0