News Aggregator -- 2
$100-450 USD
Pagado a la entrega
The aim of this project is to create a news aggregator in python.
Specifications:
1. The engine should be build on top of Scrapy and needs to be well structured, scalable and well optimized
2. The engine should crawl websites that will be provided from a JSON file or database (should be flexible because we haven’t decided yet)
3. Spiders should be build in that way that can easily scale up, like a baseline spider that can be inherited
4. When the spider hits the website for the first time, the spider should try and find if the website has RSS feeds or sitemap, in order to track and scrape the latest news/content from. If there is no RSS feed or sitemap should hit the fallback spider, which will find all the latest news
5. Machine Learning model/AI techniques in order to identify the page structure and extract the content automatically
6. The content should be cleaned with best techniques possible and retrieve back all the important information from an article (title, image/images, videos, date, author/authors, content etc.)
7. The output should be flexible, having options to save the articles in files (locally or in a remote storage like Amazon or Microsoft azure blob storage or database).
8. Every spider should be monitored and by gathering information from them (like statistics on how many articles have they scraped from the website, in case a spider fails to crawl a specific website it should raise a flag and report the reason why it has failed etc.).
9. Please do not hesitate to suggest ideas or better ways of doing it, especially using AI and Machine learning.
Requirements:
- The programming language that should be used in this project is Python 3
- The project should be well-structured, clean code and should be build with the scaling strategy in mind
- Modular/component based
- Speed and accuracy is of utmost importance
- The service should be optimized to be able to run in low performance servers and achieve good results
- External dependencies should be kept to a minimum. If you can't avoid external dependencies, please list them in a text file named [login to view URL]
Nº del proyecto: #29367861
Sobre el proyecto
8 freelancers están ofertando un promedio de $343 por este trabajo
Hello, This is sree, I'm a programmer. I have worked for many clients in this type of platforms. I write programs, that can read the data from websites or files and produce the exact output you want. I will write pro Más
Hello. Which site do you want to scrape? I am a Web Scraping Expert and have finished many scraping jobs in Python I have many ideas to scrape the website. I am sure to scrape the data perfectly as you wish Hope y Más
Hello, I m a Data Scientist and Python Developer with an extensive experience and a lot of Web scraping, Machine learning, Deep learning projects with Python. I have several certificates in the same field (you can cons Más