Scrape sections of text from webpages and count words (ideally, in R)
€30-250 EUR
Pagado a la entrega
For a research project, I need to scrape text from webpages. There is a list of 359 webpages. The text on each webpage is divided into sections. I need to count the total number of words in each section and I need to count the number of occurrences of each word in my dictionary per section. Ideally, you would also send me each section in a separate text file. Attached you can find an excel file with the links to the webpages. Each webpage has a table of contents. For example, the first webpage should be divided into the section of text before the table of contents, a section with the header "Prospectus Summary", a section with the header "Summary Financial Data", a section with the header "Risk Factors", and so forth. There are usually around 20 sections. The text on each webpage is between 50 and 180 pages long. The structure of the webpages is very similar but sometimes different sections with different headers exist. The problem I experienced when attempting to scrape the text by sections is that webpages have slightly different structures. A majority of webpages has sections in capital letters with the same name but some webpages have different headers and a slightly different structure.
I am also attaching a list of uncertain words. For each word and each section in each prospectus, I need a word count (number of occurrences). The word count should include words in either upper or lower case letters.
I am attaching some R code that I used to scrape the entire webpage (not divided into sections) and count the words, in case it is helpful. It would be great to have your code in R so that I could continue to use it. I need the output as soon as possible (by Monday).
I would very much appreciate your help and look forward to hearing from you!
Nº del proyecto: #18742555
Sobre el proyecto
12 freelancers están ofertando un promedio de €124 por este trabajo
Hello? How are you? I have seen the project - "Scrape sections of text from webpages and count words (ideally, in R)." As you can see my profile in these fields((R Programming Language, Web Scraping)), I have been Más
Hello, I have gone through your job posting and become very much interested to work with you. I am an expert in this field. I have already completed several projects like this. For evidence you can see my profile. Más
Hello, Full Stack Expert development team is ready to serve you. We are working on hourly rate 35usd/h Please check my profile and message me for more details Thanks
A Data Scientist with experience in Python, R programming, R Shiny, R studio and anything related to data science and python Master in Engineering, Electrical and Electronic Engineer, who is dynamic, reliable, resou Más
Hi, I have a lot of experience in scraping data in websites using python, Nodejs, Javascript and PHP. I can save the scrapped data in xlsx, csv, sql files and more database according to user's requirement. Also I can Más
HI Hello Sir, We have gone through the details you have provided and would be pleased to work on this with you to deliver the results that you have expected and We are sure you will not be disappointed if you giv Más
Hello. I am interesting in your project. I have rich experiences in web scrapping. Please look my portfolios and reviews. I am new in freelance. so I have little reviews. But I think my skills , honest and integr Más
Hello, I have already gone through the details from you. I think it is the best suited for me and you will have got an excellent feedback. I am available from now on. I ensure you i will provide you a better quality w Más
Hi there.. I have more than 2.5 years experience in web scraping and working as web/app scraper in a company. so I shall work on this with asp.net c# and will provide you quality data. thanks..