Hello,
I need a php script doing:
step 0: you get a file with a list of URLs (hundreds or thousands); they are in all sorts of format (subdomains, https, many SLD/TLD).
step 1: you extract the domain names from the URLs and generate a sorted list of unique domains; this is not as simple as it sounds as the function doing that must be able to tokenize any URL format as well as any form of TLD (like .[login to view URL], .fr, .[login to view URL], ... for example).
step 2: clean the list to remove some domains such as free blogs or .gov.
step 3: scrape [login to view URL] to get one data about some of the domains.
step 4: scrape [login to view URL] to get some data for a short list of domains (without getting banned for superusage).
step 5: scrape 2 data from the [login to view URL] page for each domain in the list.
step 6: sort the list and output as a flat file.
Or you can propose your method.
Potential for long term work with the right programmer(s)
Hello. I'm the right programmer! I write custom programs who can do everything. PHP code runs from certain ips thus blocked, programs can't be blocked. So, I will create a program which check the domain validity, and then go to the websites to get the info we need. Please check my portfolio and website. Thanks