COMPRANET
There are 33 sets of data that list records that vary on size.
The links are:
For the Whole Supplier Database:
---------------------------------------------
[login to view URL]
For the Supplier Database, by sets where each set stands for a specific State (which is another approach)- Numbers may vary:
-----------------------------------------------------------------
For Data Set 01: 1,721-records
[login to view URL]
...
For Data Set 02: 2,386-records
[login to view URL]
...
For Data Set 03: 974-records
[login to view URL]
...
[Pls notice the link that states the data set on "entidad=##"]
and data sets 04, 05, 06 ... all the way to 33 ...
...
For Data Set 33: 3,623-records
[login to view URL]
...
Pls. notice that data set 09 has 35,650 records, so the number varies per dataset. Total number of records is 134,518.
....
So, the idea is to extract data with the columns:
+ Data Set (from 01 to 33)
+ Comany Name
+ Tax Number
+ Address
+ Address on 5-columns [Address1, Address2, City, State, ZipCode]
+ Phone
+ Fax
+ Email
+ Business Activity
I hope it explains the scope and kind of data-extraction work.
If you have any questions let me know. Also, if you have a better idea on how to approach, your input is welcome.
Pls. notice:
* Its now a DB of more than 134,000 records
---------------------------------------------
So, I wonder if MS-Excel would work well for it ... that is on one SpreadSheet all records or if you have to divide it, pls make it as long as possible (50k+), but keeping all STATE records on a single file.
* The ADRESS field contains 6, 7, or even more records.
-------------------------------------------------------
So, I wonder if you could separate in 5(five) columns, going from right to left within the Address:
A - Codigo Postal / ZipCode
B - Estado / State
C - Municipio / City
D - Colonia / Address 2
E - Street / Address 1
For example ... on the first record of your last project(FILE: Sirve+Data+Scraping), for
"10 Quattro Sa De Cv"
the ADDRESS field reads:
3 ERA AVENIDA 700, A, CUMBRES, SEGUNDO SECTOR,MONTERREY,NUEVO LEON 64610
So ... going from right to left ... we would have:
A - 64610
B - NUEVO LEON
C - MONTERREY
D - CUMBRES SEGUNDO SECTOR
E - 3 ERA AVENIDA 700 A
I know that it may vary so that is why it makes sense to go from left to right ...
On the other hand, if you recommend to keep all records on a single ADDRESS field, I understand but if you could divide, pls. take into account that ZIP, STATE, CITY, ADDRESS1 and ADDRESS2 are standard.