Basic extraction from Wikipedia (from a few specific lists to DB)

Cerrado Publicado Mar 22, 2011 Pagado a la entrega
Cerrado Pagado a la entrega

===================

BACKGROUND

===================

I will provide you with a few lists from Wikipedia website (list of ballet companies, list of operas, list of musicals, etc.) and your job would be to write a script to extract details into two basic mySQL tables (I will provide the structure of the two tables below).

As part of the deliverables of this project, I'm looking for (a) populated tables with data and (b) the scripts themselves which were used to extract the data.

**This is the first trial project of any such extraction undertakings. There is more extraction work ahead.**

===================

DATA STRUCTURE

===================

There will be two tables: "entities" table and "entity_names" table:

**entities** table:

- ID

- Wikipedia_Page

- Type

- Primary name ID (which will point to "ID" from "entity_names" table)

**entity_names** table:

- ID

- entity_ID (which will point to "ID" from "entity" table)

- Name

- Type (primary or secondary)

The reason we're using two tables, is that a given entity could later have more than one name/alias (for example "San Francisco Symphony" could be called "SF Symphony"). For all the stuff you will be extracting, you can set the value of "type" field of "entities_table" to "primary".

## Deliverables

===================

WHAT TO EXTRACT

===================

1) List of all ballet companies

Source: <[login to view URL]>

Fields to grab:

Name = "Company Name" from the table

Type = ballet_company

Wikipedia page = page for each ballet company (example: [login to view URL])

2) List of Operas

Source: <[login to view URL]>

Name = opera name from the list

Type: opera

Wikipedia page = page for each opera (example: [login to view URL])

*(below, I will only provide the type as the other fields are self-explanatory based on the above two examples)

*3) List of Opera Companies

Source: [[login to view URL]

][1] Type: opera_company

4) List of Musicals:

Sources: <[login to view URL]:_A_to_L>

<[login to view URL]:_M_to_Z>

Type: musical

5) List of Orchestras:

Source: <[login to view URL]>

Type: orchestra

6) List of Improv Theater Companies

Source: <[login to view URL]>

Type: improv_theater_company

7) List of Comedians

Source: <[login to view URL]>

Type: comedian

Note: Please only extract those who are still alive (i.e. do not take someone like "Bud Abbott (1895-1974)")

8) List of Stand-up Comedians

Source: [[login to view URL]

][2] Type: stand_up_comedian

Note: Please only extract those who are still alive

9) List of dance companies:

Source: <[login to view URL]>

Type: dance_company

10) List of pop punk bands

Source: [[login to view URL]

][3] Type: pop_punk_band

Java JavaScript MySQL PHP Instalación de scripts Shell Script Arquitectura de software Verificación de software Web Hosting Gestión de páginas web Verificación de páginas web XML XSLT

Nº del proyecto: #3191040

Sobre el proyecto

28 propuestas Proyecto remoto Activo Apr 13, 2011

28 freelancers están ofertando un promedio de $177 por este trabajo

repmovsd

See private message.

$382.5 USD en 5 días
(144 comentarios)
7.0
samirkumardas

See private message.

$297.5 USD en 5 días
(241 comentarios)
7.0
sktn

See private message.

$143.65 USD en 5 días
(262 comentarios)
7.1
pbradaric

See private message.

$85 USD en 5 días
(28 comentarios)
6.1
mastirlaa

See private message.

$85 USD en 5 días
(76 comentarios)
6.1
novepi

See private message.

$212.5 USD en 5 días
(42 comentarios)
5.9
Bitquark

See private message.

$170 USD en 5 días
(44 comentarios)
5.9
tomkusvw

See private message.

$85 USD en 5 días
(62 comentarios)
5.7
webspiderinc

See private message.

$85 USD en 5 días
(53 comentarios)
5.5
topleaseu

See private message.

$212.5 USD en 5 días
(24 comentarios)
5.3
oasis21

See private message.

$127.5 USD en 5 días
(35 comentarios)
4.9
szaszalexmcpd

See private message.

$85 USD en 5 días
(55 comentarios)
4.4
lenzai

See private message.

$340 USD en 5 días
(16 comentarios)
4.2
ragastens

See private message.

$110.5 USD en 5 días
(37 comentarios)
4.4
cwaldbieser

See private message.

$297.5 USD en 5 días
(10 comentarios)
4.3
powzak

See private message.

$85 USD en 5 días
(25 comentarios)
4.1
MrRain

See private message.

$85 USD en 5 días
(13 comentarios)
3.8
rased108

See private message.

$85 USD en 5 días
(29 comentarios)
4.6
Archit88

See private message.

$136 USD en 5 días
(14 comentarios)
3.3
ifailed

See private message.

$85 USD en 5 días
(8 comentarios)
2.4