1- Scope:
Providing with a basic Hadoop Map-R V.3 environment over Amazon Web Services. Basic trial environment in this phase. No need to provide 24 x 7 tools or extra code.
Main aim is to analyse data from several text S3 input sources and start trial period.
2- Tools:
We provide Project AWS account for the Project and Map-R V.3 Hadoop clusters. Free administration for implementing this project.
3- Deliverables:
- Scripts code for AWS API based automatic MAP-R V.3 Set-up for a given number of masters and computing nodes.
- Set up scripts capable of using EC2 on “demand nodes”
o For real time 24x 7 live queries
o For batch night processes.
- Java basic code for providing basic routines like:
o Joints tables form several text sources.
o Gauss statistics: Mean, deviation, etc.
o Basic counting and basic mathematics routines.
o Output text or Mysql computed tables.
- skype sessions for 4 hours to train skilled informatics from de php and javascript world.
- Documented source code.
4- Input sources:
The project is intended for analysing and creating logs joints form distant connected devices and central text tables.
- Several TEXT files for remote devices stored on S3 files.
o Characteristics of remote devices (>400.000 TV sets)
• Brand
• Programed parameters
• Available channels o
• Geo location
o Log text of distant
• Real time logging of visits
• Number of visits
• Duration
• TV station tuned in in each moment
• Type home demographics where the device is installed.
o TV Stations programming scheduling
• Show type: movie, talk show, debate
• Start time, end time.
• Celebrities involved in the show.
6- Expected outputs.
- Several combinations of the above.
- - Mean time per TV set type expend in each type of show.
o Mean time
o Standard deviation
o Top celebrities watched
- Samples of joints form several sources.
- Real time queries set up in case of need real time response.
- Batch set up for long time consuming queries of whole set of queries.
7- Time table.
- Needed in four weeks / January end – first September week.
- We provide AWS zone with all the text sources inside ready for use.
- Week days 9- 18h CET e-mail /skype contact for immediate support for any doubt or clarification needs.
8- References:
- No project will be awarded without clear and outstanding references on hadoop implantations over AWS ,
- MAP-R is a plus.