Find Jobs
Hire Freelancers

Python power programmer needed to create a function to ingest and process data as a stream

$30-250 NZD

Cerrado
Publicado hace más de 5 años

$30-250 NZD

Pagado a la entrega
Looking for a python developer / data engineer should have experience ingesting and processing data as a stream demonstrable experience handling 2-3 GB of source data **knowledge of object oriented programming concepts, professional documentation methods and python lambda functions are a must Oracle VM box, linux Ubuntu 14.06.5 LTS, pycharm, Anaconda environment Data is available as TSV extracts from multiple sources in CDL. Data Engineer should be able to merge the TSV extracts by means of applying correct join techniques. As the data will be available in compressed format, data engineer should apply right techniques such as reading data in a streams rather than reading the entire uncompressed format of data - as it might not fit the entire memory. Hence optimal coding is expected. The merged data will be transformed and stored in a postgreSQL data base ([login to view URL]). The function should follow Object Oriented Paradigm with continuous integration and deployment in focus. Also version controlling is expected. Some remarks: - Each data snapshot can contain multiple headerless main data files in TSV format, with each file having a size of up to 2GB. Engineer should be able to read files as a stream while unpacking them, because they usually do not fit into RAM. - In addition to the main data files, each snapshot has a file with the header names and multiple lookup files that map the numeric IDs from the main data to Strings, comparable to a foreign key in an SQL DB. - Data should be read and transformed on a record by record base (stream or mini-batch processing). - Each combined and transformed record should be prepared for multiple data sinks, e.g. SQL query strings to write a record into a PostgreSQL, MS SQL. Engineer will create code for a write adapter for each data sink with a common interface so that the same function call can used to write into any of the specified data sinks. *** Code provided should be modular, reusable and well documented. Engineer needs to know how to build Python modules with classes, using OOP decomposition practices, inheritance (e.g. abstract classes). - Code should have Unit Tests, if appropriate - Code will be implemented as a Python AWS Lambda function. Engineer should be familiar with building Lambda functions and should ideally have a local development environment, setup for building and uploading Lambda functions.
ID del proyecto: 18026938

Información sobre el proyecto

2 propuestas
Proyecto remoto
Activo hace 5 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos

Sobre este cliente

Bandera de INDIA
faridabad, India
5,0
35
Miembro desde mar 9, 2017

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.