Find Jobs
Hire Freelancers

C or fortran function to transform .csv file to sparse matrix

$30-5000 USD

Cerrado
Publicado hace alrededor de 13 años

$30-5000 USD

Pagado a la entrega
There are 2 datasets, A and B. Set A contains a list of 3 mln. unique ids. Set B contains, for a subset of these ids, 5 variables: date1, date2, product, cost and quantity. There are about 10,000 distinct products. Each row in set B represents a sale of a product to a customer (identified by the id), sold on date1 and paid on date2 (with date2 >= date1). There can be up to 100 million rows in set B. The function I am looking for should transform set A and B into two matrices, C and D. Both matrices should have: - a total number of rows equal to the total number of unique (customer) ids, i.e. equal to the number of rows in set A. - a total number of columns equal to the total number of unique products (i.e. about 10,000) Each cell in matrix C should contain the total quantity of a product sold to a customer, and each cell in matrix D should contain the total cost of a product. Matrix C and D should have a CCS (compressed column storage) sparse matrix format (see e.g. [login to view URL]). You can use any free publicly available library or code as part of your program. ## Deliverables There are 2 datasets, A and B. Set A contains a list of 3 mln. unique ids. Set B contains, for a subset of these ids, 5 variables: date1, date2, product, cost and quantity. There are about 10,000 distinct products. Each row in set B represents a sale of a product to a customer (identified by the id), sold on date1 and paid on date2 (with date2 >= date1). There are about 100 million rows in set B. The function I am looking for should transform set A and B into two matrices, C and D. Both matrices should have: - a total number of rows equal to the total number of unique (customer) ids, i.e. equal to the number of rows in set A. - a total number of columns equal to the total number of unique products (i.e. about 10,000) Each cell in matrix C should contain the total quantity of a product sold to a customer, and each cell in matrix D should contain the total cost of a product. The function will require four additional inputs: - mindate1 - maxdate1 - mindate2 - maxdate2 To build matrix C and D, only rows from Set B should be processed that satisfy both of the following conditions: - mindate1 <= date1 <= maxdate1 - mindate2 <= date2 <= maxdate2 It is expected that only about 0.1% of the cells in matrix C and D are non-zero. Matrix C and D therefore should have a CCS (compressed column storage) sparse matrix format (see e.g. [login to view URL]). The function should not require more than 15GB RAM when executed on the data as specified above (cost and quantity variables both have double, i.e. 8 byte, storage format). Example (in this example we omit the mindate1...maxdate2 restrictions): inputs: Set A: id 15 1 2 100 Set B: id, date1, date2, prod, cost, quantity, 100, '17/02/2008', '19/02/2008', C, 79, 30, 15, '11/01/2008', '11/01/2009', A, 100.51, 2, 100, '17/02/2008', '19/02/2008', A, 79, 7, 1, '15/03/2008', '11/01/2009', B, 3.71, 13, 15, '11/10/2008', '17/01/2009', A, 58, 1, matrix C (column names would be: id, prod_A, prod_B, prod_C) 1, 0, 13, 0, 2, 0, 0, 0, 15, 3, 0, 0, 100, 7, 0, 30, output in CCS sparse format: row_ind = {1, 2, 3, 4, 3, 4, 1, 4} col_ptr = {1, 5, 7, 8} val = {1, 2, 15, 100, 3, 7, 13, 30} matrix D (column names would be: id, prod_A, prod_B, prod_C) 1, 0, 3.71, 0, 2, 0, 0, 0, 15, 158.51, 0, 0, 100, 79, 0, 79, output in CCS sparse format: row_ind = {1, 2, 3, 4, 3, 4, 1, 4} col_ptr = {1, 5, 7, 8} val = {1, 2, 15, 100, 158.51, 79, 3.71, 79} I will provide you with sample datasets A and B and the corresponding matrices C and D for testing purposes.
ID del proyecto: 3250970

Información sobre el proyecto

12 propuestas
Proyecto remoto
Activo hace 13 años

¿Buscas ganar dinero?

Beneficios de presentar ofertas en Freelancer

Fija tu plazo y presupuesto
Cobra por tu trabajo
Describe tu propuesta
Es gratis registrarse y presentar ofertas en los trabajos
12 freelancers están ofertando un promedio de $136 USD por este trabajo
Avatar del usuario
See private message.
$255 USD en 14 días
5,0 (52 comentarios)
6,0
6,0
Avatar del usuario
See private message.
$85 USD en 14 días
4,9 (39 comentarios)
6,3
6,3
Avatar del usuario
See private message.
$170 USD en 14 días
5,0 (12 comentarios)
5,7
5,7
Avatar del usuario
See private message.
$143,65 USD en 14 días
5,0 (44 comentarios)
5,0
5,0
Avatar del usuario
See private message.
$102 USD en 14 días
5,0 (20 comentarios)
4,3
4,3
Avatar del usuario
See private message.
$170 USD en 14 días
4,2 (32 comentarios)
4,5
4,5
Avatar del usuario
See private message.
$84,99 USD en 14 días
5,0 (10 comentarios)
3,0
3,0
Avatar del usuario
See private message.
$178,50 USD en 14 días
5,0 (6 comentarios)
3,0
3,0
Avatar del usuario
See private message.
$85 USD en 14 días
5,0 (2 comentarios)
2,3
2,3
Avatar del usuario
See private message.
$212,50 USD en 14 días
5,0 (2 comentarios)
1,4
1,4
Avatar del usuario
See private message.
$42,50 USD en 14 días
0,0 (0 comentarios)
0,0
0,0
Avatar del usuario
See private message.
$106,25 USD en 14 días
0,0 (0 comentarios)
0,0
0,0

Sobre este cliente

Bandera de NETHERLANDS
Netherlands
5,0
11
Forma de pago verificada
Miembro desde nov 3, 2010

Verificación del cliente

¡Gracias! Te hemos enviado un enlace para reclamar tu crédito gratuito.
Algo salió mal al enviar tu correo electrónico. Por favor, intenta de nuevo.
Usuarios registrados Total de empleos publicados
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Cargando visualización previa
Permiso concedido para Geolocalización.
Tu sesión de acceso ha expirado y has sido desconectado. Por favor, inica sesión nuevamente.