Filtro

Mis búsquedas recientes
Filtrar por:
Presupuesto
a
a
a
Tipo
Habilidades
Idiomas
    Estado del trabajo
    1,062 pyspark trabajos encontrados

    Trabajo con archivos .txt que pesan entre 1 GB y 10 GB y necesito acelerar su de...mejoras (particionamiento, paralelismo, tuning de cluster, uso de cachés, compresión, etc.). La tarea incluye implementar un job de Spark que lea los textos, realice un análisis de datos básico (conteos, filtros, validaciones sencillas) y deje el resultado preparado para que Spring Batch lo consuma sin cambios adicionales. Al finalizar espero: • Código y scripts listos para producción (Scala o PySpark, lo que domines). • Guía breve de configuración y buenas prácticas aplicadas. • Prueba de rendimiento antes vs. después que muestre la optimización lograda. Si ya has afinado lecturas masivas de TXT en Spark...

    €129 Average bid
    €129 Oferta promedio
    13 ofertas

    ...Databricks. La idea es que el usuario complete el formulario, los datos queden almacenados directamente en una tabla de Databricks y, con un clic, se genere un informe tipo resumen ejecutivo centrado en indicadores clave de rendimiento (KPI). Busco a alguien que domine tanto la parte Front-End (HTML, CSS, JavaScript) como la integración Back-End en Databricks: notebooks, Delta Lake, Databricks SQL o PySpark. El flujo debería quedar así: • El formulario se sirve como componente web incrustado en la interfaz de Databricks (o bien como un Job/Notebook con widgets). • Al enviarse, persiste la información en una tabla Delta. • Un proceso desencadenado consulta esos registros y produce el reporte ejecutivo con los KPI más relevant...

    €8 / hr Average bid
    €8 / hr Oferta promedio
    14 ofertas

    Buscamos un/a Data Engineer full time para incorporarse a un proyecto de larga duración (mínimo 1 año) en el área de análisis y calidad de datos vinculada a prevención de delitos financieros (AFC). IMPRESCINDIBLE: Residencia fiscal en España y Inglés fluido. El rol implica desarrollar soluciones técnicas con Python, PySpark y SQL, así como procesos de remediación de datos apoyados en machine learning. Requisitos imprescindibles: - Experiencia sólida con Python y PySpark - Buen nivel de SQL - Conocimientos básicos de machine learning Valorables: Hadoop, Hive, Airflow, herramientas de visualización, metodologías ágiles.

    €26 / hr Average bid
    €26 / hr Oferta promedio
    31 ofertas

    TERMINOS DE REFERENCIA IMPLEMENTACION DE PIPELINE DE DATOS – AWS Cloud Proyecto Data Warehouse Empresarial – La Ascensión SA 2025 Objeto Contratar un FreeLancer experto o firma especializada en servicios de AWS habla hispana o apoyo en idioma español, para configurar e implementar un pipeline de datos automatizado en la nu...(extracción, transformación y carga) con reglas dinámicas basadas en metadatos; integración con fuentes de datos como SAP HANA, SQL Server, BigQuery, Google Analytics GA4 y APIs de CRMs; y optimización de costos aprovechando la capa gratuita de AWS. Duración estimada: 4-7 semanas (ajustable según roadmap y complejidad). Perfil requerido: Experiencia comprobada en arquitectura serverless de A...

    €9656 Average bid
    €9656 Oferta promedio
    17 ofertas

    Requerimos contratar desarrolladores para proyectos varios en español: Profesión: Ingeniero en Sistemas o afines - Conocimientos de Lenguaje SQL. - Conocimiento de herramientas ETL. - Conocimiento de Synapse (Pipelines, DataFactory) - Manejo de Storage Accounts. - Conocimiento de procesos de ingenieria de datos(Databricks) - Conocimiento de Pyspark, Python Experiencia en construcción de warehouse, lakehouse

    €16 / hr Average bid
    €16 / hr Oferta promedio
    17 ofertas

    Necesitamos un Data Engineer con conocimientos de Python/PySpark y Databricks en entorno Azure. Se haría cargo del mantenimiento de una de nuestras aplicaciones durante al menos 2 meses, ampliable. Deseables conocimientos de Datafactory y Retool.

    €33 / hr Average bid
    €33 / hr Oferta promedio
    15 ofertas
    Senior Data Engineer Required
    6 días left
    Verificado

    ...SageMaker) - Implement monitoring, error handling & ensure high system reliability (99.9% uptime) - Build data validation, quality checks & anomaly detection systems - Design systems for backfills, reprocessing & consistency - Maintain data contracts, schema versioning & CI/CD pipelines --- Required Skills - 3–7+ years of software/data engineering experience - Strong Python (5+ yrs), SQL, Pandas/PySpark - Hands-on AWS (Lambda, Glue, S3, Athena, DynamoDB, ECS, Step Functions) - Experience with REST APIs, microservices, event-driven architecture - Knowledge of ML/MLOps (SageMaker, model lifecycle) - Exposure to Node.js (or willingness to learn) - Experience with large-scale data processing & distributed systems - Strong focus on testing, CI/CD, monitor...

    €576 Average bid
    €576 Oferta promedio
    9 ofertas

    ...end-to-end on AWS, coded in PySpark and designed for production reliability. The pipeline will ingest data from three sources—databases, APIs, and file systems—then standardise and load it into an analytics-ready destination. Source files arrive in a mix of CSV, JSON, and Parquet, so the job must include automatic format detection, schema inference, and efficient column-wise writes. Beyond raw transformation, I want solid engineering practices: parameter-driven jobs, modular Spark code, unit tests, logging, alerting, and retry logic. Leveraging AWS native services such as Glue, EMR, Lambda, and S3 is expected, but I’m open to other AWS components if they shorten development time or lower cost. Candidates must have expertise in data engineering. Deliverables ...

    €22 / hr Average bid
    €22 / hr Oferta promedio
    21 ofertas

    ...Responsibilities: - Develop ML/statistical models (DID, Synthetic Control, A/B Testing) in Python - Build and integrate FastAPI-based services - Design large-scale data pipelines using PySpark, Delta Lake, and Azure Data Lake - Optimize Spark jobs (memory, partitioning, performance tuning) - Work with Databricks for job orchestration and data workflows - Containerize and deploy applications using Docker & Kubernetes - Ensure code quality with testing and CI/CD pipelines - Collaborate with data science and product teams --- Must Have Skills: - Python (3.9+), Pandas, NumPy, Scikit-learn, SciPy - Strong PySpark & Spark Internals (OOM handling, tuning, optimization) - Databricks (clusters, workflows, Delta Lake) - Causal Inference: A/B Testing, DID, Hypothesis Test...

    €494 Average bid
    €494 Oferta promedio
    30 ofertas

    ...Unity Catalog enabled, and I need a seasoned modeller who can translate business requirements into robust Star and Snowflake schemas, then bring them to life with PySpark and advanced SQL. You will refine our Medallion architecture (Bronze → Silver → Gold), implement both Type 1 and Type 2 SCD strategies, and tune the pipelines for speed through smart partitioning and other optimisation techniques. The datasets involved are large, structured and semi-structured, so hands-on experience handling such volumes in Databricks is essential. Key deliverables • Logical and physical data models documented and version-controlled • PySpark notebooks / SQL scripts that create the Star and Snowflake tables in Delta Lake under Unity Catalog governance • P...

    €9 / hr Average bid
    €9 / hr Oferta promedio
    13 ofertas

    ...should be an active contributor with the ability to handle complex data challenges independently within an existing architecture. Key Responsibilities: Design, develop, and maintain data pipelines using PySpark Work on data ingestion, transformation, and optimisation for large-scale datasets Handle real-world data challenges such as API inconsistencies, schema drift, and incremental load failures Ensure data quality, reliability, and performance across pipelines Collaborate with cross-functional teams to deliver data-driven solutions Required Skills: Strong hands-on experience in PySpark and data engineering Proven experience in handling production-level data issues and debugging Solid understanding of data modelling, ETL/ELT processes, and data pipelines Ability to wo...

    €11 / hr Average bid
    €11 / hr Oferta promedio
    8 ofertas

    ...should be an active contributor with the ability to handle complex data challenges independently within an existing architecture. Key Responsibilities: Design, develop, and maintain data pipelines using PySpark Work on data ingestion, transformation, and optimisation for large-scale datasets Handle real-world data challenges such as API inconsistencies, schema drift, and incremental load failures Ensure data quality, reliability, and performance across pipelines Collaborate with cross-functional teams to deliver data-driven solutions Required Skills: Strong hands-on experience in PySpark and data engineering Proven experience in handling production-level data issues and debugging Solid understanding of data modelling, ETL/ELT processes, and data pipelines Ability to wo...

    €11 / hr Average bid
    €11 / hr Oferta promedio
    6 ofertas

    I need a Microsoft Fabric notebook written in PySpark that can call a suitable commodities-exchange API and pull the most recent futures data on a couple select futures products. The solution should be fully runnable inside Fabric. Key points I have set: • Refresh cadence: once a month, so include a simple scheduling example (Fabric pipeline or a cron-style note is fine). This data will be pulled from a Chinese exchange, so I want a Chinese speaking freelancer only!! Deliverables 1. The .ipynb (or .notebook) file ready to import into Fabric 2. A quick test run showing one successful fetch and a tidy DataFrame with the fields timestamp, contract, price, and volume Acceptance will be based on the notebook executing end-to-end without manual edits (apart from entering an...

    €26 Average bid
    €26 Oferta promedio
    17 ofertas

    ...and transform it in real time, feed it to a set of AI services, then serve the insights back to users through an intuitive dashboard. Your day-to-day work will touch three key areas: • Data collection – build reliable connectors, handle auth flows, schedule recurring pulls, and maintain error logging. • Data processing – design ETL pipelines, implement transformation logic in Python (Pandas, PySpark or similar), and ensure everything is containerised for smooth deployment. • Data visualization – wire processed datasets into the React front-end, craft reusable chart components (D3, , or your preferred library), and optimise for performance. Acceptance criteria 1. End-to-end pipeline runs with a one-command deploy (Docker / docker-compose or...

    €86 Average bid
    €86 Oferta promedio
    36 ofertas

    Data cleaning using SQL/Python (need to figure out) and export in Excel. The client have used this in Pyspark environment. we can have a discussion later ont the details.

    €8 Average bid
    €8 Oferta promedio
    1 ofertas

    We are looking for an experienced Palantir Foundry Developer to support data and AI use cases. Scope of Work: * Build and maintain Foundry data pipelines (Pipeline B...support data and AI use cases. Scope of Work: * Build and maintain Foundry data pipelines (Pipeline Builder, Transforms) * Work with Ontology (object types, link types, data modeling) * Develop Workshop applications for business users * Implement AIP Logic workflows and basic agent integrations * Write production-quality Python, SQL, and PySpark code Requirements: * Hands-on experience with Palantir Foundry (mandatory) * Strong skills in Python, SQL, and PySpark * Experience with Ontology, Pipelines, and Workshop * Basic understanding of AIP (preferred) Project Details: * Budget: ₹45,000+(Negotiable) * ...

    €469 Average bid
    €469 Oferta promedio
    21 ofertas

    Results-driven Senior Data Analyst with 8+ years delivering enterprise data solutions across Banking, Financial Services, and Healthcare. Specialized in end-to-end Data Warehouse design, ETL pipeline development, and BI reporting. Core Expertise: Snowflake, Azure Data Factory, Azure Synapse, Delta Lake, PySpark, SSIS, T-SQL, PL/SQL, Oracle, MySQL, Power BI, DAX, Tableau, SSRS, Star/Snowflake Schema, EDW Design, Data Mart Development, Data Lineage, Gap Analysis. Compliance & Governance: HIPAA, GDPR, SOX Audit Controls, Data Quality Frameworks, Data Governance Policies. Business Analysis: BRD/FRD Writing, JAD Facilitation, UML Diagrams, Stakeholder Management, Agile, Waterfall. Certifications: Microsoft Azure Fundamentals, Salesforce Administrator, Salesforce Platform Developer I....

    €129 Average bid
    €129 Oferta promedio
    23 ofertas

    Responsible for designing and implementing large-scale data migration and ingestion pipelines to move high-volume data from diverse sources into cloud platforms. Sources include HDFS, relational databases such as MySQL and PostgreSQL, and real-time streaming systems like Kafka. Develop and maintain robust data pipelines using PySpark, ensuring efficient processing of batch and streaming data. Implement automated scheduling mechanisms to orchestrate data workflows on daily and monthly intervals, ensuring reliability and timely data availability. Optimize data ingestion and storage through advanced performance tuning, partitioning, and compaction strategies to handle large-scale datasets efficiently. Ensure data quality, consistency, and fault tolerance across all pipelines. Deploy...

    €9 Average bid
    €9 Oferta promedio
    1 ofertas

    ...Experience Required: 5+ Years (Data Engineering), 3+ Years (Databricks) Note: Budget is fixed. Please do not apply if you are looking to negotiate. Key Responsibilities Develop and optimize data pipelines using Databricks, PySpark, and Spark SQL Design and implement Delta Lake architecture (Bronze / Silver / Gold layers) Work on Lakehouse architecture and manage Unity Catalog Apply DataOps practices for scalable and reliable data workflows Optimize Spark jobs for performance and cost efficiency Required Skills Strong hands-on experience with Databricks Proficiency in PySpark and Spark SQL Experience with Delta Lake and Lakehouse architecture Knowledge of data pipeline design and optimization Understanding of DataOps and data governance Nice to Have Experience with Azur...

    €537 - €626
    Sellado Acuerdo de Confidencialidad
    €537 - €626
    11 ofertas

    ...database containing nested JSON / key-value blobs. • Goal: parse, normalize, and flatten these blobs into well-defined columns while preserving relationships and lineage. • Scale: millions of rows, so solutions that leverage Spark, Hadoop, BigQuery, Snowflake, or well-tuned SQL/Python pipelines are welcome—as long as they remain maintainable. Deliverables 1. Transformation code (Python, PySpark, SQL, or Scala) with clear comments. 2. A runnable job definition or workflow file (Airflow DAG, Spark submit script, dbt model, etc.) that shows how to execute the pipeline end-to-end. 3. Simple README explaining prerequisites, run steps, and how new fields should be added in future. Acceptance criteria • Pipeline processes at least 10 GB of source data ...

    €127 Average bid
    €127 Oferta promedio
    6 ofertas

    ...validation rules, automated tests, and observable metrics baked in from day one—Great Expectations, Delta Live Tables expectations, or comparable frameworks are welcome, as long as quality gates are visible in the monitoring layer. Scope to cover: • Architecture design diagram with clear component rationale (Azure Data Lake, Databricks, Delta, Unity Catalog, etc.). • Reproducible code (Python / PySpark, notebooks or repos) with CI/CD instructions. • Ingestion pipelines (batch or streaming), curated layers, and serving tier (SQL endpoints, Power BI, or dashboards of your choice). • Integrated monitoring, alerting, and cost-aware observability using native Azure tools or open-source add-ons. • End-to-end test suite: unit, integration, and data qualit...

    €9 / hr Average bid
    €9 / hr Oferta promedio
    19 ofertas

    ...validation rules, automated tests, and observable metrics baked in from day one—Great Expectations, Delta Live Tables expectations, or comparable frameworks are welcome, as long as quality gates are visible in the monitoring layer. Scope to cover: • Architecture design diagram with clear component rationale (Azure Data Lake, Databricks, Delta, Unity Catalog, etc.). • Reproducible code (Python / PySpark, notebooks or repos) with CI/CD instructions. • Ingestion pipelines (batch or streaming), curated layers, and serving tier (SQL endpoints, Power BI, or dashboards of your choice). • Integrated monitoring, alerting, and cost-aware observability using native Azure tools or open-source add-ons. • End-to-end test suite: unit, integration, and data qualit...

    €466 Average bid
    €466 Oferta promedio
    59 ofertas

    ...transformation, and optimisation. • Hands-on experience working within Databricks, including notebooks, workflows, and job execution. • Proven experience using Power BI for report and dashboard development, including data modelling, DAX, Power Query, and visualisation design. • Experience building and maintaining data pipelines, ideally within Azure environments. • Experience using Python (e.g. PySpark) within Databricks environments is advantageous. • Understanding of data modelling concepts, including fact and dimension structures. • Familiarity with Azure Data Factory or similar orchestration tools, with Insight Factory advantageous. • Working knowledge of DevOps practices, including version control, repository management, and...

    €1177 Average bid
    €1177 Oferta promedio
    30 ofertas

    Job Title: Data Engineer (Databricks and AWS) Duration: 2 Hours Budget: ₹22,000 – ₹26,000 (based on screening) Tech Stack: Databricks, Python, PySpark, AWS, SQL, Git Job Description: We are looking for an experienced Data Engineer to provide short-term support. The role involves working on data pipelines, transformations, and analytics using Databricks and AWS. Responsibilities: Develop and optimize data pipelines using Databricks, PySpark, and Python Work with AWS services and SQL-based data processing Manage code and versioning using Git Troubleshoot and optimize data workflows Requirements: Strong hands-on experience with Databricks, PySpark, and Python Good knowledge of AWS data services and SQL Experience with Git and collaborative development Ability to del...

    €252 Average bid
    €252 Oferta promedio
    17 ofertas

    ...engineering experiences with various aws services Experience building end-to-end data pipelines (schema discovery, ingestion, transformation, orchestration, monitoring) Experience working with relational databases like Oracle, MySQL, and SQL Server etc Experience with data ingestion from on-prem systems to cloud Experience with streaming platforms like Kafka or AWS Kinesis Strong skills in Python, PySpark, SQL, and Terraform...

    €971 Average bid
    €971 Oferta promedio
    158 ofertas

    ...the next round of hiring I want an accomplished Senior Data Engineer to sit in on our technical interviews for roughly two hours each day. The role is purely evaluative: you will craft probing questions, join live video calls, and quickly score each candidate’s depth of knowledge across Python, Scala and SQL. Our stack centres on Azure and Databricks, so practical insight into large-scale Spark/PySpark jobs, data-model design, ETL orchestration and cloud performance tuning is essential. Candidates frequently discuss streaming, optimisation strategies and modern AI/ML add-ons, so any hands-on exposure to libraries such as PyTorch, NumPy, SciPy or TensorFlow will help you challenge them at the right level, though it is not mandatory. Availability is limited to two focus...

    €225 Average bid
    €225 Oferta promedio
    16 ofertas

    ...narrative continuity before passing curated context into a citation aware LLM routing layer that prioritizes Gemini, OpenAI, then Anthropic, then Ollama local models, enforcing context bound generation and preventing hallucination outside retrieved evidence. Indexing is parallelized using ProcessPoolExecutor for efficient multi core utilization and automatically scales to distributed ingestion via PySpark when corpus size exceeds a configured threshold, enabling safe handling of 20k plus documents or 50GB class corpora, while the system is wrapped in a full MLOps backbone that integrates MLflow for experiment tracking of retrieval metrics, PPO reinforcement learning rewards, and parameter tuning, exposes Prometheus metrics for latency and retrieval monitoring compatible with Graf...

    €210 Average bid
    €210 Oferta promedio
    14 ofertas

    Description: We’re looking for an experienced Data Engineer preferably based from Dubai to help build and manage data pipelines for a global platform. Most work is in Azure, using Azure Data Factory, ADLS, and Databricks. What you’ll do: Build and manage PySpark/Spark pipelines in Databricks Schedule and monitor pipelines in Azure Data Factory Optimize Databricks for better performance Keep code and documentation organized and clear Requirements: Experience with Azure cloud and Databricks Strong PySpark / Spark skills Experience building scalable, reliable data pipelines Details: Project-based, with potential to move to full-time Ideal for engineers who like building cloud-native pipelines

    €10 / hr Average bid
    €10 / hr Oferta promedio
    20 ofertas

    ... • Read multiple flat-file formats (mainly CSV, with the occasional JSON). • Apply thorough data-cleansing rules—removing duplicates, enforcing data types, flagging out-of-range values, and normalising text fields. • Run validation checks so that only clean, schema-compliant rows proceed to the load step. I’m happy for you to choose the stack you are most efficient with—Python (pandas, PySpark), Talend, or another ETL tool—as long as the final solution is reproducible and can be triggered automatically (CLI, scheduled job, or cloud function). If you think aggregation or more advanced joins would improve the dataset, flag that as a future enhancement; for now, cleansing and validation are the must-haves. Deliverables 1. Well-docum...

    €22 Average bid
    €22 Oferta promedio
    24 ofertas

    ...Azure Data Engineer to support and enhance our existing data platform on an ongoing basis. You should be strong in: Azure Data Factory (ADF) for building and maintaining ETL/ELT pipelines Azure Databricks and PySpark for large‑scale data processing Python for data engineering utilities, automation, and integration Delta Lakes/Lakehouse concepts, performance optimization, and troubleshooting Working with SQL‑based data sources, data warehousing, and BI integrations Responsibilities Design, build, and optimize data pipelines in Azure ADF and Databricks Develop and maintain PySpark and Python jobs for batch and near real‑time workloads Implement best practices for data quality, observability, and monitoring Collaborate with our internal team, follow existing standa...

    €9 / hr Average bid
    €9 / hr Oferta promedio
    33 ofertas

    I am looking for an experience data engineer with 4-5 years of experience with Pyspark And Python handson experience. Experience with handling a complex data pipeline.

    €4 / hr Average bid
    €4 / hr Oferta promedio
    6 ofertas

    ...Databricks Data Analyst and Data Engineer certifications and want a structured, hands-on tutoring program that also deepens my Snowflake skills. The goal is to become confident building end-to-end data pipelines, running analytics, and understanding platform architecture well enough to pass the exams and perform the work in practice. Focus areas Databricks • Data processing & analytics with PySpark/SQL and Delta Lake • Machine learning workflows inside the Databricks environment • Workspace, cluster, job, and Lakehouse architecture Snowflake • Core data-warehousing concepts and best practices • Query tuning and overall performance optimisation • Security features: RBAC, masking, encryption, and access policies How we can wor...

    €160 Average bid
    €160 Oferta promedio
    48 ofertas

    ...Object Storage, Data Flow (Spark), and Data Catalog. * Solid understanding of Finance / Order-to-Cash (O2C) data entities and processes. * Knowledge of data modeling, lineage, and governance principles. * Familiarity with CI/CD and DevOps for automated deployments. Preferred Skills * OCI Data Integration certification. * Experience integrating Oracle Cloud ERP with OCI DI. * Knowledge of Python or PySpark for custom transformations. * Exposure to Data Science and ML pipelines leveraging OCI services. * Experience with monitoring tools like Grafana...

    €1777 Average bid
    €1777 Oferta promedio
    5 ofertas

    ...guidance with embedding Genie via API into apps, Teams, or dashboards. • Train internal teams on Genie capabilities, administration, and operational readiness. Required Skills & Experience • Strong practical experience with Azure Databricks, Lakehouse architecture, Unity Catalog, SQL Warehouse. • Knowledge of Genie AI, foundational models, or Databricks conversational analytics. • Competency in PySpark, SQL, data modeling, and enterprise data engineering practices. • Familiarity with Azure ecosystem (Data Lake, Data Factory, DevOps). • Ability to translate business questions into NLQ-friendly dataset design. • Excellent communication and ability to work with cross functional data, BI, and business teams. Nice to Have • Experience with A...

    €2 / hr Average bid
    €2 / hr Oferta promedio
    3 ofertas

    I have a Hadoop cluster holding several large data sets, and I need a seasoned PySpark developer who also writes rock-solid SQL. The immediate aim is to connect to the cluster (YARN/HDFS with Hive metastore), develop or refine PySpark jobs, optimise the accompanying SQL, and make sure everything runs smoothly end-to-end. You’ll receive access to a staging namespace plus a sample of the data. Once the logic checks out we’ll promote the code to the full environment. Deliverables • A clean, well-commented PySpark notebook or .py job that executes successfully on the cluster • The corresponding SQL script or view definitions ready for Hive or spark-sql • A concise README detailing execution steps, parameters, and expected outputs Accep...

    €64 Average bid
    €64 Oferta promedio
    11 ofertas

    I need a reusable ETL framework built inside Databricks notebooks, version-controlled in Bitbucket and promoted automatically through a Bitbucket Pipeli...attached to any cluster. Acceptance criteria • Parameter-driven notebooks organised by layer. • Reusable GraphQL connector packaged as a .whl. • Bitbucket Pipelines yaml that runs unit tests, uses the Databricks CLI to deploy notebooks, and executes an integration test on commit. • Clear README detailing how to add a new API endpoint and where to place cleaning logic. Leverage native tools—PySpark, SQL, Delta Lake, dbutils—while keeping external libraries to a minimum and fully documented. Please share a brief outline of your approach and any relevant Databricks + Bitbucket CI experience s...

    €293 Average bid
    €293 Oferta promedio
    114 ofertas

    I’m a beginner looking for a 1-on-1 Databricks instructor for a very hands-on, fast-paced 2-week program. Requirements: - Strong real-world Databricks experience - Hands-on Apache Spark (PySpark), SQL, Delta Lake - Real use case / mini project (end-to-end pipeline) - Live screen sharing, coding together - Beginner-friendly but practical (no theory-only) Goal: By the end of 2 weeks, I want to confidently build and understand a real Databricks data pipeline. Availability: 5–6 sessions per week, 1–1.5 hours per session Please share: - Your Databricks experience - How you would structure these 2 weeks - Your hourly rate Thanks!

    €17 / hr Average bid
    €17 / hr Oferta promedio
    59 ofertas

    ...across multiple source systems. Build and optimize Foundry pipelines using Code Workbooks (PySpark, SQL, Scala) and Quiver. Support data integration, feature engineering, and pipeline debugging for production AI workloads. Implement security and permissions architecture aligned with enterprise governance. Help develop Foundry applications using Workshop, Contour, and Slate for analytics and decision-making. Guide on best practices for CI/CD, testing, and deployment within Foundry. Provide mentorship and troubleshooting support during live client engagements. Required Skills: Strong hands-on experience with Palantir Foundry (Ontology, Code Workbooks, Quiver, Workshop). Proficiency in Python, PySpark, and SQL. Experience with data modeling, transformation logic, and pipelin...

    €12 / hr Average bid
    €12 / hr Oferta promedio
    20 ofertas

    Need a strong streaming experience person to develop design deploy Pyspark publishing and upserting job in EMR with Spark, MongoDB(documentDb) connector, AWS EMR step functions, Cloud watch, docker, Kafka cluster architecture, Airflow dags, Gitlab, Pycharm, Cursor AI IDE etc needed for environment experience

    €9 / hr Average bid
    €9 / hr Oferta promedio
    39 ofertas

    ...patterns that Databricks loves to test. • Fresh practice questions (or a curated question bank) with detailed explanations so I understand not just the right answer but the thinking process. • At least one full-length mock exam under timed conditions followed by a debrief on weak areas and strategies to avoid common pitfalls. I work mainly in the Databricks notebook environment with Python, PySpark, and SQL, so please weave real-world examples into the prep. I’m flexible on session times and frequency; we can agree milestones and refine the plan as we go. If you’ve already helped others pass this exam—or you hold the certification yourself—tell me how you’d tackle my study roadmap and what materials you’d bring to the table. I...

    €40 Average bid
    €40 Oferta promedio
    2 ofertas

    ...actual medicines and would map once the inconsistencies are ironed out, so I want the process to be fully automated, driven by a robust auto-correct algorithm rather than manual review. Remaining 0.1% could be non medical entries, and need to be deleted. I am open to proven techniques—fuzzy matching, phonetic hashing, Levenshtein, word embeddings, or a hybrid—as long as they scale. Python, pandas, PySpark, or any other big-data friendly stack is fine, provided the final solution is reproducible and well documented. Deliverables • Clean, executable scripts (Jupyter notebook or .py) that ingest both files, normalise product names, detect duplicates, and output a one-to-one mapping table. • A brief README explaining dependencies, algorithm logic, and how ...

    €521 Average bid
    €521 Oferta promedio
    39 ofertas
    long-term partnership
    Finalizado left

    ...Infrastructure Microsoft Azure (Functions, Logic Apps, Service Bus, Blob Storage, Data Factory, Azure DevOps) AWS Cloud Docker, Kubernetes RabbitMQ CRM, ERP & Enterprise Platforms Microsoft Dynamics CRM 365 Dynamics Business Central Sage CRM NopCommerce Sitefinity v12.2 Umbraco v8.0 DotNetNuke v4.0 Python, AI & Advanced Solutions Python, Django, Flask, Pyramid REST APIs, WebSockets PySpark AI Email & Chatbot Solutions Data Science & Analytics CMS, E-Commerce & Web Platforms WordPress, Joomla, Drupal Prestashop PHP-based systems BI, Finance & Business Support Power BI Advanced Excel Accounting, Finance & Bookkeeping Data Entry & Business Reporting MS Office Suite Tools & Delivery Methodology Git (Version Control) N...

    €4 / hr Average bid
    €4 / hr Oferta promedio
    20 ofertas

    ...short interpretive notes that fold easily into manuscripts. What matters most is hands-on mastery of data extraction, table linking, and general database management within MIMIC. Solid grounding in observational study design, epidemiology, and EHR quirks is essential; a background in medicine or public health will make communication smoother. Working code in SQL plus either tidyverse/R or pandas/pySpark is expected. The immediate deliverable is a fully cleaned analytic dataset with the accompanying scripts and an outline of the statistical approach. After that, I plan to keep the collaboration open for additional projects and sensitivity analyses as new questions arise....

    €22 / hr Average bid
    €22 / hr Oferta promedio
    65 ofertas

    I’m looking for a Data Engineer with strong AWS native services experience to help build and support an event-driven data platform. This project focuses on automated batch data pipelines, data governance, and making data available in a secure ...Data Engineer with strong AWS native services experience to help build and support an event-driven data platform. This project focuses on automated batch data pipelines, data governance, and making data available in a secure and scalable way. This is not ad-hoc ETL — it’s a platform-style setup. Tech stack involved: • AWS: S3, SQS, Lambda, MWAA (Airflow), EMR Serverless • Data Processing: PySpark, Apache Spark • Data Lake: Apache Iceberg, AWS Glue Catalog • Governance & Security: Lake Formatio...

    €15 / hr Average bid
    €15 / hr Oferta promedio
    40 ofertas

    ...Remote Working Time: evening Budget: 22-24k monthly Duration:-2 hours per day Demo Required: Today Job Description We are seeking an experienced Senior Data Engineer with strong expertise in the Healthcare Payer domain to design, build, and maintain scalable data pipelines and reporting solutions. The ideal candidate will have hands-on experience across AWS and Microsoft Azure, strong Python/PySpark skills, and the ability to support integrated reporting and analytics using Power BI. Key Responsibilities Design, develop, and maintain end-to-end data pipelines for healthcare payer data Build and optimize ETL/ELT workflows using AWS Glue, Step Functions, and Python Work with Azure and AWS cloud services for data ingestion, processing, and storage Implement and manage Data ...

    €250 Average bid
    €250 Oferta promedio
    9 ofertas

    ...Remote Working Time: evening Budget: 22-24k monthly Duration:-2 hours per day Demo Required: Today Job Description We are seeking an experienced Senior Data Engineer with strong expertise in the Healthcare Payer domain to design, build, and maintain scalable data pipelines and reporting solutions. The ideal candidate will have hands-on experience across AWS and Microsoft Azure, strong Python/PySpark skills, and the ability to support integrated reporting and analytics using Power BI. Key Responsibilities Design, develop, and maintain end-to-end data pipelines for healthcare payer data Build and optimize ETL/ELT workflows using AWS Glue, Step Functions, and Python Work with Azure and AWS cloud services for data ingestion, processing, and storage Implement and manage Data ...

    €212 Average bid
    €212 Oferta promedio
    4 ofertas

    My current résumé sells me as a data engineer, yet my next move is a Data Analyst role. I need the Work Experience and Skills sections re-worked so recruiters immediately see me as a strong analytical hire. Here’s what you’ll be working with • Hands-on background in Hadoop administration, PySpark development, Databricks workflows and day-to-day data analysis. • A solid foundation in SQL and reporting tools, though these strengths are not highlighted well in the document. What I’m after • Rewrite both sections to spotlight analytical impact, business-friendly storytelling and in-demand keywords (think SQL, dashboards, data visualization, statistical insight, KPI tracking, etc.). • Re-order bullet points around results, not...

    €18 Average bid
    €18 Oferta promedio
    6 ofertas

    The core of my remote-sensing crop-yield project is in place, but the code will not run from start to finish. I need a fresh set of eyes to hunt down and eliminate the blockers so that the pipeline executes smoothly on Databricks and locally. Current state • Repository already contains: – Spark-based preprocessing notebooks (PySpark) – Trained ML model scripts and saved artefacts – A handful of Databricks experiment notebooks for exploration What I need most Debugging is the priority. I am not after a full rewrite—I want the existing pieces to work together. You are free to suggest refactors where they remove obvious bottlenecks, but the first milestone is simply getting the code to run cleanly. Focus areas • Spark preprocessi...

    €8 Average bid
    €8 Oferta promedio
    9 ofertas

    We are seeking a freelancer proxy for a Data Engineer role to support a remote healthcare data platform. The work will be 5 to 6 hours per day. You will be required to sit alongside the engineer during work hours, explain work...operational runbooks for knowledge sharing • Support and guide production-grade pipelines built on Dagster, DBT, Airflow, AWS Glue, and SSIS Required Skills & Tech Stack: • Python (Strong) • SQL (Advanced) • Dagster, DBT, Airflow, AWS Glue • AWS: Athena, Glue, SQS, SNS, IAM, CloudWatch • Databases: PostgreSQL, AWS RDS, Oracle, Microsoft SQL Server • Data Modeling & Query Optimization • Pandas, PySpark, PyCharm • Terraform, Docker, DataGrip, VS Code • Git/GitHub and CI/CD pipelines • Experience wi...

    €563 Average bid
    €563 Oferta promedio
    58 ofertas

    We are seeking a freelancer proxy for a Data Engineer role to support a remote healthcare data platform. The work will be 5 to 6 hours per day. You will be required to sit alongside the engineer during work hours, explain work...operational runbooks for knowledge sharing • Support and guide production-grade pipelines built on Dagster, DBT, Airflow, AWS Glue, and SSIS Required Skills & Tech Stack: • Python (Strong) • SQL (Advanced) • Dagster, DBT, Airflow, AWS Glue • AWS: Athena, Glue, SQS, SNS, IAM, CloudWatch • Databases: PostgreSQL, AWS RDS, Oracle, Microsoft SQL Server • Data Modeling & Query Optimization • Pandas, PySpark, PyCharm • Terraform, Docker, DataGrip, VS Code • Git/GitHub and CI/CD pipelines • Experience wi...

    €577 Average bid
    €577 Oferta promedio
    28 ofertas