Hi,
I am looking for help in creating a pipeline to read a large dataset (2TB), create a transformation (1 grouby and 1 UDF) and write subsequent small files to s3.
Creation of PySpark ETL pipeline/creation of transformation/ writing of subsequent small files
Greetings,
I am Aws, Terraform and Spotify expert. Hope can be useful for your project of coversion and write up.
Regards
KAZI NASHIDUL HAQUE