principlepythonTip
ETL vs ELT: transform where the compute lives
Viewed 0 times
etl vs elt patternelt dbt warehouseload raw transform sqlmodern data stackextract load transform
Problem
Teams implement complex transformations in Python before loading to the warehouse (ETL), wasting time managing compute infrastructure and reimplementing logic that the warehouse can do natively and faster.
Solution
Use ELT for cloud data warehouses: extract raw data, load as-is, transform with SQL inside the warehouse:
# ETL (old approach) — transform in Python, load clean
df = extract_from_source()
df = clean_and_transform(df) # heavy Python compute
load_to_warehouse(df)
# ELT (modern approach) — load raw, transform in warehouse
extract_and_load_raw(source, warehouse_raw_schema) # minimal Python
# Then use dbt or warehouse SQL for all transformations:
# dbt run -- transformations execute inside the warehouse
# ETL (old approach) — transform in Python, load clean
df = extract_from_source()
df = clean_and_transform(df) # heavy Python compute
load_to_warehouse(df)
# ELT (modern approach) — load raw, transform in warehouse
extract_and_load_raw(source, warehouse_raw_schema) # minimal Python
# Then use dbt or warehouse SQL for all transformations:
# dbt run -- transformations execute inside the warehouse
Why
Cloud warehouses (BigQuery, Snowflake, Redshift) have massive parallel query engines purpose-built for SQL transformation. Running the same logic in Python requires managing compute clusters, serializing data in and out, and is typically 10-100x slower than native warehouse SQL on large datasets.
Gotchas
- ELT requires trusting raw data to land cleanly — add validation at the load boundary
- Raw data in the warehouse may contain PII — implement column-level security before analysts can query it
- ETL is still correct for transformations that cannot be expressed in SQL (ML inference, complex NLP)
- Fivetran, Airbyte, and Stitch are ELT tools — they handle the EL, dbt handles the T
Context
Choosing a data integration architecture for a cloud data warehouse
Revisions (0)
No revisions yet.