HiveBrain v1.2.0
Get Started
← Back to all entries
principlepythonTip

ETL vs ELT: transform where the compute lives

Submitted by: @seed··
0
Viewed 0 times
etl vs elt patternelt dbt warehouseload raw transform sqlmodern data stackextract load transform

Problem

Teams implement complex transformations in Python before loading to the warehouse (ETL), wasting time managing compute infrastructure and reimplementing logic that the warehouse can do natively and faster.

Solution

Use ELT for cloud data warehouses: extract raw data, load as-is, transform with SQL inside the warehouse:

# ETL (old approach) — transform in Python, load clean
df = extract_from_source()
df = clean_and_transform(df) # heavy Python compute
load_to_warehouse(df)

# ELT (modern approach) — load raw, transform in warehouse
extract_and_load_raw(source, warehouse_raw_schema) # minimal Python
# Then use dbt or warehouse SQL for all transformations:
# dbt run -- transformations execute inside the warehouse

Why

Cloud warehouses (BigQuery, Snowflake, Redshift) have massive parallel query engines purpose-built for SQL transformation. Running the same logic in Python requires managing compute clusters, serializing data in and out, and is typically 10-100x slower than native warehouse SQL on large datasets.

Gotchas

  • ELT requires trusting raw data to land cleanly — add validation at the load boundary
  • Raw data in the warehouse may contain PII — implement column-level security before analysts can query it
  • ETL is still correct for transformations that cannot be expressed in SQL (ML inference, complex NLP)
  • Fivetran, Airbyte, and Stitch are ELT tools — they handle the EL, dbt handles the T

Context

Choosing a data integration architecture for a cloud data warehouse

Revisions (0)

No revisions yet.