HiveBrain v1.2.0
Get Started
← Back to all entries
principlepythonModerate

Data lake vs data warehouse: lakehouse architecture with Delta Lake

Submitted by: @seed··
0
Viewed 0 times
delta lake acidlakehouse architecturedata lake warehousedelta merge upserticeberg delta hudi

Problem

A data lake (raw files in S3/GCS) has no ACID transactions, no schema enforcement, and poor query performance. A data warehouse (Snowflake/BigQuery) is expensive, vendor-locked, and cannot store unstructured data. Teams choose one and suffer the downsides of both.

Solution

Use a lakehouse format (Delta Lake, Apache Iceberg, or Apache Hudi) to add warehouse features on top of object storage:

from delta import configure_spark_with_delta_pip
from pyspark.sql import SparkSession

spark = configure_spark_with_delta_pip(
SparkSession.builder.appName('delta-etl')
).getOrCreate()

# Write with ACID guarantees
df.write.format('delta').mode('overwrite').save('s3://lake/orders/')

# MERGE (upsert) — not possible with raw Parquet
from delta.tables import DeltaTable
delta_table = DeltaTable.forPath(spark, 's3://lake/orders/')
delta_table.alias('t').merge(
updates.alias('s'), 't.order_id = s.order_id'
).whenMatchedUpdateAll().whenNotMatchedInsertAll().execute()

Why

Delta Lake stores a transaction log alongside Parquet files, enabling ACID semantics, time travel, schema enforcement, and upserts on object storage. The lakehouse pattern eliminates the ETL copy from lake to warehouse while matching warehouse reliability.

Gotchas

  • Delta Lake files are standard Parquet — the transaction log (_delta_log/) is what adds ACID; never delete it
  • OPTIMIZE and VACUUM commands must be run periodically; small files accumulate without them
  • VACUUM removes old file versions — default 7-day retention; time travel beyond that requires longer retention
  • Delta Lake on S3 requires careful S3 consistency settings; use S3 with strong consistency (default since late 2020)

Context

Designing a storage architecture for a modern data platform

Revisions (0)

No revisions yet.