principlepythonModerate
Data contracts: schema and SLA agreements between producers and consumers
Viewed 0 times
data contract yamlproducer consumer agreementschema sla enforcementsoda data contractbreaking change data
Problem
Upstream teams change table schemas or drop columns without warning, breaking downstream pipelines and dashboards. There is no formal agreement about what data producers must deliver.
Solution
Define data contracts using YAML and enforce them programmatically:
# contract.yaml — agreed between producer and consumer
apiVersion: v1
kind: DataContract
id: orders-v1
info:
title: Orders Dataset
owner: data-platform-team
version: 1.2.0
models:
- name: orders
fields:
- name: order_id
type: integer
required: true
unique: true
- name: amount
type: number
minimum: 0
- name: status
type: string
enum: ['pending', 'completed', 'cancelled']
quality:
- type: row_count
mustBeBetween: [1000, null]
- type: freshness
mustBeLessThan: 25h
# Validate at pipeline start with soda-core
# soda scan -d my_warehouse -c soda.yaml contract.yaml
# contract.yaml — agreed between producer and consumer
apiVersion: v1
kind: DataContract
id: orders-v1
info:
title: Orders Dataset
owner: data-platform-team
version: 1.2.0
models:
- name: orders
fields:
- name: order_id
type: integer
required: true
unique: true
- name: amount
type: number
minimum: 0
- name: status
type: string
enum: ['pending', 'completed', 'cancelled']
quality:
- type: row_count
mustBeBetween: [1000, null]
- type: freshness
mustBeLessThan: 25h
# Validate at pipeline start with soda-core
# soda scan -d my_warehouse -c soda.yaml contract.yaml
Why
Data contracts make implicit agreements explicit and machine-readable. They shift data quality enforcement to the producer side, blocking bad data at the source rather than detecting it downstream. Contract versioning enables breaking change management.
Gotchas
- Data contracts require organizational buy-in — technical tooling alone does not make contracts stick
- Start with the most critical datasets (upstream of dashboards/ML) rather than trying to contract everything at once
- Breaking changes to a contract should trigger a major version bump and consumer notification, not a silent update
- soda-core, Schemata, and OpenDataContract are competing standards — pick one and standardize internally
Code Snippets
Python data contract validation using the datacontract-cli library
# Validate a DataFrame against a data contract at pipeline entry
from datacontract.data_contract import DataContract
contract = DataContract(data_contract_file='contract.yaml')
run = contract.test()
if run.result != 'passed':
for check in run.checks:
if check.result != 'passed':
print(f'FAILED: {check.name} — {check.reason}')
raise RuntimeError('Data contract validation failed')Context
Establishing data quality agreements between teams in a data mesh or platform architecture
Revisions (0)
No revisions yet.