patternCritical

Is there a name for this database schema of key values?

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

thisdatabasenameforvaluesthereschemakey

Problem

We process a routine data feed from a client who just refactored their database from a form that seems familiar (one row per entity, one column per attribute) to one that seems unfamiliar to me (one row per entity per attribute):

Before: one column per attribute

ID   Ht_cm   wt_kg   Age_yr  ... 
1      190      82     43    ...
2      170      60     22    ...
3      205      90     51    ...

After: one column for all attributes

ID    Metric   Value
 1     Ht_cm     190
 1     Wt_kg     82
 1     Age_yr    43
 1      ...
 2     Ht_cm     170
 2     Wt_kg     60
 2     Age_yr    22
 2     ...
 3     Ht_cm     205
 3     Wt_kg     90
 3     Age_yr    51
 3     ...

Is there a name for this database structure? What are the relative advantages? The old way seems easier to place validity constraints on specific attributes (non-null, non-negative, etc.) and easier to calculate averages. But I can see how it might be easier to add new attributes without refactoring the database. Is this a standard/preferred way of structuring data?

Solution

It's called Entity-Attribute-Value (also sometimes 'name-value pairs') and it's a classic case of "a round peg in a square hole" when people use the EAV pattern in a relational database.

Here's a list of why you shouldn't use EAV:

You can't use data types. It doesn't matter if the value is a date, a number or money (decimal). It's always going to be coerced to varchar. This can be anything from a minor performance problem to a massive gut-ache (ever had to chase down a one-cent variation in a monthly roll-up report?).

You can't (easily) enforce constraints. It requires a ridiculous amount of code to enforce "Everyone needs to have a height between 0 and 3 metres" or "Age must be not null and >= 0", as opposed to the 1-2 lines that each of those constraints would be in a properly-modelled system.

Related to above, you can't easily guarantee that you get the information you need for each client (age might be missing from one, then the next might be missing their height etc.). You can do it, but it's a hell of a lot more difficult than SELECT height, weight, age FROM Client where height is null or weight is null.

Related again, duplicate data is a lot harder to detect (what happens if they give you two ages for one client? De-EAVing the data, as below, will give you two rows of results if you have one attribute doubled. If one client has two separate entries for two attributes, you'll get four rows from the query below).

You can't even guarantee that the attribute names are consistent. "Age_yr" might become "AGE_IN_YEARS" or "age". (Admittedly this is less of a problem when you're receiving an extract versus when people are inserting data, but still.)

Any sort of nontrivial query is a complete disaster. To relationalise a three-attribute EAV system so you can query it in a rational fashion requires three joins of the EAV table.

Compare:

SELECT cID.ID AS [ID], cH.Value AS [Height], cW.Value AS [Weight], cA.Value AS [Age]
FROM (SELECT DISTINCT ID FROM Client) cID 
      LEFT OUTER JOIN 
    Client cW ON cID.ID = cW.ID AND cW.Metric = "Wt_kg" 
      LEFT OUTER JOIN 
    Client cH ON cID.ID = cH.ID AND cW.Metric = "Ht_cm" 
      LEFT OUTER JOIN 
    Client cA ON cID.ID = cA.ID AND cW.Metric = "Age_yr"

To:

SELECT c.ID, c.Ht_cm, c.Wt_kg, c.Age_yr
FROM Client c

Here's a (very short) list of when you should use EAV:

When there's absolutely no way around it and you have to support schema-less data in your database.

When you just need to store "stuff" and don't expect to have to need it in a more structured form. Beware, though, the monster called "changing requirements".

I know I just spent this entire post detailing why EAV is a terrible idea in most cases - but there are a few cases where it's needed/unavoidable. however, most of the time (including the example above), it's going to be far more hassle than it's worth. If you have a requirement for wide support of EAV-type data input, you should look at storing them in a key-value system, e.g. Hadoop/HBase, CouchDB, MongoDB, Cassandra, BerkeleyDB.

Code Snippets

SELECT cID.ID AS [ID], cH.Value AS [Height], cW.Value AS [Weight], cA.Value AS [Age]
FROM (SELECT DISTINCT ID FROM Client) cID 
      LEFT OUTER JOIN 
    Client cW ON cID.ID = cW.ID AND cW.Metric = "Wt_kg" 
      LEFT OUTER JOIN 
    Client cH ON cID.ID = cH.ID AND cW.Metric = "Ht_cm" 
      LEFT OUTER JOIN 
    Client cA ON cID.ID = cA.ID AND cW.Metric = "Age_yr"

SELECT c.ID, c.Ht_cm, c.Wt_kg, c.Age_yr
FROM Client c

Context

StackExchange Database Administrators Q#20759, answer score: 99

Revisions (0)

No revisions yet.