HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Is it reasonable to mark all columns but one as primary key?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
primarycolumnsallbutreasonableonekeymark

Problem

I have a table representing movies. The fields are:

id (PK), title, genre, runtime, released_in, tags, origin, downloads.

My database cannot be polluted by duplicated rows, so I want to enforce uniqueness. The problem is that different movies could have the same title, or even the same fields except tags and downloads. How to enforce uniqueness?

I thought of two ways:

  • make all the fields except downloads primary key. I'm keeping downloads out since it's JSON and it will probably impact the performance.



  • keep only id as primary key, but add a unique constraint with all the other columns (except, again, downloads).



I read this question which is very similar, but I didn't quite understand what should I do. Currently this table is not related to any other tables, but in the future could be.

At the moment I have slightly less than 20,000 records, but I expect the number to grow. I don't know if this is somewhat relevant to the issue.

EDIT: I modified the schema and here is how I would create the table:

CREATE TABLE movies (
    id          serial PRIMARY KEY,
    title       text NOT NULL,
    runtime     smallint NOT NULL CHECK (runtime >= 0),
    released_in smallint NOT NULL CHECK (released_in > 0),
    genres      text[] NOT NULL default ARRAY[]::text[],
    tags        text[] NOT NULL default ARRAY[]::text[],
    origin      text[] NOT NULL default ARRAY[]::text[],
    downloads   json NOT NULL,
    inserted_at timestamp NOT NULL default current_timestamp,
    CONSTRAINT must_be_unique UNIQUE(title,runtime,released_in,genres,tags,origin)
);


I also added the timestamp column, but that is not a problem as I won't touch it. So it will always be automatic and unique.

Solution

Imagine you are out with a group of friends and the conversation turns to movies. Someone asks, "What did you think of 'The Three Musketeers'?" You respond, "Which one?"

What additional information would you need to be absolutely certain you are both thinking of the same movie? The director's name? The production studio? The year it was released? One of the star's names? Some combination of two or more?

The answer to my question and yours are the same.

However, I would not think that genre would be a good candidate. One reason, genre is much too subjective a criteria. Is 'The Three Musketeers' action? drama? adventure? comedy? action-adventure? romantic-comedy? I often see the same movie listed under different genres. Even when you allow for multiple genres, your user may select an entirely different one not listed with the actual movie they are looking for.

Even runtimes can differ, especially between theater and VCR/DVD/b-ray versions.

So you need hard, objective attributes which will not change from one media release to another. Unfortunately, that can exclude the name of the movie as movies have been known to be renamed, especially after the release of a sequel.

What about release date? The theatrical release of 1993? The VCR release of 1999? The DVD release of 2004? You get the idea.

Come to think about it, what of all those movies directed by Alan Smithee? Has the real director ever finally stepped forward to put his name on the project after the fact? I don't know.

Hmm, I'd better stop while there are still some criteria left.

Some additional points:

  • Yes, keep the surrogate key and create a unique index on the natural key fields (if you can finally nail those down). The surrogate key is best for foreign key references. You don't want to duplicate all the natural key fields in every table that contains a reference to a movie.



  • Drop the array fields (genres, tags, origins). Go ahead and properly normalize those attributes. I have never seen an array field that was not a great deal more trouble than it was worth, especially if you want them to be searchable ("...where genre = 'horror'..."). Note this will not automatically eliminate any issues with case differences and spelling ( "Science Fiction" vs "SciFi" ) -- unless you properly maintain the lookup tables. But it is a whole lot easier to check for such differences in one field of a small table than every array cell of every row of a large table.

Context

StackExchange Database Administrators Q#108210, answer score: 6

Revisions (0)

No revisions yet.