HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Normalizing a table with a field that generally uniquely identifies a row, but is sometimes null

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
sometimesfieldwithnullbutthatuniquelyidentifiesnormalizinggenerally

Problem

forgive me if this has been asked and answered before.

I'm roughing out a schema for an inventory management system, to be implemented in PostgreSQL. All of our products and services have a sku. Most of our products come from the manufacturer or distributor with a separate "item number" (whether it be a distributor's catalog number, manufacturer's model number, whatever). However, not all of them have such a number. We have small assemblies that we make in-house that, generally, don't have item numbers. Our services don't have item numbers. For these reasons, the following CREATE TABLE makes sense to me.

Scenario A:

CREATE TABLE product (
   sku            text PRIMARY KEY,
   name           text UNIQUE NOT NULL, -- alternate key
   price          numeric NOT NULL CHECK (price > 0),
   quantity       numeric NOT NULL CHECK (quantity > 0),
   item_number   text -- hmmm...
);


However, I have two problems with this.

-
Sometimes (maybe 3% to 5% of the time), the item_number is actually equal to the SKU. That is, one of my suppliers in particular affixes to their products what I suspect is not a globally unique SKU, fashioned after their item number.

-
Whether equal to the SKU or not, the item_number (when existent) is in virtually every case sufficient to uniquely identify a product in the domain of my small store.

I'm worried about normalizing this to 3NF. If item_number is sometimes null, it obviously cannot be declared an alternate key. But, semantically, it is a unique identifier, where it exists, in every case I can think of. So does my above table, where every attribute is functionally dependent upon the non-prime attribute item_number whenever item_number exists, normalized? I'm thinking no, but I'm certainly not an expert. I thought of doing the following:

Scenario B

```
CREATE TABLE product (
sku text PRIMARY KEY REFERENCES product_item_number (sku),
name text UNIQUE NOT NULL, -- alternate key
price num

Solution

Provided that Sku and ItemNumber will always imply unique values

I consider that you found the answer already by discovering that, conceptually speaking, ItemNumber is an optional property; i.e., when you determined that it does not apply to each and every one of the occurrences —represented by logical-level rows— of the Product entity type. Therefore, the item_number column should not be declared as an ALTERNATE KEY (AK for brevity) in the product table, as you rightly pointed out.

In this respect, your Scenario B is quite reasonable, as the following conceptual-level formulation demonstrates:

  • A product may or may not have an item number.



In other words, there is a one to zero or one (1:0/1) cardinality ratio between Product and ItemNumber.

Then, yes, you should introduce a new table to deal with the optional column, and I agree that product_item_number is a very descriptive name for it. This table should have sku constrained as its PRIMARY KEY (PK), so as to ensure that no more than one row with the same sku value is inserted into it, just like you did.

It is also important to mention that product_item_number.sku should as well be a constrained as a FOREIGN KEY (FK) making a reference to product.sku.

Here is a sample SQL-DDL logical-level design that illustrates the previous suggestions:

-- You should determine which are the most fitting 
-- data types and sizes for all your table columns 
-- depending on your business context characteristics.

-- Also, you should make accurate tests to define
-- the most convenient INDEXing strategies.

CREATE TABLE product ( 
    sku      TEXT    NOT NULL, 
    name     TEXT    NOT NULL, 
    price    NUMERIC NOT NULL, 
    quantity NUMERIC NOT NULL,
    --
    CONSTRAINT product_PK        PRIMARY KEY (sku), 
    CONSTRAINT product_AK        UNIQUE      (name), -- AK.
    CONSTRAINT valid_price_CK    CHECK       (price > 0),
    CONSTRAINT valid_quantity_CK CHECK       (quantity > 0)
); 

CREATE TABLE product_item_number ( 
    sku         TEXT NOT NULL, -- To be constrained as PK and FK to ensure the 1:0/1 correspondence ratio between the relevant rows.
    item_number TEXT NOT NULL, 
    --
    CONSTRAINT product_item_number_PK            PRIMARY KEY (sku),
    CONSTRAINT product_item_number_AK            UNIQUE      (item_number), -- In this context, ‘item_number’ is an AK. 
    CONSTRAINT product_item_number_TO_product_FK FOREIGN KEY (sku) 
        REFERENCES product (sku)  
);


Tested on PostgreSQL 11 in this db<>fiddle.

Moreover, there is another conceptual formulation that guides in shaping the database design presented above:

  • If it exists, the ItemNumber of a Product must be unique.



So, where the item_number column should actually be declared as an AK is right there, in the product_item_number table, because said column requires uniqueness protection only when the pertinent value is provided, hence the UNIQUE and NOT NULL constraints have to be configured accordingly.

Missing values and the “Closed World Interpretation”

The logical SQL-DDL arrangement previously described is an example of the relational approach to handle missing values, although it is not the most popular —or usual—. This approach is related to the “Closed World Interpretation” —or “Assumption”—. Adopting this position, (a) the information recorded in the database is always deemed true, and (b) the information that is not recorded in it is, at all times, deemed false. In this way, one is exclusively retaining facts that are known.

In the present business scenario, when a user supplies all the data points that are comprised in the product table you have to INSERT the corresponding row and if, and only if, the user made the item_number datum available you also have to INSERT the product_item_number counterpart. In case that the item_number value is unknown or it simply does not apply, you do not INSERT a product_item_number row, and that is it.

With this method you avoid holding NULL marks/markers in your base tables —and the logical-level consequences that I will detail in the next section—, but you should be aware that this is a “controversial” topic in the database administration ambit. On this point, you might find of value the answers for the Stack Overflow question entitled:

  • “How can I avoid NULLs in my database, while also representing missing data?”



The popular course of action

I guess, however, that the popular —or common— proceeding would be to have a single product table that includes the item_number column which, in turn, would be set as NULLable and, at the same time, defined with a UNIQUE constraint. The way I see it, this approach would make your database and the applicable data manipulation operations less elegant (as shown, e.g., in this outstanding Stack Overflow answer), but it is a possibility.

See the successive DDL statements that exemplify this course of action:

```
CREATE TABLE product (
sku TEX

Code Snippets

-- You should determine which are the most fitting 
-- data types and sizes for all your table columns 
-- depending on your business context characteristics.

-- Also, you should make accurate tests to define
-- the most convenient INDEXing strategies.

CREATE TABLE product ( 
    sku      TEXT    NOT NULL, 
    name     TEXT    NOT NULL, 
    price    NUMERIC NOT NULL, 
    quantity NUMERIC NOT NULL,
    --
    CONSTRAINT product_PK        PRIMARY KEY (sku), 
    CONSTRAINT product_AK        UNIQUE      (name), -- AK.
    CONSTRAINT valid_price_CK    CHECK       (price > 0),
    CONSTRAINT valid_quantity_CK CHECK       (quantity > 0)
); 

CREATE TABLE product_item_number ( 
    sku         TEXT NOT NULL, -- To be constrained as PK and FK to ensure the 1:0/1 correspondence ratio between the relevant rows.
    item_number TEXT NOT NULL, 
    --
    CONSTRAINT product_item_number_PK            PRIMARY KEY (sku),
    CONSTRAINT product_item_number_AK            UNIQUE      (item_number), -- In this context, ‘item_number’ is an AK. 
    CONSTRAINT product_item_number_TO_product_FK FOREIGN KEY (sku) 
        REFERENCES product (sku)  
);
CREATE TABLE product ( 
    sku         TEXT    NOT NULL, 
    name        TEXT    NOT NULL, 
    price       NUMERIC NOT NULL, 
    quantity    NUMERIC NOT NULL, 
    item_number TEXT    NULL, -- Accepting NULL marks. 
    --
    CONSTRAINT product_PK        PRIMARY KEY (sku), 
    CONSTRAINT product_AK1       UNIQUE      (name), -- AK.
    CONSTRAINT product_AK2       UNIQUE      (item_number), -- Being ‘NULLable’, this is not an AK. 
    CONSTRAINT valid_price_CK    CHECK       (price > 0),
    CONSTRAINT valid_quantity_CK CHECK       (quantity > 0)
);
CREATE TABLE manufacturer (
    manufacturer_number INTEGER  NOT NULL, -- This could be something more meaningful, e.g., ‘manufacturer_code’.
    name                TEXT NOT NULL,
    --
    CONSTRAINT manufacturer_PK PRIMARY KEY (manufacturer_number), 
    CONSTRAINT manufacturer_AK UNIQUE      (name) -- AK.
);

CREATE TABLE product (
    manufacturer_number INTEGER NOT NULL, 
    sku                 TEXT    NOT NULL,
    name                TEXT    NOT NULL, 
    price               NUMERIC NOT NULL,
    quantity            NUMERIC NOT NULL,
    --
    CONSTRAINT product_PK                 PRIMARY KEY (manufacturer_number, sku), -- Composite PK.
    CONSTRAINT product_AK                 UNIQUE      (name), -- AK.
    CONSTRAINT product_TO_manufacturer_FK FOREIGN KEY (manufacturer_number)
        REFERENCES manufacturer (manufacturer_number),
    CONSTRAINT valid_price_CK             CHECK       (price > 0),
    CONSTRAINT valid_quantity_CK          CHECK       (quantity > 0)
);
CREATE TABLE product_item_number (
    manufacturer_number INTEGER NOT NULL,  
    sku                 TEXT    NOT NULL,
    item_number         TEXT    NOT NULL,
    --
    CONSTRAINT product_item_number_PK            PRIMARY KEY (manufacturer_number, sku), -- Composite PK.
    CONSTRAINT product_item_number_AK            UNIQUE      (item_number), -- AK.  
    CONSTRAINT product_item_number_TO_product_FK FOREIGN KEY (manufacturer_number, sku)
        REFERENCES product (manufacturer_number, sku)  
);
CREATE TABLE product_item_number (
    manufacturer_number INTEGER NOT NULL,  
    sku                 TEXT    NOT NULL,
    item_number         TEXT    NOT NULL, -- In this case, ‘item_number’ does not require a UNIQUE constraint.
    --
    CONSTRAINT product_item_number_PK            PRIMARY KEY (manufacturer_number, sku), -- Composite PK.
    CONSTRAINT product_item_number_TO_product_FK FOREIGN KEY (manufacturer_number, sku)
        REFERENCES product (manufacturer_number, sku)  
);

Context

StackExchange Database Administrators Q#136663, answer score: 9

Revisions (0)

No revisions yet.