patternsqlMinor
Normalizing a table with a field that generally uniquely identifies a row, but is sometimes null
Viewed 0 times
sometimesfieldwithnullbutthatuniquelyidentifiesnormalizinggenerally
Problem
forgive me if this has been asked and answered before.
I'm roughing out a schema for an inventory management system, to be implemented in PostgreSQL. All of our products and services have a sku. Most of our products come from the manufacturer or distributor with a separate "item number" (whether it be a distributor's catalog number, manufacturer's model number, whatever). However, not all of them have such a number. We have small assemblies that we make in-house that, generally, don't have item numbers. Our services don't have item numbers. For these reasons, the following CREATE TABLE makes sense to me.
Scenario A:
However, I have two problems with this.
-
Sometimes (maybe 3% to 5% of the time), the item_number is actually equal to the SKU. That is, one of my suppliers in particular affixes to their products what I suspect is not a globally unique SKU, fashioned after their item number.
-
Whether equal to the SKU or not, the item_number (when existent) is in virtually every case sufficient to uniquely identify a product in the domain of my small store.
I'm worried about normalizing this to 3NF. If item_number is sometimes null, it obviously cannot be declared an alternate key. But, semantically, it is a unique identifier, where it exists, in every case I can think of. So does my above table, where every attribute is functionally dependent upon the non-prime attribute item_number whenever item_number exists, normalized? I'm thinking no, but I'm certainly not an expert. I thought of doing the following:
Scenario B
```
CREATE TABLE product (
sku text PRIMARY KEY REFERENCES product_item_number (sku),
name text UNIQUE NOT NULL, -- alternate key
price num
I'm roughing out a schema for an inventory management system, to be implemented in PostgreSQL. All of our products and services have a sku. Most of our products come from the manufacturer or distributor with a separate "item number" (whether it be a distributor's catalog number, manufacturer's model number, whatever). However, not all of them have such a number. We have small assemblies that we make in-house that, generally, don't have item numbers. Our services don't have item numbers. For these reasons, the following CREATE TABLE makes sense to me.
Scenario A:
CREATE TABLE product (
sku text PRIMARY KEY,
name text UNIQUE NOT NULL, -- alternate key
price numeric NOT NULL CHECK (price > 0),
quantity numeric NOT NULL CHECK (quantity > 0),
item_number text -- hmmm...
);However, I have two problems with this.
-
Sometimes (maybe 3% to 5% of the time), the item_number is actually equal to the SKU. That is, one of my suppliers in particular affixes to their products what I suspect is not a globally unique SKU, fashioned after their item number.
-
Whether equal to the SKU or not, the item_number (when existent) is in virtually every case sufficient to uniquely identify a product in the domain of my small store.
I'm worried about normalizing this to 3NF. If item_number is sometimes null, it obviously cannot be declared an alternate key. But, semantically, it is a unique identifier, where it exists, in every case I can think of. So does my above table, where every attribute is functionally dependent upon the non-prime attribute item_number whenever item_number exists, normalized? I'm thinking no, but I'm certainly not an expert. I thought of doing the following:
Scenario B
```
CREATE TABLE product (
sku text PRIMARY KEY REFERENCES product_item_number (sku),
name text UNIQUE NOT NULL, -- alternate key
price num
Solution
Provided that Sku and ItemNumber will always imply unique values
I consider that you found the answer already by discovering that, conceptually speaking, ItemNumber is an optional property; i.e., when you determined that it does not apply to each and every one of the occurrences —represented by logical-level rows— of the Product entity type. Therefore, the
In this respect, your Scenario B is quite reasonable, as the following conceptual-level formulation demonstrates:
In other words, there is a one to zero or one (1:0/1) cardinality ratio between Product and ItemNumber.
Then, yes, you should introduce a new table to deal with the optional column, and I agree that
It is also important to mention that
Here is a sample SQL-DDL logical-level design that illustrates the previous suggestions:
Tested on PostgreSQL 11 in this db<>fiddle.
Moreover, there is another conceptual formulation that guides in shaping the database design presented above:
So, where the
Missing values and the “Closed World Interpretation”
The logical SQL-DDL arrangement previously described is an example of the relational approach to handle missing values, although it is not the most popular —or usual—. This approach is related to the “Closed World Interpretation” —or “Assumption”—. Adopting this position, (a) the information recorded in the database is always deemed true, and (b) the information that is not recorded in it is, at all times, deemed false. In this way, one is exclusively retaining facts that are known.
In the present business scenario, when a user supplies all the data points that are comprised in the
With this method you avoid holding NULL marks/markers in your base tables —and the logical-level consequences that I will detail in the next section—, but you should be aware that this is a “controversial” topic in the database administration ambit. On this point, you might find of value the answers for the Stack Overflow question entitled:
The popular course of action
I guess, however, that the popular —or common— proceeding would be to have a single
See the successive DDL statements that exemplify this course of action:
```
CREATE TABLE product (
sku TEX
I consider that you found the answer already by discovering that, conceptually speaking, ItemNumber is an optional property; i.e., when you determined that it does not apply to each and every one of the occurrences —represented by logical-level rows— of the Product entity type. Therefore, the
item_number column should not be declared as an ALTERNATE KEY (AK for brevity) in the product table, as you rightly pointed out.In this respect, your Scenario B is quite reasonable, as the following conceptual-level formulation demonstrates:
- A product may or may not have an item number.
In other words, there is a one to zero or one (1:0/1) cardinality ratio between Product and ItemNumber.
Then, yes, you should introduce a new table to deal with the optional column, and I agree that
product_item_number is a very descriptive name for it. This table should have sku constrained as its PRIMARY KEY (PK), so as to ensure that no more than one row with the same sku value is inserted into it, just like you did.It is also important to mention that
product_item_number.sku should as well be a constrained as a FOREIGN KEY (FK) making a reference to product.sku.Here is a sample SQL-DDL logical-level design that illustrates the previous suggestions:
-- You should determine which are the most fitting
-- data types and sizes for all your table columns
-- depending on your business context characteristics.
-- Also, you should make accurate tests to define
-- the most convenient INDEXing strategies.
CREATE TABLE product (
sku TEXT NOT NULL,
name TEXT NOT NULL,
price NUMERIC NOT NULL,
quantity NUMERIC NOT NULL,
--
CONSTRAINT product_PK PRIMARY KEY (sku),
CONSTRAINT product_AK UNIQUE (name), -- AK.
CONSTRAINT valid_price_CK CHECK (price > 0),
CONSTRAINT valid_quantity_CK CHECK (quantity > 0)
);
CREATE TABLE product_item_number (
sku TEXT NOT NULL, -- To be constrained as PK and FK to ensure the 1:0/1 correspondence ratio between the relevant rows.
item_number TEXT NOT NULL,
--
CONSTRAINT product_item_number_PK PRIMARY KEY (sku),
CONSTRAINT product_item_number_AK UNIQUE (item_number), -- In this context, ‘item_number’ is an AK.
CONSTRAINT product_item_number_TO_product_FK FOREIGN KEY (sku)
REFERENCES product (sku)
);Tested on PostgreSQL 11 in this db<>fiddle.
Moreover, there is another conceptual formulation that guides in shaping the database design presented above:
- If it exists, the ItemNumber of a Product must be unique.
So, where the
item_number column should actually be declared as an AK is right there, in the product_item_number table, because said column requires uniqueness protection only when the pertinent value is provided, hence the UNIQUE and NOT NULL constraints have to be configured accordingly.Missing values and the “Closed World Interpretation”
The logical SQL-DDL arrangement previously described is an example of the relational approach to handle missing values, although it is not the most popular —or usual—. This approach is related to the “Closed World Interpretation” —or “Assumption”—. Adopting this position, (a) the information recorded in the database is always deemed true, and (b) the information that is not recorded in it is, at all times, deemed false. In this way, one is exclusively retaining facts that are known.
In the present business scenario, when a user supplies all the data points that are comprised in the
product table you have to INSERT the corresponding row and if, and only if, the user made the item_number datum available you also have to INSERT the product_item_number counterpart. In case that the item_number value is unknown or it simply does not apply, you do not INSERT a product_item_number row, and that is it.With this method you avoid holding NULL marks/markers in your base tables —and the logical-level consequences that I will detail in the next section—, but you should be aware that this is a “controversial” topic in the database administration ambit. On this point, you might find of value the answers for the Stack Overflow question entitled:
- “How can I avoid NULLs in my database, while also representing missing data?”
The popular course of action
I guess, however, that the popular —or common— proceeding would be to have a single
product table that includes the item_number column which, in turn, would be set as NULLable and, at the same time, defined with a UNIQUE constraint. The way I see it, this approach would make your database and the applicable data manipulation operations less elegant (as shown, e.g., in this outstanding Stack Overflow answer), but it is a possibility.See the successive DDL statements that exemplify this course of action:
```
CREATE TABLE product (
sku TEX
Code Snippets
-- You should determine which are the most fitting
-- data types and sizes for all your table columns
-- depending on your business context characteristics.
-- Also, you should make accurate tests to define
-- the most convenient INDEXing strategies.
CREATE TABLE product (
sku TEXT NOT NULL,
name TEXT NOT NULL,
price NUMERIC NOT NULL,
quantity NUMERIC NOT NULL,
--
CONSTRAINT product_PK PRIMARY KEY (sku),
CONSTRAINT product_AK UNIQUE (name), -- AK.
CONSTRAINT valid_price_CK CHECK (price > 0),
CONSTRAINT valid_quantity_CK CHECK (quantity > 0)
);
CREATE TABLE product_item_number (
sku TEXT NOT NULL, -- To be constrained as PK and FK to ensure the 1:0/1 correspondence ratio between the relevant rows.
item_number TEXT NOT NULL,
--
CONSTRAINT product_item_number_PK PRIMARY KEY (sku),
CONSTRAINT product_item_number_AK UNIQUE (item_number), -- In this context, ‘item_number’ is an AK.
CONSTRAINT product_item_number_TO_product_FK FOREIGN KEY (sku)
REFERENCES product (sku)
);CREATE TABLE product (
sku TEXT NOT NULL,
name TEXT NOT NULL,
price NUMERIC NOT NULL,
quantity NUMERIC NOT NULL,
item_number TEXT NULL, -- Accepting NULL marks.
--
CONSTRAINT product_PK PRIMARY KEY (sku),
CONSTRAINT product_AK1 UNIQUE (name), -- AK.
CONSTRAINT product_AK2 UNIQUE (item_number), -- Being ‘NULLable’, this is not an AK.
CONSTRAINT valid_price_CK CHECK (price > 0),
CONSTRAINT valid_quantity_CK CHECK (quantity > 0)
);CREATE TABLE manufacturer (
manufacturer_number INTEGER NOT NULL, -- This could be something more meaningful, e.g., ‘manufacturer_code’.
name TEXT NOT NULL,
--
CONSTRAINT manufacturer_PK PRIMARY KEY (manufacturer_number),
CONSTRAINT manufacturer_AK UNIQUE (name) -- AK.
);
CREATE TABLE product (
manufacturer_number INTEGER NOT NULL,
sku TEXT NOT NULL,
name TEXT NOT NULL,
price NUMERIC NOT NULL,
quantity NUMERIC NOT NULL,
--
CONSTRAINT product_PK PRIMARY KEY (manufacturer_number, sku), -- Composite PK.
CONSTRAINT product_AK UNIQUE (name), -- AK.
CONSTRAINT product_TO_manufacturer_FK FOREIGN KEY (manufacturer_number)
REFERENCES manufacturer (manufacturer_number),
CONSTRAINT valid_price_CK CHECK (price > 0),
CONSTRAINT valid_quantity_CK CHECK (quantity > 0)
);CREATE TABLE product_item_number (
manufacturer_number INTEGER NOT NULL,
sku TEXT NOT NULL,
item_number TEXT NOT NULL,
--
CONSTRAINT product_item_number_PK PRIMARY KEY (manufacturer_number, sku), -- Composite PK.
CONSTRAINT product_item_number_AK UNIQUE (item_number), -- AK.
CONSTRAINT product_item_number_TO_product_FK FOREIGN KEY (manufacturer_number, sku)
REFERENCES product (manufacturer_number, sku)
);CREATE TABLE product_item_number (
manufacturer_number INTEGER NOT NULL,
sku TEXT NOT NULL,
item_number TEXT NOT NULL, -- In this case, ‘item_number’ does not require a UNIQUE constraint.
--
CONSTRAINT product_item_number_PK PRIMARY KEY (manufacturer_number, sku), -- Composite PK.
CONSTRAINT product_item_number_TO_product_FK FOREIGN KEY (manufacturer_number, sku)
REFERENCES product (manufacturer_number, sku)
);Context
StackExchange Database Administrators Q#136663, answer score: 9
Revisions (0)
No revisions yet.