HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

First Normal Form: Definitive Definition

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
definitionnormalfirstformdefinitive

Problem

I am trying to get a definitive version of what is First Normal Form. Everything I read has a slightly different spin.

Many authorities, such as Date, say that by definition a relation is always in First Normal Form, while others give a list of requirements. This means that there are from zero to many requirements for 1NF.

I suppose the difference is that between tables and relations: tables can be a complete mess, while a relationship follows certain restrictions. The fact that a relation is represented as a table in SQL thus creates some confusion.

I am focusing specifically on 1NF as it relates to SQL databases. The question is: what properties are required to ensure that a table is in the first normal form?

Many authorities suggest that if a table represents a relation, then it is already in 1NF. This pushes the definition of 1NF back to the definition of a relation.

Here are some properties of a table in 1NF:

  • Column Order is insignificant [1]



  • Rows Order is insignificant



  • All rows are the same length (ie, row data matches column headers)



  • There are no duplicate rows (this can be guaranteed using a surrogate primary key, but the PK is itself not required)



  • There are no repeating columns



  • Each column contains a single value (atomic)



[1] Technically attributes are unordered, but in a table, the row data must be in the same order as the column headers. However, the actual order is insignificant.

On multiple data:

The concept of atomic data is that an item cannot be further broken down. This concept has been qualified in that although technically everything can be broken down ad nauseum, the data in question cannot be practically broken down any further, depending on how the data will be used.

For example, a full address or a full name should normally be broken down further, but the components such as the given name or town name should probably not be broken down any further, despite the fact that as strings they can be.

As regards repeating c

Solution

Preliminary

The definition of normal form (which from the presentation of “Further Normalization of the Data Base Relational Model” in 1971 is known as first normal form) and the definition of the relational paradigm itself was published in 1970 in the scientific paper that provided a strong foundation for the practice of database administration, i.e., “A Relational Model of Data for Large Shared Data Banks” (RM for brevity) created by Dr. E. F. Codd, who is a Turing Award recipient and the authority with regard to the relational framework.

Yes, there is plenty of explanations, interpretations, expositions, deviations and opinions about Dr. Codd’s text, but I personally prefer to stick to the original source and I highly suggest that you analyze it by yourself so that you can draw your own conclusions.

I certainly do not understand the RM in its entirety, but what I do understand of it permits me appreciating its excellence, vision, intention and scope, and although decades later one can note that it has a few slight imprecisions, they do not reduce, in any way, its genius and elegance. In its field, the RM has stood the test of time in a unique way, and remains unmatched.

The act of emphasizing the aforementioned imprecisions would be —using a charitable term— unfair because, seeing it from from a considerable distance, this seminal material required a few refinements and extensions, yes, but the main body of the work was rock solid from the very conception (and, indeed, Dr. Codd made most —if not all— of such refinements and extensions himself).

I continue rereading the RM constantly in order to strengthen my comprehension of this exceptional source of knowledge (and my esteem of it keeps growing on every reread); the objective is to stand on the shoulders of giants.
Relations and tables

It is important to note that as relations are abstract resources, Dr. Codd envisioned the utility of representing them in tabular form (he initially used the term “array representation” but subsequently utilized “table” or “r-table”), so that the users, designers and administrators of a relational database (RDB) can approach them in a more familiar or concrete manner. Therefore, within the context of a RDB implementation, it is valid to use table as a shorthand for relation, as long as said table stands for an actual relation. This feature is —although obvious— quite significant because before evaluating whether or not a table represents a relation that complies with first normal form (1NF), it has to represent, precisely, a relation.

The RM naturally contains the qualities that a table must have to determine if it in fact portrays a relation, but I will offer an informal and unpretentious interpretation about them here (another one, yes!):

  • It must have a name (each particular relation in a database structure must be distinguished from the rest).



  • Each of its rows must depict exactly one tuple of the pertinent relation.



  • The order of its rows is not important at all.



  • Each of its columns must have a name that stands for the meaning of exactly one domain of the concerning relation, and said name must be different from the names of the rest of the columns of the table (a column must be uniquely differentiated and must carry a distinctive meaning and, yes, the role played by a database modeler and the business experts to define each domain of significance with accuracy is paramount)



  • The order of its columns has no significance.



  • All of its rows must have the same number of columns.



  • It must have at least one column, or one combination of columns, that uniquely identifies every one of the tuples depicted via rows; in this way, all of the rows must be different (yes, this stresses the importance of having at least one KEY declared, and when there are two or more KEYs one should be defined as PRIMARY based on pragmatic reasons, while the rest can be deemed as ALTERNATE; but yes, before making the decision, each of the KEYs was a “candidate” for a definition as PRIMARY).



Having a table that in actual fact represents a relation is critical as, when it undergoes manipulation operations of a relational kind, then the result is, again, a table that represents a relation. In this manner, the behavior of said table is predictable.
Atomic domains (columns)

In the first sections of the RM, Dr. Codd presents several samples of relations in order to introduce some concepts; so, in order to comprehend the meaning of atomic domain, let us start with the following excerpt from the RM that details some pertinent points:

So far, we have discussed examples of relations which are defined on simple domains—domains whose elements are atomic (nondecomposable) values. Nonatomic values can be discussed within the relational framework. Thus, some domains may have relations as elements. These relations may, in turn, be defined on nonsimple domains, and so on.

In this way, one can say that each of the aforementioned expositor

Context

StackExchange Database Administrators Q#161119, answer score: 8

Revisions (0)

No revisions yet.