HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Integrity constraints in a relational database - should we overlook them?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
constraintsintegrityoverlookrelationaldatabaseshouldthem

Problem

I’m in a permanent discussion with the developers of the company where I work because they say it is better to get rid of relationship enforcement (via FOREIGN KEY constraint definitions) in a relational database in order to speed up large queries and to gain better performance.

The platform under consideration is MySQL 5.x, and no FOREIGN KEY has been set up, even some PRIMARY KEY constraints of the relevant tables are missing which, at least for me, is not reasonable. Maybe they’re right and I’m wrong, but I don’t have enough arguments to discuss about this situation.

This has been the preferred approach for three years now. I’m new in this company (only one month) but, as the product “works”, there is hesitation to enhance the database; nevertheles, the first thing I noticed is one page taking 1 minute to load (yes, 60 seconds!).

One of the claims behind the current state of affairs is that a “denormalized” database is faster than a normalized one, but I don’t believe that’s true.

Most of the relevant queries include JOIN operations, which makes them run very, very, very slow with large amounts of data (the database contains millions of rows).

Commonly, the handling of “CRUD” operations is implemented at the application program code level; for example, in order to DELETE some data FROM, let’s say, TableA:

  • it is necessary to first check on the fly if there is some relationship between the rows of TableA and TableB,



  • in case that said relationship is “detected”, then the app program code won’t allow to DELETE the pertinent row(s), but



  • if for some reason the app program code fails, then the DELETE operation will “succeed”, no matter if there is any relationship regarding the involved rows and tables.



Question

Could you help me to elaborate a good, accurate and solid answer to enrich the debate?

Note: Maybe something like this has been asked (and answered) before, but I couldn’t find anything by means of Google.

Solution

If, as stated in your post, the intention is to create a relational database (RDB for brevity) and, therefore, it is expected that it functions as such, the short answer is:

  • No, you should not overlook data integrity constraints.



The primary objective should be to manage the pertinent data as it is: a quite valuable organizational asset, and a reliable manner to achieve said objective is employing technical means that are supported on sound theory.

Thus, as a database professional, you can take advantage of the state-of-the-art and elegant relational model mechanisms supplied by Dr. E. F. Codd to enforce business rules, and avoid the (technical and organizational) problems that would eventually arise if they are not utilized.

In this respect, I will share (a) my overall take on constraints and also (b) several considerations about the state of affairs of the database and the work environment at issue as follows.
FOREIGN KEY constraints, data relationships and referential integrity

A RDB must reflect the characteristics of the business context of interest with high accuracy, which definitely requires an in-depth conceptual-level analysis led by a modeler or designer who follows best practices, counting with the indispensable assistance of the business experts. That analysis must yield the correct identification and formulation the applicable business rules.

Consequently, if such a modeler has identified that there exist interrelationships between the data of relevance, he or she must configure the corresponding logical-level restrictions so that the database management system (DBMS) can guarantee that the data remains consistent with the exact characteristics and rules determined in the analysis referred to above at all times.

Regarding the database under discussion, one can infer that the pertinent interrelationships have been identified, since you mention that there is a procedural (and easy to circumvent) attempt to enforce them from outside of the DBMS facilities, by dint of application program code (which is a pre-relational approach) that in any case has to “touch” the database to try to validate the wholeness of said interrelationships.

However, as you know, that is not the optimal technique to protect referential integrity, because the relational science has prescribed a very powerful instrument for this purpose, i.e., FOREIGN KEY (FK) constraints. These constraints are very easy to create (via the superior declarative approach) as they are single sentences that avoid resorting to unnecessary and error prone ad hoc procedures. It is very useful to note that the execution speed of FK constraints has been highly optimized by specialized programmers (and the major platform vendors have worked on it for even decades now).

Furthermore, since a RDB must be an independent (self-protective, self-describing, etc.) software component that is capable of being accessed by multiple application programs (desktop, automatic, web, mobile, combinations thereof), it should not be “coupled” with the code of any of these apps.

Likewise, the data —being an significant organizational resource— naturally tends to outlive application programs, application programmers, application development platforms and programming paradigms.
PRIMARY KEY constraints and implications of duplicate rows

When —conceptually speaking— a particular kind of thing has been deemed of significance in a business environment, a database modeler has to (1) determine its relevant characteristics —i.e., its properties—, confirm said kind of thing as an entity instances prototype —i.e., an entity type— and (2) represent it by way of a table that is integrated by one or more columns in a logical design.

Then, just like it is paramount to distinguish each individual instance of a given entity type in the real world business, each corresponding row enclosed in a table must be uniquely distinguished as well. If a table does not have any KEY declared, it will eventually retain duplicates, and if there are two or more rows that retain exactly the same values, then they all carry the same meaning, they all represent the same fact.

On that point, duplicate rows should be discarded due to multiple reasons. From a theoretical perspective, the designer has to make sure that each row is always unique for the purpose of having tables that work as relationally as the SQL data sub-language permits (having important repercussions on data manipulation operations). Besides, from an informational perspective, if multiple rows represent the same fact, recording them is not only superfluous but harmful, as exemplified bellow:

  • Suppose that someone has inserted two identical rows in a certain table.



  • Later, someone else comes and updates only one occurrence of the duplicates. As a consequence, the other occurrence is not up-to-date anymore.



  • Successively, another person updates the occurrence that had not been modified so far. In this manner, both duplicat

Context

StackExchange Database Administrators Q#152488, answer score: 14

Revisions (0)

No revisions yet.