patternsqlMinor

Normalizing/combing mutliple tables with similar data

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

tablescombingwithdatamutliplenormalizingsimilar

Problem

I'm working on setting up a database of vehicle records based on data from a third party vendor. There are 3 models: Vehicles, Transmissions, and Adaptors.

Both Vehicles and Transmissions are 1:n Adaptors. Technically Transmissions are just Vehicles with additional transmission columns, while Vehicles contain fuel system columns in addition to the vehicle columns. I am having trouble determining the best way of combining my data into one normalized set.

Here are some samples of my data:

Transmissions table

Vehicles Table

Pivot Tables

As you can see, Transmissions and Vehicles are basically the same up to the trans_type/fuel_system columns (respectively) and beyond. For example, the Vehicle I have highlighted is the same as the 2nd Transmission shown with engine_code LML.

Ideally, I am hoping to end up with just one table of vehicle data. For instance, if I were to merge the highlighted records, I should end up with only 2 records for a 2013 GMC Sierra 3500 HD:
One with a 6.0L V8 engine and the other with a 6.6L V8 engine, along with the respective fuel system and transmission columns for each.

The pivot tables would also be combined as well which means I need to replace the old IDs with new IDs from the combined data, somehow.

Here are my thoughts on how to approach this:

Normalize the similar columns (makes, models, engine, transmission)

Simply split off the fuel_system and trans_type columns into their own tables (but this does nothing to help me combine, it just makes things a bit more manageable until I can figure out how to combine)

Make a new table containing all the columns from both tables, and insert the data from Vehicles and Transmissions, fill in the blanks (i.e. update fuel_system where records match and vice versa for transmission columns), then begin to clean up duplicates.

I apologize for the lengthy post, but I have yet to find anything in my searches that really outlines what this process may look like. Any advice

Solution

Functional dependencies and normalization

In order to carry out a normalization exercise involving second and further normal forms —as per the relational model of data by Dr. E. F Codd—, one first has to know what are the relevant functional dependencies (FDs for brevity) between the attributes (usually depicted as columns) of an adapted mathematical relation (usually portrayed as a table). This kind of exercise belongs to the logical-level of abstraction of a database. That is why, in order to address your stated intention of obtaining a normalized set, I requested information about the applicable FDs via comments.

For instance, a FD involving the hypothetical attributes Foo and Bar can be depicted as Foo → Bar, which may in turn be read as “attribute Foo determines attribute Bar” or “attribute Bar is determined by attribute Foo”. In this way one can distinguish (a) one or more attributes, or one or more combinations of attributes, that are the key, or the keys, of a relation and (b) distinguish the attributes that are not, or are not part of, the key or keys.

Regarding the scenario you describe, let us suppose that the grid entitled Transmissions is a concrete representation of a mathematical relation. With respect to the values of the columns labeled as submodel, body_code, engine_code, engine, desplacement, cylinders, trans_type, trans_code and qualifier, one can say that:

-
Some (engine_code and trans_code) appear to be determined by the values of what would be the key of a relation called, let us say, Vehicle — i.e., the combination of (model, make, year).

-
Some (transmission_type and qualifier) appear to be determined by the values of what would be the key of a relation called, let us say, TransmissionAdaptor — i.e., trans_code.

-
Some (engine_type, displacement and cylinders) appear to be determined by the values of what would be the key of a relation called, let us say, Engine — i.e., engine_code.

But those are simple assumptions, based on my personal interpretation of the information contained in the aforementioned grid, and evidently I am not familiar at all with the business context in question. Therefore, in order to get rid of those kinds of needless and problematic assumptions you have to interview the business experts, and they will help you to identify the FDs, which would in turn guide you in performing proper normalization and laying out the database structure with the precision that the database administration profession requires. If there are no business experts who you can resort to, then you will have to dive into the data sets, observe data usage and meaning, and analyze carefully the interconnections between the pieces of information of interest to determine the significant FDs on your own.

A normalized set

Having a normalized database helps to avoid update/modification anomalies (affecting INSERT, UPDATE and DELETE operations) that arise eventually when there exist undesirable dependencies among the attributes (columns) of the relations (tables) under consideration. In cases in which, e.g., there are attributes of a relation that (i) depend on non-key attributes, or that (ii) depend on parts of a composite —i.e., multi-attribute— key, the designer has to decompose the concerning relation into two or more.

Thus, most of the time, a normalized set is made up of various relations, each of which is meant to contain exactly one particular type of fact in its tuples (rows). Contrarily, unnormalized and non-fully-normalized sets consist of one or more relations that contain more than one type of fact in their tuples.
The design of a database starting from the conceptual level of abstraction

On the other hand, you can design the relevant database starting from a different perspective, analyzing first the structure and associations —or relationshipsa or connections— of the types of things of interest from a purely conceptual point of view, without thinking about relations (tables), attributes (columns), constraints and normalization yet. Of course, this approach also entails close communication between the database designer(s) and the business experts or, in absence of business experts, database designer(s) with in-depth knowledge of the informational needs and characteristics of the business environment.

Sample business rules

So I will put together some hypothetical business rules that will help in the creation of an expository conceptual schema based on mere assumptions about the information contained in your question.

A Vehicle is:

primarily identified by exactly one combination of Model, Make and Year

moved by exactly one Engine

equipped with exactly one TransmissionAdaptor

An Engine:

is primarily identified by exactly one EngineCode

is catalogued by exactly one EngineType

has exactly one Displacement

holds exactly one type of Cylinders

is provided with exactly one FuelSystem

is installed on zero, one or more Vehicles

A Transmiss

Code Snippets

CREATE TABLE TransmissionAdaptor ( 
    TransmissionAdaptorCode CHAR(10) NOT NULL,
    TransmissionType        CHAR(30) NOT NULL,
    --
    CONSTRAINT TransmissionAdaptor_PK PRIMARY KEY (TransmissionAdaptorCode)
);

CREATE TABLE FuelSystem ( 
    Name           CHAR(6)  NOT NULL,
    FuelSystemType CHAR(10) NOT NULL,
    Pressure       CHAR(10) NOT NULL,
    FuelSystemName CHAR(30) NOT NULL,
    --
    CONSTRAINT FuelSystem_PK PRIMARY KEY (Name)
);

CREATE TABLE MyEngine ( 
    EngineCode     CHAR(3) NOT NULL,
    EngineType     CHAR(8) NOT NULL,
    Displacement   CHAR(3) NOT NULL,
    Cylinders      CHAR(3) NOT NULL,
    FuelSystemName CHAR(6) NOT NULL,
    --
    CONSTRAINT Engine_PK               PRIMARY KEY (EngineCode),
    CONSTRAINT Engine_to_FuelSystem_FK FOREIGN KEY (FuelSystemName)
        REFERENCES FuelSystem (Name)
);

CREATE TABLE Vehicle ( 
    Model                   CHAR(20) NOT NULL,
    Make                    CHAR(10) NOT NULL,
    MyYear                  SMALLINT NOT NULL,
    EngineCode              CHAR(3)  NOT NULL,
    TransmissionAdaptorCode CHAR(10) NOT NULL,
    --
    CONSTRAINT Vehicle_PK                 PRIMARY KEY (Model, Make, MyYear),
    CONSTRAINT Vehicle_to_Engine_FK       FOREIGN KEY (EngineCode)
        REFERENCES MyEngine (EngineCode),
    CONSTRAINT Vehicle_to_Transmission_FK FOREIGN KEY (TransmissionAdaptorCode)
        REFERENCES TransmissionAdaptor (TransmissionAdaptorCode)
);

Context

StackExchange Database Administrators Q#220893, answer score: 7

Revisions (0)

No revisions yet.