patternMinor

Best way to design this mileage table?

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

thisdesignwaymileagetablebest

Problem

I will be populating a Miles Per Gallon (MPG) table. It's coming from an odometer source.

It's currently set up as so:

id               (primary_key)
, truck_num
, start_date
, end_date
, start_miles
, end_miles
, start_fuel
, end_fuel
, miles
, gals
, mpg

There seems to be some redundancy. The miles is (end_miles - start_miles), ditto for gals.

Should we have those miles and gals columns precalculated and stored in the database? It would definitely make querying easier, but at the expense of space. Same question for having the mpg calculated. A computed column would slow things down, no?

What indexes would work best? There's about 3,000 trucks (records) inserted in a batch every week.

I'm using SQL Server 2008 R2.

Edit: A sample query that would I'd be using

-- find average mpg for since ytd
select m.truck_num, avg(mpg)
from mpg m
join truck t on t.truck_num = m.truck_num
where start_date >= @begin_of_year and end_date <= @today
group by truck_num

Solution

Computed columns are your friend. Use them to capture simple calculations you're going to do anyway, and to guarantee that the computed values are correct.

Persist the results if you want to index or filter on them; don't persist them if you just need to pull the value once in a while.

Capture all data constraints using CHECK constraints.

Here is a pseudo-schema definition:

id               PRIMARY KEY
, truck_num

, start_date
, end_date         CHECK (end_date > start_date)

, start_miles      CHECK (start_miles >= 0)
, end_miles        CHECK (end_miles > start_miles)

                   -- what if they refill the tank?
, start_gals       CHECK (start_gals >= 0)
, end_gals         CHECK (end_gals = 0)

-- all these should be computed
, miles = end_miles - start_miles
, gals = start_gals - end_gals
, mpg = miles/gals

As for indexing the table, here's what I think:

Cluster on start_date ASC. This will satisfy your WHERE clause. You are always inserting data by increasing start_date, meaning your writes will always be sequential under this clustering scheme. You are also always querying by start_date so you satisfy your biggest query pattern as well. (3,000 inserts per week is nothing. Because you have such a low volume of inserts, you could even cluster on start_date ASC, end_date ASC.)

Create a non-clustered index on truck_num and INCLUDE mpg. This should satisfy your SELECT, JOIN, and GROUP BY clauses. If you want to ORDER BY mpg, then make mpg part of the index key after truck_num instead of just INCLUDE-ing it.

When you're done, test your setup as follows:

Create a test table

Pump it full of test data

Create the indexes

Update statistics

Run your most common queries

Check their plans and run times

Code Snippets

id               PRIMARY KEY
, truck_num

, start_date
, end_date         CHECK (end_date > start_date)

, start_miles      CHECK (start_miles >= 0)
, end_miles        CHECK (end_miles > start_miles)

                   -- what if they refill the tank?
, start_gals       CHECK (start_gals >= 0)
, end_gals         CHECK (end_gals < start_gals AND end_gals >= 0)

-- all these should be computed
, miles = end_miles - start_miles
, gals = start_gals - end_gals
, mpg = miles/gals

Context

StackExchange Database Administrators Q#6836, answer score: 7

Revisions (0)

No revisions yet.