HiveBrain v1.2.0
Get Started
← Back to all entries
principlesqlMinor

Is it a good approach to duplicate data in a database?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
duplicatedatabasegooddataapproach

Problem

I have a program that needs to store traceroutes (among other data).

Here is a diagram that depicts the business scenario:

My use case right now is to store the traceroutes using the following tables (with their corresponding columns):

packet

packet_id
source_ip
destination_ip
packet_length


traceroute

traceroute_id
packet_id
timestamp
path_id


path

path_id
traceroute_id
timestamp


ip

ip_id
ip_address


path_ip

path_id
ip_id
order_index


This creates the need to make several JOINs to get back a traceroute, but it makes the pieces of the traceroutes available for query for other needs.

I am considering having a second traceroute table that stores the path as a json string of ip_addresses. This allows me to recover full traceroutes with minimal JOINs, but still retain the individual hops (ip_addresses) used for other queries.

My questions are:

  • Does having this duplicated data make sense?



  • Is there a better approach to this?



Responses to comments:

I would add the traceroute data from one method, and would populate the tables at the same time.

The duplicated table (perhaps called traceroute2?) would just be used for retrievals, and none of the data would ever be edited or updated.

I am considering this because I am most interested in speed of data retrieval. I did some benchmarking (sort of crude) and I can get a 2-6X speed improvement on retrieval.

I don't want to prevent JOINs for a particular reason, just want to avoid this many. For this operation, I never need the individual hops of a traceroute, so I don't see what I would gain by splitting them out when I add it, and then piecing them back together again.

Solution

De-normalisation and certain kinds of data duplication can be a convenient way to speed-up processes. Examples are caching, datawarehousing, and materialised views.

Meanwhile the duplicated data is relied upon as a read-only snapshot (i.e. point-in-time copy of the actual data), or the system can guarantee consistency (like in the case of caches), it is safe to rely on this practice.

Context

StackExchange Database Administrators Q#136528, answer score: 2

Revisions (0)

No revisions yet.