principlesqlMinor
Is it a good approach to duplicate data in a database?
Viewed 0 times
duplicatedatabasegooddataapproach
Problem
I have a program that needs to store traceroutes (among other data).
Here is a diagram that depicts the business scenario:
My use case right now is to store the traceroutes using the following tables (with their corresponding columns):
packet
traceroute
path
ip
path_ip
This creates the need to make several
I am considering having a second
My questions are:
Responses to comments:
I would add the
The duplicated table (perhaps called
I am considering this because I am most interested in speed of data retrieval. I did some benchmarking (sort of crude) and I can get a 2-6X speed improvement on retrieval.
I don't want to prevent
Here is a diagram that depicts the business scenario:
My use case right now is to store the traceroutes using the following tables (with their corresponding columns):
packet
packet_id
source_ip
destination_ip
packet_lengthtraceroute
traceroute_id
packet_id
timestamp
path_idpath
path_id
traceroute_id
timestampip
ip_id
ip_addresspath_ip
path_id
ip_id
order_indexThis creates the need to make several
JOINs to get back a traceroute, but it makes the pieces of the traceroutes available for query for other needs.I am considering having a second
traceroute table that stores the path as a json string of ip_addresses. This allows me to recover full traceroutes with minimal JOINs, but still retain the individual hops (ip_addresses) used for other queries.My questions are:
- Does having this duplicated data make sense?
- Is there a better approach to this?
Responses to comments:
I would add the
traceroute data from one method, and would populate the tables at the same time. The duplicated table (perhaps called
traceroute2?) would just be used for retrievals, and none of the data would ever be edited or updated. I am considering this because I am most interested in speed of data retrieval. I did some benchmarking (sort of crude) and I can get a 2-6X speed improvement on retrieval.
I don't want to prevent
JOINs for a particular reason, just want to avoid this many. For this operation, I never need the individual hops of a traceroute, so I don't see what I would gain by splitting them out when I add it, and then piecing them back together again.Solution
De-normalisation and certain kinds of data duplication can be a convenient way to speed-up processes. Examples are caching, datawarehousing, and materialised views.
Meanwhile the duplicated data is relied upon as a read-only snapshot (i.e. point-in-time copy of the actual data), or the system can guarantee consistency (like in the case of caches), it is safe to rely on this practice.
Meanwhile the duplicated data is relied upon as a read-only snapshot (i.e. point-in-time copy of the actual data), or the system can guarantee consistency (like in the case of caches), it is safe to rely on this practice.
Context
StackExchange Database Administrators Q#136528, answer score: 2
Revisions (0)
No revisions yet.