patternsqlMinor
How recording of every change of a row in a database is generaly stored?
Viewed 0 times
storedgeneralyeverydatabasehowrecordingrowchange
Problem
In a project I'm working on, every change to the rows in some tables of the database must be tracked for further audit or rollback. It must be easy to find who modified the row, from which IP address and when, and be able to restore previous version.
The similar thing is used for example by Stack Exchange. When I change someone else's question, it is possible to find that I changed it, and to rollback the changes.
What is the general technique used to store every change to an object in a database, given that my current schema has mostly the same properties (below) that an average business app?
I thought about the following cases, but I have no real experience with those sorts of scenarios, so I would hear other's opinions:
-
Store everything in the same table, distinguishing the rows by ID and version. IMO, it is seriously stupid, and will hurt soon or later on performance level. With this approach, it is also impossible to set a different security level to the latest items and to the versions trace. Finally, every query would be more complicated to write. Actually, to access the up-to-date data, I would be forced to group everything by ID and retrieve, in each group, the last version.
-
Store the latest version in one table, and, at every change, copy the obsolete version to another table in another schema. The flaw is that every time, we store every value, even if it didn't change. Setting unchanged values to
The similar thing is used for example by Stack Exchange. When I change someone else's question, it is possible to find that I changed it, and to rollback the changes.
What is the general technique used to store every change to an object in a database, given that my current schema has mostly the same properties (below) that an average business app?
- The objects have a relatively small size: there might be some
nvarchar(1000)for example, but not huge blobs of binary data, this one being stored directly on disk, and accessed directly, and not through Microsoft SQLfilestream,
- The database load is pretty low and the whole database is handled by one virtual machine on a server,
- The access to the previous versions doesn't have to be as fast as the access to the latest version, but still has to be up-to-date¹ and not too slow².
I thought about the following cases, but I have no real experience with those sorts of scenarios, so I would hear other's opinions:
-
Store everything in the same table, distinguishing the rows by ID and version. IMO, it is seriously stupid, and will hurt soon or later on performance level. With this approach, it is also impossible to set a different security level to the latest items and to the versions trace. Finally, every query would be more complicated to write. Actually, to access the up-to-date data, I would be forced to group everything by ID and retrieve, in each group, the last version.
-
Store the latest version in one table, and, at every change, copy the obsolete version to another table in another schema. The flaw is that every time, we store every value, even if it didn't change. Setting unchanged values to
null is not a solution, since I must also track when the valSolution
The normal way do do audit logging of this sort is to have a shadow table and log changes with triggers on the base table you are auditing. The other tables can be placed on a different physical disk if you need to to that for performance, and you can put indexes on them if you need to support quick retrieval of the data.
The tables will have roughly the same structure as your original tables, but will have a datetime column for when the change took place and a marker for whether the row was inserted, changed or deleted. Sequencing the versions can be done by the time stamp.
The change date can be done by making the datetime column not null with a default of getdate(); an audit user column will capture the user with a not null column defaulted to Suser_Sname(). Assuming the actual user is being impersonated in the session this will capture the identity of the user making the change.
The database has no way to be aware of the IP address connecting to a web server. The application will have to explicitly capture and log the IP address with the transaction.
If you have a large number of tables you wish to audit, you can use the metadata from the system data dictionary to generate the triggers programmatically.
This solution is by far the best for several reasons:
-
It captures any changes to the table, not just those made by the application.
-
The audit tables can be put on a different set of disks to reduce the I/O load on your primary tables.
-
You can use a view based on a union of the table and audit log table to that will show the entire history including the current version.
-
You can index the audit log tables as necessary so that the audit users can query them responsively. As usual, index selection is a tradeoff between query performance and update overhead.
The tables will have roughly the same structure as your original tables, but will have a datetime column for when the change took place and a marker for whether the row was inserted, changed or deleted. Sequencing the versions can be done by the time stamp.
The change date can be done by making the datetime column not null with a default of getdate(); an audit user column will capture the user with a not null column defaulted to Suser_Sname(). Assuming the actual user is being impersonated in the session this will capture the identity of the user making the change.
The database has no way to be aware of the IP address connecting to a web server. The application will have to explicitly capture and log the IP address with the transaction.
If you have a large number of tables you wish to audit, you can use the metadata from the system data dictionary to generate the triggers programmatically.
This solution is by far the best for several reasons:
-
It captures any changes to the table, not just those made by the application.
-
The audit tables can be put on a different set of disks to reduce the I/O load on your primary tables.
-
You can use a view based on a union of the table and audit log table to that will show the entire history including the current version.
-
You can index the audit log tables as necessary so that the audit users can query them responsively. As usual, index selection is a tradeoff between query performance and update overhead.
Context
StackExchange Database Administrators Q#7922, answer score: 8
Revisions (0)
No revisions yet.