Tracking DML Operations in SQL tables

Recently I was asked if there was a better way than triggers to perform audits on a table. Essentially the client has a number of tables and each table needs a framework to audit DML operations occurring within them. Now the challenge is not really the audit part but how the data is consumed after it has been logged. While in some cases data needs to be returned back to the application in other cases it just needs to be stored in a history table for legal reasons. To begin with let me list out some of the common ways I have seen this kind of functionality implemented

Triggers

For most people this is the very first thing they think of. Simple put and AFTER trigger on the table that performs a subsequent insert into another table using the internal deleted table. While this is often the simplest approach there are challenges around performance since the trigger fires row by row and therefore can affect the write performance. In addition for highly transaction tables the changes can causes other functionality downstream like replication etc. to break. See the below link on how to go about using triggers. This problem increases with the number of table objects since every table will need a trigger and in addition it complicates deployments whenever there is a change to the underlying table.

https://solutioncenter.apexsql.com/how-to-create-and-use-dml-triggers-to-audit-data-changes/

Storing T-Log backups

While this approach is not as common, I have found it being implemented in a number of smaller companies where they simply retain the copy of transactional backups for a long duration to allow them to audit to any point in time historically with storage space being the main constraint. This approach has the added advantage of not cluttering the database with additional objects and not having to introduce additional features. However the disadvantage being having to retain backups for a long duration and if any backup is corrupted loss of Audit capability. Also the data is not readily accessible and has to be restored on another server which can often be a problem for VLDBs. However my main issues with this approach has been the fact the often the database goes down just when the data you want audited is being changed e.g. when the database is hacked. So as a way to track records it might not be perfect. Mainly backups are not meant to be used an auditing tool.

https://docs.microsoft.com/en-us/sql/relational-databases/backup-restore/transaction-log-backups-sql-server

Temporal tables

This option is the most attractive at the moment since it provides and inbuilt capability to track temporal data. Basically it tracks changes over time which is what auditing in this case is about. However it is a feature available in 2016 and therefore not really an option at the moment for most customers. By creating a shadow table for the user selected tables and versioning rows using a system time stamp it allows users to navigate back and forth to any point in time to see the state of the table. Think of it as a SCD type 2 where all columns are historical and tracked using Start Date and End Date.

http://www.enabledbusinesssolutions.com/blog/sql-2016-temporal-tables-the-way-we-are-most-likely-to-end-up-using-it/

Usage scenarios for temporal tables are listed below

https://docs.microsoft.com/en-us/sql/relational-databases/tables/temporal-table-usage-scenarios

Database Audits

While this does say audit it isn’t really the best fit for DML operations the way we want it. Mostly because it can tell us who did what and when but doesn’t really care what happened to the existing data. A great tool from a security perspective but as a way to log DML this sucks. Mostly it is not very user friendly to extract the details you requires and even when you do extract the required details it isn’t really practical to use the data from an application perspective.

https://docs.microsoft.com/en-us/sql/relational-databases/security/auditing/create-a-server-audit-and-database-audit-specification

Change Data Capture

As its name suggests it captures changes in data and therefore seems like an ideal tools for us to use in this case. Since we can specify the table we want tracked and in addition track specific columns. Provided with the ability to see net changes or all rows it seems to provide all the options we require. While this tool offers us better performance when compared to Triggers (because changes are read directly from the T log) it does have some significant drawbacks too. The first being it needs to be enabled table by table and can cause a significant amount of clutter in the database with additional functions and jobs created to track the changes. The other problem being the duration for which the changes are captured. It doesn’t store the changes for an indefinite amount of time so we need to write additional logic to extract data then load it into history tables later on. Naturally this is an overhead and more importantly it is not real time unlike the triggers based logic.

http://www.enabledbusinesssolutions.com/blog/script-to-enabled-and-use-cdc-in-ms-sql-server/

Change Tracking

Change tracking is lightweight version of CDC in a way. It captures the fact a row has changed but not the nature of the changes. In this way it simply tracks the primary key of the row and lets you know which version of the row is current. And how many versions ago it was created. Not really helpful for an audit but great when using ETL tools to reconcile data across table’s

https://docs.microsoft.com/en-us/sql/relational-databases/track-changes/work-with-change-tracking-sql-server

Within the Query

Probably the most elegant solution of the lot requires massive amount if changes to the application code to accommodate the required functionality. Essentially we piggy back on the internal Inserted and Deleted tables to capture the changes as the DML operation is happening and then decide how to handle them within the transaction. Because of the entire logic being written into the procedure it allows massive amount of customization on which columns and what rows qualify for audit. Along with the ability to add your own type of row version the same history table could be used for auditing and ETL processes as well.

The below code might provide a better explanations on how the whole thing works. Naturally this way provides a lot of flexibility but doesn’t overcome the issues faced by trigger such as row by row firing but if your using additional inbuilt function if provides a level of detail that CDC provides but in real time.

-- CREATE MASTER TABLE
CREATE TABLE insertedtest_hist (
    Id INT
    ,FullName VARCHAR(100)
    ,Age INT
    ,PreferredAddress VARCHAR(10)
    )
GO

-- CREATE TRACKING TABLE
CREATE TABLE insertedtest_hist (
    Id INT
    ,FullName VARCHAR(100)
    ,Age INT
    ,PreferredAddress VARCHAR(10)
    )
GO

-- INSERT FIRST ROW INTO MASTER - PARALLEL INSERT INTO TRACKING TABLE
INSERT INTO insertedtest (
    Id
    ,FullName
    ,Age
    ,PreferredAddress
    )
OUTPUT inserted.Id
    ,inserted.FullName
    ,inserted.Age
    ,inserted.PreferredAddress
INTO insertedtest_hist(Id, FullName, Age, PreferredAddress)
VALUES (
    1
    ,'Insert1'
    ,13
    ,'BLR'
    )
GO

-- INSERT FIRST ROW INTO MASTER - PARALLEL INSERT INTO TRACKING TABLE
INSERT INTO insertedtest (
    Id
    ,FullName
    ,Age
    ,PreferredAddress
    )
OUTPUT inserted.Id
    ,inserted.FullName
    ,inserted.Age
    ,inserted.PreferredAddress
INTO insertedtest_hist(Id, FullName, Age, PreferredAddress)
VALUES (
    2
    ,'Insert2'
    ,13
    ,'HYD'
    )
GO

-- UPDATE THE MASTER TABLE AND THE SUBSEQIENT TABLE IN ALSO UPDATED INTO THE HISTORY TABLE AS WELL.
UPDATE insertedtest
SET Age = 25
OUTPUT inserted.Id
    ,inserted.FullName
    ,inserted.Age
    ,inserted.PreferredAddress
INTO insertedtest_hist(Id, FullName, Age, PreferredAddress)
WHERE id = 2

-- TEST THE RESULT
SELECT *
FROM insertedtest_hist

SELECT *
FROM insertedtest