In this blog post I am doing a quick comparison on the performance implications of using Cell level encryption vs Hashing vs storing the data as plain text with just column level permissions.
While it would be pretty obvious that encryption has significantly more overhead what I was interested in knowing was “by how much”. The question was mainly if the difference between cell level encryption performance and hashing performance is less would I prefer to use encryption. Obviously there are some disadvantages to using hashing such as commonly used passwords would generate the same has value which is circumvented by using a GUID key along with the hash ( this is the approach followed in this test).
The below diagram shows the numbers but let me explain the approach.
There is a table which contains 5 columns, username, password, hash-password, key and encrypted password against which I am randomly looking up username and passwords, all usernames and passwords are unique. The test was run at least 5 times (1000 iterations per run) for the below scenarios
- Encryption with and without appropriate indexes
- Hashing with and without appropriate indexes
- Plain text password with and without appropriate indexes
Once indexes are in place you will notice that the difference in performance between Hashing and Encryption are significant but small, please keep in mind that I have used TRIPLE DES Encryption better algorithms will result in more time. You will notice that plain text and Encryption heavily rely on indexes to improve their performance while hashing doesn’t benefit as much with the index.
The below diagram shows the execution plan for encryption notice the “Filter” task being performed this is done because the encrypted password is not used as seek predicate in index seek operation and instead needs to be filtered later. This means that it’s better to have unique usernames (which is almost always the case) so that filter task just needs to deal with one row.
The below screen shot shows the execution plan for the hashing operation.
So in summary , I would prefer to use hashing for high performance/high concurrency but slightly less security critical databases. Infact the only reason I would use encryption would be to be complaint with some industry standard else Pricipal of least privilages and sound DB design should be good enough.
In case anybody is wondering what I mean by hashing here is the query
select username , [hashpassword] from testlogins
where username = @username and [hashpassword]=hashbytes('SHA1',@password+'878C5B0D-2D20-481D-8220-D1F864069BB3')
This is what I mean by encryption
A video explaining this blog in much more detail along with the tests in realtime.