Indexes – F1 Crash Course

There are a million articles, blogs post etc. about indexes out there, I am guilty too of contributing to it. Here is my attempt at correcting the vast amount of theory with a quick crash course on index F1 style, ideally in a few quick minutes you should go from 0 to 300

What I start with

CREATE TABLE [dbo].[customers](

[first_name] [varchar](50) NULL,

[last_name] [varchar](50) NULL,

[email] [varchar](100) NULL,

[Title] [varchar](20) NULL,

[UserName] [varchar](53) NULL,

[custid] [int] IDENTITY(1,1) NOT NULL

) ON [PRIMARY]

This table contains 8000 rows of customer data.

CREATE TABLE [dbo].[Fact](

[custid] [float] NULL,

[Date] [datetime] NULL,

[OrderNumber] [float] NULL,

[Status] [float] NULL,

[Amount] [float] NULL,

[Region] [float] NULL,

[Discounted] [float] NULL

) ON [PRIMARY]

This table 1048575 rows of order data

The Query I want to optimize,

select title+' '+first_name+' '+ last_name , email , year(Date) , month(date) , Sum(Amount)

from customers c

inner join [dbo].[Fact] f

on c.custid = f.custid

Group by title+' '+first_name+' '+ last_name , email , year(Date) , month(date)

 

NO INDEXES ON THE TABLES AT THE MOMENT (WORST POSSIBLE PERFORMANCE)

Statistics

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server parse and compile time:

CPU time = 655 ms, elapsed time = 2435 ms.

(717713 row(s) affected)

Table 'customers'. Scan count 5, logical reads 596, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Fact'. Scan count 5, logical reads 8557, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 7613 ms, elapsed time = 8396 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

Execution Plan

 

In this case there is a table scan happening because I need to read all the rows in order to summarize, parallelism is required since cost threshold exceeded while reading the fact table with over a million rows in it.

 Optimizer suggests missing index

 

CREATE NONCLUSTERED INDEX []

ON [dbo].[Fact] ([custid])

INCLUDE ([Date],[Amount])

 

Why?

Since customer table is so small it is not affecting performance, however the FACT table is huge and needs an index for better fetch. The custid is the key column since that is what needs to be searched however the Date and Amount need to be included in the non-clustered index since it is required as part of the output.

NON CLUSTERED INDEX ON FACT TABLE USING CUSTID AND INCLUDED COLUMNS (BEST PERFORMANCE )

 

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server parse and compile time:

CPU time = 16 ms, elapsed time = 85 ms.

(717713 row(s) affected)

Table 'customers'. Scan count 5, logical reads 72, physical reads 0, read-ahead reads 72, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Fact'. Scan count 5, logical reads 5010, physical reads 1, read-ahead reads 1755, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 7207 ms, elapsed time = 8133 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

CLUSTERED INDEX ON BOTH TABLES ON CUSTID COLUMN AND CLEAN BUFFER POOL AND NO NONCLUSTERED INDEX ( THIRD BEST PERFORMANCE – CLUSTERED INDEX NOT ALWAYS USEFUL )

(717713 row(s) affected)

Table 'customers'. Scan count 5, logical reads 217, physical reads 1, read-ahead reads 72, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Fact'. Scan count 5, logical reads 9610, physical reads 0, read-ahead reads 1154, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(1 row(s) affected)

SQL Server Execution Times:

CPU time = 7611 ms, elapsed time = 7715 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 0 ms.

 

EXECUTION PLAN

 

In this case the Table scan is converted into a clustered index scan because once a clustered index is created the HEAP table becomes a Clustered index, in other words the clustered index is the table itself , as a result the clustered index is all that is needed to answer any queries on these tables. I created the clustered index on the custid column since that is the only column on which any kind of searching happens.

Notice how having two clustered indexes on the tables and no non-clustered indexes on the table is actually worse for performance. This CPU time ( actual work being done) is slightly higher for the query ,Why? The clustered index is the entire table and as such any scan on the index reads the entire table hence the larger number of logical reads on the fact table 9610 in clustered index as opposed to 5010 for non clustered index. With the NC index the columns (order number , status , region) on fact table are not touched and are not loaded the RAM i.e. less data fetched equals better performance.

A NONCLUSTERED COLUMN STORE INDEX ON FACT TABLE AND NO INDEXES ON ANY OTHER TABLE (SECOND BEST PERFORMANCE)

In this query creating a NON CLUSTERED COUMNSTORE INDEX isn’t much help since the date column on which the year and month function are applied come from the fact table also the total output returns 71K rows even when executed in batch mode.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

(717713 row(s) affected)

Table 'Fact'. Scan count 4, logical reads 31670, physical reads 0, read-ahead reads 1712, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'customers'. Scan count 5, logical reads 72, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 4664 ms, elapsed time = 8971 ms.

On the other hand the below query

select title+' '+first_name+' '+ last_name , email , Sum(Amount)

from customers c

inner join [dbo].[Fact] f

on c.custid = f.custid

Group by title+' '+first_name+' '+ last_name , email

Completes in less than 300 milliseconds.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 85 ms.

(8000 row(s) affected)

Table 'Fact'. Scan count 4, logical reads 26332, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'customers'. Scan count 5, logical reads 72, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Workfile'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 203 ms, elapsed time = 270 ms.

Bottom line , the optimizer knows what its doing , just try and follow its recommendations, Simple rule of thumb, create a clustered index on the PK , always have a clustered index to speed up any lookups that may be required by the Non clustered index. Always have indexes on the columns that are used for filtering or joining as long as you use the columns in the original format they were stored in. Include any columns that are part of the select statement into the index if the Non clustered index is being used by that query. Avoid over indexing.