Thursday, October 18, 2012

Faster Wait Statistics for SQL Server 2012

Since SQL Server 2005, waits and queues statistics have been implemented in SQL Server to provide important information and effective analysis to identify the root cause and for performance tuning. Here is some script to aggregate this information for performance tuning and analysis.

Dynamic management views (DMV) such as sys.dm_os_wait_stats provides insight of which wait type was frequently encountered and how long it lasts. The result from sys.dm_os_wait_stats returns wait_type, waiting_tasks_count, wait_time_ms, max_wait_time_ms, signal_wait_time_ms (the time difference between when the thread is signaled and when it start running). Refer to MSDN for more details of this DMV. The results includes a lot of benign waits. We want to aggregate the results so it would only show us the top wait type that are really affecting the server.

Paul Randal (link) has written great script to aggregate the top wait types and filter out the rest so it gives you a more useful insights. I have modified his codes to take advantage of some new feature of SQL Server 2012 to produce faster execution. If you are using SQL Server 2012, you can utilize the code below,

    wait_time_ms / 1000.0 AS WaitS,
    (wait_time_ms - signal_wait_time_ms) / 1000.0 AS ResourceS,
    signal_wait_time_ms / 1000.0 AS SignalS,
    waiting_tasks_count AS WaitCount,
    100.0 * wait_time_ms / SUM (wait_time_ms) OVER () AS Percentage,
 100.0 * wait_time_ms / SUM (wait_time_ms) OVER
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (

 W1.wait_type AS WaitType, 
    CAST (W1.WaitS AS DECIMAL(14, 2)) AS Wait_S,
    CAST (W1.ResourceS AS DECIMAL(14, 2)) AS Resource_S,
    CAST (W1.SignalS AS DECIMAL(14, 2)) AS Signal_S,
    W1.WaitCount AS WaitCount,
    CAST (W1.Percentage AS DECIMAL(4, 2)) AS Percentage,
    CAST ((W1.WaitS / W1.WaitCount) AS DECIMAL (14, 4)) AS AvgWait_S,
    CAST ((W1.ResourceS / W1.WaitCount) AS DECIMAL (14, 4)) AS AvgRes_S,
    CAST ((W1.SignalS / W1.WaitCount) AS DECIMAL (14, 4)) AS AvgSig_S
WHERE W1.RunningPerc > 5;  --Top 95%

I ran both original code and my modified code in my testing environment to make comparison on their speed. The original codes take around 3-4 seconds to return the results,

(2 row(s) affected)
Table 'Worktable'. Scan count 7, logical reads 6887, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

CPU time = 3000 ms,  elapsed time = 3967 ms.
SQL Server Execution Times:

With the above code, it takes around 20 milliseconds as shown in the statistics below,

(2 row(s) affected)
Table 'Worktable'. Scan count 3, logical reads 1286, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
CPU time = 16 ms,  elapsed time = 23 ms.

The above code is faster since it eliminates the self join and looping every records. The improvement of OVER clause with the new ROWS clause feature allow the code above run much faster. More details on OVER clause can be found at MSDN.

Paul Randal blog post as mentioned previously explain what may be the root causes on some of the common wait type. I also encourage everyone to go to this TechNet link, download and read the wait and queues white paper.