Friday, April 19, 2013

SQL Statistics - Auto Create and Auto Update

SQL statistics includes histogram which display the distribution of values and it also contains densities, information of the correlation statistics between multiple columns. These information are used by query optimizer to estimate the cardinality (number of rows) of the query result and determine an optimal query plan for better query performance based on the estimation. As a result, creating and maintaining the statistics efficiently and accurately become a critical task.

SQL Server can automatically create and update statistics. There are 3 statistics settings that affect the entire database,


Setting AUTO_CREATE_STATISTICS setting to ON enables query optimizer to generate single column statistics on query predicate. This statistics is generated (with default sampling) if there is no other existing statistics with the same first column. Turning the setting OFF prevent query optimizer from automatically create statistics and may result sub-optimal query plan.

Setting AUTO_UPDATE_STATISTICS setting to ON specifies query optimizer to automatically update the statistics (with default sampling) if it is deemed out-of-date. The statistics becomes out-of-date when it reaches the 'change threshold' that caused by data modification through update, insert, delete or merge operations. The permanent table 'change threshold' is calculated as,

For table with number of rows < 500, requires 500 modifications since last statistics updates
For table with number of rows >= 500, requires 500 + 20% of the cardinality changes since last statistics updates

Turning this setting ON generally is good idea to automatically update the statistics as the distribution of the values may have changed since the statistics last update. However, in certain cases when the default sampling (auto update uses default sampling) is not desirable, especially when the statistics have previously been generated/updated with full scan. As data distribution varied between application, this is the time where testing and verification are required to justify if either latest statistics with default sampling or out-of-date full scan statistics produce better query performance.

Query optimizer determines if the statistics is out-of-date and updates the statistics if required before compiling and executing the query. This may result slower query response time especially if the table have large number of rows. This is where AUTO_UPDATE_STATISTICS_ASYNC could help. The query compile without waiting for the updates as the statistics update occurs at later times, with the trade off of using an out-of-date statistics for query compiled at the first occurrence. Query compiled after the asynchronous update can benefit from the updated statistics. This setting by default is OFF. This setting should be used with care as it may generated a sub-optimal query plan before the asynchronous statistics update.

The current values of these setting can be retrieved from sys.databases catalog views,

FROM sys.databases

To change these setting, it could be done with ALTER DATABASE T-SQL statement.


Remember these setting affect the entire database. To change AUTO_UPDATE_STATISTICS on specified statistics, index, table or indexed view, use sp_autostats. This option is only useful towards turning a specified statistics AUTO_UPDATE_STATISTICS OFF when database has its AUTO_UPDATE_STATISTICS enabled (ON). Please notes that it does not recognize the specified statistics AUTO_UPDATE_STATISTICS ON when the database setting AUTO_UPDATE_STATISTICS is turned off.

To turn the specified statistics AUTO_UPDATE_STATISTICS OFF

EXEC sp_autostats @tblname = 'YourTable', @flagc = 'OFF', @indname = 'YourStatistics'

Use the same stored procedure without @flagc to returns the result with its current value of AUTO_UPDATE_STATISTICS.

EXEC sp_autostats @tblname = 'YourTable', @indname = 'YourStatistics'

There is another way to disable the automatic update on a table/statistics level. By using NORECOMPUTE option during a manual statistics update. This is particular useful when a statistics is previously updated with FULLSCAN and needs to avoid the default sampling during automatic statistics updates.