We recently upgraded some SQL instance to SQL Server 2012, and we were very excited about it. Most of the stuff looks pretty good except a few hiccup like missing maintenance plan on SQL 2012 SP1 CU3, which Microsoft has released a hotfix for it.
This Monday morning (yes, it is always Monday morning) we noticed a few database log files have grown so large that the disk drive almost ran out disk space. We were able to shrink a large log file that were at that time consuming small percentage of the log file and free up some disk space. With the urgent issue resolved, I dive into SQL Server and trying to investigate further and find the root cause.
First thing came to my mind was to examine if there is anything that holding back the log file truncation. The sys.databases DMV shows that the database as SIMPLE recovery model on the recovery_model_desc, but strangely with LOG_BACKUP on the log_reuse_wait_desc. Under simple recovery model, checkpoint is automatically issued when log file becomes 70% full, unless there are other process delaying the log truncation, in this case strangely log backup. Missing log backup should be delaying log truncation only when the database is set to FULL or BULK-LOGGED recovery model. The LOG_BACKUP log reuse wait for simple recovery model is just not right.
At least the reason of the log file explosion was found due to the older logs aren't marked as reused (to be truncated) as it is waiting for a 'LOG_BACKUP'. After further investigation and tests, I was able to reproduce the behavior.
ALTER DATABASE model SET RECOVERY SIMPLE; GO
Create a database..
CREATE DATABASE test ON PRIMARY ( NAME = test_data, SIZE = 5MB, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.TEST\MSSQL\DATA\test_data.mdf' ) LOG ON ( NAME = test_log, SIZE = 1MB, FILEGROWTH = 1MB, FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL11.TEST\MSSQL\DATA\test_log.ldf' ); GO
Verify the database is created in simple recover model (from model database)
SELECT name, recovery_model_desc, log_reuse_wait_desc FROM sys.databases WHERE name = 'Test'; GO
There are 4 virtual log files (vlf), with the first vlf marked as active (status 2).
DBCC LOGINFO (Test);
Create a table a insert some rows. Run the insert 40 times should have enough transaction to use more than 1 virtual log file.
CREATE TABLE table1 ( ID bigint IDENTITY(1,1), text1 varchar(4000), ); GO INSERT INTO table1 SELECT REPLICATE('H', 4000) GO 40 CHECKPOINT; GO
As the database is in simple recovery model, checkpoint command is issued to mark older log as reusable (log truncation).
The DBCC LOGINFO result shows that the first vlf has been marked as reusable (status 0) and the 2nd vlf as active. The sys.databases dmv we have earlier shows the log_reuse_wait_desc as NOTHING. This is the expected behavior for simple recovery model. Actually, since we have not taken any full backup, this would be the same behavior for full recovery model as well. Let's take a full backup.
BACKUP DATABASE test TO DISK = 'test_backup.bak'; GO
Now, execute the insert 60 times.
USE Test; GO INSERT INTO table1 SELECT REPLICATE('H', 4000) GO 60 CHECKPOINT; GO
Now run the DBCC LOGINFO again. We notice that the 2nd vlf is still marked as active, as well as the 3rd vlf.
The sys.databases shows that the log_reuse_wait_desc is now LOG_BACKUP.
If we continue to run the insert (e.g. GO 100), the log file grows to 2MB without any log truncation. That's the exact problem we had. The log file just grows and grows.
Since it is waiting for LOG_BACKUP for log to be truncated, why not we just tease it by 'trying' to back up the log. Log backup on simple recovery model should throw an error stating log backup is not allowed while the recovery model is SIMPLE.
BACKUP LOG test TO DISK = 'test_log_backup.trn' GO
Guess what, it ran successfully! What!? The DBCC LOGINFO shows the log has been truncated. Is this a good news? I am not so sure anymore.
All these behavior are starting to make me feel like SQL Server has been treating this database as full recovery model the whole time, even we have previously verified the recovery model as simple.
Let's change the recovery model to FULL and back to SIMPLE recovery to see if it does anything. The log_reuse_wait_desc is showing NOTHING. Good. Run a bunch of insert (e.g. GO 100, make sure issue a checkpoint after). The DBCC shows that the truncation is in place. Great!
If the model database is set to SIMPLE recovery model and user database is created with the simple recovery model from model database template, SQL Server does not truncate its log automatically like it suppose to (after a full backup). It appears that somehow SQL Server is treating it as if it is in full recovery model.
It works correctly if database is created with FULL recovery and later changed to SIMPLE recovery model.
Database restored from a SQL 2008 R2 backup to SQL Server 2012 instance in SIMPLE recovery model does not have this problem.
The behavior is tested and reproduced on SQL Server 2012 RTM, SP1, and SP1 CU3. I have only tested on these environment.
Change the recovery model of the affected database to FULL and back to SIMPLE (after at least one backup) appears to solve the problem. If the log has grown too large, you may need to execute a one time DBCC SHRINKFILE command to shrink the log file to an appropriate size. Also for a more permanent solution before Microsoft fix it, set the model database as FULL recovery model and user database only alter to SIMPLE recovery explicitly when required.
It appears that the root cause is when a user database is created (not alter after) with simple recovery model, then it has this problem. I blame it on model database as that is the only way to create a user database with the defined recovery model.
As the model database by default is set as FULL recovery model, this issue may not be as widespread as I initially thought. However, if you have been creating database or have database created by application (e.g. Sharepoint) on SQL Server 2012 instance with the model database set as Simple recovery model, you may want to inspect the log file size and its log_reuse_wait_desc on sys.databases DMV.
Right before I was about to submit a Microsoft Connect ticket, I found a ticket has already been created stating the same exact issue. I wish I came across this ticket earlier.
SQL Server 2012 RTM Cumulative Update 7 (CU7) released on April 15, 2013 has addressed this issue. Now I really wish I came across this update sooner.
SQL Server 2012 SP1 Cumulative Update 4 (CU4) released on May 31, 2013 has address this issue. Now, we all are happy.