Detecting and Troubleshooting Non-Yielding Worker Threads in SQL Server 2005/2008

I had a little bit of excitement with one of my production servers couple of days ago, that I thought I would document (especially since it turned out well in the end).  I have a little VB.NET WinForms application that calls a series of DMV stored procedures on the production database instances.

Below (with some sensitive information obscured) is a screenshot of Operations Monitor (during a normal time).

As I woke up and checked the system with Operations Monitor, I noticed several unusual readings that led me to believe that one of my database servers was not a happy camper. First, my Avg Task Count for the instance was running in the 25-30 range (where it normally is in the 4-7 range). Second, my SQL Server CPU utilization was in the 25-30% range (where it is normally in the 10-15% range). Third, My FullText Pending Changes were all very high (where they are usually close to zero). Finally, the Log Reuse Description for one of the databases on that instance was Active Transaction (which is unusual for my OLTP workload).

I initially thought that I had an issue with Integrated Full Text Search (iFTS), so I ran a command to resume population, which had no effect. Then, I tried to start a manual population, and that had no effect. Puzzled, I decided to look at the SQL Error log, where I saw a number of errors like you see below.

Process 0:0:0 (0x1470) Worker 0x000000000583E1A0 appears to be non-yielding on Scheduler 13. Thread creation time: 12903249354211. Approx Thread CPU Used: kernel 0 ms, user 0 ms. Process Utilization 19%. System Idle 78%. Interval: 10704548 ms.

Oh-oh, that was not good. I remembered a relevant white paper on this subject: How To Diagnose and Correct Errors 17883, 17884, 17887, and 17888 by Bob Dorr and Sameer Tejani at Microsoft, which I took a quick look at. While I was doing this, I noticed that I still had Active Transaction for the Log Reuse Description on that OLTP database.

I decided to run DBCC OPENTRAN, and discovered a transaction that had been active for a couple of hours, which was very unusual. Then, I ran DBCC INPUTBUFFER with the SPID for that transaction, and discovered that it was for a stored procedure that was kicked off by a nightly SQL Agent job.  I tried cancelling the Agent job, but that had no effect. Next, I ran KILL 290 to kill that SPID, which caused the transaction to to be rolled back (which took awhile). After that SPID was finally dead, the non-yielding worker errors stopped, and all of my other reading quickly returned to normal.

The moral of this story is to stay calm, and work the problem as you gather information about what is going on. I wasted some time chasing the iFTS problem before I ran DBCC OPENTRAN. The non-yielding worker was not bad enough to have caused a mini-dump by SQL Server, and the database server was not in acute distress, so I had time to keep working on it before trying something more drastic.

I have some of these basic but useful queries and commands shown below. They are very handy to have around and use to monitor your database servers. It is very important to know what your baseline values are, so that you will notice when something is different.

-- Get Avg Task Count and Avg Runnable Task Count
SELECT AVG(current_tasks_count) AS [Avg Task Count], 
AVG(runnable_tasks_count) AS [Avg Runnable Task Count]
FROM sys.dm_os_schedulers WITH (NOLOCK)
WHERE scheduler_id < 255;

-- Get Log reuse wait description
SELECT [name], log_reuse_wait_desc
FROM sys.databases WITH (NOLOCK)
WHERE database_id > 4;
    

-- Find out how many full-text changes are pending on a table
SELECT OBJECTPROPERTY(OBJECT_ID('CurrentPostFullTextMonday'), 
'TableFulltextPendingChanges') AS 'Monday Full Text Pending Changes';

-- Find oldest open transaction
DBCC OPENTRAN

-- Find command for a given SPID
DBCC INPUTBUFFER(290)

-- Kill a SPID
KILL 290

This entry was posted in SQL Server 2008. Bookmark the permalink.

2 Responses to Detecting and Troubleshooting Non-Yielding Worker Threads in SQL Server 2005/2008

  1. Amit says:

    Scheduler 13 is the one that is currently unusable by worker threads. Since you have a multi proc machine, the effects were not felt on other worker threads. In case this was a 4 proc box, then the effects would be felt as SQL would now have 25% scheduler availability reduced. All worker threads waiting to run on the affected SQL scheduler would continue to wait indefinitely. SQL 2005 Scheduling algorithm is much smarter than it\’s predecessor, so a lot of problems are avoided due to non-yielding conditions in SQL 2005.

  2. Glenn says:

    Well, this box has 16 logical cores, and it is SQL Server 2008 SP1 CU5.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s