Hotfix for SQL Server 2008/2008 R2 Periodically Does Not Accept Connections Bug

Microsoft has released a hotfix for a very frustrating issue that plagued me for a couple of years at NewsGator.  I recently became reacquainted with this old problem with Error: 18056, Severity: 20, State: 29. Basically what happens is that a SQL Server 2008 or 2008 R2 database server that is under  absolutely no CPU or memory stress suddenly stops accepting connections from your application and web servers. You will get a number of errors in the SQL Server error log, like you see below:

Date  11/18/2011 8:42:40 PM
Log  SQL Server (Current – 11/18/2011 9:00:00 PM)

Source  spid81

Message
Error: 18056, Severity: 20, State: 29.

Date  11/18/2011 8:42:40 PM
Log  SQL Server (Current – 11/18/2011 9:00:00 PM)

Source  spid81

Message
The client was unable to reuse a session with SPID 81, which had been reset for connection pooling. The failure ID is 29. This error may have been caused by an earlier operation failing. Check the error logs for failed operations immediately before this error message.

While this is going on, no middle-tier servers can connect to the SQL Server instance in question. Quite often, you as the DBA will not be able to make a new connection to the server in question either, using SSMS.  Your CPU utilization will go down to zero, with seemingly no activity happening, and the SQL Server Service will continue to run just fine, with no cluster failover or database mirroring failover being triggered. Essentially, your database instance seems to be pouting and not talking to anyone, like an unruly two-year old…  Usually, this problem clears itself up with no intervention within anywhere from one to ten minutes, but occasionally it requires restarting the SQL Server Service to resolve.

We used to see this issue periodically at NewsGator, starting back in 2009 on SQL Server 2008. There was no change or improvement as we moved to SQL Server 2008 R2. I had previously filed a couple of Connect items about it, opened a CSS case, etc., with no final resolution from Microsoft.  Here are some Connect items that describe the issue in more detail:

SQL Server 2008 Periodically Does Not Accept Connections

SQL Server 2008 SP1 CU6 Periodically Does Not Accept Connections  

SQL Server 2008 R2 Does Not Accept Connections

Increasing the default MaxWorkerThreads instance configuration setting seemed to mitigate the issue somewhat, when I first started seeing the issue.  Another change that seemed to help reduce the frequency of the issue was lowering the MaxServerMemory instance configuration setting by a few GB lower than you would otherwise have it set at. Lots of people I know and respect in the SQL Server community have also run into this over the last couple of years. Bob Dorr talked about this back in August 2010 and a Microsoft Escalation Engineer named Tejas Shah talked about it in May 2010.

After this background and history, it seems that the hotfix that I linked to in the first sentence of this blog post corrects the issue. I have not been able to deploy the fix yet on my particular production server where I recently saw the problem, but a good friend of mine from Microsoft has told me that one of his largest customers recently deployed the fix, and they have not seen the issue reoccur since then. According to the KB article for the fix, it is included in these Cumulative Updates:

SQL Server 2008 R2 RTM CU9 (10.50.1804)

SQL Server 2008 R2 SP1 CU2  (10.50.2772)

SQL Server 2008 SP2 CU5  (10.00.4316)

SQL Server 2008 SP3 CU1  (10.00.5766)

The relevant fix does show up in the fix-list for the SQL Server 2008 R2 RTM CU9 Cumulative Update, but not in the fix-list for SQL Server 2008 R2 SP1 CU2 or for SQL Server 2008 R2 SP1 CU3. I have been assured by someone else at Microsoft (who is in a position to know), that the fix is in the SQL Server 2008 R2 SP1 branch, even though it is not explicitly listed in the fix-list.  If you have been running into this issue, I would suggest that you make plans to get a Cumulative Update that is new enough to include the fix as soon as possible. If you are at an organization that does not believe in deploying Cumulative Updates, you might have to make an exception in this case. I would love to hear from anyone else who has been seeing this problem themselves. Thanks for reading!

This entry was posted in SQL Server 2008, SQL Server 2008 R2 and tagged . Bookmark the permalink.

18 Responses to Hotfix for SQL Server 2008/2008 R2 Periodically Does Not Accept Connections Bug

  1. Jason says:

    So good to hear! I can’t wait to test this out on some of my servers that have had this issue pop up at random times!

  2. Brian R says:

    Gawd what a saga that was. I noticed that SQL Azure in the America North zone was having lots of intermittent connection issues over the last few weeks. Perhaps they hit this issue and finally got enough info to isolate the problem.

  3. DFahey says:

    Glen – Can your Microsoft contact confirm that all fixes in R2 RTM CU9 are rolled in to main branch for r2 SP1 or even that they are all rolled into into R2 SP1 CU 1,2 or 3????

    I guess this question can be expanded to RTM CU7, CU8 and CU9 as these were at least documented to NOT be included in R2 SP1 as only CU1-6 were supposidly included….

    So the question is are the pre SP1 CU 7, CU8 and CU9 fixes rolled into main or are they in one of the post SP1 CU’s????

    KB2159286 is the bug fix in particular that we are concerned about being included from CU9, however as a matter of general principal one would hope all the fixes in R2 RTM CU 7,8,9 have been rolled into one of the latter SP1 CU’s if not the SP1 itself via main branch now.

  4. Glenn Berry says:

    There is usually a one CU release delay after a Service Pack is released before the new SP branch catches up to where the latest CU was in the previous SP or RTM branch. If you were at the latest CU in the previous branch, you have to wait a bit for parity in the new branch.

    This is because there is a much longer testing period for a full Service Pack, which means that the most recent fixes from the older branch are not included in the Service Pack until a CU is released for that Service Pack.

    Microsoft seems to have not done the best job of maintaining the public fix list for the SQL Server 2008 R2 SP1 branch…

  5. hello Glenn, how are you?

    are you sure this fix is included in the 2008 R2 SP1 CU2? I don’t see any info in the documentation stating that the fix for 2543687 KB is included in either the CU1, CU2 or CU3.
    I have a big customer that is facing this exact same problem, I have 2008 R2 SP1 CU3 installed and still seeing the problem..

    • Glenn Berry says:

      I have been told by a couple of people at Microsoft that it is included in SP1 CU2. We have not seen the issue since we went to SP1 CU3, but that is not a 100% guarantee that it is really fixed.

  6. Omaha DBA says:

    Thanks for this info – I hope it’ll help my random 020’s that my SQL 2008R2 server sends out at 3am (watching it via agent alerts)

  7. Glenn Berry says:

    It turns out that we are still seeing this error (although less frequently) since we deployed SQL Server 2008 R2 SP1 CU3.

  8. Mark Horton says:

    Have you found that SQL Server 2008 R2 SP1 CU4 has solved the problem?

  9. Glenn Berry says:

    Felipe,
    I would love to know whether that is true. Hopefully so. Who told you that this issue is fixed in SP1 CU5?

    • internal contact that was helping us (SolidQ) with this issue in a customer.
      hope is true too.. in the meantime, I had to convince the customer to buy more and more memory, as this bug seems to appears when the server suffers on memory pressure.
      Since we doubled the avaliable memory the bug dissapeared.

      • Jesse says:

        Any update on whether or not SP1 CU5 resolved this issue for anyone? We just applied it to one of the servers in our cluster but still hit the issue. We’re uncertain at this point if it’s the same issue we were hitting before or perhaps another issue exhibiting the same symptoms.

  10. Justin Dover says:

    MS has updated http://support.microsoft.com/kb/2543687 stating that this issue is first resolved with CU6. Has anyone had the opportunity to verify?

    • Puneet says:

      We just applied the sql 2008 R2 SP1 CU6 and the error is still there. Planning to upgrade to R2 Sp2 and then taking CU1 . Will test that and see if that corrects the issue

  11. Kim says:

    One of my clients is experiencing this with Standard Edition, I don’t see SE in the Applies to list. Has anyone had this issue in SE and tried this update?

  12. Kim says:

    Hey Glen! This exact issue is cropping up on SQL Server 2008 R2 SP3 – any new CU’s out there to correct it?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s