Solved: Fire and Forget Service Broker Conversations Causes TempDB to Fill Up

I had some code running in production which uses fire and forget service broker conversations. I recently started having issues with queries reporting that the TempDb is full. This blog post explores this issue and how the issue was resolved.

Details of the Problem

The issue is occurring on Microsoft SQL Server 2012. I have a process which runs a stored procedure asynchronously and hence does not need to wait for a reply. I was still learning about the Service Broker functionality when I was building this. However, the process I built ran successfully and solved a performance issue I was having. This has been running successfully in production for over a year now without any issues. In the last couple of days, I started having issues with my ETL nightly runs. Some of the larger queries started erroring out with TempDb is full errors.

Investigating the Issue

I checked my TempDb for free space and noticed that it was using 7.5Gb out of 8Gb. I also noticed that my SQL Server was hogging the CPU, I had around 70% CPU usage by the SQL Server. So I decided to see which queries were using the CPU. I noticed that the queries having the greatest CPU utilisation were related to my Service Broker process.

At this point, I decided to disable the Service Broker as it is not a critical component of my application. Once the broker was disabled I checked the TempDb free space again. It was only using 500Mb out of the 8Gb available. The CPU utilisation also went down.

I wanted to confirm my belief that the issue was related to Service Broker. So, I started the broker again, and TempDb disk usage shot up again.

Trying to Figure Out Why It's Happening

I started searching the Internet for issues with Service Broker and TempDb. I couldn't confirm what was happening. At this point, I opted to look at my queues to try to get a better understanding of what was going on in my process.

To my astonishment, I found that the queue I use to start the asynchronous process had a significant amount of queued messages, even though my asynchronous process didn't have any remaining work to do.

I tried to end one of the conversations in the initiating queue and it disappeared. At this point, I was confused as my code was calling the End Conversation function. I looked at the Microsoft documentation for End Conversation. It seems that when I was developing this, I had missed a key statement (Ends one side of an existing conversation) in the documentation and misunderstood what End Conversation was actually doing.

Solution

I had an inkling of what I needed to do to solve the problem. However, I still opted to do some additional research on the issue so that I solve it correctly. I found this blog post by Rusanu Consulting which tackles this issue.

I implemented a new activator function for the initiating queue as per the instructions on the blog. After running some tests to verify that the solution does not cause the same issue.

Conclusion

The issue has now been fixed and the fix deployed to production. TempDb disk usage has gone down to around 100Mb and CPU usage has gone down to 6%.


Published in: Troubleshooting