HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Service Broker - Conversation Lifetime?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
brokerconversationservicelifetime

Problem

We are trying to get Service Broker working in our environment in order to solve a business case. I don't know if the message title is a good one, but my question is below. But it may not be a good question, so after that is what we are doing and why I think it's the right question.

How many messages should be sent on a conversation before ending the conversation?

We want to use Service Broker in order to asynchronously update a result table. The result table is flattened and fast. We have triggers on the base tables that send a message with their table and primary key. We have three queues:

  • Low Latency - objective is 15 second to process. It handles items that change relating to a specific item.



  • Bulk Queue - objective is 5 minutes to process. It handles when something changes that affects many hundred (or thousands) of items. It breaks out the list of items that were affected and feeds them to the Deferred Low Latency Queue.



  • Deferred Low Latency - objective is 30 minutes to process. This processes items, but only from the bulk queue.



Basically, if a client's information updates, that affects many products, so that gets sent to the bulk queue for slower processing. However, if a product gets updated, that gets sent to the low latency queue.

We reuse conversations similar to Remus Rusanu's blog http://rusanu.com/2007/04/25/reusing-conversations/, with the exception that we do it based on the modulus of the primary key. This has the side benefit of aiding in de-duplication of primary key's.

So, we are re-using conversations and are within our guidelines. With two threads, I was able to burn through 125 messages/second (artificial drop of several thousand messages), which is more than capable of keeping up with production (est. 15 messages/sec).

However, the problem we are experiencing is that after a period of time, ~4 hours or 120K messages, we started seeing blocks and high contention on sysdesend and the queue table. The locks are LCK_M_

Solution

I know it's bad form to answer your own question, but I wanted to close this out for anyone who was interested. We finally did manage to resolve the issue, or at least resolve it enough to meet our requirements. I want to thank everyone who contributed comments; Remus Rusanu and Kin as they were very helpful.

Our database is quite busy and is in RCSI mode. We have multiple (thousands) of mobile devices that update their location information every 45 seconds. Through these updates, multiple tables get their information updated (poor design, as I would have limited the volatile information to a single table and then joined it in for the results). These tables are the same ones that we were attempting to asynchronously generate reporting information for rather than having the end users go directly against the base tables.

We initially had the triggers doing a cursor over the modified records in every update/insert statement (should have been one row in most cases) and sending each primary key in a message to the service broker. Inside service broker, especially the bulk queue were further cursors which executed the upsert procedure for the report (one execution per primary key).

What finally got us working:

-
We removed the cursors and settled on sending larger messages. Still one message per user transaction per table, but we now send messages with more than one primary key.

-
The bulk processor also sends multiple keys per message, which reduced the number of SEND CONVERSATIONS that were going on as it shuffled messages to the other queue as appropriate.

-
The most volatile table (our mobile device data table) had it's triggers removed. We updated the upsert procedure to include the appropriate foreign keys and now we just join back on that table when fetching results to the users. This table easily contributed 80% of the messages we had to process in a day.

We process ~1M messages a day (without the Mobile table) and the vast majority (99%+) of our messages are processed inside our objective. We still have the occasional outlier but given the rare nature of the that it is deemed acceptable.

Contributing factors:

-
I found a bug in the conversation cleanup procedure mentioned earlier that was not actually cleaning conversations appropriately, and prematurely ending them. This has now resulted in our sysdesend count to never be more than a few thousand (most of that comes from using the 150 trick).

-
The cursors in the triggers appeared to hold more locking than anticipated (even with static, forward_only). removing those seems to have made the locks we do see on SEND CONVERSATION more transient in nature (or at least the times we see are much lower).

-
We were essentially running two solutions side by side (the Service Broker solution backend (for testing under production load)) and the current solution (terrible query that spans many tables).

As a side benefit, this has uncovered a Ghost Record Cleanup problem and while it was not on the Service Broker tables (system or queue), it is quite rampant in our system and the symptoms line up really well with our "no clear cause" issues we experience at times. Investigation is ongoing on that, we are trying to find the tables that are contributing to it and we will probably just routinely rebuild their indexes.

Thank you once again.

UPDATE 2021

I got another upvote on this so I wanted to come back and revisit. We eventually abandoned Service Broker shortly after I asked this question as our transaction workload was still far too high/fast to work effectively. We eventually moved to this style of queue for our high throughput message system and it works great.

http://kejser.org/implementing-message-queues-in-relational-databases/

Context

StackExchange Database Administrators Q#187571, answer score: 4

Revisions (0)

No revisions yet.