HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Generating sequences partitioned or grouped by field(s)

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fieldgeneratinggroupedpartitionedsequences

Problem

Given a table like:

Id     Grouping    SequenceId    
1         A            1         
2         B            1         
3         A            2         
4         A            3         
5         C            1         
6         B            2


Where SequenceId increments based on Grouping, like the result of DENSE_RANK()

I would like to efficiently generate the next SequenceId on INSERT. SQL Server has SEQUENCE, but I would need a sequence for each Grouping. I could also create a table like the following (perhaps with a few more columns for flexibility):

Grouping     NextValue
   A             4
   B             3
   C             2


and a procedure to atomically fetch the next value and update the table, but this wouldn't be as effecient as a SEQUENCE and feels an awful lot like reinventing the wheel. I just haven't found the wheel I'm looking for yet.

What is the efficient and scale-able way to generate sequences per grouping, when the number of sequences can be huge and I want to avoid the performance hit at read time of an aggregate function like DENSE_RANK()?

Solution

Ultimately, the best solution will depend on the specific details of what you need, and how various solution perform in your environment.

First: using a SEQUENCE or a table of virtual "sequences" will result in behavior closer to ROW_NUMBER than to DENSE_RANK - with DENSE_RANK, if the value you're ordering on (say, a record creation date) is the same for three records in your partition, they'd all get the same rank, but they'd each get a unique row number.

Second, even using a SEQUENCE, you cannot guarantee that there will be no gaps in your numbering. If you grab the next value from the sequence, but end up rolling back the transaction, the sequence number is still consumed.

An additional concern in your situation is managing the sequences. You would presumably need one sequence for each group you created. Creating and using the sequences would require dynamic SQL. It might not be vulnerable to SQL injection (if all the dynamic values are system-generated rather than user-input-generated), but this could still be a concern for some.

Also, you'd need a way to ensure that the Grouping values were properly tied to the sequences. If those values are user-facing, then someone will want to change one at some point - if the sequence names depend on that user-facing value, then the original sequence would be lost after a change, and a new one would presumably have to be started (or, you'd have to include code to update the sequence name after a change). More practically, you might need a table with a Grouping ID, and the name of the corresponding SEQUENCE.

Using a table of "virtual sequences" lets you avoid some of these problems, but can create another, much larger one - blocking. If you include retrieving the next value from the virtual sequence and updating it in the transaction to create your record, then, if the transaction is rolled back, then so is the sequence number change; however, you can't add another record to the table until that first transaction is completed. I worked with a table like that, and almost every deadlock we encountered came back to that table. And, if you grab the next ID outside the transaction where you insert a new record, then you're back to maybe having gaps in the ID range.

Of course, you should consider that gaps can happen naturally, as well - whenever a record is deleted, there will be a gap. Are those gaps tolerated/desired, or does the business owner of the process really want to see 1, 2, 3, 4, 5, even if the rows are actually 1, 3, 4, 6, 7 (because 2 and 5 were deleted)?

So, work out what the actual reasoning behind the requirement is. If there is an absolute need to have the report show sequences with no gaps whatsoever, then ROW_NUMBER (or, possibly, if you want the same value for ties, DENSE_RANK) is probably the best solution available to you, and I'd check on just how much of a performance hit it causes. If some gaps (especially for deleted rows) should be retained, you almost have to go with one of the options you suggested above. If rollbacks are rare in your application (and I've seen some where that's true), a SEQUENCE is probably your best bet, performance wise. The "virtual sequence" table has to be handled very carefully, or it can be a major performance bottleneck; but, it can work (I should note that most of those deadlock issues I mentioned earlier occurred in jobs processing bulk data inserts, not in normal user activity in the application, and additional routines to grab a range of values instead of one at a time were created - however, using those increased the likelihood of a skipped values again).

Make sure the business owner requesting this understands that providing permanent sequence values with no gaps can have significant impact on performance, etc., and see if the requirement is really a must-have, or a nice-to-have (or, actually requires ROW_NUMBER or DENSE_RANK).

Context

StackExchange Database Administrators Q#174606, answer score: 4

Revisions (0)

No revisions yet.