snippetsqlMinor
How to keep smaller transaction logs files while rebuilding index of very large table
Viewed 0 times
whilelogskeepfilessmallertransactionlargehowindextable
Problem
I need all the help that I can get to get this index rebuild successful. Especially an expert advise on transaction log management.
Background
The target database is a DW database hosting 3 big tables in clustered columnstore index (CCIX). The biggest table called Analog. It holds ~37 billion rows and ~600 GB of high compressed data. Based on our initial estimate using smaller table, a 150 GB CCIX table could expand to 5 TB, so we provisioned additional 29 TB for this rebuild.
Why do we have to do this?
In the first version of our ETL service, the data are not consolidated and sorted first before they are loaded to CCIX. This results to non-optimal use of CCIX as many row segments are not fully filled before they are compressed. An optimal segment must have compressed 1,0485,76 rows and must be sorted in Time so time-based queries can get the most row-group elimination in sql server and less segments to process. We’re learn more CCIX as we go and as more documentatons are available.
Excerpt of my alignment is here
https://drive.google.com/file/d/0BzGLNskaj70UQUtZYW9CZF9iUUk/view?usp=sharing
The new ETL guarantees that data are consolidated and sorted in time before they are loaded to CCIX. So future loads will be sorted, we are concern here with existing data loaded during First Load.
For full case description, please read here.
https://drive.google.com/open?id=0BzGLNskaj70URmZURlVDWVNYd2M
This is our second attempt and I can see the log file is still piling up despite the new script having partition-based index rebuild. We need to keep this in check.
My questions are:
-
I already executed the 2nd attempt (in progress), is it possible to check which partitions are already processed and sorted? I have OFFLINE = ON in the rebuild script.
-
How can we manage the size of transaction log while a partion-based index rebuild is in progress? We need to keep this in check and regularly truncated every parti
Background
The target database is a DW database hosting 3 big tables in clustered columnstore index (CCIX). The biggest table called Analog. It holds ~37 billion rows and ~600 GB of high compressed data. Based on our initial estimate using smaller table, a 150 GB CCIX table could expand to 5 TB, so we provisioned additional 29 TB for this rebuild.
- SQL Server Enterprise 2016
- Recovery mode = Simple
Why do we have to do this?
In the first version of our ETL service, the data are not consolidated and sorted first before they are loaded to CCIX. This results to non-optimal use of CCIX as many row segments are not fully filled before they are compressed. An optimal segment must have compressed 1,0485,76 rows and must be sorted in Time so time-based queries can get the most row-group elimination in sql server and less segments to process. We’re learn more CCIX as we go and as more documentatons are available.
Excerpt of my alignment is here
https://drive.google.com/file/d/0BzGLNskaj70UQUtZYW9CZF9iUUk/view?usp=sharing
The new ETL guarantees that data are consolidated and sorted in time before they are loaded to CCIX. So future loads will be sorted, we are concern here with existing data loaded during First Load.
For full case description, please read here.
https://drive.google.com/open?id=0BzGLNskaj70URmZURlVDWVNYd2M
This is our second attempt and I can see the log file is still piling up despite the new script having partition-based index rebuild. We need to keep this in check.
My questions are:
-
I already executed the 2nd attempt (in progress), is it possible to check which partitions are already processed and sorted? I have OFFLINE = ON in the rebuild script.
-
How can we manage the size of transaction log while a partion-based index rebuild is in progress? We need to keep this in check and regularly truncated every parti
Solution
The most important advice I can give you is to test against a smaller table first. Don't just dive into converting a 37 billion row table. You can refine your method against a smaller table and use that to get an estimate on converting the large table. Also recovering from mistakes is far less costly (no more 24 hour rollback operations). You may also be able to use the smaller table to test query performance to verify that the conversion that you're doing is worth the effort.
The second most important advice is to get someone to check out your storage configuration. If you recently added a lot of storage it may not have been done in a good way to maximize throughput.
To be honest, it's not clear to me why you're seeing so much transaction log activity. Converting the CCI to a rowstore should be a minimally logged operation since you're using a recovery model of simple.
I'll walk through what I know about how SQL Server implements the following statement:
Before writing any rows SQL Server needs to sort the entire table's data by
Along with the sort the data will be written to a new table. The old CCI table will be dropped after the new table is finished. For your biggest table you estimate that this will be 20 TB. It could take a long time to write 20 TB of data.
If you want to convert your data avoiding huge sorts and huge transactions it should be possible to do, but messy. Before going into this I need to point out that if your CCI is already partitioned the right thing to do is to use partition switching to better organize your rowgroups. Based on your question it seems like tables aren't partitioned yet but I want to be sure. I also have not tested any of the following.
Create a partitioned rowstore heap or a partitioned view. The partition scheme of these needs to match your future CCI partition scheme. Delete the
The above insert should require a sort for the heap and will probably require a sort for the partitioned view as well. However, since you control the value for
Note that as you continue to do loops your query will probably take longer and longer to read data from the CCI. This is because it needs to scan past the deleted rows. Periodically you should issue a
After the loops are finished you'll have all of your data in the correct partitions but unsorted. At this point for each partition or table you can partition switch into a heap, build the rowstore clustered index on the table, build the CCI with
Two notes about loading new data into a partitioned CCI. If you want perfectly ordered, full rowgroups you can't use parallel inserts. That's because the parallel reads of the source data will lead to unevenly distributed rows on threads and each thread will load to a different rowgroup. Also, plan your partitions carefully so that you don't need to split a partition with data. You can't directly SPLIT a partition without dropping the CCI which you obviously don't want to do.
The second most important advice is to get someone to check out your storage configuration. If you recently added a lot of storage it may not have been done in a good way to maximize throughput.
To be honest, it's not clear to me why you're seeing so much transaction log activity. Converting the CCI to a rowstore should be a minimally logged operation since you're using a recovery model of simple.
I'll walk through what I know about how SQL Server implements the following statement:
CREATE CLUSTERED INDEX [Analog_ColumnStoreIndex]
ON [dbo].[Analog]
(
[LogTime] ASC,
[CTDIID] ASC,
[WindFarmId] ASC,
[StationId] ASC
)
WITH (
DROP_EXISTING = ON,
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON,
ONLINE = OFF
)
ON [AnalogMonthlyPScheme]([LogTime]);Before writing any rows SQL Server needs to sort the entire table's data by
LogTime, CTDIID, WindFarmId, and StationId. The size of the sort will be massive. I suspect that it will be about the size of the uncompressed clustered rowstore table. It will probably spill over from tempdb to disk. Perhaps that part of the operation is not minimally logged? I can't really guess as to why else you are seeing so much log activity. Note that sorting on LogTime would be sufficient for your conversion and might make the sort cheaper but I don't think it would reduce the workspace required for the sort.Along with the sort the data will be written to a new table. The old CCI table will be dropped after the new table is finished. For your biggest table you estimate that this will be 20 TB. It could take a long time to write 20 TB of data.
If you want to convert your data avoiding huge sorts and huge transactions it should be possible to do, but messy. Before going into this I need to point out that if your CCI is already partitioned the right thing to do is to use partition switching to better organize your rowgroups. Based on your question it seems like tables aren't partitioned yet but I want to be sure. I also have not tested any of the following.
Create a partitioned rowstore heap or a partitioned view. The partition scheme of these needs to match your future CCI partition scheme. Delete the
TOP N rows from the CCI into the partitioned heap using the TABLOCK hint. That should get you minimal logging for the insert. You will not get minimal logging for the delete against the CCI. However, a delete against a CCI is just changing values in a bitmap so it should require far less logging than a standard delete.The above insert should require a sort for the heap and will probably require a sort for the partitioned view as well. However, since you control the value for
N you can limit the size of the sort to fit it within memory + tempdb.Note that as you continue to do loops your query will probably take longer and longer to read data from the CCI. This is because it needs to scan past the deleted rows. Periodically you should issue a
REORGANIZE against the CCI to clean up those deleted rows.After the loops are finished you'll have all of your data in the correct partitions but unsorted. At this point for each partition or table you can partition switch into a heap, build the rowstore clustered index on the table, build the CCI with
DROP_EXISTING = ON and MAXDOP 1, and finally partition switch into the new CCI table. After you are done with all of the partition switching you should have a partitioned CCI with excellent rowgroup elimination by LogTime.Two notes about loading new data into a partitioned CCI. If you want perfectly ordered, full rowgroups you can't use parallel inserts. That's because the parallel reads of the source data will lead to unevenly distributed rows on threads and each thread will load to a different rowgroup. Also, plan your partitions carefully so that you don't need to split a partition with data. You can't directly SPLIT a partition without dropping the CCI which you obviously don't want to do.
Code Snippets
CREATE CLUSTERED INDEX [Analog_ColumnStoreIndex]
ON [dbo].[Analog]
(
[LogTime] ASC,
[CTDIID] ASC,
[WindFarmId] ASC,
[StationId] ASC
)
WITH (
DROP_EXISTING = ON,
PAD_INDEX = OFF,
STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF,
ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON,
ONLINE = OFF
)
ON [AnalogMonthlyPScheme]([LogTime]);Context
StackExchange Database Administrators Q#166757, answer score: 2
Revisions (0)
No revisions yet.