patternsqlModerate
SUM of DATALENGTHs not matching table size from sys.allocation_units
Viewed 0 times
allocation_unitstablesizesyssumdatalengthsfromnotmatching
Problem
I was under the impression that if I were to sum the
I used this query below (that I got from online to get table sizes, clustered indexes only so it doesn't include NC indexes) to get the size of a particular table in my database. For billing purposes (we charge our departments by the amount of space they use) I need to figure out how much space each department used in this table. I have a query that identifies each group within the table. I just need to figure out how much space each group is taking up.
Space per row may swing wildly due to
It has been suggested that I create a filtered index for each department or partition the table, so I can directly query the space used per index. Filtered i
DATALENGTH() of all fields for all records in a table that I would get the total size of the table. Am I mistaken?SELECT
SUM(DATALENGTH(Field1)) +
SUM(DATALENGTH(Field2)) +
SUM(DATALENGTH(Field3)) TotalSizeInBytes
FROM SomeTable
WHERE X, Y, and Z are trueI used this query below (that I got from online to get table sizes, clustered indexes only so it doesn't include NC indexes) to get the size of a particular table in my database. For billing purposes (we charge our departments by the amount of space they use) I need to figure out how much space each department used in this table. I have a query that identifies each group within the table. I just need to figure out how much space each group is taking up.
Space per row may swing wildly due to
VARCHAR(MAX) fields in the table, so I can't just take an average size * the ratio of rows for a department. When I use the DATALENGTH() approach described above I only get 85% of the total space used in the query below. Thoughts?SELECT
s.Name AS SchemaName,
t.NAME AS TableName,
p.rows AS RowCounts,
(SUM(a.total_pages) * 8)/1024 AS TotalSpaceMB,
(SUM(a.used_pages) * 8)/1024 AS UsedSpaceMB,
((SUM(a.total_pages) - SUM(a.used_pages)) * 8)/1024 AS UnusedSpaceMB
FROM
sys.tables t with (nolock)
INNER JOIN
sys.schemas s with (nolock) ON s.schema_id = t.schema_id
INNER JOIN
sys.indexes i with (nolock) ON t.OBJECT_ID = i.object_id
INNER JOIN
sys.partitions p with (nolock) ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN
sys.allocation_units a with (nolock) ON p.partition_id = a.container_id
WHERE
t.is_ms_shipped = 0
AND i.OBJECT_ID > 255
AND i.type_desc = 'Clustered'
GROUP BY
t.Name, s.Name, p.Rows
ORDER BY
TotalSpaceMB descIt has been suggested that I create a filtered index for each department or partition the table, so I can directly query the space used per index. Filtered i
Solution
Please note that the following info is not intended to be a comprehensive
description of how data pages are laid out, such that one can calculate
the number of bytes used per any set of rows, as that is very complicated.
Data is not the only thing taking up space on an 8k data page:
-
There is reserved space. You are only allowed to use 8060 of the 8192 bytes (that's 132 bytes that were never yours in the first place):
-
LOB pointers for data that is not stored in row. So that would account for
-
LOB overflow pages: If a value is 10k, then that will require 1 full 8k page of overflow, and then part of a 2nd page. If no other data can take up the remaining space (or is even allowed to, I am unsure of that rule), then you have approx 6kb of "wasted" space on that 2nd LOB overflow datapage.
-
Unused space: An 8k data page is just that: 8192 bytes. It does not vary in size. The data and meta-data placed on it, however, does not always fit nicely into all 8192 bytes. And rows cannot be split onto multiple data pages. So if you have 100 bytes remaining but no row (or no row that would fit in that location, depending on several factors) can fit there, the data page is still taking up 8192 bytes, and your 2nd query is only counting the number of data pages. You can find this va
description of how data pages are laid out, such that one can calculate
the number of bytes used per any set of rows, as that is very complicated.
Data is not the only thing taking up space on an 8k data page:
-
There is reserved space. You are only allowed to use 8060 of the 8192 bytes (that's 132 bytes that were never yours in the first place):
- Page header: This is exactly 96 bytes.
- Slot array: this is 2 bytes per row and indicates the offset of where each row starts on the page. The size of this array is not limited to the remaining 36 bytes (132 - 96 = 36), else you would be effectively limited to only putting 18 rows max on a data page. This means that each row is 2 bytes larger than you think it is. This value is not included in the "record size" as reported by
DBCC PAGE, which is why it is kept separate here instead of being included in the per-row info below.
- Per-Row meta-data (including, but not limited to):
- The size varies depending on the table definition (i.e. number of columns, variable-length or fixed-length, etc). Info taken from @PaulWhite's and @Aaron's comments that can be found in the discussion related to this answer and testing.
- Row-header: 4 bytes, 2 of them denoting the record type, and the other two being an offset to the NULL Bitmap
- Number of columns: 2 bytes
- NULL Bitmap: which columns are currently
NULL. 1 byte per each set of 8 columns. And for all columns, even theNOT NULLones. Hence, minimum 1 byte.
- Variable-length column offset array: 4 bytes minimum. 2 bytes to hold the number of variable-length columns, and then 2 bytes per each variable-length column to hold the offset to where it starts.
- Versioning Info: 14 bytes (this will be present if your database is set to either
ALLOW_SNAPSHOT_ISOLATION ONorREAD_COMMITTED_SNAPSHOT ON).
- Please see the following Question and Answer for more details on this: Slot Array and Total Page Size
- Please see the following blog post from Paul Randall which has several interesting details on how the data pages are laid out: Poking about with DBCC PAGE (Part 1 of ?)
-
LOB pointers for data that is not stored in row. So that would account for
DATALENGTH + pointer_size. But these are not of a standard size. Please see the following blog post for details on this complex topic: What is the Size of the LOB Pointer for (MAX) Types Like Varchar, Varbinary, Etc?. Between that linked post and some additional testing I have done, the (default) rules should be as follows:- Legacy / deprecated LOB types that nobody should be using anymore as of SQL Server 2005 (
TEXT,NTEXT, andIMAGE):
- By default, always store their data on LOB pages and always use a 16-byte pointer to LOB storage.
- IF sp_tableoption was used to set the
text in rowoption, then:
- if there is space on the page to store the value, and value is not greater than the max in-row size (configurable range of 24 - 7000 bytes with a default of 256), then it will be stored in-row,
- else it will be a 16-byte pointer.
- For the newer LOB types introduced in SQL Server 2005 (
VARCHAR(MAX),NVARCHAR(MAX), andVARBINARY(MAX)):
- By default:
- If the value is not greater than 8000 bytes, and there is room on the page, then it will be stored in-row.
- Inline Root — for data between 8001 and 40,000 (really 42,000) bytes, space permitting, there will be 1 to 5 pointers (24 - 72 bytes) IN ROW that point directly to the LOB page(s). 24 bytes for the initial 8k LOB page, and 12 bytes per each additional 8k page for up to four more 8k pages.
- TEXT_TREE — for data over 42,000 bytes, or if the 1 to 5 pointers can't fit in-row, then there will be just a 24 byte pointer to the starting page of a list of pointers to the LOB pages (i.e. the "text_tree" page).
- IF sp_tableoption was used to set the
large value types out of rowoption, then always use a 16-byte pointer to LOB storage.
- I said "default" rules because I did not test in-row values against the impact of certain features such as Data Compression, Column-level Encryption, Transparent Data Encryption, Always Encrypted, etc.
-
LOB overflow pages: If a value is 10k, then that will require 1 full 8k page of overflow, and then part of a 2nd page. If no other data can take up the remaining space (or is even allowed to, I am unsure of that rule), then you have approx 6kb of "wasted" space on that 2nd LOB overflow datapage.
-
Unused space: An 8k data page is just that: 8192 bytes. It does not vary in size. The data and meta-data placed on it, however, does not always fit nicely into all 8192 bytes. And rows cannot be split onto multiple data pages. So if you have 100 bytes remaining but no row (or no row that would fit in that location, depending on several factors) can fit there, the data page is still taking up 8192 bytes, and your 2nd query is only counting the number of data pages. You can find this va
Context
StackExchange Database Administrators Q#127405, answer score: 19
Revisions (0)
No revisions yet.