HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

SUM of DATALENGTHs not matching table size from sys.allocation_units

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
allocation_unitstablesizesyssumdatalengthsfromnotmatching

Problem

I was under the impression that if I were to sum the DATALENGTH() of all fields for all records in a table that I would get the total size of the table. Am I mistaken?

SELECT 
SUM(DATALENGTH(Field1)) + 
SUM(DATALENGTH(Field2)) + 
SUM(DATALENGTH(Field3)) TotalSizeInBytes
FROM SomeTable
WHERE X, Y, and Z are true


I used this query below (that I got from online to get table sizes, clustered indexes only so it doesn't include NC indexes) to get the size of a particular table in my database. For billing purposes (we charge our departments by the amount of space they use) I need to figure out how much space each department used in this table. I have a query that identifies each group within the table. I just need to figure out how much space each group is taking up.

Space per row may swing wildly due to VARCHAR(MAX) fields in the table, so I can't just take an average size * the ratio of rows for a department. When I use the DATALENGTH() approach described above I only get 85% of the total space used in the query below. Thoughts?

SELECT 
s.Name AS SchemaName,
t.NAME AS TableName,
p.rows AS RowCounts,
(SUM(a.total_pages) * 8)/1024 AS TotalSpaceMB, 
(SUM(a.used_pages) * 8)/1024 AS UsedSpaceMB, 
((SUM(a.total_pages) - SUM(a.used_pages)) * 8)/1024 AS UnusedSpaceMB
FROM 
    sys.tables t with (nolock)
INNER JOIN 
    sys.schemas s with (nolock) ON s.schema_id = t.schema_id
INNER JOIN      
    sys.indexes i with (nolock) ON t.OBJECT_ID = i.object_id
INNER JOIN 
    sys.partitions p with (nolock) ON i.object_id = p.OBJECT_ID AND i.index_id = p.index_id
INNER JOIN 
    sys.allocation_units a with (nolock) ON p.partition_id = a.container_id
WHERE 
    t.is_ms_shipped = 0
    AND i.OBJECT_ID > 255 
    AND i.type_desc = 'Clustered'
GROUP BY 
    t.Name, s.Name, p.Rows
ORDER BY 
    TotalSpaceMB desc


It has been suggested that I create a filtered index for each department or partition the table, so I can directly query the space used per index. Filtered i

Solution

Please note that the following info is not intended to be a comprehensive
description of how data pages are laid out, such that one can calculate
the number of bytes used per any set of rows, as that is very complicated.

Data is not the only thing taking up space on an 8k data page:

-
There is reserved space. You are only allowed to use 8060 of the 8192 bytes (that's 132 bytes that were never yours in the first place):

  • Page header: This is exactly 96 bytes.



  • Slot array: this is 2 bytes per row and indicates the offset of where each row starts on the page. The size of this array is not limited to the remaining 36 bytes (132 - 96 = 36), else you would be effectively limited to only putting 18 rows max on a data page. This means that each row is 2 bytes larger than you think it is. This value is not included in the "record size" as reported by DBCC PAGE, which is why it is kept separate here instead of being included in the per-row info below.



  • Per-Row meta-data (including, but not limited to):



  • The size varies depending on the table definition (i.e. number of columns, variable-length or fixed-length, etc). Info taken from @PaulWhite's and @Aaron's comments that can be found in the discussion related to this answer and testing.



  • Row-header: 4 bytes, 2 of them denoting the record type, and the other two being an offset to the NULL Bitmap



  • Number of columns: 2 bytes



  • NULL Bitmap: which columns are currently NULL. 1 byte per each set of 8 columns. And for all columns, even the NOT NULL ones. Hence, minimum 1 byte.



  • Variable-length column offset array: 4 bytes minimum. 2 bytes to hold the number of variable-length columns, and then 2 bytes per each variable-length column to hold the offset to where it starts.



  • Versioning Info: 14 bytes (this will be present if your database is set to either ALLOW_SNAPSHOT_ISOLATION ON or READ_COMMITTED_SNAPSHOT ON).



  • Please see the following Question and Answer for more details on this: Slot Array and Total Page Size



  • Please see the following blog post from Paul Randall which has several interesting details on how the data pages are laid out: Poking about with DBCC PAGE (Part 1 of ?)



-
LOB pointers for data that is not stored in row. So that would account for DATALENGTH + pointer_size. But these are not of a standard size. Please see the following blog post for details on this complex topic: What is the Size of the LOB Pointer for (MAX) Types Like Varchar, Varbinary, Etc?. Between that linked post and some additional testing I have done, the (default) rules should be as follows:

  • Legacy / deprecated LOB types that nobody should be using anymore as of SQL Server 2005 (TEXT, NTEXT, and IMAGE):



  • By default, always store their data on LOB pages and always use a 16-byte pointer to LOB storage.



  • IF sp_tableoption was used to set the text in row option, then:



  • if there is space on the page to store the value, and value is not greater than the max in-row size (configurable range of 24 - 7000 bytes with a default of 256), then it will be stored in-row,



  • else it will be a 16-byte pointer.



  • For the newer LOB types introduced in SQL Server 2005 (VARCHAR(MAX), NVARCHAR(MAX), and VARBINARY(MAX)):



  • By default:



  • If the value is not greater than 8000 bytes, and there is room on the page, then it will be stored in-row.



  • Inline Root — for data between 8001 and 40,000 (really 42,000) bytes, space permitting, there will be 1 to 5 pointers (24 - 72 bytes) IN ROW that point directly to the LOB page(s). 24 bytes for the initial 8k LOB page, and 12 bytes per each additional 8k page for up to four more 8k pages.



  • TEXT_TREE — for data over 42,000 bytes, or if the 1 to 5 pointers can't fit in-row, then there will be just a 24 byte pointer to the starting page of a list of pointers to the LOB pages (i.e. the "text_tree" page).



  • IF sp_tableoption was used to set the large value types out of row option, then always use a 16-byte pointer to LOB storage.



  • I said "default" rules because I did not test in-row values against the impact of certain features such as Data Compression, Column-level Encryption, Transparent Data Encryption, Always Encrypted, etc.



-
LOB overflow pages: If a value is 10k, then that will require 1 full 8k page of overflow, and then part of a 2nd page. If no other data can take up the remaining space (or is even allowed to, I am unsure of that rule), then you have approx 6kb of "wasted" space on that 2nd LOB overflow datapage.

-
Unused space: An 8k data page is just that: 8192 bytes. It does not vary in size. The data and meta-data placed on it, however, does not always fit nicely into all 8192 bytes. And rows cannot be split onto multiple data pages. So if you have 100 bytes remaining but no row (or no row that would fit in that location, depending on several factors) can fit there, the data page is still taking up 8192 bytes, and your 2nd query is only counting the number of data pages. You can find this va

Context

StackExchange Database Administrators Q#127405, answer score: 19

Revisions (0)

No revisions yet.