HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Estimate a Database Size

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
databaseestimatesize

Problem

I have to do some project and need to estimate the DB size.
I'm going to use Oracle in the project because we will have a lot of transactions and data.
But in the "presentation" I need to specify, in 5 years, how much I expect the DB to grow.

So, I have a PDF with the way of calculating the row size and another stuff, but for SQL-Server.
I want to do it for Oracle.

The pdf is an estimation formula - without having the real DB, so I can't do queries to the DB to check the actual size (That's What I've seen in other posts here when I googled).

The formula for SQL-Server row is


Row_Size = Fixed_Data_Size +Variable_Data_Size +Null_Bitmap + 4

This formula works in Oracle too?
Also, "4" is the size of the header of a row (I don't know how to translate the meaning in English)

So, I have to do this in Oracle. What will be the value 4 in Oracle?
The other things I think I think I've solved them.

Solution

Ideally, you would create the database, load a bit of sample data, measure the size, and extrapolate. That is, by far, the more accurate method of estimating the size of a database in 5 years.

If you do want to compute the database size, you would generally start by figuring out how many rows will fit in a single block. For simplicity, we'll assume that rows are never deleted and that updates never change the size of the row. We'll also assume no compression is being used. Otherwise, things get a bit (more) complicated.

Calculate the size of the data in the row. For fixed-size data types (i.e. DATE, CHAR), that's just the size of the type. For variable-size data types (i.e. NUMBER, VARCHAR2), that's the average size of the data in the column. There are a couple of bytes of additional overhead, but you can pretty safely ignore that-- they're going to be swamped by the errors in estimating the size of your actual data and in the subsequent estimate of the number of rows per block.

If you expect that each row will have x bytes of data, the number of rows per block will be

> = 
  floor( > * 
          (1 - >/100) / 
          > ) + 1


assuming that a row is smaller than > * >/100. If the row is larger, then don't add 1 to the result of the floor.

Once you know the number of rows per block, the estimated size of the table will be

> = 
  ceil( > / > ) *
    >


Walking through an example

My database block size is 8k and we'll assume that I'm using the default PCTFREE of 10 (meaning 10% of the block is reserved for future updates that increase the size of the row). I'll create a simple two-column table

SQL> create table foo(
  2    foo_id number,
  3    foo_str varchar2(100)
  4  );

Table created.


If foo_id is going to be the primary key with values from 1 to 1 million, each foo_id will consume between 1 and 7 bytes of space. But I also know from doing the test that, on average, it'll take ~6 bytes (actually 5.89 bytes). Of course, the larger the foo_id values get, the more space, on average, each foo_id requires. Oracle needs, on average 1.1 bytes per element to store the numbers 1-10, 1.92 bytes to store 1-100, 2.89 bytes to store 1-1,000, 3.89 bytes to store 1-10,000, 4.89 bytes to store 1-100,000, and 5.89 bytes to store 1-1,000,000. So, let's estimate for our example that foo_id will require 6 bytes and foo_str will require 50 bytes because the average foo_str is roughly 50 bytes. So we'll estimate a row size of 56 bytes.

The number of rows per block

> = 
  floor( 8192 * 
          (1 - 10/100) / 
          56 ) + 1


which works out to 132 rows per block. If we want to estimate the size of a 1 million row table,

> = 
  ceil( 1000000 / 132 ) *
    8192


which works out to 59.19 MB.

Now, let's test our estimate

We'll insert 1 million rows where foo_id goes from 1 to 1,000,000 and foo_str is a string with a random length between 1 and 100.

SQL> ed
Wrote file afiedt.buf

  1  insert into foo
  2    select level, dbms_random.string( 'p', dbms_random.value(1,100))
  3      from dual
  4*  connect by level  /

1000000 rows created.


Our estimate of the average row length was spot on (note that, in reality, you won't be nearly this close-- your estimate of variable column sizes will not be nearly so accurate)

SQL> exec dbms_stats.gather_table_stats( 'SCOTT', 'FOO' );

PL/SQL procedure successfully completed.
SQL> ed
Wrote file afiedt.buf

  1  select avg_row_len, num_rows
  2    from user_tables
  3*  where table_name = 'FOO'
SQL> /

AVG_ROW_LEN   NUM_ROWS
----------- ----------
         56    1000000


But how big is the actual table? The most common measure is to look at the size of the segment which is 72 MB.

SQL> select sum(bytes)/1024/1024 mb
  2    from user_segments
  3   where segment_name = 'FOO';

        MB
----------
        72


Our guess was off by ~20% and that was when we were perfect on our estimate of the size of a row. That's because Oracle allocates space to tables in chunks called extents that we've ignored. There are different algorithms for this that depend on the setup of the tablespace. Assuming recent Oracle versions where all tables are in locally managed tablespaces, you would be choosing between uniform extent allocation and automatic extent allocation. In my example, the tablespace I'm using is using automatic extent allocation. The exact algorithm for that, in turn, may depend on the version of Oracle you're using. In my case, though, the first 16 extents are 64 kb, the next 63 extents are 1 MB, and the last extent is 8 MB

SQL> ed
Wrote file afiedt.buf

  1  select bytes, count(*)
  2    from user_extents
  3   where segment_name = 'FOO'
  4   group by bytes
  5*  order by bytes
SQL> /

     BYTES   COUNT(*)
---------- ----------
     65536         16
   1048576         63
   8388608          1


That means that I probably got a bit unlucky and

Code Snippets

<<rows per block>> = 
  floor( <<database block size>> * 
          (1 - <<pctfree of table>>/100) / 
          <<size of row>> ) + 1
<<size of table>> = 
  ceil( <<number of rows in table>> / <<rows per block>> ) *
    <<database block size>>
SQL> create table foo(
  2    foo_id number,
  3    foo_str varchar2(100)
  4  );

Table created.
<<rows per block>> = 
  floor( 8192 * 
          (1 - 10/100) / 
          56 ) + 1
<<size of table>> = 
  ceil( 1000000 / 132 ) *
    8192

Context

StackExchange Database Administrators Q#23147, answer score: 15

Revisions (0)

No revisions yet.