HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

why update a table take so long on oracle

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
whyupdatetakelongoracletable

Problem

I have a table A which has 3800+ rows, another table B with 300+ rows.

when I execute a update like this:

update A set some_colum = '2' where another_column not in (select id from B)


it takes 33 seconds to finish. Is that too bad performance?

how to improve?

I add a non-unique index on A.another_column, but it does't help. B.id is primary key.

here is the explain :

Solution

Looks like my short answer was not to everyone's taste, so let me show you the long answer

First of all, let's create our lab.

@WestFarmer mentioned a couple of tables

SQL>create table A (
  2  some_column    number,
  3  another_column number
  4  );

Table created.

SQL>create table B (
  2  id number
  3  );

Table created.


Also a mentoned that he create an index over table A (another_column) and that B(id) was the primary key.

SQL>create index idx1 on A (another_column);

Index created.

SQL>alter table B add constraint b_pk primary key (id);

Table altered.


Now let's populate both tables, 3800+ for A and 300+ for B. I fix it to 4000 and 400.

SQL>declare
  2    i number;
  3    j number;
  4  begin
  5    for i in 1..400 loop
  6      for j in 1..10 loop
  7        insert into A values (j, i+100);
  8      end loop;
  9      insert into B values (i);
  10   end loop;
  11 end;
  12 /

PL/SQL procedure successfully completed.


Also I am going to add this extra record, just to show you why I suggest NOT EXIST instead NOT IN

SQL>insert into A values (1,NULL);

1 row created.

SQL>commit;

Commit complete.


As you will see below, ANOTHER_COLUMN include values from 101 to 500 for table A

SQL>select max(another_column), min(another_column) from A

MAX(ANOTHER_COLUMN) MIN(ANOTHER_COLUMN)
------------------- -------------------
                500                 101


And id store values from 1 to 400

SQL>select max(id), min(id) from B;

   MAX(ID)    MIN(ID)
---------- ----------
       400          1


Now the logic said that NOT IN should be the same as NOT EXIST, but if we count the record using not in you will see that it does not include NULL values:

SQL>select count(*) from A where another_column not in (select id from B);

  COUNT(*)
----------
      1000


However if we use not exists it include the null record I inserted.

SQL>select count(*) from A where not exists (select null from B where another_column = id);

  COUNT(*)
----------
      1001


Now the question does not mention nothing about NULL values but performance. So, let's remove the null record

SQL>delete A where another_column is null;

1 row deleted.

SQL>commit;

Commit complete.

SQL>select count(*) from A where another_column not in (select id from B);

  COUNT(*)
----------
      1000

SQL>select count(*) from A where not exists (select null from B where another_column = id);

  COUNT(*)
----------
      1000


Now both options looks similar, right? No matter if we use NOT IN or NOT EXIST.

I will show you why is better option NOT EXISTS from the performance perspective

I have activated autotrace option in order to see the execution plan.

First, let's run NOT IN option

SQL>update A set some_column = '2' where another_column not in (select id from B);

1000 rows updated.

Elapsed: 00:00:00.39

Execution Plan
----------------------------------------------------------
Plan hash value: 3794573283

------------------------------------------------------------------
| Id  | Operation                 | Name | Rows  | Bytes | Cost  |
------------------------------------------------------------------
|   0 | UPDATE STATEMENT          |      |  4000 |   152K|     4 |
|   1 |  UPDATE                   | A    |       |       |       |
|*  2 |   HASH JOIN RIGHT ANTI SNA|      |  4000 |   152K|     4 |
|   3 |    INDEX FULL SCAN        | B_PK |   400 |  5200 |     1 |
|   4 |    TABLE ACCESS FULL      | A    |  4000 |   101K|     2 |
------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   2 - access("ANOTHER_COLUMN"="ID")

Note
-----
   - cpu costing is off (consider enabling it)
   - dynamic sampling used for this statement (level=2)

Statistics
----------------------------------------------------------
          7  recursive calls
       1022  db block gets
         44  consistent gets
          0  physical reads
     248832  redo size
        397  bytes sent via SQL*Net to client
        584  bytes received via SQL*Net from client
          3  SQL*Net roundtrips to/from client
          1  sorts (memory)
          0  sorts (disk)
       1000  rows processed


Now, let's run NOT EXIST option

```
SQL>update A set some_column = '2' where not exists (select null from B where another_column = id);

1000 rows updated.

Elapsed: 00:00:00.33

Execution Plan
----------------------------------------------------------
Plan hash value: 2312933728

------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost |
------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 4000 | 152K| 3 |
| 1 | UPDATE | A | | | |
| 2 | NESTED LOOPS ANTI | | 4000 | 152K| 3 |
| 3 | TABLE ACCESS FULL| A | 4000

Code Snippets

SQL>create table A (
  2  some_column    number,
  3  another_column number
  4  );

Table created.

SQL>create table B (
  2  id number
  3  );

Table created.
SQL>create index idx1 on A (another_column);

Index created.

SQL>alter table B add constraint b_pk primary key (id);

Table altered.
SQL>declare
  2    i number;
  3    j number;
  4  begin
  5    for i in 1..400 loop
  6      for j in 1..10 loop
  7        insert into A values (j, i+100);
  8      end loop;
  9      insert into B values (i);
  10   end loop;
  11 end;
  12 /

PL/SQL procedure successfully completed.
SQL>insert into A values (1,NULL);

1 row created.

SQL>commit;

Commit complete.
SQL>select max(another_column), min(another_column) from A

MAX(ANOTHER_COLUMN) MIN(ANOTHER_COLUMN)
------------------- -------------------
                500                 101

Context

StackExchange Database Administrators Q#107279, answer score: 3

Revisions (0)

No revisions yet.