HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlMinor

Why does PostgreSQL aggregate result in the Index or Seq scan outputting all columns of the table?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlresultwhythescancolumnsalloutputtingseqtable

Problem

Doing a query like this:

SELECT fruit, avg(cost) FROM fruit_table GROUP BY fruit


results in a query plan like this:

Sort  (cost=3713873.41..3713874.46 rows=420 width=16)
  Output: fruit, (avg(cost))
  Sort Key: fruittable.fruit
  ->  HashAggregate  (cost=3713849.87..3713855.12 rows=420 width=16)
      Output: fruit, (avg(cost))
      ->  Seq Scan on public.fruit  (cost=0.00..3677849.91 rows=7199991 width=16)
          Output: fruit, weight, cost, ...


The result is correct, but ideally, for performance reasons I would like to have weight omitted in the output of index/seq scan - what am I doing wrong?

This is the output of EXPLAIN ANALYZE VERBOSE:

Sort  (cost=3713873.41..3713874.46 rows=420 width=16) (actual time=31156.940..31156.958 rows=612 loops=1)
  Output: fruit, (avg(cost))
  Sort Key: fruittable.fruit
  Sort Method: quicksort  Memory: 53kB
  ->  HashAggregate  (cost=3713849.87..3713855.12 rows=420 width=16) (actual time=31156.272..31156.729 rows=612 loops=1)
    Output: fruit, avg(cost)
    ->  Seq Scan on public.fruittable  (cost=0.00..3677849.91 rows=7199991 width=16) (actual time=0.004..15050.636 rows=7199991 loops=1)
      Output: fruit, weight, cost, ...
Total runtime: 31157.097 ms


  1. EDIT



I run the following queries based on Craig's comment:

SELECT
pg_stat_get_live_tuples('fruittable'::regclass), 
pg_stat_get_dead_tuples('fruittable'::regclass), 
pg_relation_size('fruittable'::regclass), 
pg_total_relation_size('fruittable'::regclass); 

-- OUTPUT:
pg_stat_get_live_tuples: 0 
pg_stat_get_dead_tuples: 0 
pg_relation_size: 29539123200 
pg_total_relation_size: 32118743040


and

SELECT avg(pg_column_size(x)) FROM (SELECT * FROM fruittable LIMIT 1000) AS x;

-- OUTPUT
avg: 1354.68


  1. EDIT



This is the table definition

```
CREATE TABLE x_2009
(
c_a character varying(70),
c_b character varying(70),
c_c character varying(70),
c_d character varying(70),
c_e character varying(70),
c_f double precision,

Solution

"Seq scans" means full table scan. It's always going to read all of the columns on your table. It's a full table scan. PostgreSQL uses row-based storage, so it can't just read one column at a time.

The reason your query takes so long is that you're reading 7.2 million rows in the table and aggregating them. Apparently, on a fairly slow computer.

Context

StackExchange Database Administrators Q#25739, answer score: 3

Revisions (0)

No revisions yet.