patternsqlMinor
Does Postgres execute non-parallel plan nodes sequentially, IO-optimized, or in parallel threads?
Viewed 0 times
nodespostgresnonthreadsoptimizedplanparalleldoesexecutesequentially
Problem
It's hard to find a proper title to this, so any suggestions and edits are appreciated.
When ANALZYING an plan, you get told when certain nodes emitted the first row and when they emitted their last row. Some nodes can start operating even without having the full result set (a nested loop that does an index lookup receiving rows from an inner node can theoretically already start doing this even before the inner node is fully evaluated, when the inner node is already returning an subset of the results).
I can see these
My questions:
I know internally things are more complicated than this, but I think this is a good high-level explanations of my thoughts.
When ANALZYING an plan, you get told when certain nodes emitted the first row and when they emitted their last row. Some nodes can start operating even without having the full result set (a nested loop that does an index lookup receiving rows from an inner node can theoretically already start doing this even before the inner node is fully evaluated, when the inner node is already returning an subset of the results).
I can see these
actual times overlapping in the evaluations, meaning that some nodes must already start being evaluated even before the inner nodes are finished.My questions:
- Do all the nodes (unless an parallel processing node is specified) run sequentually, meaning they only actually start doing work once their inner node is finished? (nb: considering the actual time overlaps in analyze, I don't think so)
- If they do not run sequentually, but only run in an single thread, do they run IO-optimized, as in, while an inner node is doing something IO-heavy, is an outer node doing computations if possible, and yielding back once the inner node is ready?
- Or, alternatively, do they actually just have multiple threads, one for each node, waking up once they have something to do?
I know internally things are more complicated than this, but I think this is a good high-level explanations of my thoughts.
Solution
The answer is three times “no”.
PostgreSQL produces result rows on demand. To produce a result row, it recurses down the plan tree. If the top node is a nested loop join, it fetches the first row from the outer relation, then it fetches the first matching row from the inner relation. Then it outputs the first result row. These steps are executed sequentially, and all I/O is synchronous.
So the execution of the nodes of a query plan is interleaved, not concurrent. The one exception is parallel query: everything under a “Gather” node is executed in several parallel processes concurrently. But each of the parallel processes proceeds as described above.
PostgreSQL produces result rows on demand. To produce a result row, it recurses down the plan tree. If the top node is a nested loop join, it fetches the first row from the outer relation, then it fetches the first matching row from the inner relation. Then it outputs the first result row. These steps are executed sequentially, and all I/O is synchronous.
So the execution of the nodes of a query plan is interleaved, not concurrent. The one exception is parallel query: everything under a “Gather” node is executed in several parallel processes concurrently. But each of the parallel processes proceeds as described above.
Context
StackExchange Database Administrators Q#320020, answer score: 3
Revisions (0)
No revisions yet.