HiveBrain v1.2.0
Get Started
← Back to all entries
patternMajor

Why are graphs represented as adjacency lists instead of adjacency sets?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
representedwhyaregraphsinsteadlistsadjacencysets

Problem

In answering this question, I was looking for references (textbooks, papers, or implementations) which represent a graph using a set (e.g. hashtable) for the adjacent vertices, rather than a list. That is, the graph is a map from vertex labels to sets of adjacent vertices:

graph: Map>


In fact, I thought that this representation was completely standard and commonly used, since it allows O(1) querying for an edge existence, O(1) edge deletion, and O(1) iterating over the elements of the adjacency set. I have always represented graphs this way both in my own implementations and teaching.

To my surprise, most algorithms textbooks do not cover this directly, and instead represent it using a list of labels:

graph: Map>


As far as I understand, adjacency lists seem strictly worse: both representations support O(1) vertex additions and iteration over adjacent edges, but adjacency lists require O(m) for edge removal or edge existence (in the worst case).

Yet I am baffled that, for example Cormen Leiserson Rivest Stein: Introduction to Algorithms, Morin: Open Data Structures, and Wikipedia all suggest using adjacency lists. They mainly contrast adjacency lists with adjacency matrices, but the idea of storing adjacent elements as a set is only mentioned briefly in an off-hand comment as an alternative to the list representation, if at all. (For example, Morin mentions this on page 255, "What type of collection should be used to store each element of adj?") I must be missing something basic.
Q: What is the advantage of using a list instead of a set for adjacent vertices?

-
Is this a pedagogical choice, an aversion to hashmaps/hashsets, a historical accident, or something else?

-
This question is closely related, but asks about the representation graph: Set. The top answer suggests using my representation. Looking for a bit more context on this.

-
The second answer suggests hash collisions are a problem. But if hash sets are not preferred, another re

Solution

In many algorithms we don't need to check whether two vertices are adjacent, like in search algorithms, DFS, BFS, Dijkstra's, and many other algorithms.

In the cases where we only need to enumerate the neighborhoods, a list/vector/array far outperforms typical set structures. Python's set uses a hashtable underneath, which is both much slower to iterate over, and uses much more memory.

If you want really efficient algorithms (and who doesn't), you take this into account.

If you need $O(1)$ lookup of adjacencies and don't intend to do much neighborhood enumeration (and can afford the space), you use an adjacency matrix. If expected $O(1)$ is good enough, you can use hashtables, sets, trees, or other datastructures with the performance you need.

I suspect, however, that you don't hear about this so often, is because in algorithms classes, it makes analysis much simpler to use lists, because we don't need to talk about expected running time and hash functions.

Editing in two comments from leftaroundabout and jwezorek. Many real world graphs are very sparse and you often see $O(1)$-sized degrees for most of the graphs. This means that even if you want to do lookup, looping through a list is not necessarily much slower, and can in many cases be much faster.

As a "proof", I add some statistics from the graphs from Stanford Network Analysis Platform. Out of approximately 100 large graphs, the average degrees are

Avg. degree
number of graphs

35

43

10

4

2

3

1

1

Context

StackExchange Computer Science Q#146553, answer score: 29

Revisions (0)

No revisions yet.