PostgreSQL recursive query explained with a bare minimum dataset and example
Postgres allows to run recursive queries, by using a special syntax that allows to name the final data set and recursively refer to it via Common Table Expressions.
When do you need them ?
In case you are storing hierarchical data (e.g. a tree) in a relational format (e.g. where rows have a foreign key to another row — the parent — in the same table), you can basically return all the subtree records (all the levels) starting from an initial filter (e.g. selection of the root node). It’ll be very clear after the following example that I’ve kept as simple as possible.
Data Example
Memorize this structure. A grandfather has a father that has two children.
grandfather
father
child1
child2
We’ll store this in Postgres. I didn’t add any PK or FK to keep it simple and more readable.
create table people (id integer, nome varchar, parent_id integer);
INSERT INTO people VALUES
(1,'grandfather', null),
(2,'father', 1),
(3,'child1', 2),
(4,'child2', 2),
(5,'other person non connected to the hierarchy', null);
Note that I’ve voluntarily added a fifth record with data not related to the grandfather nor any other node.
Query to return all the hierarchy
Consider the table above with lots of records. If we want to extract the data starting from grandfather (id=1) to the furthest children, we’d have to run a query to get the father (WHERE parent_id=1), then another query to get the children (WHERE parent_id=2). That is repeated for as many level as our tree has.
The following recursive query allows to solve the return the whole hierarchy in a single query.
WITH RECURSIVE x AS (
-- base case starting from grandfather
SELECT
id,
parent_id,
nome
FROM
people
WHERE
id =1
UNION
--- recursive query (note it adds to the partial table "x")
SELECT
p.id,
p.parent_id,
p.nome
FROM
people p
INNER JOIN x x1
ON p.parent_id = x1.id
) SELECT
*
FROM
x ;
and the result is what we want. Note that the other records (e.g. the 5th one) are not displayed. Only the records under the grandfather with id=1 are displayed.
+----+-----------+-------------+
| id | parent_id | nome |
|----+-----------+-------------|
| 1 | <null> | grandfather |
| 2 | 1 | father |
| 3 | 2 | child1 |
| 4 | 2 | child2 |
+----+-----------+-------------+
How it works
Postgres executes the base case first and constructs the result set (that I called “x” to make it clear and visible it’s an internal virtual result-set name and not a real table).
The base query only filters by the grandfather therefore so far we have:
+----+-----------+-------------+
| id | parent_id | nome |
|----+-----------+-------------|
| 1 | <null> | grandfather | <-- added as id=1
+----+-----------+-------------+
After that we have a UNION that adds the recursive query, that adds records having parent equal to a record in “x”, so it’s now adding the father to the result-set.
+----+-----------+-------------+
| id | parent_id | nome |
|----+-----------+-------------|
| 1 | <null> | grandfather |
| 2 | 1 | father | <-- added as its parent_id=1
+----+-----------+-------------+
and at next iteration is also adding child1 and child2 as the father is in the result-set.
+----+-----------+-------------+
| id | parent_id | nome |
|----+-----------+-------------|
| 1 | <null> | grandfather |
| 2 | 1 | father |
| 3 | 2 | child1 | <-- added as its parent_id=2
| 4 | 2 | child2 | <-- added as its parent_id=2
+----+-----------+-------------+
The iterations now stop as there are no queries with children of child1 or child2.
Links
Read the official Common Table Expression (CTE, basically the “WITH” part) along with some examples on the official Postgres website or an alternative explanation.
Clap if useful. Follow me for further related articles