0 90 en

How to store hierarchical data in the database

Categories: 💻 Programming

This article considers popular options for storing hierarchical data in SQL and NoSQL databases.

Materialized Path (Simple, supported by SQL/NoSQL)

Materialized Path

Materialized Path represents hierarchy as a string path (e.g., "1/3/3.2/3.2.1").

How Materialized Path Works

Each row stores its full hierarchical path.

Queries use LIKE filters to fetch descendants.

Schema in SQL

CREATE TABLE table_example (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    path TEXT NOT NULL
);

Query Example

SELECT * FROM table_example WHERE path LIKE '3/%';

DB structure in NoSQL

{
    "_id": 53,
    "name": "Step 3.2.1",
    "path": "3/3.2/3.2.1"
}

Query Example

{ "path": { "$regex": "^3/" } }

Materialized Path Pros

  • ✅ Simple to implement
  • ✅ Fast reads using string pattern matching.

Materialized Path cons

  • ❌ Updates require rewriting affected paths.
  • ❌ Less flexible for profound or frequently changing hierarchies.

Closure Table (Recommended for Complex Queries & Large Trees)

Closure Table

Closure Table stores all ancestor-descendant relationships explicitly in a separate table.

How Closure Table Works

  • Each node links to all its ancestors.
  • Enables fast parent-child lookups.

Schema in SQL

CREATE TABLE File_Name (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL
);

CREATE TABLE tree_path (
    ancestor_id INT NOT NULL,
    descendant_id INT NOT NULL,
    depth INT NOT NULL,
    PRIMARY KEY (ancestor_id, descendant_id),
    FOREIGN KEY (ancestor_id) REFERENCES hierarchy(id),
    FOREIGN KEY (descendant_id) REFERENCES hierarchy(id)
);

Query Example (Get all descendants of step 3)

SELECT ps.* FROM File_Name  ps
JOIN tree_path ph ON ps.id = ph.descendant_id
WHERE ph.ancestor_id = 3;

Closure Table Pros

  • ✅ Fast queries for deep hierarchies.
  • ✅ Supports flexible parent-child lookups.

Closure Table Cons

  • ❌ Requires an extra table.
  • ❌ Inserts and deletions involve multiple updates.

Adjacency List (Recommended for Small, Shallow Trees)

Adjacency List (Recommended for Small, Shallow Trees)

Adjacency List represents hierarchy by linking each node to its direct parent.

Schema in SQL

CREATE TABLE tree (
    id SERIAL PRIMARY KEY,
    name TEXT NOT NULL,
    parent_id INT NULL,
    FOREIGN KEY (parent_id) REFERENCES tree (id)
);

Query Example (Get children of a specific step)

SELECT * FROM tree WHERE parent_id = 3;

Adjacency List Pros

  • ✅ Very simple schema, easy to understand.
  • ✅ Easy inserts/deletes.

Adjacency List Cons

  • ❌ Requires recursive queries to retrieve complete hierarchy.
  • ❌ Poor performance for deep hierarchies.

Ancestors Array (Recommended for Flexible NoSQL Queries)

Each document stores an array of ancestor IDs, allowing quick lookups.

The following example models the tree using an Array of Ancestors. In addition to the ancestor's field, these documents also store the reference to the immediate parent category in the parent field:

Example Document in MongoDB

db.categories.insertMany( [
  { _id: "MongoDB", ancestors: [ "Books", "Programming", "Databases" ], parent: "Databases" },
  { _id: "dbm", ancestors: [ "Books", "Programming", "Databases" ], parent: "Databases" },
  { _id: "Databases", ancestors: [ "Books", "Programming" ], parent: "Programming" },
  { _id: "Languages", ancestors: [ "Books", "Programming" ], parent: "Programming" },
  { _id: "Programming", ancestors: [ "Books" ], parent: "Books" },
  { _id: "Books", ancestors: [ ], parent: null }
] )

Query Example

db.categories.findOne( { _id: "MongoDB" } ).ancestors

Ancestors Array Pros 

  • ✅ Fast retrieval using indexes.
  • ✅ Supports flexible queries.

Ancestors Array Cons

❌ Updating ancestor paths requires updating multiple documents.

Nested Sets Model (Alternative for Read-Optimized NoSQL Trees)

MongoDB also suggests a nested sets model, where each node has left and right values to define its position in the hierarchy.

Consider the following hierarchy of categories:

The following example models the tree using Nested Sets:

db.categories.insertMany( [
   { _id: "Books", parent: 0, left: 1, right: 12 },
   { _id: "Programming", parent: "Books", left: 2, right: 11 },
   { _id: "Languages", parent: "Programming", left: 3, right: 4 },
   { _id: "Databases", parent: "Programming", left: 5, right: 10 },
   { _id: "MongoDB", parent: "Databases", left: 6, right: 7 },
   { _id: "dbm", parent: "Databases", left: 8, right: 9 }
] )

Query Example

var databaseCategory = db.categories.findOne( { _id: "Databases" } );
db.categories.find( { left: { $gt: databaseCategory.left }, right: { $lt: databaseCategory.right } } );

Nested Sets Pros 

  • ✅ Very fast queries for deep trees.
  • ✅ No recursion needed.

Nested Sets Cons

  • ❌ Complex inserts and updates.
  • ❌ Not as commonly used in NoSQL setups.

Comparison table:

ApproachBest ForRead SpeedWrite ComplexityFlexibility
Materialized Path (SQL/ NoSQL)Simple hierarchies, fast reads✅ Fast❌ Expensive updates❌ Less flexible
Closure Table (SQL)Complex queries, deep trees✅ Fast❌ More storage required✅ Very flexible
Adjacency List (SQL)Small trees, easy updates❌ Slower✅ Easy updates❌ Limited flexibility
Ancestors Array (NoSQL)MongoDB, flexible queries✅ Fast❌ Complex updates✅ Very flexible
Nested Sets (NoSQL)Read-optimized deep trees✅ Very fast❌ Hard updates❌ Less flexible

Key takeaways

Choosing the proper structure depends on your use case, query performance needs, and update frequency:

  • For SQL: Use Closure Table for deep hierarchies, Materialized Path for simple trees, and Adjacency List for small structures.
  • For NoSQL: Use Materialized Path for simplicity, Ancestors Array for flexibility, or Nested Sets for read-heavy use cases.

Comments:

Please log in to be able add comments.