Skip to content

feature: add create_subgraph()#2441

Open
jrgemignani wants to merge 1 commit into
apache:masterfrom
jrgemignani:feature_add_subgraph
Open

feature: add create_subgraph()#2441
jrgemignani wants to merge 1 commit into
apache:masterfrom
jrgemignani:feature_add_subgraph

Conversation

@jrgemignani

Copy link
Copy Markdown
Contributor

Add the feature create_subgraph() for materialized induced-subgraph extraction.

Add ag_catalog.create_subgraph(new_graph, from_graph, node_filter, relationship_filter) which materializes a new, persistent, fully Cypher-queryable AGE graph as the induced subgraph of an existing graph.

Selection follows the graph-theory induced-subgraph definition as operationalized by Neo4j GDS gds.graph.filter():

  • a vertex is kept iff node_filter holds ('*' keeps all);
  • an edge is kept iff relationship_filter holds AND both of its endpoints were kept (no dangling edges).

Filters are arbitrary Cypher predicates bound to n (nodes) and r (relationships) and are evaluated by AGE's own Cypher engine against the source graph, so the full predicate language is available; label selection uses label(n)/label(r) since the match pattern is fixed.

Implementation notes:

  • Result is a real, ACID, registered graph (create_graph + create_v/ elabel), not a virtual view; it composes with cypher() and itself.
  • Entity graphids are reassigned from the destination labels' own sequences (graphid encodes a per-graph label id), and edge endpoints are remapped through an old->new vertex map, enforcing the induced rule via inner joins.
  • Source label tables are read with FROM ONLY to avoid double-copying children under PostgreSQL table inheritance.
  • Properties of any agtype are preserved; self-loops and parallel edges (multigraph structure) are retained.
  • SECURITY INVOKER: reads respect the caller's table privileges and RLS; the new graph is owned by the caller.
  • Validates NULL/identical graph names, missing source, pre-existing destination, and a reserved dollar-quote token in predicates.

Wire-up:

  • sql/age_subgraph.sql (new) registered in sql/sql_files after age_pg_upgrade; identical body added to age--1.7.0--y.y.y.sql so the upgrade-path catalog comparison matches.
  • regress/sql/subgraph.sql + expected output (new), added to REGRESS. Covers full copy, vertex-induced, node+rel, label-only edge drop, bipartite, empty result, composability, self-loops/parallel edges, property fidelity, and error cases over a ~4500-vertex / 2000-edge source graph.

All 38 regression tests pass against PostgreSQL 18.

Co-authored-by: GitHub Copilot (Claude Opus 4.8) <[email protected]>

modified: Makefile
modified: age--1.7.0--y.y.y.sql
new file: regress/expected/subgraph.out
new file: regress/sql/subgraph.sql
new file: sql/age_subgraph.sql
modified: sql/sql_files

Add the feature create_subgraph() for materialized induced-subgraph
extraction.

Add ag_catalog.create_subgraph(new_graph, from_graph, node_filter,
relationship_filter) which materializes a new, persistent, fully
Cypher-queryable AGE graph as the induced subgraph of an existing graph.

Selection follows the graph-theory induced-subgraph definition as
operationalized by Neo4j GDS gds.graph.filter():
  * a vertex is kept iff node_filter holds ('*' keeps all);
  * an edge is kept iff relationship_filter holds AND both of its
    endpoints were kept (no dangling edges).

Filters are arbitrary Cypher predicates bound to `n` (nodes) and `r`
(relationships) and are evaluated by AGE's own Cypher engine against the
source graph, so the full predicate language is available; label
selection uses label(n)/label(r) since the match pattern is fixed.

Implementation notes:
  * Result is a real, ACID, registered graph (create_graph + create_v/
    elabel), not a virtual view; it composes with cypher() and itself.
  * Entity graphids are reassigned from the destination labels' own
    sequences (graphid encodes a per-graph label id), and edge endpoints
    are remapped through an old->new vertex map, enforcing the induced
    rule via inner joins.
  * Source label tables are read with FROM ONLY to avoid double-copying
    children under PostgreSQL table inheritance.
  * Properties of any agtype are preserved; self-loops and parallel
    edges (multigraph structure) are retained.
  * SECURITY INVOKER: reads respect the caller's table privileges and
    RLS; the new graph is owned by the caller.
  * Validates NULL/identical graph names, missing source, pre-existing
    destination, and a reserved dollar-quote token in predicates.

Wire-up:
  * sql/age_subgraph.sql (new) registered in sql/sql_files after
    age_pg_upgrade; identical body added to age--1.7.0--y.y.y.sql so the
    upgrade-path catalog comparison matches.
  * regress/sql/subgraph.sql + expected output (new), added to REGRESS.
    Covers full copy, vertex-induced, node+rel, label-only edge drop,
    bipartite, empty result, composability, self-loops/parallel edges,
    property fidelity, and error cases over a ~4500-vertex / 2000-edge
    source graph.

All 38 regression tests pass against PostgreSQL 18.

Co-authored-by: GitHub Copilot (Claude Opus 4.8) <[email protected]>

modified:   Makefile
modified:   age--1.7.0--y.y.y.sql
new file:   regress/expected/subgraph.out
new file:   regress/sql/subgraph.sql
new file:   sql/age_subgraph.sql
modified:   sql/sql_files

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new SQL API ag_catalog.create_subgraph() that materializes an induced subgraph from an existing AGE graph into a new persistent graph schema, plus regression coverage and extension wiring so the function is available on fresh install and via upgrades.

Changes:

  • Introduce ag_catalog.create_subgraph(new_graph, from_graph, node_filter, relationship_filter) implemented in PL/pgSQL, copying filtered vertices/edges label-by-label while remapping graphids.
  • Wire the new SQL file into extension build/install and the upgrade template.
  • Add a dedicated regression test (subgraph) with expected output.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
sql/sql_files Registers age_subgraph.sql for extension build ordering.
sql/age_subgraph.sql Implements create_subgraph() and its catalog comment.
age--1.7.0--y.y.y.sql Mirrors create_subgraph() in the upgrade template so upgrade-path catalog matches.
Makefile Adds subgraph to the regression test suite list.
regress/sql/subgraph.sql New regression test exercising correctness, edge cases, and composability.
regress/expected/subgraph.out Expected output for the new regression test.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread sql/age_subgraph.sql
EXECUTE format(
'CREATE TEMP TABLE _ag_sg_vstage ON COMMIT DROP AS '
'SELECT t.id AS old_id, '
' ag_catalog._graphid(%s, nextval(%L)) AS new_id, '
Comment thread sql/age_subgraph.sql
DROP TABLE IF EXISTS _ag_sg_estage;
EXECUTE format(
'CREATE TEMP TABLE _ag_sg_estage ON COMMIT DROP AS '
'SELECT ag_catalog._graphid(%s, nextval(%L)) AS new_id, '
Comment thread age--1.7.0--y.y.y.sql
EXECUTE format(
'CREATE TEMP TABLE _ag_sg_vstage ON COMMIT DROP AS '
'SELECT t.id AS old_id, '
' ag_catalog._graphid(%s, nextval(%L)) AS new_id, '
Comment thread age--1.7.0--y.y.y.sql
DROP TABLE IF EXISTS _ag_sg_estage;
EXECUTE format(
'CREATE TEMP TABLE _ag_sg_estage ON COMMIT DROP AS '
'SELECT ag_catalog._graphid(%s, nextval(%L)) AS new_id, '
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants