Rules on Graphs in Graphs of Rules, Part 1
Speed up queries and decouple domain logic from apps
In the previous post, we saw how inference rules illustrate the different meanings of rules. Some of them, like paradigm, model and straightedge, were lost in history in favor of algorithm and regulation.
Inference rules are useful. They bring declarative expressivity — stating what follows from facts rather than how to compute it. But for users, their most salient feature is that they make queries simpler and faster.
This post is part of the Rules series.
I use rules daily in my personal knowledge graph. The most frequent pattern used in queries is linking and block with a page referred to in it. The standard path in Roam is to go to the page reference via :block/refs and then to the page title via :node/title.
Just like in the example from the previous post “has uncle” makes a shortcut from X to Z, instead of going along “has uncle” and “has brother,” in a similar way refs-page is replacing the route pattern
:block/refs → ?page → :node/title.
In the essay Graph Pruning, for example, both queries for counting typed pages can be simplified by this rule, making them shorter and faster. The simpler query for typed pages will look just like this:
[:find (count ?page).
:where
(refs-page "is a" ?isA)
[?isA :block/page ?page]
]
This is a Datalog rule, but using Datomic syntax, so instead of using :- to distinguish the rule head from the rule body, the rule head is put in parentheses. In other words, the rule is a list of lists, the first list being the rule head distinguished with the syntax for grouping and the rest is using the brackets for vectors. The rule for the ref page looks like this:
[
(refs-page ?page-title ?page)
[?block :block/refs ?page]
[?page :node/title ?page-title]
]The refs-page rule will halve the size of the query for the number of words I shared in Writing with Roam.
An important feature of such rules is that they separate domain logic from application code. When rules are themselves nodes in the graph, they are easy to access and inspect. If the graph is based on interoperable standards, the rules can also be fully decoupled from any specific applications and have independent governance. In corporate settings, this becomes critical when organisations need to integrate new data sources, respond to changing business requirements or legislation, or replace a vendor. I will return to this point in the next post.
Now, let’s see how these rules work.
Let’s extend the graph of relations from the previous post with a few more.
If this is an RDF graph, we can define Datalog rules for parent and uncle and use open source tools like Maplib or commercial tools like RDFox and materialize these inferred relations. And that’s how, for example, the personal knowledge graph in the latest Samsung phones work.
But if we want to manage these rules as part of the graph, then we need to either use N3 rules or SHACL rules. SHACL has broader tool support and allows more sophisticated rules than N3. It also has the benefit of expressing the rules with SPARQL. This way, SPARQL, apart from querying and graph manipulation, can also be used for graph generation from heterogeneous data sources, as shown in the post about facade-X, and for expressing rules. Since SPARQL is not as good for recursive rules as Datalog, the next version of SHACL will either allow Datalog rules in SHACL shape graphs or enhance the SHACL rules so that they overcome these limitations.
SHACL rules are either triple rules or SPARQL rules linked to a node shape (SHACL can be extended to include other ways of expressing rules). The node shapes declare the target nodes, which are the computed subset of the nodes in the graph on which the rules are applied. As such, the target node declaration is also an implicit rule.
In this post, I’ll show the graph of rules just as a diagram so that it’s easier to follow. At the end of the next part, I’ll include all the code and the instructions needed to reproduce the examples.
In SHACL, node shapes define the target nodes on which the validation constraints or the rules are applied. A common approach is to target instances of a class. But I want to focus only on relations here and not depend on whether the nodes linked with such relations are declared or inferred as instances of some classes.
There are plenty of ways to express the rules to produce parent and uncle relations. The same set of rules structured differently creates a system with different efficiency, robustness and maintainability. That’s interesting to study and will do so, on a much bigger dataset and one representing actual family relations. Comparing different rule graph structures is useful for finding the best way to use SHACL depending on the context, but more importantly, it demonstrates how the balance between Autonomy and Cohesion plays out even in simple technical systems. All that in the next part.
For the purposes of this post, I’ll use only one node shape and link all rules to it. To make sure all rules have the right set of focus nodes (that’s how the targeted nodes are called during run time), the targeted nodes declared as the union of subjects of the relations “has mother” and “has father.” Here, subjects is used in reference to the triple structure subject-predicate-object, “has mother,” “has father,” “has parent,” and “has uncle,” being the predicates we’ll focus on.
The node shape node is linked to each rule with the predicate sh:rule which is shown in the diagram below without a prefix. There are statements for prefix declarations, order of execution, and activation switches, which we’ll skip for now but will look into in the next part.
The diagram shows the four main nodes with their type, Node Shape and SPARQL rule, respectively. The dashed arrows represent rule dependencies: the third rule uses “has parent” relation, which can only be produced by the first and the second rule, if their conditions are satisfied.
The target declaration in the nodes shape will make three focus nodes: Mai, Sophie, and Pierre, since these are the only nodes in the graph which are subjects of “has mother” and “has father”.
Once “has parent” relations are available, the third rule, focusing on the subjects of “has mother” and “has father” relations, to which $this will be bound, will produce “has uncle” relations between the nodes linked along the path “has parent”-”has brother”. This will be the resulting graph, shown in green:
What will happen if the third rule is made in a way that uses only asserted relations and is independent of any other rules? And what if each rule is in a separate node shape? Stay tuned for the next part, where we’ll see the answer tо these and other questions using a dataset from Wikidata to generate twenty thousand uncles.





