Relational Inductive Bias

There is a recent surge in papers which are using Relational Inductive Bias with Deep Reinforcement Learning. So here is my investigation on what is it and how is this connected to the inductive biases used in Logic.

(Inductive) Bias refers to any basis for choosing one generalization over another, other than strict consistency with the observed training instances [Mitchell 80]. Relational inductive bias refers to biases which impose constraints on relationships and interactions among entities. Logic Programming extensively deals with the entities and relationships. So, it seems to me there should be some equivalence between the biases used in ILP with the Relational Inductive biases.

Bias in ILP

… bias is anything which influences how the concept learner draws inductive inferences based on the evidence. There are two fundamentally different forms of bias: declarative bias, which defines the space of hypotheses to be considered by the learner, i.e., what to search, and preference bias, which determines how to search that space, which hypotheses to focus on, and which ones to prune, etc.
…
ILP systems distinguish two kinds of declarative bias: syntactic bias (sometimes also called language bias) and semantic bias. Syntactic bias imposes restrictions on the form (syntax) of clauses allowed in hypothesis. … Semantic bias imposes restrictions on the meaning, or the behavior of hypotheses.
– Muggleton and De Raedt, 1994

Another typology of biases include following categorizations

Language Bias - specifies set of syntactically acceptable clauses.
Search Bias - specifies which region of hypothesis set should be searched.
Validation Bias - specifies when the search should stop.

Language Bias

Some straight forward syntactic constraints can be maximum length of clause, number of new variables that can be introduced in the clause, number of function operators used in horn clauses, etc. All of these are form of language-bias.

More evolved language biases include

specify the set of clauses allowed in hypotheses
each clause in this set can further contain set of literals which are allowed
$\quad$ e.g, $\operatorname{father}(X, Y) \leftarrow { \operatorname{male}(X), \operatorname{female}(X) }, \operatorname{parent}(X, Y);$
second-order schemata
Second-order schemata defines a language bias as the set of all clauses that can be obtained by instantiating a second-order schema with a second-order substitution.
A second-order schema is a clause, where some of the predicate names are (existentially quantified) predicate variables.
$\quad$ e.g., $S = \exists p, q, r : p(X, Y) \leftarrow q(X, XW), q(YW, Y), r(XW, YW);$
A second-order substitution is a substitution that replaces predicate-variables by predicate names.
$\quad$ e.g., $\theta = { p /\operatorname{connected}, q/\operatorname{part-oj}, r/\operatorname{touches}}$
gives
$\quad$ $S\theta = \operatorname{connected(X, Y)} \leftarrow \operatorname{part-of}(X,XW), \operatorname{part-ox}(YW, Y), \operatorname{touches}(XW, YW)$

Search Bias

Search bias can be a restriction or preference. With restriction, some hypothesis space is ignored. While with preference, some space is prioritized while searching. Search bias is given by $types$ and $modes$ in ILP. This can also be semantic bias

One important search/semantic bias used in most ILP systems is notion of determinate clause. “A clause is determinate if all of its literals are determinate; and a literal is determinate if each of its variables that does not appear in preceding literals has only one possible binding given the bindings of its variables that appear in preceding literals.” (from Muggleton and De Raedt, 94).

Search biases can be broadly categorized as:

Ordering
Specification of ordering for pre-compiled hypotheses or relations to be considered for learning.
Example selection criteria
How the examples are selected to verify some candidate hypothesis
Coverage function
Intermediate validation criteria

Bias in Graph Networks

In Graph Networks, inductive biases used include non-relational inductive biases like choice of activation function, dropout, weight decay, training curricula, algorithm optimization etc. Relational inductive bias, on other hand are the biases that are constrain the interactions of entities, relationships and rules while learning the model. Entities/Nodes are elements with attributes, relations/edges are property between entities and rule is a function mapping set of entities and relations to another entity and relation.

Three main relational inductive bias introduced in graph networks are

Locality
Translation invariance
Permutation invariance

For fully-connected (FC) layers, all the entities (nodes in network) are connected to all other entities and there is no restriction on rules that can be used. So, relational-inductive bias is very weak for FC layers.

For Convolutional layer, the nodes are the pixels of images and edges are the relation between neighbors. Rules are defined by the weights and biases of the hidden network. One relational-inductive bias in here is constraint of locality. Rules should use related nodes, i.e. convolution function should only use the neighbors, it can not use arbitrary pixels. Another relational-inductive bias used is that the kernel matrix is same for all the neighborhoods, i.e. all the localities are related in same way so rule is translation invariant.

In ILP, the locality bias is, perhaps, induced by use of $\operatorname{mode}$. A $+ve , \operatorname{mode}$ is preferred in a literal when adding it as candidate to a hypothesis clause. In a sense, we prefer edges/connected nodes from previously selected nodes in the hypothesis clause. The translation invariance bias is, perhaps, integral in ILP since horn clause by definition use $\forall$ operator.

In images, there is a natural ordering of the pixels and hence the nodes are ordered. But in most graphs, nodes do not have a natural ordering, nor does edges. One of the most important contribution of the Graph Neural Networks and Graph Convolutional Networks is to provide a way to operate on nodes in graph which is permutation invariant. So, this becomes the third and the most important relational inductive bias.

ILP does not assume any natural ordering of the entities or relations. Hence there is no need of an equivalent bias.

Conclusion

There are many more biases in ILP which are not yet accounted for in Graph Networks and hence if we find a way to include those biases for Graph Networks, we might be able to achieve sample efficient and effective learning in Graph Networks.

References

Inductive Logic Programming: theory and methods, Stephen Muggleton and Luc De Raedt, J. Logic Programming 1994
Claire Nédellec, Céline Rouveirol, Hilde Adé, Francesco Bergadano, and Birgit Tausend. Declarative Bias in ILP, 1996.
Tom Mitchell, The need for biases in learning generalizations, 1980.
Battaglia et al. Relational inductive biases, deep learning, and graph network, arXiv:1806.01261, 2018

Previous post Model-Agnostic Meta-Learning Next post Few-Shot Learning with GNN