### harsha kokel

## Relational Reinforcement Learning

My notes on Džeroski, Sašo, Luc De Raedt, and Kurt Driessens, Machine Learning 2001

The paper came before the goal-conditioned RL, Multi-task RL or Graph Neural Network literature. Major motivation of this paper is to learn a **generalizable policy**. Generalization in terms of **varying number of objects** in the domain (for example, in blocks-world number of blocks can change) or **change in the goal state** (for example, stack red block on blue block instead if green on yellow).

Authors demonstrate that by using approaches from inductive logic programming literature, first-order policy can be learnt which naturally supports both the generalization discussed above.

In particular, author propose to learn **Q-Tree** i.e. TILDE-RT (Top-down Induction of Logical decision trees for regression) as Q-Function which take the state and action pair and predict q values. The policy function, **P-Tree**, can then be induced from Q-Tree.

Following Q-RRL algorithm is proposed to learn Q-Tree, which updates the TILDE-RT after every episode. Data set used to learn the TILDE-RT is generated by exploring the environment and P-RRL algorithm is proposed for inducing P-Tree from Q-Tree

### Critique

- Proposed solution does not seem to scale well specifically because the Logical programs do not scale with higher number of data points or high dimensional data.
- The relational trees might be able to generalize to various number of blocks but I think it will not generalize to different goals. For e.g. if all the training examples had
`goal(on(.,.))`

and if the test examples have`goal(clear(.))`

, I do not think the TILDE-RT will be able to achieve that goal. - With Graph Neural Network and goal-conditioned RL, both the generalizations targeted by Q-RRL are achieved in a scalable manner. So, the only additional benefit RRL really has is the use of domain knowledge, which comes from ILP.