BLOG by Milch et al. provides a language which help us model random functions and probabilistic properties of unknown objects. Going beyong Herbrand Universe. This mainly reviews the key contributions and limitations of that paper. Written as part of the Statistical Relational Learning course by Prof.Sriraam Natarajan.
Most of the First-Order Logic Models we have seen till now like Bayesian Logic Programs or MLNs works with an assumption that we are given a fixed set of objects, a Herbrand Universe and we have a possible world, Herbrand Interpretation on which we build our BLP model or MLN Network. However, in many practical problems we do not have a fixed Herbrand Universe, or we do not know how many objects are there in the universe. This paper introduces BLOG (Bayesian LOGic) which can perform inference even when the objects are unknown. BLOG is a representation language which can generate objects for such an unknown universe and create a Bayesian network which can then be used for inference. BLOG model describes a stochastic process which creates a world by generating objects in an order. The existence of a generated object is governed by the unknown number of objects in the universe and by the random functions defined in the model. The key idea here is that any property of the newly created object depends only on the objects that are already created. Using this and the concept of context specific independence, BLOG generates a countably infinite Bayesian network in which each node has finite set of parents. The finite set of parents and prior knowledge about the distribution makes the inference possible.
According to me, the key contribution of this paper is the representation language. This language helps us model random functions and probabilistic properties of unknown objects. Consider the Balls in Urn example provided in the paper. We do not know the number of balls in the urn and we randomly draw balls and put it in. We have no way to identify if the same ball was drawn twice. We can only detect the color of the ball and that too has false detection probability of 0.2. With this information, we are supposed to find out number of balls in the urn. Most other models will give up at the first step itself. BLOG provides a simple language which can model such a stochastic scenario by using just 6 kinds of statements: Type Declarations, Random Function Declarations, Non-random Function Definitions, Guaranteed Object Statements, Dependency Statements, Generating Functions, and Number Statements. Once we have the model, we can represent any possible instantiation of the world.
As BLOG creates a Random Variable for every function and object type, we potentially get a Bayesian Network with exponential number of Random Variables. Although context specific independence assumption helps us do the inference, it is computationally impossible to work with a such huge Bayesian Network. This makes the whole development less viable in practical scenarios as the purpose of the model was to provide inference over unknown or non-finite universe.
Representation can have unnecessary variables. For example, if we have the evidence that balls drawn in Draw1 and Draw2 are equal, model creates a new constant for that. Also, we would still have all random variables for both
ObsColor[Draw2]. So, there will be 2n extra random variables where n is the number of functions for Draw object.
To make the Bayesian network inferable, BLOG makes the context specific independence assumption. But this assumption is not straight forward. For instance, in the aircraft example, a blip at time t is dependent on the state of all the aircraft at time t. To make it dependent only on its source aircraft, we would need evidence or prior knowledge about the source of the blip.
As the BLOG assumes that any newly created object depends only on the already existing object, it is assuming the universe to be acyclic and objects have topological order.
The model cannot learn any prior distributions. It assumes that it has all the prior distributions from the domain expert.
As each object is stored as a tuple, they can expand to considerable size. For example, in the aircraft model each blip is represented as (Blip, (Source, (Aircraft, 2)), (Time, 8), 1), if the object blip had k properties the tuple size would increase further.
- The examples of the stochastic world taken are motivating. The paper provides a beautiful explanation on the difference between conditioning on the constant versus conditioning on the existence.
- These new constants resemble Skolem constants, but conditioning on assertions about the new constants is not the same as conditioning on an existential sentence. For example, suppose you go into a new wine shop, pick up a bottle at random, and observe that it costs USD 40. This scenario is correctly modeled by introducing a new constant Bottle1 with a Uniform CPD. Then observing that Bottle1 costs over USD 40 suggests that this is a fancy wine shop. On the other hand, the mere existence of a USD 40+ bottle does not suggest this, because almost every shop has some bottle at over USD 40.”
- The paper represents the model, functions, worlds, etc,. in mathematical expressions but doesn’t provide strong mathematical proof/explanation for the unique joint distribution.
- There is inconsistency in the models. For instance, Aircraft example has a number function for creating the blip which assigns the source and time simultaneously. However, in the Ball example, true color of balls is not assigned when it is created using number function. Similarly, the number function of Publication assigns Author but the generating function for author is missing. Hence, it appears that there can be multiple models to represent the same scenario. However, there is no explanation if different representation would result in same joint distribution.
- Paper doesn’t put forward enough experimental evidence. The experiment conducted asserts that 10 balls were drawn and all appeared blue. The query was made about number of balls in the urn. Results showed that when the prior for the number of balls is uniform over 1, . . . , 8, the posterior puts more weight on small numbers of balls. And this experiment was validated by saying ”this makes sense because the more balls there are in the urn, the less likely it is that they are all blue”. Few more experiments would have helped.
- Brian Milch, Bhaskara Marthi, Stuart Russell, David Sontag, Daniel L. Ong and Andrey Kolobov, BLOG: Relational Modeling with Unknown Objects, IJCAI 2005
- Lise Getoor and Ben Tasker, Introduction to Statistical Relational Learning, (Adaptive Computation and Machine Learning), The MIT Press.