CSE 573 Homework 4

Learning with Markov Logic Networks

Due December 6 In Class

Overview

Markov Logic Networks (MLNs) provide a powerful framework for combining the relative strengths of probability and first order logic.  In this assignment you will explore several tasks involving MLNs within Alchemy, namely: training, structure learning and inference of missing data.  

You will work with data describing the UW CSE department. The domain consists of 12 predicates, and 2707 constants divided into 10 types.  Predicates include: Professor(person), Student(person),  AdvisedBy(person, person),  and YearsInProgram(person, integer).  Types include  person, publication, and course.  

Using typed variables, the database contains approximately 3,000 true ground atoms that were obtained by scraping the pages in the department's website as well as the bibilographies of department members. These databases were further split across 5 discplines: ai, graphics, languages, systems, and theory and anonymized to protect the privacy of the individuals.

You are also given an initial knowledge base (kb) containing a set of first-order logic formulas that describe the domain. This kb, which was the result of the work of several human authors, contains statements describing the domain.  Examples are: Students are not professors, If a student is an author of a paper, so is her advisor,  and Advanced students only TA courses taught by their advisors.  Note that these statements are not always true! (But are true in some cases..)

Part 1: MLN Training

Generative vs. Discriminative Learning

A MLN consists of a set of first-order logic statements with a weight attached to each formula. While these weights can be specified by a "domain expert" they can be learned directly from data.  

MLN weights can be learned generatively by maximizing the likelihood of a given database, though this procedure is often too slow in practice. Instead, one can maximize the pseudo-likelihood of the data, which involves computing each variable's probability given only its Markov blanket.  Generative learning approaches such as likelihood and pseudo-likelihood attempt to maximize the joint distribution of all variables at once. However, often we know ahead of time that we plan to compute conditional probabilities, P(q | e), for a set of query variables q given a set of evidence variables e.  In this case discriminative methods which maximize the conditional likelihood of a set of outputs given a set of inputs can lead to better results, since no time is spent modeling dependencies between evidence variables. The voted perceptron algorithm is one such method for performing discriminative training. For further details, see Singla & Domingos, 2005.

Task 1: Learn weights for the UW MLN using generative learning by executing the learnwts -g command (refer to the manual for more details).  Specify AdvisedBy as the query predicate using -ne AdvisedBy (This will cause only the pseudo-likelihood of the non-evidence predicates to be optimized). Perform 5-fold cross validation, each time training with all but 1 of the 5 areas, using the left out area for testing. Note that multiple databases can be specified using -t ai.db,graphics.db etc.

Answer the following questions: Task 2: Repeat  the procedure using discriminative training by specifying learnwts -d -ne AdvisedBy, again using AdvisedBy as your query predicate.

Answer the following questions:

Adding Additional Domain Knowledge 

Task 3: Develop at least 5 new formulas for this domain. They can be new rules or refinements of existing rules. Add them to the beginning of the existing set of clauses, (i.e. following the line containing the "// Clauses" comment) and include a brief comment that explains each rule.  If you do not follow these instructions precisely you will be penalized.

Using either generative or discriminative learning,
relearn the weights of this MLN, again specifying AdvisedBy as your non-evidence predicate.  Answer the following questions:

Part 2: Structure Learning

While it is sometimes possible to begin with an existing kb, structure learning makes it possible to learn clauses directly from a given database. The procedure can begin from an empty kb (i.e. just the set of unit clauses), or it can try to add rules to an existing kb. When learning an MLN from scratch, the natural approach is to try to add a literal (or its negation) to a clause. When beginning with an existing kb, the system should try to correct any errors made by the authors of the kb by deleting conditions from existing clauses, in addition to introducing new conditions. Additional errors can be corrected by flipping the sign of predicates, such as in the case when implications are written in the wrong direction.

The approach to stucture learning in Alchemy you'll experiment involves peforming a beam search over the space of possible clauses, given (1) a kb, (2) a set of operators (addition, deletion, negation) and (3) an evaluation metric (in our case, WPLL, a weighted form of pseudo-likelihood). Starting with the kb, apply all legal operations to each clause in the kb, keeping the b best ones. Each operator is then applied to those, and repeated until no new clause imrpoves the WPLL of the data. After each iteration of the search, the clause resulting in the greatest increase in WPLL is added to the kb. For more details, refer to Kok and Domingos, 2005.

There are several parameters of interest when running structure learning via the learnstruct command in Alchemy:

-maxNumPredicates <int> The maximum number of predicates allowed in any learned clause
-maxVars <int> The maximum number of variables in any learned clause
-beamSize <int> The size of the beam, e.g. how many candidate clauses to retain at each iteration. 
-minWt <double> Candidate clauses must have a weight at least as large as this value or else are discarded. Thus, assigning a larger minWt will prune more candidates.
-fractAtoms If sampling ground atoms, the fraction of each predicate's ground atoms to draw.
-penalty <double> Each difference between the current and previous version of a candidate clause penalizes the (weighted) pseudo-log-likelihood by this amount.

Task 4: Starting with an empty mln, experiment with structure learning.   To get a feel for the parameters, you can start with the following settings (plan to let this code run for approximately 2 hrs).

learnstruct -maxNumPredicates 3 -maxVars 3 -noSampleClauses -fractAtoms 0.5 -minWt 0.1 -penalty 0.05 -i uw-empty.mln -o uw-ls.mln -t <dbfiles>

Task 5: Experiment with differrent parameter settings as time permits and discuss your findings. You can increase the maxiumum number of predicates and/or variables (> 5 is not recommended), experiment with sampling, adjust the size of the beam, alter the penalty assigned to candidates, and more as you see fit. You can also try adding more than the unit clauses to the initial MLN.

Part 3: Inference of Missing Values

In some cases, databases will be incomplete; for instance we may know that a person P was the author of a publication, but are uncertain to whether or not he/she is a professor or student. The voted percepton algorithm (VP) can be used to approximate the counts of missing values for the MAP state given the current model, and subsequently reupdate the weights of MLN formulas. The algorithm proceeds as follows:

Initialize all weights to zero.
For t=1 to T iterations peform gradient ascent
Find the MAP configuration using the current weights
Δwi,t = training count - MAP count
Return the parameters averaged over all iterations, wi = Σ t=1 to T  wi,t / T
Task 6: Retrain the weights of uw.mln where a percentage of the known values for the predicates of Professor and Student are missing, and AdvisedBy is your query predicateYou will need to write a script to alter the data, removing a given percentage of Professor and Student groundings at random.  Perform this procedure when the percentage of missing data is 1%, 10% and 100% and compare the results.  Turn on VP  by giving the -withEM option to learnwts when using discriminative training.  Allow for 40-60 minutes for training.


What to Hand In

Please bring hardcopies to class of the following:

1. A write-up that includes:
The table should contain data for each of the following:

MLN-Generative
MLN-Discrminiative
MLN-Your-Rules
MLN-StructLearn
MLN-VP-1
MLN-VP-10
MLN-VP-100
2. Printouts of the following MLNs: