CSE 573 (au06) Assignment 4

CSE 573 Homework 4

Learning with Markov Logic Networks

Due December 6 In Class

Overview

Markov Logic Networks (MLNs) provide a powerful framework for combining the relative strengths of probability and first order logic. In this assignment you will explore several tasks involving MLNs within Alchemy, namely: training, structure learning and inference of missing data.

You will work with data describing the UW CSE department. The domain consists of 12 predicates, and 2707 constants divided into 10 types. Predicates include: Professor(person), Student(person), AdvisedBy(person, person), and YearsInProgram(

person,

integer). Types include person, publication, and course.

Using typed variables, the database contains approximately 3,000 true ground atoms that were obtained by scraping the pages in the department's website as well as the bibilographies of department members. These databases were further split across 5 discplines:

ai,
graphics, languages, systems,

and theory and anonymized to protect the privacy of the individuals.

You are also given an initial knowledge base (kb) containing a set of first-order logic formulas that describe the domain. This kb, which was the result of the work of several human authors, contains statements describing the domain. Examples are: Students are not professors, If a student is an author of a paper, so is her advisor, and Advanced students only TA courses taught by their advisors. Note that these statements are not always true! (But are true in some cases..)

View the complete set of predicates and variable types. The equality predicates (SamePerson, SameCourse, etc) have been removed for this assignment.
Get the knowledge base and the set of databases by area.
Obtain and reinstall the latest version of Alchemy.

Part 1: MLN Training

Generative vs. Discriminative Learning

A MLN consists of a set of first-order logic statements with a weight attached to each formula. While these weights can be specified by a "domain expert" they can be learned directly from data.

MLN weights can be learned generatively by maximizing the likelihood of a given database, though this procedure is often too slow in practice. Instead, one can maximize the pseudo-likelihood of the data, which involves computing each variable's probability given only its Markov blanket. Generative learning approaches such as likelihood and pseudo-likelihood attempt to maximize the joint distribution of all variables at once. However, often we know ahead of time that we plan to compute conditional probabilities, P(q | e), for a set of query variables q given a set of evidence variables e. In this case discriminative methods which maximize the conditional likelihood of a set of outputs given a set of inputs can lead to better results, since no time is spent modeling dependencies between evidence variables. The voted perceptron algorithm is one such method for performing discriminative training. For further details, see Singla & Domingos, 2005.

Task 1: Learn weights for the UW MLN using generative learning by executing the learnwts -g command (refer to the manual for more details). Specify AdvisedBy as the query predicate using -ne AdvisedBy (This will cause only the pseudo-likelihood of the non-evidence predicates to be optimized). Perform 5-fold cross validation, each time training with all but 1 of the 5 areas, using the left out area for testing. Note that multiple databases can be specified using

-t
ai.db,graphics.db

etc.

Answer the following questions:

Using the MLN you learned, and your left out database, compute the average CLP of AdvisedBy (the query predicate) given all other predicates as evidence. Use Gibbs sampling to perform inference. Do this for each fold of the data and take the average. For example, in one run, use {ai, graphics, language,systems} to learn the weights of the MLN, and theory as the database whose values you will try to infer.
In general, which rules were learned as having the most weight? The least?
Note any interesting differences in the relative importance of rules across different traning folds.

Task 2: Repeat the procedure using discriminative training by specifying learnwts -d -ne AdvisedBy, again using AdvisedBy as your query predicate.

Answer the following questions:

Compute the average CLP of AdvisedBy as described in Task 1 for this MLN.
How are the weights affected by discriminative training compared to the generative case?

Adding Additional Domain Knowledge

Task 3: Develop at least 5 new formulas for this domain. They can be new rules or refinements of existing rules. Add them to the beginning of the existing set of clauses, (i.e. following the line containing the "// Clauses" comment) and include a brief comment that explains each rule. If you do not follow these instructions precisely you will be penalized.

Using either generative or discriminative learning, relearn the weights of this MLN, again specifying AdvisedBy as your non-evidence predicate. Answer the following questions:

How much weight was given to the rules you added? Propose an explanation to account for the differences you observed.
How did the rules you added affect other the weight of rules contained in the original MLN?
Compute the average CLP of AdvisedBy according to this MLN, using 5-fold cross-validation and Gibbs sampling.
Hand in uw-newrules.mln, the output MLN that includes the rules you added + their learned weights, making sure an explanatory comment is included for each rule you introduced.

Part 2: Structure Learning

While it is sometimes possible to begin with an existing kb, structure learning makes it possible to learn clauses directly from a given database. The procedure can begin from an empty kb (i.e. just the set of unit clauses), or it can try to add rules to an existing kb. When learning an MLN from scratch, the natural approach is to try to add a literal (or its negation) to a clause. When beginning with an existing kb, the system should try to correct any errors made by the authors of the kb by deleting conditions from existing clauses, in addition to introducing new conditions. Additional errors can be corrected by flipping the sign of predicates, such as in the case when implications are written in the wrong direction.

The approach to stucture learning in Alchemy you'll experiment involves peforming a beam search over the space of possible clauses, given (1) a kb, (2) a set of operators (addition, deletion, negation) and (3) an evaluation metric (in our case, WPLL, a weighted form of pseudo-likelihood). Starting with the kb, apply all legal operations to each clause in the kb, keeping the b best ones. Each operator is then applied to those, and repeated until no new clause imrpoves the WPLL of the data. After each iteration of the search, the clause resulting in the greatest increase in WPLL is added to the kb. For more details, refer to Kok and Domingos, 2005.

There are several parameters of interest when running structure learning via the learnstruct command in Alchemy:

-maxNumPredicates <int>	The maximum number of predicates allowed in any learned clause
-maxVars <int>	The maximum number of variables in any learned clause
-beamSize <int>	The size of the beam, e.g. how many candidate clauses to retain at each iteration.
-minWt <double>	Candidate clauses must have a weight at least as large as this value or else are discarded. Thus, assigning a larger minWt will prune more candidates.
-fractAtoms	If sampling ground atoms, the fraction of each predicate's ground atoms to draw.
-penalty <double>	Each difference between the current and previous version of a candidate clause penalizes the (weighted) pseudo-log-likelihood by this amount.

Task 4: Starting with an empty mln, experiment with structure learning. To get a feel for the parameters, you can start with the following settings (plan to let this code run for approximately 2 hrs).

learnstruct
-maxNumPredicates 3 -maxVars 3 -noSampleClauses -fractAtoms 0.5 -minWt
0.1 -penalty 0.05 -i uw-empty.mln -o uw-ls.mln -t <dbfiles>

Task 5: Experiment with differrent parameter settings as time permits and discuss your findings. You can increase the maxiumum number of predicates and/or variables (> 5 is not recommended), experiment with sampling, adjust the size of the beam, alter the penalty assigned to candidates, and more as you see fit. You can also try adding more than the unit clauses to the initial MLN.

Describe which parameters you experimented with and the effect they had on (1) output quality and (2) run-time.
Did the learned rules replicate any of those specified in the human-engineered KB?
Using the output you believe contains the best learned domain representation, once again compute cross-validated average CLP of AdvisedBy using all other predicates as evidence.
Hand in uw-ls.mln, the output you believe you contain the best domain representation. Include a comment at the top of the file indicating what final parameter settings you used.

Part 3: Inference of Missing Values

In some cases, databases will be incomplete; for instance we may know that a person P was the author of a publication, but are uncertain to whether or not he/she is a professor or student. The voted percepton algorithm (VP) can be used to approximate the counts of missing values for the MAP state given the current model, and subsequently reupdate the weights of MLN formulas. The algorithm proceeds as follows:

Initialize all weights to zero.
For t=1 to T iterations peform gradient ascent

Find the MAP configuration using the current weights

Δw_i,t = training count - MAP count

Return the parameters averaged over all iterations, w_i = Σ _{t=1 to T} w_i,t / T

Task 6: Retrain the weights of uw.mln where a percentage of the known values for the predicates of Professor and Student are missing, and AdvisedBy is your query predicate. You will need to write a script to alter the data, removing a given percentage of Professor and Student groundings at random. Perform this procedure when the percentage of missing data is 1%, 10% and 100% and compare the results. Turn on VP by giving the -withEM option to learnwts when using discriminative training. Allow for 40-60 minutes for training.

Compute the average CLP of AdvisedBy for each different value of missing data.

What to Hand In

Please bring hardcopies to class of the following:

1. A write-up that includes:

Your answers to the questions from each section
A table organizing the CLPs you found for each learned MLN. Each value is the average over the 5 folds obtained by performing leave-one-out area training/testing.

The table should contain data for each of the following:

MLN-Generative

MLN-Discrminiative

MLN-Your-Rules

MLN-StructLearn

MLN-VP-1

MLN-VP-10

MLN-VP-100

2. Printouts of the following MLNs:

uw-newrules.mln (the original mln with your new rules at the top)
uw-ls.mln (the best result you could obtain using structure learning)