You will work with data describing the UW CSE department. The domain consists of 12 predicates, and 2707 constants divided into 10 types. Predicates include: Professor(

`person`

),
Student(`person`

),
AdvisedBy(`person`

,
`person`

), and YearsInProgram(```
person,
```

`integer`

). Types include
`person`

,
`publication`

, and `course`

.
Using typed variables, the database contains approximately 3,000 true ground atoms that were obtained by scraping the pages in the department's website as well as the bibilographies of department members. These databases were further split across 5 discplines:

```
ai,
graphics, languages, systems,
```

and` theory`

and anonymized to protect the privacy of the individuals.You are also given an initial knowledge base (kb) containing a set of first-order logic formulas that describe the domain. This kb, which was the result of the work of several human authors, contains statements describing the domain. Examples are: Students are not professors, If a student is an author of a paper, so is her advisor, and Advanced students only TA courses taught by their advisors. Note that these statements are not always true! (But are true in some cases..)

- View the complete set of predicates and variable types. The equality predicates (SamePerson, SameCourse, etc) have been removed for this assignment.
- Get the knowledge base and the set of databases by area.
- Obtain and reinstall the latest version of Alchemy.

MLN weights can be learned generatively by maximizing the likelihood of a given database, though this procedure is often too slow in practice. Instead, one can maximize the pseudo-likelihood of the data, which involves computing each variable's probability given only its Markov blanket. Generative learning approaches such as likelihood and pseudo-likelihood attempt to maximize the joint distribution of all variables at once. However, often we know ahead of time that we plan to compute conditional probabilities, P(q | e), for a set of query variables q given a set of evidence variables e. In this case discriminative methods which maximize the conditional likelihood of a set of outputs given a set of inputs can lead to better results, since no time is spent modeling dependencies between evidence variables. The voted perceptron algorithm is one such method for performing discriminative training. For further details, see Singla & Domingos, 2005.

Task 1: Learn weights for the UW MLN using generative learning by executing the

`learnwts -g `

command (refer to the manual for more
details). Specify AdvisedBy as the query predicate using `-ne AdvisedBy`

```
-t
ai.db,graphics.db
```

etc. Answer the following questions:

- Using the MLN you learned, and your left out database, compute
the average CLP of AdvisedBy (the query predicate) given all other predicates as evidence.
Use Gibbs sampling to perform inference. Do this for each fold of the data and take the average. For
example, in one run, use {
`ai, graphics, language,systems`

} to learn the weights of the MLN, and`theory`

as the database whose values you will try to infer. - In general, which rules were learned as having the most weight? The least?
- Note any interesting differences in the relative importance of rules across different traning folds.

`learnwts -d -ne AdvisedBy`

, again using AdvisedBy as your query predicate.Answer the following questions:

- Compute the average CLP of AdvisedBy as described in Task 1 for this MLN.
- How are the weights affected by discriminative training compared to the generative case?

Using either generative or discriminative learning, relearn the weights of this MLN, again specifying AdvisedBy as your non-evidence predicate. Answer the following questions:

- How much weight was given to the rules you added? Propose an explanation to account for the differences you observed.
- How did the rules you added affect other the weight of rules contained in the original MLN?
- Compute the average CLP of AdvisedBy according to this MLN, using 5-fold cross-validation and Gibbs sampling.
- Hand in uw-newrules.mln, the output MLN that includes the rules you added + their learned weights, making sure an explanatory comment is included for each rule you introduced.

The approach to stucture learning in Alchemy you'll experiment involves peforming a beam search over the space of possible clauses, given (1) a kb, (2) a set of operators (addition, deletion, negation) and (3) an evaluation metric (in our case, WPLL, a weighted form of pseudo-likelihood). Starting with the kb, apply all legal operations to each clause in the kb, keeping the b best ones. Each operator is then applied to those, and repeated until no new clause imrpoves the WPLL of the data. After each iteration of the search, the clause resulting in the greatest increase in WPLL is added to the kb. For more details, refer to Kok and Domingos, 2005.

There are several parameters of interest when running structure learning via the

`learnstruct`

command in Alchemy:-maxNumPredicates <int> | The maximum number of predicates allowed in any learned clause |

-maxVars <int> | The maximum number of variables in any learned clause |

-beamSize <int> | The size of the beam, e.g. how many candidate clauses to retain at each iteration. |

-minWt <double> | Candidate clauses must have a weight at least as large as this value or else are discarded. Thus, assigning a larger minWt will prune more candidates. |

-fractAtoms | If sampling ground atoms, the fraction of each predicate's ground atoms to draw. |

-penalty <double> | Each difference between the current and previous version of a candidate clause penalizes the (weighted) pseudo-log-likelihood by this amount. |

Task 4: Starting with an empty mln, experiment with structure learning. To get a feel for the parameters, you can start with the following settings (plan to let this code run for approximately 2 hrs).

```
learnstruct
-maxNumPredicates 3 -maxVars 3 -noSampleClauses -fractAtoms 0.5 -minWt
0.1 -penalty 0.05 -i uw-empty.mln -o uw-ls.mln -t <dbfiles>
```

Task 5: Experiment with differrent parameter settings as time permits and discuss your findings. You can increase the maxiumum number of predicates and/or variables (> 5 is not recommended), experiment with sampling, adjust the size of the beam, alter the penalty assigned to candidates, and more as you see fit. You can also try adding more than the unit clauses to the initial MLN.

- Describe which parameters you experimented with and the effect they had on (1) output quality and (2) run-time.
- Did the learned rules replicate any of those specified in the human-engineered KB?
- Using the output you believe contains the best learned domain representation, once again compute cross-validated average CLP of AdvisedBy using all other predicates as evidence.
- Hand in uw-ls.mln, the output you believe you contain the best domain representation. Include a comment at the top of the file indicating what final parameter settings you used.

Initialize all weights to zero.

For t=1 to T iterations peform gradient ascent

_{i} = Σ _{t=1 to T} w_{i,t} / T

For t=1 to T iterations peform gradient ascent

Find the MAP configuration using the current weights

Δw_{i,t} = training count -
MAP count

Return the parameters averaged over all iterations, w`-withEM`

option to `learnwts`

when using discriminative training. Allow for 40-60 minutes for training.- Compute the average CLP of AdvisedBy for each different value of missing data.

1. A write-up that includes:

- Your answers to the questions from each section
- A table organizing the CLPs you found for each learned MLN. Each value is the average over the 5 folds obtained by performing leave-one-out area training/testing.

The table should contain data for each of the following:

MLN-Generative |

MLN-Discrminiative |

MLN-Your-Rules |

MLN-StructLearn |

MLN-VP-1 |

MLN-VP-10 |

MLN-VP-100 |

- uw-newrules.mln (the original mln with your new rules at the top)
- uw-ls.mln (the best result you could obtain using structure learning)