Classification with WEKA

The Data File

This data file describes a problem of medical prognosis:  given that a patient has been diagnosed with hepatitis, is it predicted that the patient will live or die?

Download the data file here:  hepatitis.arff

The Data

The data has the following features:

Feature discrete/continuous Description Test Type
Age continuous age of the patient question
Sex discrete gender of the patient question
Steroid discrete whether the patient is taking steroids question
Antivirals discrete whether the patient is taking antiviral medication question
Fatigue discrete whether the patient reports fatigue question
Malaise discrete whether the patient reports malaise question
Anorexia discrete whether the patient is anorexic question
Histology discrete whether a liver histology was performed for the diagnosis question
Liver big discrete whether liver appears enlarged physical exam
Liver firm discrete whether liver is firm physical exam
Spleen palpable discrete whether spleen is palpable physical exam
Spider discrete spider veins visible physical exam
Ascites discrete ascites (fluid in abdominal cavity) detected physical exam
Varices discrete varices (swollen blood vessel or lymph node) detected physical exam
Bilirubin continuous   blood test
Alk Phosphate continuous   blood test
SGOT continuous   blood test
Albumin continuous   blood test
Protime continuous   blood test
class discrete the observed outcome n/a

Exercise

Using weka induce two C4.5 decision trees over the hepatitis data.  In weka, C4.5 is called J48 and is found under the "trees" group after clicking the "Choose" button on the "Classify" tab.

First create an unpruned tree:  click the text area showing the classifier name and set the "unpruned" option to true.

Test the unpruned tree on both the training data and using 10-fold cross-validation.  (These options are set in the "Test Options" panel below the "Choose" button.)

Next, create a pruned tree and test it on both the training data and under 10-fold cross validation.

After you've induced the tree, you can right-click the results in the lower-left "Result list" panel and select "Visualize tree" to see the tree that was learned.

Questions

Additional Exercises

Try some of the other classification algorithms built into weka on the hepatitis data.

For example, a Naive Bayes Classifier is found at "bayes → NaiveBayesSimple".  A Support Vector Machine implementation is at "functions→ SMO".