This data file describes a problem of medical prognosis: given that a patient has been diagnosed with hepatitis, is it predicted that the patient will live or die?
Download the data file here: hepatitis.arff
The data has the following features:
Feature | discrete/continuous | Description | Test Type |
Age | continuous | age of the patient | question |
Sex | discrete | gender of the patient | question |
Steroid | discrete | whether the patient is taking steroids | question |
Antivirals | discrete | whether the patient is taking antiviral medication | question |
Fatigue | discrete | whether the patient reports fatigue | question |
Malaise | discrete | whether the patient reports malaise | question |
Anorexia | discrete | whether the patient is anorexic | question |
Histology | discrete | whether a liver histology was performed for the diagnosis | question |
Liver big | discrete | whether liver appears enlarged | physical exam |
Liver firm | discrete | whether liver is firm | physical exam |
Spleen palpable | discrete | whether spleen is palpable | physical exam |
Spider | discrete | spider veins visible | physical exam |
Ascites | discrete | ascites (fluid in abdominal cavity) detected | physical exam |
Varices | discrete | varices (swollen blood vessel or lymph node) detected | physical exam |
Bilirubin | continuous | blood test | |
Alk Phosphate | continuous | blood test | |
SGOT | continuous | blood test | |
Albumin | continuous | blood test | |
Protime | continuous | blood test | |
class | discrete | the observed outcome | n/a |
Using weka induce two C4.5 decision trees over the hepatitis data. In weka, C4.5 is called J48 and is found under the "trees" group after clicking the "Choose" button on the "Classify" tab.
First create an unpruned tree: click the text area showing the classifier name and set the "unpruned" option to true.
Test the unpruned tree on both the training data and using 10-fold cross-validation. (These options are set in the "Test Options" panel below the "Choose" button.)
Next, create a pruned tree and test it on both the training data and under 10-fold cross validation.
After you've induced the tree, you can right-click the results in the lower-left "Result list" panel and select "Visualize tree" to see the tree that was learned.
Try some of the other classification algorithms built into weka on the hepatitis data.
For example, a Naive Bayes Classifier is found at "bayes → NaiveBayesSimple". A Support Vector Machine implementation is at "functions→ SMO".