For this project we'll be using some machine learning software in java called WEKA which has a large variety of learners implemented and is set up to automatically perform experiments using cross-validation.
You can install WEKA and the Java runtime environment from this location.
It's very easy to run. Here's an example:
java -cp weka.jar weka.classifiers.neural.NeuralNetwork -t data\iris.arff
This will run the neural network classifier on the "data\iris.arff" file, show the neural network model, and evaluate it using cross-validation.
You can also download extra datasets from the UCI Machine Learning Repository in the WEKA arff file format (the datasets are described here).
Run the following three classifiers on the labor data included with WEKA.
This is the decision tree classifier. It is based on C4.5.
java -cp weka.jar weka.classifiers.j48.J48 -t data\labor.arff
This is the neural network classifier.
java -cp weka.jar weka.classifiers.neural.NeuralNetwork -t data\labor.arff
This is a simple naive bayes classifier.
java -cp weka.jar weka.classifiers.NaiveBayes -t data\labor.arff
Note the model, training set accuracy, and cross-validation accuracy in the ouput for each execution.
Pick a dataset in arff format from the UCI machine learning datasets, run the experiments described above, and answer the same questions. You can extract the data files from the datasets-UCI.jar file with the following command.
jar xvf datasets-UCI.jar