Mushroom Poisonous Prediction (Decision Tree) in CSharp
Jump to navigation
Jump to search
System Requirement
Component | Requirement | Detail |
---|---|---|
Emgu CV | Version 2.0.0.0 Alpha | |
Operation System | Cross Platform |
What is a Decision Tree
According to wikipedia,
- A decision tree (or tree diagram) is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. Decision trees are commonly used in operations research, specifically in decision analysis, to help identify a strategy most likely to reach a goal. Another use of decision trees is as a descriptive means for calculating conditional probabilities.
In this example, we attempt to train a decision tree to identify poisonous mushroom. This example is part of the Emgu.CV.Test project in SVN, it is also a port of OpenCV's mushroom.exe example in C#.
Data Set
The data set used in this example is File:Agaricus-lepiota.txt. If you need to run the example, please download the data set and change its extension from .txt to .data.
Source Code
using System.Drawing;
using Emgu.CV;
using Emgu.CV.Structure;
using Emgu.CV.ML;
using Emgu.CV.ML.Structure;
...
private void ReadMushroomData(out Matrix<float> data, out Matrix<float> response)
{
string[] rows = System.IO.File.ReadAllLines("agaricus-lepiota.data");
int varCount = rows[0].Split(',').Length - 1;
data = new Matrix<float>(rows.Length, varCount);
response = new Matrix<float>(rows.Length, 1);
int count = 0;
foreach (string row in rows)
{
string[] values = row.Split(',');
Char c = System.Convert.ToChar(values[0]);
response[count, 0] = System.Convert.ToInt32(c);
for (int i = 1; i < values.Length; i++)
data[count, i - 1] = System.Convert.ToByte(System.Convert.ToChar(values[i]));
count++;
}
}
[Test]
public void TestDTreesMushroom()
{
Matrix<float> data, response;
ReadMushroomData(out data, out response);
//Use the first 80% of data as training sample
int trainingSampleCount = (int)(data.Rows * 0.8);
Matrix<Byte> varType = new Matrix<byte>(data.Cols + 1, 1);
varType.SetValue((byte)MlEnum.VAR_TYPE.CATEGORICAL); //the data is categorical
Matrix<byte> sampleIdx = new Matrix<byte>(data.Rows, 1);
using (Matrix<byte> sampleRows = sampleIdx.GetRows(0, trainingSampleCount, 1))
sampleRows.SetValue(255);
float[] priors = new float[] {1, 0.5f};
GCHandle priorsHandle = GCHandle.Alloc(priors, GCHandleType.Pinned);
MCvDTreeParams param = new MCvDTreeParams();
param.maxDepth = 8;
param.minSampleCount = 10;
param.regressionAccuracy = 0;
param.useSurrogates = true;
param.maxCategories = 15;
param.cvFolds = 10;
param.use1seRule = true;
param.truncatePrunedTree = true;
param.priors = priorsHandle.AddrOfPinnedObject();
using (DTree dtree = new DTree())
{
bool success = dtree.Train(
data,
Emgu.CV.ML.MlEnum.DATA_LAYOUT_TYPE.ROW_SAMPLE,
response,
null,
sampleIdx,
varType,
null,
param);
if (!success) return;
double trainDataCorrectRatio = 0;
double testDataCorrectRatio = 0;
for (int i = 0; i < data.Rows; i++)
{
using (Matrix<float> sample = data.GetRow(i))
{
double r = dtree.Predict(sample, null, false).value;
r = Math.Abs(r - response[i, 0]);
if (r < 1.0e-5)
{
if (i < trainingSampleCount)
trainDataCorrectRatio++;
else
testDataCorrectRatio++;
}
}
}
trainDataCorrectRatio /= trainingSampleCount;
testDataCorrectRatio /= (data.Rows - trainingSampleCount);
Trace.WriteLine(String.Format("Prediction accuracy for training data :{0}%", trainDataCorrectRatio*100));
Trace.WriteLine(String.Format("Prediction accuracy for test data :{0}%", testDataCorrectRatio*100));
}
priorsHandle.Free();
}
Result
The result of running this unit test:
Prediction accuracy for training data :99.8769041390983% Prediction accuracy for test data :99.2615384615385%
That's a really good prediction rate. A big thanks to OpenCV developers.