Wednesday, 11 September 2013

How to make sense the output of DecisionTreeClassifier in scikit-learn?

How to make sense the output of DecisionTreeClassifier in scikit-learn?

I'm learning ML and uses scikit-learn to do a basic decision tree classify.
The value of features are categorical so I used DictVectorizer to convert
the original feature values. Here's my code:
training_set # list of dict representing the traing set
labels # corresponding labels of the training set
vec = DictVectorizer()
vectorized = vec.fit_transform(training_set)
clf = tree.DecisionTreeClassifier()
clf.fit(vectorized.toarray(), labels)
with open("output.dot", "w") as output_file:
tree.export_graphviz(clf, out_file=output_file)
But I don't understand the output graph. It contains a tree with each node
marked X[1] <= 0.5000 or something like that. What I expected was that the
nodes marked with FEATURE_1 == VALUE_1, the un-vectorized information show
on the tree.
Is it possible?

No comments:

Post a Comment