Compute the impurity information from a sequence of subtrees by using an independent set. This function is used to select the best subtree among a sequence of subtrees.
impurity.cartgv(validation, tree_seq, tree)
validation | a new data frame containing the same variables that " |
---|---|
tree_seq | the object returned by the function " |
tree | a fitted CARTGV tree. It is an object returned by the function " |
a list with elements
impurete: a data frame containing the value of several impurity fucntions (in this order Gini, Entropy, misclassification rate) for each subtree of the sequence. The i-th row corresponds to the i-th subtree of the sequence.
pred: a list containing the prediction of the label for the data set "validation
"
based on each subtree. Precisely, the i-th element is the object returned by the function
"predict_cartgv
" for the i-th subtree nd by using
the data set "validation
".
summary_noeuds: a list containg for each subtree informations about the nodes
(nom_noeuds: node name, N: number of observations in
the node, N[Y=1]
: number of observation with "Y=1" in
the node, N[Y=0]
: number of observation with "Y=0"
in the node, P[Y=1]
: estimated probability that an
observation in the node is assigned to the label "Y=1",
P[Y=1]
: estimated probability that an observation in
the node is assigned to the label "Y=0" and P[hat.Y!=Y]
:
misclassification rate in the node).