Compute the impurity information from a sequence of subtrees by using an independent set. The subtrees are based on a modified CARTGV tree (i.e. a CARTGV tree that is included in a RFGV forest).
impurity.cartgv.rf(validation, tree_seq, tree)
validation | a new data frame containing the same variables that " |
---|---|
tree_seq | the object returned by the function " |
tree | a fitted modified CARTGV tree. It is an output of the function " |
a list with elements
- impurete: a data frame containing the value of several impurity fucntions (in this order Gini, Entropy, misclassification rate)
for each subtree of the sequence. The i-th row corresponds to the i-th subtree of the sequence.
- pred: a list containing the prediction of the label for the data set "validation
" based on each subtree.
Precisely, the i-th element is the object returned by the function "predict_cartgv.rf
" for the i-th subtree nd by using
the data set "validation
".
- summary_noeuds: a list containg for each subtree informations about the nodes (nom_noeuds: node name, N: number of observations in
the node, N[Y=1]
: number of observation with "Y=1" in the node, N[Y=0]
: number of observation with "Y=0"
in the node, P[Y=1]
: estimated probability that an observation in the node is assigned to the label "Y=1",
P[Y=1]
: estimated probability that an observation in the node is assigned to the label "Y=0" and P[hat.Y!=Y]
:
misclassification rate in the node).