Compute the impurity information from a sequence of subtrees by using an independent set. The subtrees are based on a modified CARTGV tree (i.e. a CARTGV tree that is included in a RFGV forest).

impurity.cartgv.rf(validation, tree_seq, tree)

Arguments

validation

a new data frame containing the same variables that "data".

tree_seq

the object returned by the function "extract_subtrees". Each element of this sequence is an object "tree" returned by the function cartgv.rf.

tree

a fitted modified CARTGV tree. It is an output of the function "cartgv.rf".

Value

a list with elements - impurete: a data frame containing the value of several impurity fucntions (in this order Gini, Entropy, misclassification rate) for each subtree of the sequence. The i-th row corresponds to the i-th subtree of the sequence. - pred: a list containing the prediction of the label for the data set "validation" based on each subtree. Precisely, the i-th element is the object returned by the function "predict_cartgv.rf" for the i-th subtree nd by using the data set "validation". - summary_noeuds: a list containg for each subtree informations about the nodes (nom_noeuds: node name, N: number of observations in the node, N[Y=1]: number of observation with "Y=1" in the node, N[Y=0]: number of observation with "Y=0" in the node, P[Y=1]: estimated probability that an observation in the node is assigned to the label "Y=1", P[Y=1]: estimated probability that an observation in the node is assigned to the label "Y=0" and P[hat.Y!=Y]: misclassification rate in the node).