Compute the impurity information from a sequence of subtrees by using an independent set. This function is used to select the best subtree among a sequence of subtrees.

impurity.cartgv(validation, tree_seq, tree)

Arguments

validation

a new data frame containing the same variables that "data".

tree_seq

the object returned by the function "extract_subtrees". It is a sequence of optimal subtrees obtained by applying the cost-complexity pruning method

tree

a fitted CARTGV tree. It is an object returned by the function "cartgv".

Value

a list with elements

  • impurete: a data frame containing the value of several impurity fucntions (in this order Gini, Entropy, misclassification rate) for each subtree of the sequence. The i-th row corresponds to the i-th subtree of the sequence.

  • pred: a list containing the prediction of the label for the data set "validation" based on each subtree. Precisely, the i-th element is the object returned by the function "predict_cartgv" for the i-th subtree nd by using the data set "validation".

  • summary_noeuds: a list containg for each subtree informations about the nodes (nom_noeuds: node name, N: number of observations in the node, N[Y=1]: number of observation with "Y=1" in the node, N[Y=0]: number of observation with "Y=0" in the node, P[Y=1]: estimated probability that an observation in the node is assigned to the label "Y=1", P[Y=1]: estimated probability that an observation in the node is assigned to the label "Y=0" and P[hat.Y!=Y]: misclassification rate in the node).