impurity.cartgv

Compute the impurity information from a sequence of subtrees by using an independent set. This function is used to select the best subtree among a sequence of subtrees.

impurity.cartgv(validation, tree_seq, tree)

Arguments

validation	a new data frame containing the same variables that "`data`".
tree_seq	the object returned by the function "`extract_subtrees`". It is a sequence of optimal subtrees obtained by applying the cost-complexity pruning method
tree	a fitted CARTGV tree. It is an object returned by the function "`cartgv`".

Value

a list with elements

impurete: a data frame containing the value of several impurity fucntions (in this order Gini, Entropy, misclassification rate) for each subtree of the sequence. The i-th row corresponds to the i-th subtree of the sequence.
pred: a list containing the prediction of the label for the data set "validation" based on each subtree. Precisely, the i-th element is the object returned by the function "predict_cartgv" for the i-th subtree nd by using the data set "validation".
summary_noeuds: a list containg for each subtree informations about the nodes (nom_noeuds: node name, N: number of observations in the node, N[Y=1]: number of observation with "Y=1" in the node, N[Y=0]: number of observation with "Y=0" in the node, P[Y=1]: estimated probability that an observation in the node is assigned to the label "Y=1", P[Y=1]: estimated probability that an observation in the node is assigned to the label "Y=0" and P[hat.Y!=Y]: misclassification rate in the node).

Arguments

Value

Contents