Calculate the best split of a node for each group of input variables when building a CARTGV tree.

split_cartgv(node, group, label, maxdepth = 2, penalty = "No")

Arguments

node

a data frame containing the observations in the node. The first column is the response vector named "Y" and with the lable "0" and "1". The p-1 others variables are continuous or categroical variable must be coded as a set of dummy variables.

group

a vector with the group number of each variable. (WARNING : if there are "p" goups, the groups must be numbers from "1" to "p" in increasing order. The group label of the response variable is missing (i.e. NA)).

label

an integer indicating the label of the node (the majority class)

maxdepth

an integer indicating the maximal depth for a split-tree. The default value is 2.

penalty

a boolean indicating if the decrease in node impurity must take account of the group size. Four penalty are available: "No" ,"Size","Root.size" or "Log".

Value

a list with elements - Gain_Gini: a vector containing the reduction of Gini in the node from splitting on each group, - Gain_Ent: a vector containing the reduction of Entropy in the node from splitting on each group, - Gain_Mis: a vector containing the reduction of the number of miclassified observations in the node from splitting on each group, - carts: a list containing for each group the CART object which summarizes the splitting tree, - pred: a matrix with "nrows(node)" lines and "length(unique(group))" columns containing for each group the prediction, resulting from the splitting tree.