Classification And Regression Trees for Grouped Variables.

cartgv(data, group, crit = 1, case_min = 1, maxdepth = 2,
  penalty = "No", IMPORTANCE = TRUE)

Arguments

data

a data frame containing the response value (for the first variable) and the predictors and used to grow the tree. The name of the response value must be "Y".The response variable must be the first variable of the data frame and the variable must be coded as the two levels "0" and "1".

group

group a vector with the group number of each variable. (WARNING : if there are "p" goups, the groups must be numbers from "1" to "p" in increasing order. The group label of the response variable is missing (i.e. NA)).

crit

an integer indicating the impurity function used (1=Gini index / 2=Entropie/ 3=Misclassification rate).

case_min

an integer indicating the minimun number of cases/non cases in a terminal nodes. The default is 1.

maxdepth

the max depth for a split-tree.

penalty

a boolean indicating if the decrease in node impurity must take account of the group size. Four penalty are available: "No","Size","Root.size" or "Log".

IMPORTANCE

a boolean indicating if the importance of each group need to be computed.

Value

a list with elements

  • tree : a data frame which summarizes the resulted CARTGV tree.

  • carts : a list containing all the CART objects used to buid the CARTGV tree.

  • splits : a list containing informations about the splits. Each element is an object retuned by the function "split_cartgv".

  • pop : a list containing the indices (rownames) of the observations which belong to the nodes.

  • tables_coupures : a list containing data frames that summarizes the splits.

  • importance_rand : a list providing the importance of each group at each node. Calculation of the importance is based on the Group Rand Importance.

  • importance_sur : a list providing the importance of each group at each node. Calculation of the importance is based on the Group Surrogate Importance.

  • agreement : a list containing the measure of agreement between the selected split and the other possible split at each node and fr each group. This proximity measure is based on the Rand Index.

Details

Implemented for binary classification problems