Variant of the CARTGV approach to build RFGV forests. Implemented for binary classification problems

cartgv.rf(data, group, crit = 1, case_min = 1, maxdepth = 2,
  mtry_group = floor(sqrt(length(unique(group[!is.na(group)])))),
  penalty = "No",
  mtry_var = sapply(as.numeric(table(group[!is.na(group)])), function(x)
  floor(sqrt(x))))

Arguments

data

a data frame containing the response value (for the first variable) and the predictors and used to grow the tree. The name of the response value must be "Y".The response variable must be the first variable of the data frame and the variable must be coded as the two levels "0" and "1".

group

group a vector with the group number of each variable. (WARNING : if there are "p" goups, the groups must be numbers from "1" to "p" in increasing order. The group label of the response variable is missing (i.e. NA)).

crit

an integer indicating the impurity function used (1=Gini index / 2=Entropie/ 3=Misclassification rate).

case_min

an integer indicating the minimun number of cases/non cases in a terminal nodes. The default is 1.

maxdepth

the max depth for a split-tree.

mtry_group

an integer the number of variables randomly samples as candidates at each split.

penalty

a boolean indicating if the decrease in node impurity must take account of the group size. Four penalty are available: "No","Size","Root.size" or "Log".

mtry_var

a vector of length the number of groups. It indicates the number of drawn variables for each group.

Value

a list with elements

  • tree: a data frame which summarizes the resulted CARTGV tree.

  • tree_split: a list containing informations about the splitting trees. Each element is an object returned by the function "cartgv_split".

  • pop: a list containing the indices (rownames) of the observations which belong to the nodes.

  • groups_selec: a matrix containint for each splitting-tree the indices of the sampled grouped. Precisely, the i-th row correspond to the i-th splitting-tree.