## Classification And Regression Tree

Classification And Regression Tree(CART) is a kind of decision tree model can be used both for classification and regression. CART models are always binary trees.

### Classification Tree

The main process of a classification tree is shown below.

- Choose a variable $x_i$ and a
$v_i$, then split the data space into two parts. All data in the first part satisfy that $x_i \leq v_i$ and all data in the second part satisfy that $x_i > v_i$. For discrete data, the condition is equivalent to $x_i = v_i$ and $x_i \neq v_i$.*split point* - Split the space recursively until the stopping condition.
- Stopping condition is that all data in this subspace is in the same class. There are also some other conditions like using $\chi^2$ value or other independence test and stop the spliting when are splited data are independent.

A question is how to choose the split point. In classification task, Gini impurity is widely used. The Gini impurity can be simply understand as ** the probability of misclassified**.

$$

Gini(p) = \sum_{k = 1}^mp_k(1-p_k) = 1-\sum_{k = 1}^mp_k^2

$$

Under this situation, $p_k = \frac{|C_k|}{|D|}$ where $C_k$ is the subset of $D$ with data labeled as $k^{th}$ class.

If $D_1 = {X|x_i \leq v_i}$, $D_2 = {X|x_i > v_i}$, we have $D_1\cup D_2 = D$ and $D_1\cap D_2 = \emptyset$, the gain of gini impurity shown below.

$$

Gain(D, x_i) = \sum_{j=1}^2\frac{|D_j|}{|D|}Gini(D_j)

$$

Here, the smaller gain of gini is, the less misclassification. So we always choose the split and $x_i$ makes the gain of gini smallest. ** CART will combine catagories into two super-catagories before spliting if there are more than two catagories**.