Decision Tree using CART algorithm Solved Example 1

 

Decision Tree using CART algorithm Solved Example 1

In this tutorial, we will understand how to apply Classification And Regression Trees (CART) decision tree algorithm to construct and find the optimal decision tree for the given Play Tennis Data. Also, predict the class label for the given example…?

OutlookTempHumidityWindyPlay
SunnyHotHighFalseNo
SunnyHotHighTrueNo
OvercastHotHighFalseYes
RainyMildHighFalseYes
RainyCoolNormalFalseYes
RainyCoolNormalTrueNo
OvercastCoolNormalTrueYes
SunnyMildHighFalseNo
SunnyCoolNormalFalseYes
RainyMildNormalFalseYes
SunnyMildNormalTrueYes
OvercastMildHighTrueYes
OvercastHotNormalFalseYes
RainyMildHighTrueNo
OutlookTempHumidityWindyPlay
SunnyHotNormalTrue?

Solution:

First, we need to Determine the root node of the tree

In this example, there are four choices of questions based on the four variables:

Start with any variable, in this case, outlook. It can take three values: sunny, overcast, and rainy.

Start with the sunny value of outlook. There are five instances where the outlook is sunny.

In two of the five instances, the play decision was yes, and in the other three, the decision was no.

Thus, if the decision rule was that outlook: sunny → no, then three out of five decisions would be correct, while two out of five such decisions would be incorrect. There are two errors out of five. This can be recorded in Row 1.

Similarly, we will write all rules for the Outlook attribute.

Outlook

Overcast4Yes4
No0
Sunny5Yes2
No3
Rainy5Yes3
No2

Rules, individual error, and total for Outlook attribute

AttributeRulesErrorTotal Error
OutlookSunny → No2/54/14
Overcast → Yes0/4
Rainy → Yes2/5

Temp

Hot4Yes2
No2
Mild6Yes4
No2
Cold4Yes3
No1

Rules, individual error, and total for Temp attribute

AttributeRulesErrorTotal Error
TempHot → No2/45/14
Mild → Yes2/6
Cool → Yes1/4

Humidity

High7Yes3
No4
Normal7Yes6
No1

Rules, individual error, and total for Humidity attribute

AttributeRulesErrorTotal Error
HumidityHigh→ No3/74/14
Normal → Yes1/7

Windy

False8Yes6
No2
True6Yes3
No3

Rules, individual error, and total for Humidity attribute

AttributeRulesErrorTotal Error
WindyTrue → No3/65/14
False → Yes2/8

Consolidated rules, errors for individual attributes values, and total error of the attribute are given below.

AttributeRulesErrorTotal Error
OutlookSunny → No2/54/14
Overcast → Yes0/4
Rainy → Yes2/5
Temp  hot → No2/45/14
Mild → Yes2/6
Cool → Yes1/4
Humidityhigh → No3/74/14
Normal → Yes1/7
WindyFalse → Yes2/85/14
True → No3/6

From the above table, we can notice that the attributes Outlook and Humidity have the same minimum error that is 4/14. Hence we consider the individual attribute value errors. The outlook attribute has one rule which generates zero error that is the rule Overcast → Yes. Hence we consider the Outlook as the splitting attribute.

Now we build the tree with Outlook as the root node. It has three branches for each possible value of the outlook attribute. As the rule, Overcast → Yes generates zero error. When the outlook attribute value is overcast we get the result as Yes. For the remaining two attribute values we consider the subset of data and continue building the tree. Tree with Outlook as root node is,

Now, for the left and right subtrees, we write all possible rules and find the total error. Based on the total error table, we will construct the tree.

Left subtree,

Left subtree

Consolidated rules, errors for individual attributes values, and total error of the attribute are given below.

left subtree rules

From the above table, we can notice that Humidity has the lowest error. Hence Humidity is considered as the splitting attribute. Also, when Humidity is High the answer is No as it produces zero errors. Similarly, when Humidity is Normal the answer is Yes, as it produces zero errors.

Right subtree,

Right subtree

Consolidated rules, errors for individual attributes values, and total error of the attribute are given below.

From the above table, we can notice that Windyhas the lowest error. Hence Windy is considered as the splitting attribute. Also, when Windy is False the answer is Yes as it produces zero errors. Similarly, when Windy is True the answer is No, as it produces zero errors.

The final decision tree for the given Paly Tennis data set is,

final decision tree for the given Paly Tennis data set

Also, from the above decision tree the prediction for the new example:

is, Yes

Summary:

In this tutorial, we understood, how to apply Classification And Regression Trees (CART) decision tree algorithm (solved example 1) to construct and find the optimal decision tree. If you like the tutorial share it with your friends. Like the Facebook page for regular updates and YouTube channel for video tutorials.

Leave a Comment

Your email address will not be published. Required fields are marked *