Decision Tree using CART algorithm Solved Example 3

Decision Tree using CART algorithm Solved Example 3

In this tutorial, we will understand how to apply Classification And Regression Trees (CART) decision tree algorithm (Solved Example 3) to construct and find the optimal decision tree for the given Data set with City Size, Avg. Income, Local Investors, LOHAS Awareness attributes. Also, predict the class label for the given example…?

City SizeAvg. IncomeLocal InvestorsLOHAS AwarenessDecision
BigHighYesHighYes
MediumMedNoMedNo
SmallLowYesLowNo
BigHighNoHighYes
SmallMedYesHighNo
MedHighYesMedYes
MedMedYesMedNo
BigMedNoMedNo
MedHighYesLowNo
SmallHighNoHighYes
SmallMedNoHighNo
MedHeighNoMedNo
City SizeAvg. IncomeLocal InvestorsLOHAS AwarenessDecision
MedMedNoMed?

Solution:

First, we need to Determine the root node of the tree

Start with any variable, in this case, City Size. It can take three values: Big, Medium, and Small.

See also  Appropriate Problems for Artificial Neural Networks

Start with the Big value of outlook. There are three instances where the City Size is Big.

In one of the three instances, the decision was no, and in the other two, the decision was yes.

Thus, if the decision rule was that City Size: Big → Yes, then two out of three decisions would be correct, while one out of three decisions would be incorrect. There is one error out of three instances. This can be recorded in Row 1.

Similarly, we will write all rules for the City Size attribute.

City Size Attribute

Big3Yes2
No1
Medium5Yes1
No4
Small4Yes1
No3

Rules, individual error, and total for City Size attribute

AttributeRulesErrorTotal Error
City SizeBig->Yes1/33/12
Medium->No1/5
Small->No1/4

Average Income Attribute

High6Yes4
No2
Medium5Yes
No5
Low1Yes
No1

Rules, individual error, and total for Average Income attribute

AttributeRulesErrorTotal Error
Average IncomeHigh->Yes2/62/12
Medium->No0/5
Low->No0/1

Local Investors Attribute

Yes6Yes2
No4
No6Yes2
No4

Rules, individual error, and total for Local Investors attribute

See also  Backpropagation Algorithm Machine Learning
AttributeRulesErrorTotal Error
Local InvestorsYes->No2/64/12
No->No2/6

Lohas Awareness Attribute

High5Yes3
No2
Med5Yes1
No4
Low2Yes
No2

Rules, individual error, and total for Lohas Awareness attribute

AttributeRulesErrorTotal Error
Lohas AwarenessHigh->Yes2/53/12
Med->No1/5
Low->No0/2

Consolidated rules, errors for individual attributes values, and total error of the attribute are given below.

AttributeRulesErrorTotal Error
City SizeBig->Yes1/33/12
Medium->No1/5
Small->No1/4
Avg. Inc.High->Yes2/62/12
Medium->No0/5
Low->No0/1
Local InvestorsYes->No2/64/12
Yes->No2/6
Lohas AwarenessHigh->Yes2/53/12
Med->No1/5
Low->No0/2

From the above table, we can notice that the attributes Average Income has the minimum error that is 2/12 (2 errors out of 12 examples).

Now we build the tree with Average Income as the root node. It has three branches for each possible value of the Average Income attribute. As the rule, Medium->No generates zero error. When the Average Income attribute value is Medium we get the result as No. For the remaining attribute value that is High and Low, we consider the subset of data and continue building the tree. Tree with Average Income as root node is,

Tree with Average Income as root node

Now, for the middle subtree, we write all possible rules and find the total error. Based on the total error table, we will construct the tree.

See also  Linear Regression Solved Example with One Independent Variable

Middle subtree with Average income -> High,

Consolidated rules, errors for individual attributes values, and total error of the attribute are given below.

AttributeRulesErrorTotal Error
City SizeBig->Yes0/21/6
Medium->No1/3
Small->Yes0/1
Local InvestorsYes->Yes1/32/6
No->Yes1/3
Lohas AwarenessHigh->Yes0/31/6
Med->No/Yes1/2
Low->No0/1

From the above table, we can notice that City Size and Lohas Awareness have the same lowest error that is 1/6. Also, both attributes have two rules with zero errors. Hence we have a tie again. The number of rue with errors is 1. Again we have a tie. Now, we will check a number of examples to be considered for both attributes. With respect to City Size, we have left with 3 examples, and Lohas Awareness we have left with 2 examples. Hence we consider Lohas Awareness as the splitting attribute. The Lohas Awareness has three attribute values High, Med, Low. The attribute values High and low generates no error.

The tree with Lohas Awareness as splitting attribute is shown below,

tree with Lohas Awareness as splitting attribute

Middle subtree with Lohas Awareness -> Med,

Consolidated rules, errors for individual attributes values, and total error of the attribute are given below.

AttributeRulesErrorTotal Error
City SizeMed->Yes/No1/21/2
Local InvestorYes->Yes0/20/2
No->No0/1

From the above table, we can notice that Local Investor has the lowest error that is 0/2. Hence we consider the Local Investor as the splitting attribute. Both the rules of attribute Local Investor generate zero errors.

The final decision tree for the given data set is,

final decision tree for the given data set

Also, from the above decision tree the prediction for the new example:

City SizeAvg. IncomeLocal InvestorsLOHAS AwarenessDecision
MedMedNoMedNo

Summary:

In this tutorial, we understood, how to apply Classification And Regression Trees (CART) decision tree algorithm (solved example 3) to construct and find the optimal decision tree for the given Data set with City Size, Avg. Income, Local Investors, LOHAS Awareness attributes. If you like the tutorial share it with your friends. Like the Facebook page for regular updates and YouTube channel for video tutorials.

Leave a Comment

Your email address will not be published. Required fields are marked *