After completing the reading this week answer the following questions: Chapter 3: Note the basic concepts in data classif
After completing the reading this week answer the following questions: Chapter 3:
- Note the basic concepts in data classification.
- Discuss the general framework for classification.
- What is a decision tree and decision tree modifier? Note the importance.
- What is a hyper-parameter?
- Note the pitfalls of model selection and evaluation.
Read:
- Chapter 3 in textbook: Classification: Basic Concepts and Techniques
Watch:
attached content
About Press Copyright Contact us Creators Advertise Developers Terms Privacy Policy & Safety How YouTube works Test new features© 2022 Google LLC,
Dr. Oner Celepcikay
ITS 632
ITS 632
Week 4
Classification
Header – dark yellow 24 points Arial Bold
Body text – white 20 points Arial Bold, dark yellow highlights
Bullets – dark yellow
Copyright – white 12 points Arial
Size:
Height: 7.52"
Width: 10.02"
Scale: 70%
Position on slide:
Horizontal – 0"
Vertical – 0"
Machine Learning Methods – Classification
ITS 632
Given a collection of records (training set)
– Each record contains a set of attributes, one of the attributes is the class.
Find a model for class attribute as a function of the values of other attributes.
A test set is used to estimate the accuracy of the model.
Goal: previously unseen records (test set) should be assigned a class as accurately as possible.
Machine Learning – Classification Example
ITS 632
categorical
categorical
continuous
class
Test
Set
Training
Set
Model
Learn
Classifier
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes
No
Married
Single, Divorced
< 80K
> 80K
Splitting Attributes
Model: Decision Tree
Machine Learning – Classification Example
categorical
categorical
continuous
ITS 632
class
MarSt
Refund
TaxInc
YES
NO
NO
NO
Yes
No
Married
Single, Divorced
< 80K
> 80K
There could be more than one tree that fits the same data!
categorical
categorical
continuous
Another Example of Decision Tree
ITS 632
Test Data
Start from the root of tree.
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes
No
Married
Single, Divorced
< 80K
> 80K
Apply Model to Test Data
ITS 632
Test Data
Start from the root of tree.
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes
No
Married
Single, Divorced
< 80K
> 80K
Apply Model to Test Data
ITS 632
Test Data
Start from the root of tree.
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes
No
Married
Single, Divorced
< 80K
> 80K
Apply Model to Test Data
ITS 632
Test Data
Start from the root of tree.
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes
No
Married
Single, Divorced
< 80K
> 80K
Apply Model to Test Data
ITS 632
Test Data
Start from the root of tree.
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes
No
Married
Single, Divorced
< 80K
> 80K
Apply Model to Test Data
ITS 632
Test Data
Start from the root of tree.
Apply Model to Test Data
ITS 632
Assign “Cheat” No
No
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes
No
Married
Single, Divorced
< 80K
> 80K
Machine Learning – Classification Example
ITS 632
categorical
categorical
continuous
class
Model
Learning
Algorithm
Induction
Deduction
General Structure of Hunt’s Algorithm
Let Dt be the set of training records that reach a node t
General Procedure:
If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt
If Dt is an empty set, then t is a leaf node labeled by the default class, yd
If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. Recursively apply the procedure to each subset.
Dt
?
ITS 632
Don’t
Cheat
Refund
Don’t
Cheat
Don’t
Cheat
Yes
No
Refund
Don’t
Cheat
Yes
No
Marital
Status
Don’t
Cheat
Cheat
Single,
Divorced
Married
Taxable
Income
Don’t
Cheat
< 80K
>= 80K
Refund
Don’t
Cheat
Yes
No
Marital
Status
Don’t
Cheat
Cheat
Single,
Divorced
Married
Hunt’s Algorithm
ITS 632
Decision Tree Application to Oil & Gas Data
ITS 632
British Petroleum designed a decision tree for gas-oil separation for offshore oil platforms that replaced an earlier rule-based expert system.
We will do a similar (but simpler) decision tree example towards the end of the semester.
Greedy strategy.
Split the records based on an attribute test that optimizes certain criterion.
Issues
Determine how to split the records
How to specify the attribute test condition?
How to determine the best split?
Determine when to stop splitting
Tree Induction
ITS 632
How to determine the Best Split
ITS 632
Before Splitting: 10 records of class 0, 10 records of class 1
Which test condition is the best?
How to determine the Best Split
ITS 632
Greedy approach:
Nodes with homogeneous class distribution are preferred
Need a measure of node impurity:
Non-homogeneous,
High degree of impurity
Homogeneous,
Low degree of impurity
Measures of Node Impurity
ITS 632
Gini Index
Entropy
Misclassification error
How to Find the Best Split
ITS 632
B?
Yes
No
Node N3
Node N4
A?
Yes
No
Node N1
Node N2
Before Splitting:
M0
M1
M2
M3
M4
M12
M34
Gain = M0 – M12 vs M0 – M34
Measure of Impurity: GINI
ITS 632
Gini Index for a given node t :
Need a measure of node impurity:
(NOTE: p( j | t) is the relative frequency of class j at node t).
Maximum (0.5) when records are equally distributed among all classes, implying least interesting information
Minimum (0.0) when all records belong to one class, implying most interesting information
Examples for computing GINI
ITS 632
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1
Gini = 1 – P(C1)2 – P(C2)2 = 1 – 0 – 1 = 0
P(C1) = 1/6 P(C2) = 5/6
Gini = 1 – (1/6)2 – (5/6)2 = 0.278
P(C1) = 2/6 P(C2) = 4/6
Gini = 1 – (2/6)2 – (4/6)2 = 0.444
Examples for computing GINI
ITS 632
A?
Yes
No
Node N1
Node N2
Gini(N1) = 1 – (4/7)2 – (3/7)2 = 0.4898
Gini(N2) = 1 – (2/5)2 – (3/5)2 = 0.48
Gini(Children) = 7/12 * 0.4898 + 5/12 * 0.48 = 0.486
Examples for computing GINI
ITS 632
B?
Yes
No
Node N1
Node N2
Gini(N1) = 1 – (/)2 – (/)2 =
Gini(N2) = 1 – (/)2 – (/)2 =
Gini(Children) =
Classification error at a node t :
Measures misclassification error made by a node.
Maximum (0.5) when records are equally distributed among all classes, implying least interesting information
Minimum (0) when all records belong to one class, implying most interesting information
Splitting Criteria based on Classification Error
ITS 632
Splitting Criteria based on Classification Error
ITS 632
P(C1) = 0/6 = 0 P(C2) = 6/6 = 1
Error = 1 – max (0, 1) = 1 – 1 = 0
P(C1) = 1/6 P(C2) = 5/6
Error = 1 – max (1/6, 5/6) = 1 – 5/6 = 1/6
P(C1) = 2/6 P(C2) = 4/6
Error = 1 – max (2/6, 4/6) = 1 – 4/6 = 1/3
Greedy strategy.
Split the records based on an attribute test that optimizes certain criterion.
Issues
Determine how to split the records
How to specify the attribute test condition?
How to determine the best split?
Determine when to stop splitting (Next class!)
ANY IDEAS??
Tree Induction
ITS 632
Classification Methods
ITS 632
Decision Tree based Methods
Rule-based Methods
Memory based reasoning
Neural Networks
Naïve Bayes and Bayesian Belief Networks
Support Vector Machines
Tid
Refund
Marital
Status
Taxable
Income
Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
10
Refund
Marital
Status
Taxable
Income
Cheat
No
Single
75K
?
Yes
Married
50K
?
No
Married
150K
?
Yes
Divorced
90K
?
No
Single
40K
?
No
Married
80K
?
10
Tid
Refund
Marital
Status
Taxable
Income
Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
10
Refund
Marital
Status
Taxable
Income
Cheat
No
Single
75K
?
Yes
Married
50K
?
No
Married
150K
?
Yes
Divorced
90K
?
No
Single
40K
?
No
Married
80K
?
10
Tid
Refund
Marital
Status
Taxable
Income
Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
10
Tid
Refund
Marital
Status
Taxable
Income
Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
10
Refund
Marital
Status
Taxable
Income
Cheat
No
Married
80K
?
10
Refund |
Marital Status |
Taxable Income |
Cheat |
No |
Married |
80K |
? |
10
Refund |
Marital Status |
Taxable Income |
Cheat |
No |
Married |
80K |
? |
10
Refund Marital Status
Taxable Income Cheat
No Married 80K ? 10
Refund Marital
Status
Taxable
Income
Cheat
No Married 80K ?
10
Refund |
Marital Status |
Taxable Income |
Cheat |
No |
Married |
80K |
? |
10
Refund |
Marital Status |
Taxable Income |
Cheat |
No |
Married |
80K |
? |
10
Refund Marital Status
Taxable Income Cheat
No Married 80K ? 10
Refund Marital
Status
Taxable
Income
Cheat
No Married 80K ?
10
Refund |
Marital Status |
Taxable Income |
Cheat |
No |
Married |
80K |
? |
10
Refund |
Marital Status |
Taxable Income |
Cheat |
No |
Married |
80K |
? |
10
Tid
Refund
Marital
Status
Taxable
Income
Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
10
Refund
Marital
Status
Taxable
Income
Cheat
No
Single
75K
?
Yes
Married
50K
?
No
Married
150K
?
Yes
Divorced
90K
?
No
Single
40K
?
No
Married
80K
?
10
Tid Refund Marital
Status
Taxable
Income
Cheat
1 Yes Single 125K
No
2 No Married 100K
No
3 No Single 70K
No
4 Yes Married 120K
No
5 No Divorced 95K
Yes
6 No Married 60K
No
7 Yes Divorced 220K
No
8 No Single 85K
Yes
9 No Married 75K
No
10 No Single 90K
Yes
10
Tid |
Refund |
Marital Status |
Taxable Income |
Cheat |
1 |
Yes |
Single |
125K |
No |
2 |
No |
Married |
100K |
No |
3 |
No |
Single |
70K |
No |
4 |
Yes |
Married |
120K |
No |
5 |
No |
Divorced |
95K |
Yes |
6 |
No |
Married |
60K |
No |
7 |
Yes |
Divorced |
220K |
No |
8 |
No |
Single |
85K |
Yes |
9 |
No |
Married |
75K |
No |
10 |
No |
Single |
90K |
Yes |
10
Tid
Refund
Marital
Status
Taxable
Income
Cheat
1
Yes
Single
125K
No
2
No
Married
100K
No
3
No
Single
70K
No
4
Yes
Married
120K
No
5
No
Divorced
95K
Yes
6
No
Married
60K
No
7
Yes
Divorced
220K
No
8
No
Single
85K
Yes
9
No
Married
75K
No
10
No
Single
90K
Yes
10
Own
Car?
C0: 6
C1: 4
C0: 4
C1: 6
C0: 1
C1: 3
C0: 8
C1: 0
C0: 1
C1: 7
Car
Type?
C0: 1
C1: 0
C0: 1
C1: 0
C0: 0
C1: 1
Student
ID?
…
Yes
No
Family
Sports
Luxuryc
1
c
10
c
20
C0: 0
C1: 1
…
c
11
Own Car?�
C0: 6 C1: 4�
C0: 4 C1: 6�
Car Type?�
C0: 1 C1: 3�
C0: 8 C1: 0�
C0: 1 C1: 7�
C0: 1 C1: 0�
C0: 1 C1: 0�
C0: 0 C1: 1�
Student ID?�
…�
Yes�
No�
Family�
Sports�
Luxury�
c1�
c10�
c20�
C0: 0 C1: 1�
…�
c11�
C0: 5
C1: 5
C0: 9
C1: 1
C0: 5 C1: 5�
C0: 9 C1: 1�
C0 N10
C1 N11
C0 N20
C1 N21
C0 N30
C1 N31
C0 N40
C1 N41
C0 N00
C1 N01
C0 |
N40 |
C1 |
N41 |
C0 |
N00 |
C1 |
N01 |
C0 |
N10 |
C1 |
N11 |
C0 |
N20 |
C1 |
N21 |
C0 |
N30 |
C1 |
N31 |
å
–
=
j
t
j
p
t
GINI
2
)]
|
(
[
1
)
(
C1
0
C2
6
Gini=0.000
C1
2
C2
4
Gini=0.444
C1
3
C2
3
Gini=0.500
C1
1
C2
5
Gini=0.278
C1
1
C2
5
Gini=0.278
C1
0
C2
6
Gini=0.000
C1
2
C2
4
Gini=0.444
C1
3
C2
3
Gini=0.500
C1
0
C2
6
C1
2
C2
4
C1
1
C2
5
C1 |
0 |
C2 |
6 |
C1 |
2 |
C2 |
4 |
C1 |
1 |
C2 |
5 |
Parent
C1
6
C2
6
Gini = 0.500
N1 N2 C1 4 2 C2 3 3 Gini=0.486
N1 N2
C1 4 2
C2 3 3
Gini=0. 486
|
Parent |
C1 |
6 |
C2 |
6 |
Gini = 0.500 |
|
N1 |
N2 |
C1 |
4 |
2 |
C2 |
3 |
3 |
Gini=0.486 |
N1 N2 C1 1 5 C2 4 2
Gini=?
N1 N2
C1 1 5
C2 4 2
Gini= ?
|
Parent |
C1 |
6 |
C2 |
6 |
Gini = 0.500 |
|
N1 |
N2 |
C1 |
1 |
5 |
C2 |
4 |
2 |
Gini=? |
)
|
(
max
1
)
(
t
i
P
t
Error
i
–
=
C1 |
1 |
C2 |
5 |
C1 |
0 |
C2 |
6 |
C1 |
2 |
C2 |
4 |
</
Collepals.com Plagiarism Free Papers
Are you looking for custom essay writing service or even dissertation writing services? Just request for our write my paper service, and we'll match you with the best essay writer in your subject! With an exceptional team of professional academic experts in a wide range of subjects, we can guarantee you an unrivaled quality of custom-written papers.
Get ZERO PLAGIARISM, HUMAN WRITTEN ESSAYS
Why Hire Collepals.com writers to do your paper?
Quality- We are experienced and have access to ample research materials.
We write plagiarism Free Content
Confidential- We never share or sell your personal information to third parties.
Support-Chat with us today! We are always waiting to answer all your questions.
All Rights Reserved Terms and Conditions
College pals.com Privacy Policy 2010-2018