Data Mining Classification: Basic Concepts, Decision Trees ...
๏ปฟData Mining Classification: Basic Concepts, Decision
Trees, and Model Evaluation
Lecture Notes for Chapter 4
Introduction to Data Mining
by Tan, Steinbach, Kumar
? Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
1
Classification: Definition
O Given a collection of records (training set )
? Each record contains a set of attributes, one of the attributes is the class.
O Find a model for class attribute as a function of the values of other attributes.
O Goal: previously unseen records should be assigned a class as accurately as possible.
? A test set is used to determine the accuracy of the model. Usually, the given data set is divided into training and test sets, with training set used to build the model and test set used to validate it.
? Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
2
Illustrating Classification Task
Tid 1 2 3 4 5 6 7 8 9 10
10
Attrib1 Attrib2
Yes
Large
No
Medium
No
Small
Yes
Medium
No
Large
No
Medium
Yes
Large
No
Small
No
Medium
No
Small
Attrib3 125K 100K 70K 120K 95K 60K 220K 85K 75K 90K
Training Set
Class No No No No Yes No No Yes No Yes
Tid 11 12 13 14 15
10
Attrib1 Attrib2
No
Small
Yes
Medium
Yes
Large
No
Small
No
Large
Attrib3 55K 80K 110K 95K 67K
Test Set
Class ? ? ? ? ?
Learning algorithm
Induction Learn Model
Apply Model
Deduction
? Tan,Steinbach, Kumar
Introduction to Data Mining
Model
4/18/2004
3
Examples of Classification Task
O Predicting tumor cells as benign or malignant
O Classifying credit card transactions as legitimate or fraudulent
O Classifying secondary structures of protein as alpha-helix, beta-sheet, or random coil
O Categorizing news stories as finance, weather, entertainment, sports, etc
? Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
4
Classification Techniques
O Decision Tree based Methods O Rule-based Methods O Memory based reasoning O Neural Networks O Na?ve Bayes and Bayesian Belief Networks O Support Vector Machines
? Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
5
Example of a Decision Tree
categoriccaal tegorical continuoucslass
Tid Refund Marital Taxable Status Income Cheat
1 Yes 2 No 3 No 4 Yes 5 No 6 No 7 Yes 8 No 9 No 10 No
10
Single 125K No
Married 100K No
Single 70K
No
Married 120K No
Divorced 95K
Yes
Married 60K
No
Divorced 220K No
Single 85K
Yes
Married 75K
No
Single 90K
Yes
Training Data
Splitting Attributes
Refund
Yes
No
NO
MarSt
Single, Divorced
Married
TaxInc
NO
< 80K
> 80K
NO
YES
Model: Decision Tree
? Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
6
Another Example of Decision Tree
categoriccaal tegorical continuoucslass
Tid Refund Marital Taxable Status Income Cheat
1 Yes 2 No 3 No 4 Yes 5 No 6 No 7 Yes 8 No 9 No 10 No
10
Single 125K No
Married 100K No
Single 70K
No
Married 120K No
Divorced 95K
Yes
Married 60K
No
Divorced 220K No
Single 85K
Yes
Married 75K
No
Single 90K
Yes
Married NO
MarSt Single, Divorced
Refund
Yes
No
NO
TaxInc
< 80K
> 80K
NO
YES
There could be more than one tree that fits the same data!
? Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
7
Decision Tree Classification Task
Tid 1 2 3 4 5 6 7 8 9 10
10
Attrib1 Attrib2
Yes
Large
No
Medium
No
Small
Yes
Medium
No
Large
No
Medium
Yes
Large
No
Small
No
Medium
No
Small
Attrib3 125K 100K 70K 120K 95K 60K 220K 85K 75K 90K
Training Set
Class No No No No Yes No No Yes No Yes
Tid 11 12 13 14 15
10
Attrib1 Attrib2
No
Small
Yes
Medium
Yes
Large
No
Small
No
Large
Attrib3 55K 80K 110K 95K 67K
Test Set
Class ? ? ? ? ?
Tree Induction algorithm Induction
Learn Model
Apply Model
Deduction
? Tan,Steinbach, Kumar
Introduction to Data Mining
Model
Decision Tree
4/18/2004
8
Apply Model to Test Data
Start from the root of tree.
Refund
Yes
No
Test Data
Refund Marital Taxable Status Income Cheat
No
Married 80K
?
10
NO
MarSt
Single, Divorced
Married
< 80K
TaxInc
NO > 80K
NO
YES
? Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
9
Apply Model to Test Data
Refund
Yes
No
Test Data
Refund Marital Taxable Status Income Cheat
No
Married 80K
?
10
NO
MarSt
Single, Divorced
Married
< 80K
TaxInc
NO > 80K
NO
YES
? Tan,Steinbach, Kumar
Introduction to Data Mining
4/18/2004
10
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
- data classification security framework v5
- data classification and handling
- data classification methodology connecticut
- the definitive guide to data classification
- data mining classification basic concepts decision trees
- data classification preprocessing overfitting in decision
- cuny data classification standard
- data classification university of massachusetts medical
- data classification and data types home home
Related searches
- 4 basic concepts of development
- basic concepts of information systems
- development of basic concepts list
- the basic concepts of information systems
- describe the basic concepts of information systems
- basic concepts of education
- computer network basic concepts pdf
- basic concepts developmental chart
- basic concepts and types of information systems
- basic concepts of information system
- basic concepts of evolution
- basic concepts of chemistry pdf