Using Weka 3 for clustering - Computer Science



Assignment 10: Clustering

Try the Following:

(Taken from material posted by Zdravko Markov.)

K-Means Clustering

In Weka Explorer load the training file weather.arff. Get to the Cluster mode (by clicking on the Cluster tab) and select the clustering algorithm SimpleKMeans. Then click on Start and you get the clustering result in the output window. The actual clustering for this algorithm is shown as one instance for each cluster representing the cluster centroid.

Scheme: weka.clusterers.SimpleKMeans -N 2 -S 10

Relation: weather

Instances: 14

Attributes: 5

outlook

temperature

humidity

windy

play

Test mode: evaluate on training data

=== Model and evaluation on training set ===

kMeans

======

Number of iterations: 3

Within cluster sum of squared errors: 16.23745631138724

Cluster centroids:

Cluster 0

Mean/Mode: sunny 75.8889 84.1111 FALSE yes

Std Devs: N/A 6.4893 8.767 N/A N/A

Cluster 1

Mean/Mode: overcast 69.4 77.2 TRUE yes

Std Devs: N/A 4.7223 12.3167 N/A N/A

Clustered Instances

0 9 ( 64%)

1 5 ( 36%)

[pic]

Evaluation

The way Weka evaluates the clusterings depends on the cluster mode you select. Four different cluster modes are available (as buttons in the Cluster mode panel):

1. Use training set (default). After generating the clustering Weka classifies the training instances into clusters according to the cluster representation and computes the percentage of instances falling in each cluster. For example, the above clustering produced by k-means shows 43% (6 instances) in cluster 0 and 57% (8 instances) in cluster 1.

2. In Supplied test set or Percentage split Weka can evaluate clusterings on separate test data if the cluster representation is probabilistic (e.g. for EM).

3. Classes to clusters evaluation. In this mode Weka first ignores the class attribute and generates the clustering. Then during the test phase it assigns classes to the clusters, based on the majority value of the class attribute within each cluster. Then it computes the classification error, based on this assignment and also shows the corresponding confusion matrix. An example of this for k-means is shown below.

Scheme: weka.clusterers.SimpleKMeans -N 2 -S 10

Relation: weather

Instances: 14

Attributes: 5

outlook

temperature

humidity

windy

Ignored:

play

Test mode: Classes to clusters evaluation on training data

=== Model and evaluation on training set ===

kMeans

======

Number of iterations: 3

Within cluster sum of squared errors: 11.237456311387238

Cluster centroids:

Cluster 0

Mean/Mode: sunny 75.8889 84.1111 FALSE

Std Devs: N/A 6.4893 8.767 N/A

Cluster 1

Mean/Mode: overcast 69.4 77.2 TRUE

Std Devs: N/A 4.7223 12.3167 N/A

Clustered Instances

0 9 ( 64%)

1 5 ( 36%)

Class attribute: play

Classes to Clusters:

0 1 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download