Using Weka 3 for clustering - Computer Science
Assignment 10: Clustering
Try the Following:
(Taken from material posted by Zdravko Markov.)
K-Means Clustering
In Weka Explorer load the training file weather.arff. Get to the Cluster mode (by clicking on the Cluster tab) and select the clustering algorithm SimpleKMeans. Then click on Start and you get the clustering result in the output window. The actual clustering for this algorithm is shown as one instance for each cluster representing the cluster centroid.
Scheme: weka.clusterers.SimpleKMeans -N 2 -S 10
Relation: weather
Instances: 14
Attributes: 5
outlook
temperature
humidity
windy
play
Test mode: evaluate on training data
=== Model and evaluation on training set ===
kMeans
======
Number of iterations: 3
Within cluster sum of squared errors: 16.23745631138724
Cluster centroids:
Cluster 0
Mean/Mode: sunny 75.8889 84.1111 FALSE yes
Std Devs: N/A 6.4893 8.767 N/A N/A
Cluster 1
Mean/Mode: overcast 69.4 77.2 TRUE yes
Std Devs: N/A 4.7223 12.3167 N/A N/A
Clustered Instances
0 9 ( 64%)
1 5 ( 36%)
[pic]
Evaluation
The way Weka evaluates the clusterings depends on the cluster mode you select. Four different cluster modes are available (as buttons in the Cluster mode panel):
1. Use training set (default). After generating the clustering Weka classifies the training instances into clusters according to the cluster representation and computes the percentage of instances falling in each cluster. For example, the above clustering produced by k-means shows 43% (6 instances) in cluster 0 and 57% (8 instances) in cluster 1.
2. In Supplied test set or Percentage split Weka can evaluate clusterings on separate test data if the cluster representation is probabilistic (e.g. for EM).
3. Classes to clusters evaluation. In this mode Weka first ignores the class attribute and generates the clustering. Then during the test phase it assigns classes to the clusters, based on the majority value of the class attribute within each cluster. Then it computes the classification error, based on this assignment and also shows the corresponding confusion matrix. An example of this for k-means is shown below.
Scheme: weka.clusterers.SimpleKMeans -N 2 -S 10
Relation: weather
Instances: 14
Attributes: 5
outlook
temperature
humidity
windy
Ignored:
play
Test mode: Classes to clusters evaluation on training data
=== Model and evaluation on training set ===
kMeans
======
Number of iterations: 3
Within cluster sum of squared errors: 11.237456311387238
Cluster centroids:
Cluster 0
Mean/Mode: sunny 75.8889 84.1111 FALSE
Std Devs: N/A 6.4893 8.767 N/A
Cluster 1
Mean/Mode: overcast 69.4 77.2 TRUE
Std Devs: N/A 4.7223 12.3167 N/A
Clustered Instances
0 9 ( 64%)
1 5 ( 36%)
Class attribute: play
Classes to Clusters:
0 1 ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related download
Related searches
- ideas for computer science project
- computer science projects for students
- project topics for computer science students
- computer science for beginners pdf
- is computer science for me
- is computer science right for me
- study computer science for free
- mathematics for computer science pdf
- salaries for computer science major
- computer science articles for students
- computer science basics for beginners
- computer science for high schoolers