Andy Menz - University of Washington



Andy Menz

SpamDiagnostic Documentation

SpamDiagnostic is a GUI program designed for use with FeatureFinder diagnostic output files. Diagnostic files are in the form “diagnostic_.txt”, where is the parameter passed to FeatureFinder – see FeatureFinder documentation for more details. SpamDiagnostic is a Visual Basic 6.0 program that has been compiled into an executable. Diagnostic data is displayed in fields which include elapsed time, all parameters used when running FeatureFinder, the feature vectors used, and many others. SpamDiagnostic was written by Andy Menz solely for this project.

Using SpamDiagnostic

1. Open SpamDiagnostic.exe in a Microsoft Windows Environment

2. Select a diagnostic_*.txt file from the file windows

3. Double click on the file or click the “Open File” button.

4. View the diagnostic information in the text boxes

Run Data Frame

The Run Data Frame contains basic run time information.

Output File Name – the name of the .arff output when this instance of FeatureFinder was processed

Start Time – the time the FeatureFinder instance was run

End Time – the time this instance of FeatureFinder terminated

Elapsed Time (s) – the time, in seconds, that this instance took to run

Total Messages – the number of email messages analyzed

Total Features – the total number of features (words) analyzed

Parameters Frame

The Parameters Frame includes all parameters used when this instance of FeatureFinder was called.

Feature Vector Type – the type of feature vector used

2-Value Discretized – whether the feature values where converted to 2-value nominal attributes

Word Stemming – whether word stemming was applied

Stop Terms – whether stop terms were applied

Feature Vector Size – the size of the final feature vector (# of features used per message)

Num Words Stemmed – the number of words that changed in the word stemmer

Top Features Frame

The Top Features Frame includes all the data in the other frames, plus a list of the final feature vector. Note that the features are ordered from greatest to least in terms of the feature’s mutual information gain. Thus features appearing at the top of the list are more useful in classifying the document.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download