Knock Email into shape



Knock Email into shape

Email management tool is needed to prevent spam occupying entire volume and retrieve relevant emails. Email is one of the most efficient, fast and inexpensive way to communicate to a large number of people, but, at the same time, unsolicited bulk email are bother us.

One solution for spam filtering is Greylisting which blocking significant amounts of spam at the mail server level. Graylisting is implemented without resorting to statistical analysis approaches. Consequently, implementations are fairly lightweight, and may even decrease network traffic and processor load on your mail server.

On the other hand, Graylisting have weakness points. Spammer can easily tricks us by using common personal name or area name as email sender. The Graylisting would accept the sender as white-list (the e-mail addresses of the people/companies you wish to hear from). In opposite way, it might assort my friend mail as black-list, because of it have some analogy with black-mail.

Generally, email classification methods are based on text data mining. A text classifier is a system that classifies texts into the discrete sets of predefined categories. For the email classification, incoming messages will be classified as spam or legitimate using classification methods.

However, none of algorithm can classify spam perfectly because of large and various numbers of features in the dataset. In many document datasets, to classifying documents only a few percent of the total features may be useful, and it might generate different result while using all the features. The quality of training dataset decides the performance of both the text classification algorithms and feature selection algorithms. An ideal training document dataset for each particular category will include all the important terms and their possible distribution in the category.

This challenge of spam filtering already has been indicated, and a dozen different approaches to the problem that have been implemented. Recently, most of filtering algorithm has been done using the techniques such as decision trees, Naïve Bayesian classifiers, neural networks, etc. To address the problem of growing volumes of unsolicited emails, many different methods for email filtering are being deployed in many commercial products.

Generally, the classification procedure using the Neural Network consists of three steps, data preprocessing, data training, and testing. NN algorithm has been extended from text data mining to graph mining known as k-NN algorithm. The k-nearest neighbor algorithm is a method for classifying objects based on closest training examples in the feature space. k-NN is a type of instance-based learning, or lazy learning where the function is only approximated locally and all computation is deferred until classification. It can also be used for regression.

SVM is similar to k-nn which is a relatively new learning process influenced highly by advances in statistical learning theory. SVMs have led to a growing number of applications in image classification and handwriting recognition.

Naive Bayes classifier is a simple probabilistic classifier based on applying Bayes' theorem with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be "independent feature model".

It is true that spammers are continually developing new tricks to get their messages past spam filters. For example, tricks that used by spammer to get through Bayesian filters. Bayesian filters can estimate the probability that an email is spam by computing the product of individual probabilities for each feature. The effectiveness of the estimation is largely based on the set of features selected and the accuracy of the estimation of the individual feature probability. As spammers shrewd way expand gradually, spam filtering techniques also become advanced.

Still, there is no universal solution to spam. Each user has one’s own specific requirements. For instance, a person working for an e-commerce site will probably not be willing to lose a single client due to gray list challenges. In such a case, people have to be careful with SMTP filtering. Perhaps we can filter e-mail which coming from the area that our company does not have trade relations. In addition, some people may consider about losing an important e-mail because of spam filtering

However, by creating a spam filter with advanced algorithms, a filter will be user customized, scalable, and modularized, so it can be embedded to many other systems. Moreover, if some applications become popular, people sending e-mail to an address for the first time will gradually get used to possibly receiving a confirmation request from the recipient. An instruction providing E-mail manner may even become to be considered. All this small pieces of inconvenience is after all due to some people abusing common Internet e-mail resources.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download