Welcome to IJSDR ISSN: 2455-2631



Heart Disease Prediction System Using Machine Learning & Data Mining TechniqueTeena Varma#1 , Gaurav Kanojia #2, Hemant Gosavi#3, Atharva Jadhav#4, Vijayram Kanojiya#5Department of Computer Engineering, Mumbai University#1teena.v@xavier.ac.in, #2kanojiagaurav12345@#3hghemantgosavi777@#4atharvasj780@#5vijaykanojiya101@ Abstract - Machine Learning algorithms have a large number of applications in all the fields. One of them is implementation of machine learning algorithms is in the field of healthcare. Medical facilities need to be updated so that better decisions for diagnosing patients and treatment options can be done. Machine Learning in medical field helps doctors to process huge and complex medical datasets and then analyze them into clinical observations. This can further be used by doctors in providing medical care. Hence, machine learning when implemented in healthcare can lead to increased patient satisfaction. In this paper, we will implement functionalities of machine learning for predicting heart disease. Some cases can occur when early diagnosis of a disease is not possible; hence, disease prediction can be effectively implemented. As it is widely said, “prevention is better than cure”, prediction of diseases and epidemic outbreak would lead to early prevention of an occurrence of a disease.Keywords - Big Data, healthcare, Machine learning, K-mean algorithm, Random ForestINTRODUCTIONData Mining is a process of experimenting data from using different ways and gathering the important knowledge from it. The discovered knowledge is often used for various applications for instance healthcare industry. Nowadays healthcare industry generates great deal of knowledge about patients, disease diagnosis etc. data processing provides a group of techniques to get hidden patterns from data. a serious challenge facing Healthcare industry is quality of service. The better the quality, the better the treatments provided to patients. Poor diagnosis can cause disastrous consequences which are unacceptable.The main focus is on to use machine learning in healthcare to supplement patient look after better results. Machine learning has made easier to spot different diseases and diagnosis correctly. Predictive analysis with the assistance of efficient multiple machine learning algorithms helps to predict the disease more correctly and help treat patients.The healthcare industry produces large amounts of health- care data daily which will be wont to extract information for predicting disease which will happen to a patient in future while using the treatment history and health data. This hidden information within the healthcare data are going to be later used for affective deciding for patient’s health. Medical field also needs improvement by using the informative data in healthcare.One such implementation of machine learning algorithms is within the field of healthcare. Medical facilities got to be advanced in order that better decisions for patient diagnosis and treatment options are often made. Machine learning in healthcare aids the humans to process huge and sophisticated medical datasets then analyze them into clinical insights. This then can further be employed by physicians in providing medical aid. Hence machine learning when implemented in healthcare can results in increased patient satisfaction. The heart is vital organ or a part of our body. Life is itself hooked in to efficient working of heart. If operation of heart isn't proper, it'll affect the opposite body parts of human like brain, kidney etc. it's nothing quite a pump, which pumps blood through the body. Inefficient blood circulation in the body especially in the organs like brain suffer and if heart causes rapid death. Life is totally hooked in to efficient working of the guts. The term heart condition refers to disease of heart & vessel system within it.BACKGROUNDHeart Disease PredictionHDP aims to determine an appropriate treatment, which can be effective or less effective, depend on the likeliness of having a disease. To make a good prediction, HDP needs 12 characteristics, which are 1: Age ( in years); 2: Sex;3: Cigarettes smoked by the person per day; 4: Resting blood pressure ; 5: Serum Cholesterol in terms of mg/dl ; 6 : Fasting blood sugar ; 7 : Resting Electrocardiographic Results ; 8 : Maximum Heart Rate achieved ; 9: Angina caused by exercise ; 10 : ST depression caused by exercise relative to rest 11: the slope of the peak exercise ST segment ; 12 : Count of major vessels Machine Learning ApproachesMachine learning is a branch of Artificial Intelligence that gives a computer or any computing device the capability to learn without being manually programmed. It is a branch of artificial intelligence. It can use many different ways such as statistics, probabilities, boolean logic, absolute conditionality, and unconventional optimization strategies to classify patterns or to build prediction models. Machine learning is of 2 types: supervised learning ( using classification) and unsupervised learning depending on the previous used data and their availability. These are some of the used algorithms.Decision TreesOne of the most commonly used algorithms in medical application is the decision tree. It makes the use of graphs to represent a decision tree. Research has shown that amongst three diverse data mining techniques, decision tree is the best with 99.6% approximation. Within decision trees, further research on various medical datasets showed that CART is the best in terms of accuracy and time complexitySupport Vector MachinesSupport Vector Machin is a supervised learning algorithms and it reduces the overflowing of trained data. Its goal is to find the optimized decision boundaries to help predict heart disease at the earlier stage.Random forests Classifier Random forests Classifier is a supervised learning algorithm. It does both classification and regression. It is also very flexible and easily usable algorithm. A forest is contains multiple trees. Robustness of the forest depends on the number of trees in it. Random forests obtains the best solution by creating decision trees on randomly selected data samples, gets prediction from each tree and then voting. It also helps to find feature importance.K-Nearest Neighbor(KNN) Classification K Nearest Neighbor(KNN) is a very simple, easy to understand and one of the most efficient machine learning algorithms. KNN has a many applications in finance, healthcare, political science, handwriting detection, image recognition and video recognition. In credit rating, financial institutes predict the credit rating of customers. In loan disbursement, banking institutes predict whether the loan is safe or risky. KNN algorithm can also be used for both classification and regression problems. KNN algorithm based on feature similarity approach.PROPOSED ALGORITHMSThis paper delves into the performance of some of the machine learning algorithms like Random Forest Classifier, Decision Tree Classifier, K Means as it is a very popular choice from classification problems. However, there been a rise in popularity of other algorithms such as Na?ve Bayesian Classifier and k-Nearest Neighbours in the field of medical research. Therefore, I have decided to compare a few popular algorithm (SVM, K Means, DTC) along with some other algorithms that are gaining popularity(Na?ve Bayes and k-Nearest Neighbours)KNN ClassificationAlgorithm :Divide the dataset into training and testing setSelect an instance and find its distance from the training setArrange the distances in ascending orderClass of the instance is the most common class of of the first ?k? training instancesDecision Tree AlgorithmAlgorithm :Choose the best attribute using Attribute Selection Measures(ASM) to divide the records.Make it a decision node and split the dataset into smaller sub-datasets.Start making trees by repeating this process recursively for each child dataset until one of the conditions will match:All the tuples can be classified to the same attribute value.There are no more remaining attributes.There are no more instances.Random Forests Classifier Algorithm :Randomly choose samples from a given dataset.Construct a decision tree for each chosen sample and get a predict a result from each decision tree.Perform a vote for each foreseen result.The prediction result with the most votes is the final prediction.Support Vector MachineAlgorithm:Prepare and format datasetNormalize the datasetSelect activating function(usually sigmoid)Enhance parameters c and g using search algorithm after cross validationTrain SVM networkTest SVM networkNa?ve Bayesian ClassifierAlgorithm:Split the data into a block of 2 classes and 2 sets of featuresCalculate the standard deviation and mean for each feature and classCalculate the probability of each feature using density of normal distributionCalculate probability of each class as a multiplication of probabilities of all featuresPredict class of an instance using the probabilitiesOBSERVATIONS AND RESULTSPerformance of Support Vector Machine, K-Menas and Logistic Regression was poor before any kind of data scaling and cross validation was used. The observations are as followsTable 1: comparison between KNN ,LR, RFC, DTC, SVM, NBAlgorithmAccuracyKNN95. 4295 %LR85.00RFC99.29 %DTC96.07%SVM86.43 %K-Means81.2332 %Naive Bayes84.29 %Fig 1: Boxplot of performance of RFC,KNN,DTC,NB,SVM, , LR, KMEANSFig 2: Histogram of the frequency of values of all columnsFig. 3 K Neighbors Classifier scores for different K valuesCONCLUSIONThe 7 algorithms were implemented on the dataset from Kaggle () and I will be using Machine Learning to predict whether any person will suffer or is suffering from heart disease. The observation was that Random Forest Classifier has the highest accuracy of 99.29 % however Decision Tree Classifier has a good accuracy of 96.07% % and if the dataset is larger, the computational costs of the Random Forest Classifier will increase.FUTURE SCOPEIn the future I will test these algorithms on different datasets with a larger number of instances so that we can confirm the conclusions we have made in this paper. It would be better to use real life datasets from different fields of science to exhaustively test these algorithms and compare their performances.REFERENCES1. Detrano, R., Janosi, A., Steinbrunn, W., Pfisterer, M., Schmid, J., Sandhu, S., Guppy, K., Lee, S., & Froelicher, V. (1989). International application of a new probability algorithm for the diagnosis of coronary artery disease. American Journal of Cardiology, 64,304--310.2. David W. Aha & Dennis Kibler. "Instance-based prediction of heart-disease presence with the Cleveland database." 3. Gennari, J.H., Langley, P, & Fisher, D. (1989). Models of incremental concept formation. Artificial Intelligence, 40, 11--61. ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download