Sudodj.weebly.com



Program1 : Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis based on a given set of training data samples. Read the training data from a .CSV file.Algorithm : Find-S, a Maximally Specific Hypothesis Learning AlgorithmStep1: Initialize h to the most specific hypothesis in H Step2: For each positive training instance x For each attribute constraint ai in h If the constraint ai in h is satisfied by x then do nothing else replace ai in h by the next more general constraint that is satisfied by x Step3 : Output hypothesis hinput.csv:SunnyWarmNormalStrongWarmSameYesSunnyWarmHighStrongWarmSameYesRainyColdHighStrongWarmChangeNoSunnyWarmHighStrongCoolChangeYesProgram:import randomimport csvattributes = [['Sunny','Rainy'], ['Warm','Cold'], ['Normal','High'], ['Strong','Weak'], ['Warm','Cool'], ['Same','Change']]num_attributes = len(attributes)print (" \n The most general hypothesis : ['?','?','?','?','?','?']\n")print ("\n The most specific hypothesis : ['0','0','0','0','0','0']\n")a = []print("\n The Given Training Data Set \n")with open('input.csv', 'r') as csvFile: reader = csv.reader(csvFile) for row in reader: a.append (row) print(row)print("\n The initial value of hypothesis: ")hypothesis = ['0'] * num_attributesprint(hypothesis)# Comparing with First Training Example for j in range(0,num_attributes): hypothesis[j] = a[0][j];# Comparing with Remaining Training Examples of Given Data Setprint("\n Find S: Finding a Maximally Specific Hypothesis\n")for i in range(0,len(a)): if a[i][num_attributes]=='Yes': for j in range(0,num_attributes): if a[i][j]!=hypothesis[j]: hypothesis[j]='?' else : hypothesis[j]= a[i][j] print(" For Training Example No :{0} the hypothesis is ".format(i),hypothesis) print("\n The Maximally Specific Hypothesis for a given Training Examples :\n")print(hypothesis) Output: The most general hypothesis : ['?','?','?','?','?','?'] The most specific hypothesis : ['0','0','0','0','0','0'] The Given Training Data Set ['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'Yes']['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'Yes']['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'No']['sunny', 'warm', 'high', 'strong', 'cool', 'change', 'Yes'] The initial value of hypothesis: ['0', '0', '0', '0', '0', '0'] Find S: Finding a Maximally Specific Hypothesis For Training Example No :0 the hypothesis is ['sunny', 'warm', 'normal', 'strong', 'warm', 'same'] For Training Example No :1 the hypothesis is ['sunny', 'warm', '?', 'strong', 'warm', 'same'] For Training Example No :2 the hypothesis is ['sunny', 'warm', '?', 'strong', 'warm', 'same'] For Training Example No :3 the hypothesis is ['sunny', 'warm', '?', 'strong', '?', '?'] The Maximally Specific Hypothesis for a given Training Examples :['sunny', 'warm', '?', 'strong', '?', '?']Problem 2: For a given set of training data examples stored in a .CSV file, implement and demonstrate the Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent with the training examples.### Candidate Elimination AlgorithmG :maximally general hypotheses in HS :maximally specific hypotheses in HFor each training example d=<x,c(x)>Case 1 : If d is a positive example Remove from G any hypothesis that is inconsistent with d For each hypothesis s in S that is not consistent with d Remove s from S. Add to S all minimal generalizations h of s such that h consistent with d Some member of G is more general than h Remove from S any hypothesis that is more general than another hypothesis in SCase 2: If d is a negative example Remove from S any hypothesis that is inconsistent with d For each hypothesis g in G that is not consistent with d Remove g from G. Add to G all minimal specializations h of g such that h consistent with d Some member of S is more specific than h Remove from G any hypothesis that is less general than another hypothesis in Ginput.csv:SunnyWarmNormalStrongWarmSameYSunnyWarmHighStrongWarmSameYRainyColdHighStrongWarmChangeNSunnyWarmHighStrongCoolChangeYProgram:import randomimport csvdef g_0(n): return ("?",)*ndef s_0(n): return ('0',)*ndef more_general(h1, h2): more_general_parts = [] for x, y in zip(h1, h2): mg = x == "?" or (x != "0" and (x == y or y == "0")) more_general_parts.append(mg) return all(more_general_parts)l1 = [1, 2, 3]l2 = [3, 4, 5]list(zip(l1, l2))# min_generalizationsdef fulfills(example, hypothesis): ### the implementation is the same as for hypotheses: return more_general(hypothesis, example)def min_generalizations(h, x): h_new = list(h) for i in range(len(h)): if not fulfills(x[i:i+1], h[i:i+1]): h_new[i] = '?' if h[i] != '0' else x[i] return [tuple(h_new)]min_generalizations(h=('0', '0' , 'sunny'), x=('rainy', 'windy', 'cloudy'))def min_specializations(h, domains, x): results = [] for i in range(len(h)): if h[i] == "?": for val in domains[i]: if x[i] != val: h_new = h[:i] + (val,) + h[i+1:] results.append(h_new) elif h[i] != "0": h_new = h[:i] + ('0',) + h[i+1:] results.append(h_new) return resultsmin_specializations(h=('?', 'x',), domains=[['a', 'b', 'c'], ['x', 'y']], x=('b', 'x'))with open('input.csv') as csvFile: examples = [tuple(line) for line in csv.reader(csvFile)]examplesdef get_domains(examples): d = [set() for i in examples[0]] for x in examples: for i, xi in enumerate(x): d[i].add(xi) return [list(sorted(x)) for x in d]get_domains(examples)def candidate_elimination(examples): domains = get_domains(examples)[:-1] G = set([g_0(len(domains))]) S = set([s_0(len(domains))]) i=0 print("\n G[{0}]:".format(i),G) print("\n S[{0}]:".format(i),S) for xcx in examples: i=i+1 x, cx = xcx[:-1], xcx[-1] # Splitting data into attributes and decisions if cx=='Y': # x is positive example G = {g for g in G if fulfills(x, g)} S = generalize_S(x, G, S) else: # x is negative example S = {s for s in S if not fulfills(x, s)} G = specialize_G(x, domains, G, S) print("\n G[{0}]:".format(i),G) print("\n S[{0}]:".format(i),S) returndef generalize_S(x, G, S): S_prev = list(S) for s in S_prev: if s not in S: continue if not fulfills(x, s): S.remove(s) Splus = min_generalizations(s, x) ## keep only generalizations that have a counterpart in G S.update([h for h in Splus if any([more_general(g,h) for g in G])]) ## remove hypotheses less specific than any other in S S.difference_update([h for h in S if any([more_general(h, h1) for h1 in S if h != h1])]) return Sdef specialize_G(x, domains, G, S): G_prev = list(G) for g in G_prev: if g not in G: continue if fulfills(x, g): G.remove(g) Gminus = min_specializations(g, domains, x) ## keep only specializations that have a conuterpart in S G.update([h for h in Gminus if any([more_general(h, s) for s in S])]) ## remove hypotheses less general than any other in G G.difference_update([h for h in G if any([more_general(g1, h) for g1 in G if h != g1])]) return Gcandidate_elimination(examples)Output:[(1, 3), (2, 4), (3, 5)][('rainy', 'windy', '?')][('a', 'x'), ('c', 'x'), ('?', '0')][('Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same', 'Y'), ('Sunny', 'Warm', 'High', 'Strong', 'Warm', 'Same', 'Y'), ('Rainy', 'Cold', 'High', 'Strong', 'Warm', 'Change', 'N'), ('Sunny', 'Warm', 'High', 'Strong', 'Cool', 'Change', 'Y')][['Rainy', 'Sunny'], ['Cold', 'Warm'], ['High', 'Normal'], ['Strong'], ['Cool', 'Warm'], ['Change', 'Same'], ['N', 'Y']] G[0]: {('?', '?', '?', '?', '?', '?')} S[0]: {('0', '0', '0', '0', '0', '0')} G[1]: {('?', '?', '?', '?', '?', '?')} S[1]: {('Sunny', 'Warm', 'Normal', 'Strong', 'Warm', 'Same')} G[2]: {('?', '?', '?', '?', '?', '?')} S[2]: {('Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same')} G[3]: {('Sunny', '?', '?', '?', '?', '?'), ('?', 'Warm', '?', '?', '?', '?'), ('?', '?', '?', '?', '?', 'Same')} S[3]: {('Sunny', 'Warm', '?', 'Strong', 'Warm', 'Same')} G[4]: {('Sunny', '?', '?', '?', '?', '?'), ('?', 'Warm', '?', '?', '?', '?')} S[4]: {('Sunny', 'Warm', '?', 'Strong', '?', '?')}Program3: Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate data set for building the decision tree and apply this knowledge to classify a new sample. Algorithm :iunput.csv:PlayTennisOutlookTemperatureHumidityWind0NoSunnyHotHighWeak1NoSunnyHotHighStrong2YesOvercastHotHighWeak3YesRainMildHighWeak4YesRainCoolNormalWeak5NoRainCoolNormalStrong6YesOvercastCoolNormalStrong7NoSunnyMildHighWeak8YesSunnyCoolNormalWeak9YesRainMildNormalWeak10YesSunnyMildNormalStrong11YesOvercastMildHighStrong12YesOvercastHotNormalWeak13NoRainMildHighStrongImport Play Tennis Dataimport pandas as pdimport numpy as npdf_tennis = pd.read_csv('input.csv')from collections import Counterdef entropy_list(a_list): cnt = Counter(x for x in a_list) num_instance = len(a_list)*1.0 probs = [x/num_instance for x in cnt.values()] return entropy(probs)import mathdef entropy(probs): return sum([-prob*math.log(prob,2) for prob in probs])def info_gain(df,split,target,trace=0): df_split = df.groupby(split) nobs = len(df.index)*1.0 df_agg_ent = df_split.agg({ target:[entropy_list, lambda x: len(x)/nobs] }) df_agg_ent.columns = ['Entropy','PropObserved'] new_entropy = sum( df_agg_ent['Entropy'] * df_agg_ent["PropObserved"]) old_entropy = entropy_list(df[target]) return old_entropy - new_entropydef id3(df,target,attribute_name,default_class = None): cnt = Counter(x for x in df[target]) if len(cnt)==1: return next(iter(cnt)) elif df.empty or (not attribute_name): return default_class else: default_class = max(cnt.keys()) gains = [info_gain(df,attr,target) for attr in attribute_name] index_max = gains.index(max(gains)) best_attr = attribute_name[index_max] tree = { best_attr:{ } } remaining_attr = [x for x in attribute_name if x!=best_attr] for attr_val, data_subset in df.groupby(best_attr): subtree = id3(data_subset,target,remaining_attr,default_class) tree[best_attr][attr_val] = subtree return treedef classify(instance,tree,default = None): attribute = next(iter(tree)) if instance[attribute] in tree[attribute].keys(): result = tree[attribute][instance[attribute]] if isinstance(result,dict): return classify(instance,result) else: return result else: return defaultattribute_names = list(df_tennis.columns)attribute_names.remove('PlayTennis') #Remove the class attributetree = id3(df_tennis,'PlayTennis',attribute_names)print("\n\nThe Resultant Decision Tree is :\n")pprint(tree)training_data = df_tennis.iloc[1:-4] # all but last thousand instancestest_data = df_tennis.iloc[-4:] # just the last thousandtrain_tree = id3(training_data, 'PlayTennis', attribute_names)print("\n\nThe Resultant Decision train_tree is :\n")pprint(train_tree)test_data['predicted2'] = test_data.apply(classify,axis=1,args=(train_tree,'Yes') )print ('\n\n Training the model for a few samples, and again predicting \'Playtennis\' for remaining attribute')print('The Accuracy for new trained data is : ' + str( sum(test_data['PlayTennis']==test_data['predicted2'] ) / (1.0*len(test_data.index)) ))Output:The Resultant Decision Tree is :{'Outlook': {'Overcast': 'Yes', 'Rainy': {'Windy': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny': {'Humidity': {'High': 'No', 'Normal': 'Yes'}}}}The Resultant Decision train_tree is :{'Outlook': {'Overcast': 'Yes', 'Rainy': {'Windy': {'Strong': 'No', 'Weak': 'Yes'}}, 'Sunny': {'Temperature': {'Cool': 'Yes', 'Hot': 'No', 'Mild': 'No'}}}} Training the model for a few samples, and again predicting 'Playtennis' for remaining attributeThe Accuracy for new trained data is : 0.75Program4: Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the same using appropriate data sets Algorithm:Program:from random import randomdef initialize(n_inputs, n_hidden, n_output): network=list() hidden_layer = [{'weights':[random() for i in range(n_inputs+1)]} for i in range(n_hidden)] network.append(hidden_layer) output_layer = [{'weights':[random() for i in range(n_hidden+1)]} for i in range(n_output)] network.append(output_layer) return networkdef activate(weights,inputs): activation = weights[-1] for x in range(len(weights)-1): activation += weights[x]*inputs[x] return activationfrom math import expdef transfer(activation): return 1.0 / (1.0 + exp(-activation))def forward_prop(network,row): inputs = row for layer in network: new_inputs = [] for neuron in layer: activation = activate(neuron['weights'],inputs) neuron['output']=transfer(activation) new_inputs.append(neuron['output']) inputs = new_inputs return inputsdef transfer_derivative(output): return output * (1.0-output)def backward_prop(network,expected): for i in reversed(range(len(network))): layer=network[i] errors=list() if i != len(network)-1: for j in range(len(layer)): error=0.0 for neuron in network[i+1]: error += (neuron['weights'][j]*neuron['delta']) errors.append(error) else: for j in range(len(layer)): neuron=layer[j] errors.append(expected[j]-neuron['output']) for j in range(len(layer)): neuron=layer[j] neuron['delta']=errors[j]*transfer_derivative(neuron['output'])def update_weights(network, row, l_rate): for i in range(len(network)): inputs = row[:-1] if i != 0: inputs = [neuron['output'] for neuron in network[i - 1]] for neuron in network[i]: for j in range(len(inputs)): neuron['weights'][j] += l_rate * neuron['delta'] * inputs[j] neuron['weights'][-1] += l_rate * neuron['delta']def train_network(network, train, l_rate, n_epoch, n_outputs): for epoch in range(n_epoch): sum_error = 0 for row in train: outputs = forward_prop(network, row) expected = [0 for i in range(n_outputs)] expected[row[-1]] = 1 sum_error += sum([(expected[i]-outputs[i])**2 for i in range(len(expected))]) backward_prop(network, expected) update_weights(network, row, l_rate) print('>epoch=%d, lrate=%.3f, error=%.3f' % (epoch, l_rate, sum_error))from random import seedseed(1)dataset = [[2.7810836,2.550537003,0], [1.465489372,2.362125076,0], [3.396561688,4.400293529,0], [1.38807019,1.850220317,0], [3.06407232,3.005305973,0], [7.627531214,2.759262235,1], [5.332441248,2.088626775,1], [6.922596716,1.77106367,1], [8.675418651,-0.242068655,1], [7.673756466,3.508563011,1]]n_inputs = len(dataset[0]) - 1n_outputs = len(set([row[-1] for row in dataset]))network = initialize(n_inputs, 2, n_outputs)train_network(network, dataset, 0.5, 20, n_outputs)for layer in network: print(layer)Output:>epoch=0, lrate=0.500, error=6.350>epoch=1, lrate=0.500, error=5.531>epoch=2, lrate=0.500, error=5.221>epoch=3, lrate=0.500, error=4.951>epoch=4, lrate=0.500, error=4.519>epoch=5, lrate=0.500, error=4.173>epoch=6, lrate=0.500, error=3.835>epoch=7, lrate=0.500, error=3.506>epoch=8, lrate=0.500, error=3.192>epoch=9, lrate=0.500, error=2.898>epoch=10, lrate=0.500, error=2.626>epoch=11, lrate=0.500, error=2.377>epoch=12, lrate=0.500, error=2.153>epoch=13, lrate=0.500, error=1.953>epoch=14, lrate=0.500, error=1.774>epoch=15, lrate=0.500, error=1.614>epoch=16, lrate=0.500, error=1.472>epoch=17, lrate=0.500, error=1.346>epoch=18, lrate=0.500, error=1.233>epoch=19, lrate=0.500, error=1.132[{'delta': -0.0059546604162323625, 'weights': [-1.4688375095432327, 1.850887325439514, 1.0858178629550297], 'output': 0.029980305604426185}, {'delta': 0.0026279652850863837, 'weights': [0.37711098142462157, -0.0625909894552989, 0.2765123702642716], 'output': 0.9456229000211323}][{'delta': -0.04270059278364587, 'weights': [2.515394649397849, -0.3391927502445985, -0.9671565426390275], 'output': 0.23648794202357587}, {'delta': 0.03803132596437354, 'weights': [-2.5584149848484263, 1.0036422106209202, 0.42383086467582715], 'output': 0.7790535202438367}]Program5: Write a program to implement the na?ve Bayesian classifier for a sample training data set stored as a .CSV file. Compute the accuracy of the classifier, considering few test data sets. import pandas as pddf_data = pd.read_csv('input.csv',header=None).astype('float')size =int(len(df_data)*0.67)train_df = df_data.iloc[:size].copy()test_df = df_data.iloc[size:].copy()import mathdef mean(numbers): return sum(numbers)/float(len(numbers)) def stdev(numbers): avg = mean(numbers) variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1) return math.sqrt(variance) def summarize(data): summaries = [(mean(data[attribute]), stdev(data[attribute])) for attribute in data.columns[0:-1]] return summariesdef summarizeByClass(train_df): summary={} for key,value in train_df.groupby(train_df.columns[-1]): summary[key]=summarize(value) return summarydef calculateProbability(x, mean, stdev): exponent = math.exp(-(math.pow(x-mean,2)/(2*math.pow(stdev,2)))) return (1 / (math.sqrt(2*math.pi) * stdev)) * exponent def calculateClassProbabilities(summaries, inputVector): probabilities = {} for classValue, classSummaries in summaries.items(): probabilities[classValue] = 1 for i in range(len(classSummaries)): mean, stdev = classSummaries[i] x = inputVector[i] probabilities[classValue] *= calculateProbability(x, mean, stdev) return probabilitiesdef predict(summaries,inputVector): probabilities = calculateClassProbabilities(summaries, inputVector) bestLabel, bestProb = None, -1 for classValue, probability in probabilities.items(): if bestLabel is None or probability > bestProb: bestProb = probability bestLabel = classValue return bestLabeldef getPredictions(summaries,testSet): predictions = [] for i in range(len(testSet)): result = predict(summaries,testSet.iloc[i:i+1]) predictions.append(result) return predictionsdef getAccuracy(testSet, predictions): correct = 0 for x in range(len(testSet)): if testSet.iloc[x:x+1][8].values == predictions[x]: correct += 1 return (correct/float(len(testSet))) * 100.0summaries = summarizeByClass(train_df)predictions = getPredictions(summaries,test_df)accuracy = getAccuracy(test_df, predictions)print('Accuracy is '+str(accuracy))Output:Accuracy is 76.77165354330708Program6: Assuming a set of documents that need to be classified, use the na?ve Bayesian Classifier model to perform this task. Built-in Java classes/API can be used to write the program. Calculate the accuracy, precision, and recall for your data set.Program:from sklearn.datasets import fetch_20newsgroupscategories = ['alt.atheism', 'soc.religion.christian','comp.graphics', 'sci.med']news_train = fetch_20newsgroups(subset='train', shuffle=True, categories=categories)news_test = fetch_20newsgroups(subset='test', shuffle=True,categories=categories)from sklearn.feature_extraction.text import TfidfVectorizerfrom sklearn.naive_bayes import MultinomialNBfrom sklearn.pipeline import Pipelinetext_clf = Pipeline([('vect',TfidfVectorizer()), ('clf',MultinomialNB()), ])text_clf.fit(news_train.data,news_train.target)predicted = text_clf.predict(news_test.data)from sklearn import metricsimport numpy as npprint('Accuracy achieved is ' + str(np.mean(predicted == news_test.target)))print(metrics.classification_report(news_test.target, predicted, target_names=news_test.target_names))metrics.confusion_matrix(news_test.target, predicted)Output:Accuracy achieved is 0.8348868175765646 precision recall f1-score support alt.atheism 0.97 0.60 0.74 319 comp.graphics 0.96 0.89 0.92 389 sci.med 0.97 0.81 0.88 396soc.religion.christian 0.65 0.99 0.78 398 micro avg 0.83 0.83 0.83 1502 macro avg 0.89 0.82 0.83 1502 weighted avg 0.88 0.83 0.84 1502array([[192, 2, 6, 119], [ 2, 347, 4, 36], [ 2, 11, 322, 61], [ 2, 2, 1, 393]])Program7 : Write a program to construct a Bayesian network considering medical data. Use this model to demonstrate the diagnosis of heart patients using standard Heart Disease Data Set. You can use Java/Python ML library classes/API. Program:import numpy as npimport pandas as pdfrom pgmpy.models import BayesianModelfrom pgmpy.estimators import MaximumLikelihoodEstimator, BayesianEstimatormydata=pd.read_csv("heart_disease_data.csv")np.set_printoptions(threshold=np.nan)names = ['age', 'sex', 'cp', 'trestbps', 'chol', 'fbs', 'restecg', 'thalach', 'exang', 'oldpeak', 'slope', 'ca', 'thal', 'heartdisease']heartDisease = pd.read_csv( "heart_disease_data.csv")display(heartDisease.head())del heartDisease['ca']del heartDisease['slope']del heartDisease['thal']del heartDisease['oldpeak']heartDisease = heartDisease.replace('?', np.nan)model = BayesianModel([('age', 'trestbps'), ('age', 'fbs'), ('sex', 'trestbps'), ('sex', 'trestbps'),('exang', 'trestbps'),('trestbps','heartdisease'),('fbs','heartdisease'),('heartdisease','restecg'),('heartdisease','thalach'),('heartdisease','chol')])model.fit(heartDisease, estimator=MaximumLikelihoodEstimator)from pgmpy.inference import VariableEliminationHeartDisease_infer = VariableElimination(model)q = HeartDisease_infer.query(variables=['heartdisease'], evidence={'age': 32})print(q['heartdisease'])q = HeartDisease_infer.query(variables=['heartdisease'], evidence={'chol': 100})print(q['heartdisease'])q = HeartDisease_infer.query(variables=['heartdisease'], evidence={'trestbps': 10})print(q['heartdisease'])Output:╒════════════════╤═════════════════════╕│ heartdisease │ phi(heartdisease) │╞════════════════╪═════════════════════╡│ heartdisease_0 │ 0.6000 │├────────────────┼─────────────────────┤│ heartdisease_1 │ 0.4000 │╘════════════════╧═════════════════════╛╒════════════════╤═════════════════════╕│ heartdisease │ phi(heartdisease) │╞════════════════╪═════════════════════╡│ heartdisease_0 │ 0.5510 │├────────────────┼─────────────────────┤│ heartdisease_1 │ 0.4490 │╘════════════════╧═════════════════════╛╒════════════════╤═════════════════════╕│ heartdisease │ phi(heartdisease) │╞════════════════╪═════════════════════╡│ heartdisease_0 │ 0.5000 │├────────────────┼─────────────────────┤│ heartdisease_1 │ 0.5000 │╘════════════════╧═════════════════════╛Program 8 : Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for clustering using k-Means algorithm. Compare the results of these two algorithms and comment on the quality of clustering. You can add Java/Python ML library classes/API in the program.import matplotlib.pyplot as pltfrom sklearn.datasets import load_irisiris = load_iris()import pandas as pdX = pd.DataFrame(iris.data)y = pd.DataFrame(iris.target)import numpy as npcolormap = np.array(['red', 'lime', 'black'])plt.figure(figsize=(14,7))from sklearn.cluster import KMeansmodel = KMeans(n_clusters=3)model.fit(X)plt.subplot(1, 2, 2)plt.scatter(X[2],X[3], c=colormap[model.labels_])plt.title('K Mean Classification')import sklearn.metrics as smprint(sm.accuracy_score(y, model.labels_))from sklearn.mixture import GaussianMixturegmm = GaussianMixture(n_components=3)gmm.fit(X)y_cluster_gmm=gmm.predict(X)plt.subplot(1, 2, 1)plt.scatter(X[2],X[3], c=colormap[y_cluster_gmm])plt.title('GMM Classification')print(sm.accuracy_score(y, y_cluster_gmm))print(sm.confusion_matrix(y, y_cluster_gmm))Output:0.89333333333333330.3333333333333333[[ 0 50 0] [45 0 5] [ 0 0 50]]Program9 : Write a program to implement k-Nearest Neighbour algorithm to classify the iris data set. Print both correct and wrong predictions. Java/Python ML library classes can be used for this problem.Program:from sklearn.datasets import load_iris from sklearn.neighbors import KNeighborsClassifier import numpy as np from sklearn.model_selection import train_test_split iris_dataset=load_iris() X_train, X_test, y_train, y_test = train_test_split(iris_dataset["data"], iris_dataset["target"], random_state=0) kn = KNeighborsClassifier() kn.fit(X_train, y_train) prediction = kn.predict(X_test)print("ACCURACY:"+ str(kn.score(X_test, y_test)))target_names = iris_dataset.target_namesfor pred,actual in zip(prediction,y_test): print("Prediction is "+str(target_names[pred])+", Actual is "+str(target_names[actual]))Output:ACCURACY:0.9736842105263158Prediction is virginica, Actual is virginicaPrediction is versicolor, Actual is versicolorPrediction is setosa, Actual is setosaPrediction is virginica, Actual is virginicaPrediction is setosa, Actual is setosaPrediction is virginica, Actual is virginicaPrediction is setosa, Actual is setosaPrediction is versicolor, Actual is versicolorPrediction is versicolor, Actual is versicolorPrediction is versicolor, Actual is versicolorPrediction is virginica, Actual is virginicaPrediction is versicolor, Actual is versicolorPrediction is versicolor, Actual is versicolorPrediction is versicolor, Actual is versicolorPrediction is versicolor, Actual is versicolorPrediction is setosa, Actual is setosaPrediction is versicolor, Actual is versicolorPrediction is versicolor, Actual is versicolorPrediction is setosa, Actual is setosaPrediction is setosa, Actual is setosaPrediction is virginica, Actual is virginicaPrediction is versicolor, Actual is versicolorPrediction is setosa, Actual is setosaPrediction is setosa, Actual is setosaPrediction is virginica, Actual is virginicaPrediction is setosa, Actual is setosaPrediction is setosa, Actual is setosaPrediction is versicolor, Actual is versicolorPrediction is versicolor, Actual is versicolorPrediction is setosa, Actual is setosaPrediction is virginica, Actual is virginicaPrediction is versicolor, Actual is versicolorPrediction is setosa, Actual is setosaPrediction is virginica, Actual is virginicaPrediction is virginica, Actual is virginicaPrediction is versicolor, Actual is versicolorPrediction is setosa, Actual is setosaPrediction is virginica, Actual is versicolorProgram10 : Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select appropriate data set for your experiment and draw graphs.Program:import numpy as np from bokeh.plotting import figure, show, output_notebook from bokeh.layouts import gridplot from bokeh.io import push_notebook output_notebook() import numpy as npdef local_regression(x0, X, Y, tau): # add bias term x0 = np.r_[1, x0] # Add one to avoid the loss in information X = np.c_[np.ones(len(X)), X] # fit model: normal equations with kernel xw = X.T * radial_kernel(x0, X, tau) # XTranspose * W beta = np.linalg.pinv(xw @ X) @ xw @ Y # @ Matrix Multiplication or Dot Product # predict value return x0 @ beta # @ Matrix Multiplication or Dot Product for prediction def radial_kernel(x0, X, tau): return np.exp(np.sum((X - x0) ** 2, axis=1) / (-2 * tau * tau)) # Weight or Radial Kernal Bias Function n = 1000 # generate dataset X = np.linspace(-3, 3, num=n) print("The Data Set ( 10 Samples) X :\n",X[1:10]) Y = np.log(np.abs(X ** 2 - 1) + .5) print("The Fitting Curve Data Set (10 Samples) Y :\n",Y[1:10]) # jitter X X += np.random.normal(scale=.1, size=n) print("Normalised (10 Samples) X :\n",X[1:10]) domain = np.linspace(-3, 3, num=300) print(" Xo Domain Space(10 Samples) :\n",domain[1:10]) def plot_lwr(tau): # prediction through regression prediction = [local_regression(x0, X, Y, tau) for x0 in domain] plot = figure(plot_width=400, plot_height=400) plot.title.text='tau=%g' % tau plot.scatter(X, Y, alpha=.3) plot.line(domain, prediction, line_width=2, color='red') return plot Xo Domain Space(10 Samples) : [-2.97993311 -2.95986622 -2.93979933 -2.91973244 -2.89966555 -2.87959866 -2.85953177 -2.83946488 -2.81939799] # Plotting the curves with different tau show(gridplot([ [plot_lwr(10.), plot_lwr(1.)], [plot_lwr(0.1), plot_lwr(0.01)] ]))OUTPUT:The Data Set ( 10 Samples) X : [-2.99399399 -2.98798799 -2.98198198 -2.97597598 -2.96996997 -2.96396396 -2.95795796 -2.95195195 -2.94594595] The Fitting Curve Data Set (10 Samples) Y : [ 2.13582188 2.13156806 2.12730467 2.12303166 2.11874898 2.11445659 2.11015444 2.10584249 2.10152068] Normalised (10 Samples) X : [-3.17013248 -2.87908581 -3.37488159 -2.90743352 -2.93640374 -2.97978828 -3.0549104 -3.0735006 -2.88552749] ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download