Birmingham

Machine Learning (Extended)

Resit paper for August 2014.

Answer all questions.

Non-alpha calculator may be used.

Question 1 – Probabilistic generative classifiers

a) What is the Naive Bayes assumption and why is it called Naive? [5%]

b) State an advantage and a disadvantage of using the the Naive Bayes assumption in classification. [5%]

c) How many parameters do you need to specify P(A,B|C) if A, B and C are discrete random variables that can take on a, b and c different values respectively? [5%]

d) Assume we have a data set described the following three variables:

Hair = {B,D}, where B=blonde, D=dark.

Height = {T,S}, where T=tall, S=short.

Country = {G,P}, where G=Greenland, P=Poland.

You are given the following training data set (Hair, Height, Country):

(B,T,G), (D,T,G), (D,T,G), (D,T,G), (B,T,G), (B,S,G), (B,S,G), (D,S,G),

(B,T,G), (D,T,G), (D,T,G), (D,T,G), (B,T,G), (B,S,G), (B,S,G), (D,S,G),

(B,T,P), (B,T,P), (B,T,P), (D,T,P), (D,T,P), (D,S,P), (B,S,P), (D,S,P).

Now, suppose you observe a new individual tall with blond hair, and you want to use these training data to determine the most likely country of origin.

i) Give the maximum a posteriori (MAP) answer to the above question, using the Naïve Bayes assumption. Show all of your working. [5%]

ii) Give the Maximum Likelihood (ML) answer to the above question, using the Naïve Bayes assumption, and explain what is the difference from the method used in a). [5%]

iii) Explain how would you solve i) or ii) if instead of blonde/dark we would be given some continuous valued measurements of the hair colour, and instead of tall/short we would measure the height in centimeters. [5%]

e) Consider a 1-dimensional Gaussian classifier, that is a classifier that models each class by a Gaussian having its own mean and variance. Assume that the class prior probabilities are equal for both classes. Draw an example of 1-dimensional Gaussian classifiers with two classes, and indicate where is the decision boundary on your plot. [5%]

Question 2 – Non-probabilistic classifiers

a) Consider the following training data from 2 categories:

Class1: (1,1)’

Class 2: (-1,-1)’, (1,0)’, (0,1)’

i) Plot these four points, draw the linear separation boundary that SVM would give for these data, and list the support vectors. [5%]

ii) Consider training data of 1-dimensional points from two classes:

Class 1: -5,5

Class 2: -2,1

A) Consider the transformation f: R(R2, f(x)=(x,x2). Transform the data and plot these transformed points. Are these linearly separable? [5%]

B) Draw the optimal separating hyper-plane in the transformed space, and explain in one or two sentences how does this linear boundary help us to separate the original data points. [5%]

b) Consider the following data set with two real-valued inputs x (i.e. the coordinates of the points) and one binary output y (taking values + or -). We want to use k-nearest neighbours (K-NN) with Euclidean distance to predict y from x.

i) Calculate the leave-one-out cross-validation error of 1-NN on this data set. That is, for each point in turn, try to predict its label y using the rest of the points, and count up the number of misclassification errors. [5%]

ii) Calculate the leave-one-out cross-validation error of 3-NN on the same data set. [5%]

iii) Describe how would you choose the number of neighbours K in K-NN in general? [5%]

c) Suppose you have a data set to classify and you have several classification methods that you can try. Explain how do you decide which of these methods to choose [5%]

Question 3 – Learning theory

a) State three questions that are studied by learning theory. [5%]

b) A learning theory framework is the PAC model of learning, where PAC stands for Probably Approximately Correct. Explain in plain English when do we say that a concept class is PAC-learnable. [5%]

Question 4 – Unsupervised learning: Clustering

a) What methods to you know for data clustering? [5%]

b) Describe how you would use a clustering method to do image segmentation. [5%]

c) Describe two limitations of the K-means clustering algorithm. [5%]

d) Suppose you have run K-means clustering on a data set and later you get more data point into the same data set. How would you cluster the new points without re-running the algorithm ? [5%]

Link to learning the outcomes assessed by examination

1. Demonstrate a knowledge and understanding of the main approaches to machine learning.

Q1 a,c,e; Q2 a, b; Q3; Q4 a.

2. Demonstrate the ability to apply the main approaches to unseen examples.

Q1 d; Q2 b; Q3; Q4 d.

3. Demonstrate an understanding of the differences, advantages and problems of the main approaches in machine learning.

Q1 d iii; Q2 c.

4. Demonstrate an understanding of the main limitations of current approaches to machine learning, and be able to discuss possible extensions to overcome these limitations.

Q1 b; Q2 c; Q4 c.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

To fulfill the demand for quickly locating and searching documents.

It is intelligent file search solution for home and business.

Literature Lottery

To fulfill the demand for quickly locating and searching documents.

Related download

Related searches