Problem 1 – First-Order Predicate Calculus (15 points)
University of Wisconsin – Madison
Computer Sciences Department
CS 760 - Machine Learning
Fall 2004
Exam
11am-12:45pm, December 10, 2004
Room 107 Psychology
CLOSED BOOK
(one sheet of notes and a calculator allowed)
Write your answers on these pages and show your work. If you feel that a question is not fully specified, state any assumptions you need to make in order to solve the problem. You may use the backs of these sheets for scratch work.
Neatly write your name on this and all other pages of this exam.
Name ________________________________________________________________
Problem Score Max Score
1 ______ 45
2 ______ 13
3 ______ 15
4 ______ 12
5 ______ 15
TOTAL ______ 100
Problem 1 – Learning from Labeled Examples (45 points)
Imagine you want to learn to predict which flu patients will respond well to the new drug Pacix, based on three measurements you take. The table below contains the results on a training set of volunteers.
Temp BP Age Result
Ex1 100.0 ave 20 helps
Ex2 98.5 low 41 helps
Ex3 98.6 ave 29 no-effect
Ex4 103.0 high 72 no-effect
Ex5 99.9 low 87 no-effect
a) Assuming we are considering using feature BP (blood pressure) for the root of a decision tree. After splitting on this feature, on average how many bits of information would still be needed in order to classify a future example?
b) Now consider using Age as the root of a decision tree. Which “decisions” (i.e., possible tests at an interior node) do we need to score? Simply list them and briefly explain why these are the candidates we need to consider; you do not need to calculate each’s information gain.
c) Imagine a medical expert suggests that the drug Pacix will help patients whose Age is less than half their Temp. Assume we want to create a decision node that tests this hypothesis. What would be this test be and how much information does it gain (when considered as the root)?
d) You are boosting “decision stumps” and in the first round choose the decision
If age < 50 then helps else no-effect
Show the weights on the five training examples after this first round.
Show your work here.
Weight(Ex1) = __________
Weight(Ex2) = __________
Weight(Ex3) = __________
Weight(Ex4) = __________
Weight(Ex5) = __________
On the second round of boosting, what would be the information gain of the feature BP?
Show your work.
e) Show how a one-nearest neighbor algorithm would classify the following new example:
Temp = 100 BP = high Age = 50
The training examples repeated for your convenience:
Temp BP Age Result
Ex1 100.0 ave 20 helps
Ex2 98.5 low 41 helps
Ex3 98.6 ave 29 no-effect
Ex4 103.0 high 72 no-effect
Ex5 99.9 low 87 no-effect
f) Show how a perceptron could be applied to this data set. Draw the initial network, initializing all free parameters to 0. Then calculate, showing your work, how the delta rule would change the network upon processing the first training example above (let η=0.1). Assume the two real-valued features are normalized by dividing them by 100. Represent helps by 1 and no-effect by 0. Also assume that if the weighted sum equals the threshold, the perceptron outputs 0.
g) Describe how a linear support-vector machine (SVM) could be applied to this problem. Show the expression that would be minimized and the constraints that would need to be satisfied. (Do not assume the data is linearly separable.) Be sure to explain your answer. You do not need to express your solution in terms of the detailed Ax ≥ b matrices and vectors of linear programming, nor do you need to calculate the optimal parameter values that the SVM would find.
minimize explanation
subject to explanation
h) Verbally relate your solution to Part g to:
i. minimal description length
ii. weight decay
Problem 2 – Reinforcement Learning (13 points)
Consider the deterministic reinforcement environment drawn below (let γ=0.5). The numbers on the arcs indicate the immediate rewards.
a) Assume we use a Q table for this task and initialize all of its entries to the immediate rewards obtained from each state-action pair.
i. A learner follows the path start ( a ( b ( end. Using one-step, regular Q learning, which entries in the Q table change? Show the calculations that produce the new Q table entries.
ii. Which of the learner’s steps in Part i are exploration steps? Explain.
iii. Repeat Part i, but this time use one-step SARSA (start with the original Q table, not the one that results from the learning in Part i). For those Q updates that are the same as in Part i, simply say so and do not repeat the calculations.
b) Why does the converge proof of Q learning done in class and in Mitchell, not apply to the case of using a function approximator to represent Q functions?
Problem 3 – Experimental Methodology (15 points)
a) You run your favorite algorithm on a training set of 900 examples, and then find it properly classifies 80 out of 100 test-set examples. In what interval are you 95% confident that the true accuracy on future examples lies?
b) Assume two algorithms get the following test-set accuracies in a four-fold cross validation :
Algo A: 90 92 87 93
Algo B: 91 95 84 97
Is the difference between A and B statistically significant at the 95% confidence level?
(t0.95,3 = 3.18 t0.95,4 = 2.78 t0.95,5 = 2.57 for a two-sided test and
t0.95,3 = 2.35 t0.95,4 = 2.13 t0.95,5 = 2.02 for a one-sided test)
c) Assume your probabilistic learner produces the following test-set results:
Example Probability Correct Category
1 0.99 negative
2 0.90 positive
3 0.75 positive
4 0.25 negative
5 0.05 positive
Draw to the right of this table the ROC curve
for this learner. Be sure to label your axes.
Problem 4–Miscellaneous Questions (12 points)
a) Assume that you are considering a noise-free data set that contains N Boolean-valued features and are using the hypothesis space:
All possible Boolean functions (there are 2 2 possibilities)
According to the PAC model, how many examples do you need to collect in order to be 99%
confident that you will learn the underlying Boolean concept to 90% accuracy when N = 10?
b) “Hardwire” a neural network that memorizes the following two positive examples (and explain your answer):
f1 = true f2 = false f3 = true f4 = false
f1 = true f2 = true f3 = false f4 = false
c) Show how the KBANN algorithm would initialize a neural network given the following
domain theory (assume all examples are represented by four Boolean-valued features, f1-f4). Be sure to indicate the initial weights and node thresholds.
If (f2 and f3) then category = positive
If (f1 and NOT f3) then category = positive
If (f1) then category = positive
Problem 5– Miscellaneous Short Answers (15 points)
Briefly discuss the importance in machine learning of each of the following:
tile coding
random forests
recall-precision curves
options
feature-independence assumption
-----------------------
-2
1
start
c
a
-8
8
-4
-4
2
b
end
0
N
................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- first order predicate logic
- 1 waitfor delay 0 0 15 2021 tax document
- 1 waitfor delay 0 0 15 7 dos complementos
- allstate 1 waitfor delay 0 0 15 tax document
- capitulo 1 waitfor delay 0 0 15 dos complementos
- windows 1 waitfor delay 0 0 15 run as administrator command prompt
- 1 waitfor delay 0 0 15 2015 used
- toyota highlander 1 waitfor delay 0 0 15 used
- 1 waitfor delay 0 0 15 grade reading passages pdf
- 1 waitfor delay 0 0 15 example of present tense
- 1 waitfor delay 0 0 15 example of epic story
- 1 waitfor delay 0 0 15 168 1 1 default username and password