Exam0



Solution Sketches Midterm2 ExamCOSC 4368 Fundamentals of Artificial IntelligenceApril 10, 2019Your Name:Your student id: Problem 1 --- Constraint Satisfaction Problems [8]Problem 2 --- Supervised Learning in General [8]Problem 3 --- Neural Networks [11]Problem 4 --- Support Vector Machines [12]Problem 5 --- Using First Order Predicate Logic as a Language [8]Problem 6 --- Reinforcement Learning [13]??????:Number Grade:The exam is “open books and notes” but the use of computers is not allowed; you have 75 minutes to complete the exam. The exam will count approx. 15% towards the course grade. 1) Constraint Satisfaction Problems [8]Provide a definition the letter constraint satisfaction problem given below:1. Define the Variables2. Define the set of values each variable can take3. Define all constraints! T W O + T W O ------------------------------- F O U RAssume each letter can take only one digit, and reciprocally each digit can be associated to at most one letter. Variables: T, W, O, F, U, R in O…9 X1, X2, X3 in {0,1}Constraints:DIFF(T, W, O, F, U, R)O+O=R+10*X1X1+W+W=U+10*X2X2+T+T=O+X3X3=F2) Supervised Learning in General [8]a) What is the purpose of using N-fold Cross Validation? Explain in a few sentences how 2-fold cross validation works! [4]To determine the generalization error/training accuracy of the learn model (if they just say just accuracy give them on 0.5 points) [1.5]Correct description of 2 fold cross validation [2.5]b) What is overfitting? [2]The model is too complex [1]; the testing accuracy is not optimal, although the training error is quite low. [1]c) Deep Neural Networks usually employ very complex models; what can be done to alleviate the problem of overfitting when using deep neural networks? [2]Use very large training sets[2]. 3) Neural Networks [11]a) How do neural networks compute the value/activation of a node? [2]By applying the activation function to the weighted sum of the activations of its parent nodes. b) Describe how multi-layer neural networks, consisting of 3+ layers learn a model for a training set! Limit you answer to at most 9 sentences! [7]Neural network learning tries to find weights that minimize the error in the neural network prediction for a training set [1]. Neural networks employ gradient decent hill climbing to find the “best” weights. [1]. In particular, Neural network learning adjust weights example by example [1]; weights are adjusted in the direction of the steepest negative gradient of the error function---that is weights are updated accordingly moving in the direction that reduces the error the most [2]. The step width of the weight update in the direction of the steepest gradient depends on the learning rate and other factors [0.5]. In order to apply this procedure the error for each none-input node has to be known. As intermediate layer nodes is not initially given, is computed using the back-propagation algorithm [2].Other observation might deserve credit. At most 7 points!c) 2-Layer Neural Networks do not use the Backpropagation Algorithm—why is the case? [2]There is no intermediate layer; consequently, all necessary errors are known [2]4) Support Vectors Machines (SVM) [12]a) What is the margin for a SVM hyperplane? Why do SVM models maximize the margin? What are support vectors? [4]Margin means the width of the slab parallel to the hyperplane that has no interior data points [1.5] not mentioning “no interior points” at most 0.5 points…maximize the margin to better handle noise/to become more fault tolerant [1]Support vectors are the data points nearest to the hyperplane, the points of a data set that, if removed, would alter the position of the dividing hyperplane. [1.5]b) There has been a lot work in designing new kernels in machine learning including using kernels in conjunction with support vector machines. What do kernels do? Why do most support vector machine approaches employ non-linear kernels? What do you believe is the reason that support vector machines in conjunction with kernels accomplish quite high accuracies for challenging datasets? [5]Kernel map the dataset into a different, usually higher dimensional space [1]To deal with datasets whose classes are not linearly separable [2]By mapping the data to a higher dimensional datasets there are more ways to separate the examples of the 2 classes, improving the potential to obtain higher accuracies [2]Other answers for the second or third question might deserve partial credit!c) Assume we have a dataset with numerical attributes x, y, and an attribute c where c is a class variable which we assume takes value in {0,1}. Give the equation of a hyperplane that the SVM learning could potentially learn for this dataset! [3]e.g. x???????y??????????no partial credit!5) First Order Predicate Logic as a Language [8]Map the following two sentences into First Order Predicate Logic formulas: a) There are at least two green frogs in room 205 GARb) Every house owner in Texas owns a dog. a) fg (frog(g) frog(f) green(g) green(f) fg in-room(f, 205GAR) in-room(g, 205GAR))b) o ((house-owner(o) lives(o, Texas) (d owns(o,d)))Solutions that specify that the owned house is in Texas, instead the house owner being in Texas also deserve full credit, assuming that they are one error up to 1.5 points (e.g. omitting fg in a.), but no partial credit if 2+errors or if formulas do not make any sense at all 6) Reinforcement Learning [13]a) What are the main differences between supervised learning and reinforcement learning? [5]SL: assumes a static world[0.5], correct answer/action is known and described in training sets from which models are learnt![1.5]RL: can deal with dynamic changing worlds/can adapt [1]; needs to learn from indirect, sometimes delayed feedback/rewards[1]; suitable for exploration of unknown worlds[1]; temporal analysis/worried about the future/interested in an agent’s long term wellbeing[1], needs to carry out actions to find out if they are good—which actions/states are good is (usually) not know in advance1[1]Other answers might deserve credit, might also use answer from the RL-Paper paragraph on that matter (page 239)! At most 5 points!b) Assume the following world ABC is given:Assume that SARSA is used for the ABC World; the initial q-values are all 0, the learning rate is 0.5 and the discount rate is 1. The agent begins in state 2 and applies aaaa (a four times). Additionally, you can assume that according the agent’s policy operator a is always applied in state 3. How does the Q-table look like after aaaa has been applied? [4]Q(a,s) Q(a,s) + α [ R(s) + γ*Q(a’,s’) - Q(a,s) ] Q(a,2)= 0 + 0.5*(2 + 0 – 0)=1Q(a,3)= 0 + 0.5*(-1 + 0 – 0)=?0.5Q(a,1)=0+ 0.5*(0+1-0)=0.5Q(a,2)=1+ 0.5*(2?0.5-1)=1+0.5*0.5=1.25One error at most 2 points; 2 errors at most 0.5 points!ValueSARSA-Updateq(a,1)0q(b,1)0q(a,2)0q(b,2)0q(a,3)0q(b,3)0c) In which cases would you prefer SARSA over Q-Learning? [4] When the employed policy is quite different from a greedy policy [4]; in world that change significantly and need adaptation [2.5], when dealing with unknown worlds that need to be explored.[1.5]At most 4 points ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download