Smart Dog for Minecraft - Aalborg Universitet

[Pages:109]Smart Dog for Minecraft

Kim Arnold Thomsen Rasmus D. C. Dalhoff-Jensen

June 10, 2014

Title: Smart Dog for Minecraft Subject: Reinforcement Learning Semester: Spring Semester 2014 Project group: sw103f14

Participants:

Department of Computer Science

Aalborg University Selma Lagerl?fs Vej 300 DK-9220 Aalborg ?st Telephone +45 9940 9940 Telefax +45 9940 9798

Rasmus D.C. Dalhoff-Jensen Kim A. Thomsen Supervisor: Manfred Jaeger Number of copies: 4 Number of pages: 109 Number of numerated pages: 101 Number of appendices: 9 Pages Completed: June 10, 2014

Synopsis: This report argues for the benefit of combining tabular reinforcement learning with feature-based reinforcement learning, to make it possible for agents to have specific behaviour in specific situations in an general environment impractical to express without features. The report describes three different approaches to do so, as well as an implementation of such a system in the game Minecraft. The report describes a set of test showing that two of the three approaches shows benefit in such a scenario.

The content of this report is freely accessible. Publication (with source reference) can only happen with the acknowledgement from the authors of this report.

Contents

Chapter 1 Word Definitions and Abbreviations

1

I Introduction

2

Chapter 2 Introduction to Smart Dog

3

Chapter 3 The Minecraft World

4

3.1 The technical aspects . . . . . . . . . . . . . . . . . . . . . . . 4

3.1.1 Minecraft Forge . . . . . . . . . . . . . . . . . . . . . . 5

Chapter 4 Problem Domain

6

4.1 Problem Specification . . . . . . . . . . . . . . . . . . . . . . 8

4.1.1 Develop an Intelligent Agent . . . . . . . . . . . . . . 8

4.1.2 Enable the Agent to Learn an Optimal Behaviour for

a Given Situation . . . . . . . . . . . . . . . . . . . . . 9

4.1.3 Enable Knowledge Transfer between Different Situations 9

Chapter 5 Our Thesis

10

II Theory

11

Chapter 6 Reinforcement Learning

12

6.1 Markov Decision Process . . . . . . . . . . . . . . . . . . . . . 12

6.2 Optimal Policy . . . . . . . . . . . . . . . . . . . . . . . . . . 15

6.3 -greedy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

6.4 Q-Values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

Chapter 7 Tabular Q-Learning

18

7.1 Size Factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Chapter 8 Feature-based Q-Learning

20

8.1 Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

8.2 Q-values . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

III

CONTENTS

8.3 Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 8.3.1 Weight Adjustments . . . . . . . . . . . . . . . . . . . 21 8.3.2 Gradient Descent . . . . . . . . . . . . . . . . . . . . . 22

Chapter 9 The Combined Approach

24

9.1 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

9.2 Decision Making . . . . . . . . . . . . . . . . . . . . . . . . . 25

9.3 Approaches for Q-value Updates . . . . . . . . . . . . . . . . 25

9.3.1 Separate Update . . . . . . . . . . . . . . . . . . . . . 25

9.3.2 Unified-Q Update . . . . . . . . . . . . . . . . . . . . 26

9.3.3 Unified-a Update . . . . . . . . . . . . . . . . . . . . . 26

9.4 Transferring Knowledge . . . . . . . . . . . . . . . . . . . . . 26

III Components

28

Chapter 10 Overview of the system

29

10.1 Why do we use Minecraft? . . . . . . . . . . . . . . . . . . . . 29

10.2 Conceptual System Structure . . . . . . . . . . . . . . . . . . 29

10.3 System Flow Chart . . . . . . . . . . . . . . . . . . . . . . . . 30

10.4 The EntitySmartDog Class . . . . . . . . . . . . . . . . . . . 32

10.5 The Following Chapters . . . . . . . . . . . . . . . . . . . . . 32

10.5.1 Source Code . . . . . . . . . . . . . . . . . . . . . . . 32

Chapter 11 State and State Space

34

11.1 State-Attributes and Features . . . . . . . . . . . . . . . . . . 34

11.2 Possible Actions . . . . . . . . . . . . . . . . . . . . . . . . . 35

11.3 Changes from previous semester . . . . . . . . . . . . . . . . . 36

Chapter 12 Virtual Real-Time Perspective

38

12.1 Using the onLivingUpdate Method . . . . . . . . . . . . . . . 39

12.2 Synchronisation . . . . . . . . . . . . . . . . . . . . . . . . . . 42

12.3 Changes in The Entity Smart Dog Class . . . . . . . . . . . . 42

Chapter 13 Sensory Module

43

13.1 Checking for State-Changes . . . . . . . . . . . . . . . . . . . 43

13.2 Obtaining a state . . . . . . . . . . . . . . . . . . . . . . . . . 44

13.2.1 Health . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

13.3 Collecting Rewards . . . . . . . . . . . . . . . . . . . . . . . . 47

13.4 Changes from Previous Semester . . . . . . . . . . . . . . . . 47

Chapter 14 Decision Making Module

49

14.1 Making Decisions . . . . . . . . . . . . . . . . . . . . . . . . . 51

IV

CONTENTS

14.2 Updating Q-values . . . . . . . . . . . . . . . . . . . . . . . . 52 14.2.1 Seperate Update . . . . . . . . . . . . . . . . . . . . . 52 14.2.2 Unified-Q Update . . . . . . . . . . . . . . . . . . . . 53 14.2.3 Unified-a Update . . . . . . . . . . . . . . . . . . . . . 54

14.3 Changes since Previous Semester . . . . . . . . . . . . . . . . 54

Chapter 15 Actuator Module

55

15.1 Actions and Possible Actions . . . . . . . . . . . . . . . . . . 55

15.1.1 Wait . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

15.1.2 Activate Button . . . . . . . . . . . . . . . . . . . . . 56

15.1.3 Move to block . . . . . . . . . . . . . . . . . . . . . . . 56

15.1.4 Pick Up Edible/Non-Edible Item . . . . . . . . . . . . 56

15.1.5 Drop Item . . . . . . . . . . . . . . . . . . . . . . . . . 57

15.1.6 Eat item . . . . . . . . . . . . . . . . . . . . . . . . . . 57

15.2 Performing an Action . . . . . . . . . . . . . . . . . . . . . . 58

15.3 Changes Since Previous Semester . . . . . . . . . . . . . . . . 61

Chapter 16 Knowledge Base

62

16.1 Q-Learning Documet Handler and Feature Document Handler 62

16.2 Changes Since Previous Semester . . . . . . . . . . . . . . . . 63

Chapter 17 Testing Component

64

17.1 Analysing the Test Results . . . . . . . . . . . . . . . . . . . 65

IV Testing

66

Chapter 18 The First Potion Test

67

18.1 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . 67

18.2 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

18.3 Test Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

18.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

18.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Chapter 19 Second Potion Test

71

19.1 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . 71

19.2 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

19.3 Test Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

19.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

19.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

Chapter 20 Testing with Food

76

20.1 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . 76

V

CONTENTS

20.2 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 20.3 Test Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 20.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 20.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

Chapter 21 Testing Knowledge Transfer

80

21.1 Test Environment . . . . . . . . . . . . . . . . . . . . . . . . . 80

21.2 Expectations . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

21.3 Test Factors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

21.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

21.5 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

Chapter 22 Test Evaluation

84

V Evaluation

86

Chapter 23 Discussion and Conclusion

87

23.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Chapter 24 Reflection

89

Bibliography

91

VI Appendex

92

VI

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download