Google Go! A look behind the scenes

University of Salzburg Department of Computer Science

Google Go!

A look behind the scenes

Seminar for Computer Science Summer 2010

Martin Aigner Alexander Baumgartner

July 15, 2010

Contents

1 Introduction

3

2 Data representation in Go

5

2.1 Basic types and arrays . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.2 Structs and pointers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.3 Strings and slices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.4 Dynamic allocation with "new" and "make" . . . . . . . . . . . . . . . . 9

2.5 Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2.6 Implementation of interface values . . . . . . . . . . . . . . . . . . . . . . 11

3 The Go Runtime System

14

3.1 Library dependencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.2 Memory safety by design . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.3 Limitations of multi-threading . . . . . . . . . . . . . . . . . . . . . . . . 15

3.4 Segmented stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Concurrency

17

4.1 Share by communicating . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2 Goroutines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2.1 Once . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.3 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4.3.1 Channels of channels . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.4 Parallelization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.4.1 Futures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.4.2 Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.4.3 Parallel For-Loop . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.4.4 Semaphores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

4.4.5 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

1 Introduction

Go is a programming language with a focus on systems programming, i.e. writing code for servers, databases, system libraries, browsers and so on. It was created because the authors of Go were frustrated with the existing languages for systems programming and the tradeoff one had to make between execution time, compilation time and the time for development. Dynamic languages like Python allow fast development but suffer from runtime overhead. Static languages like C allow fast execution but suffer from flexibility when writing large scale programs. C++ was designed to add some of this missing features to C but this leads to complex compilers, long compilation time and huge object files with a lot of duplicated code which requires smart linkers to keep the code footprint small.

Go is designed to meet the needs of system programmers combining the useful concepts of different worlds. It is a compiled language with a static type system i.e. strong type checking is performed at compile time which avoids a lot of errors that programmers tend to make in dynamic typed languages. Dynamic memory allocation is done with garbage collection. Thereby the complexity of correct memory deallocation drops. It gets hard to write unsafe code in Go!

Go is object-oriented but unlike other object-oriented languages like Java or C++ Go is not type-oriented. There is no subclassing because there are no classes at all. Instead methods can be defined on any type such as basic types like int, float or bool but also on composite types like maps or structs. Hence, objects in Go are simply types with methods defined on them.

The behavior or abilities of objects in Go can be specified using interfaces. Interfaces are a set of method declarations and if this set is contained in the set of methods on a given type then this type satisfies implicitly the interface. Note that an object may satisfy multiple interfaces. In Go interfaces represent abstraction and structs (or even simpler types) represent data. Due to this orthogonality it becomes very easy to change Go programs to new requirements.

Concurrency is important in systems programming especially for reactive systems and parallel computation. The main challenge in concurrent programming is to get the synchronization on shared resources right. Go is intended to aid the programmer with concepts and idioms that make concurrent programming easier. Parallelism is not the primary goal behind the concurrent concept. Nevertheless if a program is structured

1 Introduction

in a way to perform multiple tasks at the same time then it can be run in parallel thus achieving a significant speedup. Still the focus is on writing code that is easy to understand and easy to change without breaking the whole concurrency concept of a program.

In this work we focus on the representation of data of the Go compiler and the Go runtime system so one can get an intuition about which operations are expensive and which are not. We will also discuss the concepts that allow elegant concurrent programming and the runtime support that enables thousands of concurrent routines to be executed efficiently.

Google Go! | A look behind the scenes

4

2 Data representation in Go

Good systems programmers know what the compiler or interpreter does with the code they write and how it affects the execution of the program. A good starting point to understand how a Go program is executed is to understand the data structures Go uses to represent data.[1]

2.1 Basic types and arrays

The basic types in Go are similar to other languages and most programmers should be familiar with them. There are types like int or float which represent values of a size that might depend on the target machine (e.g. 32 or 64-bit). But there are also types with explicit size like int8 or float64. Note that on a 32-bit machine int and int32 may have the same size but nevertheless are distinct types and assigning a value of type int32 to a variable of type int will cause a compile time error.

Figure 2.1: Memory Layout of basic types The examples used here show a 32-bit memory layout. Note that in the current imple-

2.2 Structs and pointers

mentation of Go on a 64-bit machine only pointers are 64 bits long while int still uses 32 bits.

Figure 2.1 shows the memory layout of some basic types. The variable i has the implicit 32-bit type int while the variable j has the explicit 32-bit type. As mentioned above an assignment i = j would fail and must be written with an explicit conversion i = int(j).

The variable f has the same memory footprint as i or j but a different interpretation.

Arrays in Go are similar to arrays in C. The array b consists of 3 bytes of contiguous memory where type byte is a synonym for uint8 and is used as the element type of strings. Similarly primes is an array of int with length 3 which consumes 3 * 4 = 12 bytes of memory.

2.2 Structs and pointers

Go lets the programmer decide what is a pointer and what is not. In Figure 2.2, a struct named Point is defined which is represented as two adjacent words (of type int) in memory.

Figure 2.2: Memory Layout of structs

The composite literal syntax Point{10, 20} creates the initialized Point p. Like in C, the address operator & takes the address of a newly created and initialized Point in memory where pp points to. Composite literals are often used in Go to construct values for structs, arrays, slices, and maps. Given the composite literal, the compiler automatically determines the types of p and pp. The type of p is Point whereas the type of pp is *Point.

As we saw, the fields in a struct are laid out contiguously in memory. This also applies for composite types e.g. structs of structs. Figure 2.3 shows the memory layout of

Google Go! | A look behind the scenes

6

2.3 Strings and slices

variables of the type Rect1 and Rect2. Rect1 is a structure that contains two fields of type Point whereas Rect2 consists of two pointers to values of type Point.

Figure 2.3: Memory Layout of composite structs

Programmers who are familiar with C will not be surprised about the distinction between values and pointers. It gives the programmer the freedom do control the memory layout of the data structures she builds. Often this is important for building well performing systems. Consider data structures that have to fit in one cache line for example.

2.3 Strings and slices

Go's strings are implemented in a different way than in C. In C, a string is the same as an array of char which is terminated with '\0'. In the following examples the gray arrows denote pointers that exist in the implementation of Go but are not visible to the program. Neither can they be read nor altered.

Figure 2.4: Memory Layout of a string

Google Go! | A look behind the scenes

7

2.3 Strings and slices

In Go, strings are represented as a two-word structure in memory. The first word is a pointer to the string data and the second word contains the length. Figure 2.4 depicts the declaration of a variable s. The Go compiler allocates the 5 bytes for the data and creates the two-word structure representing s.

Go defines strings to be immutable. So it is save for multiple strings to share the same underlying data. The variable t is of type string and is created using the slice operation on s. The slice operation is a primary expression on a string, array or slice which creates a substring or slice on the underlying data. If the sliced operand is a string or slice, the result of the slice operation is a string or slice of the same type. If the sliced operand is an array, the result of the slice operation is a slice with the same element type as the array [2]. The syntax of the slice operation is a[lo:hi] where the index expressions lo and hi select which elements appear in the result. The result has indexes starting at 0 and length equal to hi-lo. The example in Figure 2.4 leads to a string t that contains the single character "l".

Figure 2.5: Slicing an array of integers

Now that we discussed the slice operation on strings we will take a look at the slice type. As shown in Figure 2.5, a slice consists of a tree-word data structure containing the pointer to the underlying data, the length and - in contrast to strings - the capacity of the slice. The length is the upper bound for indexing operations like x[i] while the capacity is the upper bound for slice operations like x[i:j]. Note the difference of the composite literal creating x and the one creating primes in Figure 2.1. The only difference is in omitting the size of the array which is determined automatically.

The advantage of slicing is that creating a new slice of a string or an array does not create a copy of the underlying data. It does not even invoke the allocator because most of the time the three-word slice will reside on the stack. This makes passing a slice as a function parameter as efficient as passing a pointer and length pair in C. However, like in Java there is a well-known drawback when keeping a small slice of a huge array. The existence of the slice keeps the entire data in live memory.

Google Go! | A look behind the scenes

8

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download