Assembly High Performance with ARM64 SFO17-314 …

SFO17-314 Optimizing Golang for

High Performance with ARM64

Wei Xiao Staff Software Engineer Wei.Xiao@

Assembly

September 27, 2017 Linaro Connect SFO17

? 2017 Arm Limited

Agenda

? Introduction ? Differences from GNU Assembly ? Integrate assembly into Golang ? Optimize CRC32 for arm64 ? Optimize SHA256 for arm64 ? Optimize IndexByte for arm64 ? Work Summary and Next steps

2 ? 2017 Arm Limited

Introduction

? Assembly optimization benefits

? Take advantages of ARMv8 capabilities ? Hardware specific instructions (such as SVC, AES, SHA and etc.) ? Vector (Single Instruction Multiple Data) Instructions

? Others ? No need for CGo dependency ? Avoid runtime context switching overhead ? Optimized code (vs Go compiler) ? Faster compilation

3 ? 2017 Arm Limited

Assembly Optimization Current Status

? Go Standard packages with assembly optimization

crypto/aes crypto/elliptic crypto/internal/cipherhw crypto/md5

crypto/rc4 crypto/sha1 crypto/sha256 crypto/sha512

hash/crc32 math

math/big

reflect

runtime runtime/cgo runtime/internal/atomicruntime/internal/sys

strings sync/atomic syscall

......

red ? arm64 optimization ongoing black ? no arm64 optimization

4 ? 2017 Arm Limited

Assembly Terminology

? Mnemonic

? CALL, MOVW, MOVD, ...

? Register

? R1, F0, V3, ...

? Immediate

? $1, $0x100, ...

? Memory

? (R1), 8(R3), ...

5 ? 2017 Arm Limited

Registers in AArch64

Instruction Differences from GNU Assembly

? Semi-abstract instruction set (Plan 9 from Bell Labs)

? Architecture independent mnemonics like MOVD ? Some architecture aspects shine through ? Assembler may insert prologues, remove `unreachable'

instructions ? Instructions may be expanded by the assembler

1 // func Add(a, b int) int 2 TEXT ?Add(SB),$0-24 3 MOVD arg1+0(FP), R0 4 MOVD arg2+8(FP), R1 5 ADD R1, R0, R0 6 MOVD R0, ret+16(FP) 7 RET

? Not all instructions available

? BYTE/WORD/LONG directives to lay down opcodes into instruction stream directly

6 ? 2017 Arm Limited

Operand Differences from GNU Assembly

? Data flow from left to right

? ADD R1, R2

R2 += R1

? SUBW R12 ................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download