Assembly High Performance with ARM64 SFO17-314 Optimizing ...

SFO17-314 Optimizing Golang for

High Performance with ARM64

Wei Xiao Staff Software Engineer Wei.Xiao@

Assembly

September 27, 2017 Linaro Connect SFO17

? 2017 Arm Limited

Agenda

? Introduction ? Differences from GNU Assembly ? Integrate assembly into Golang ? Optimize CRC32 for arm64 ? Optimize SHA256 for arm64 ? Optimize IndexByte for arm64 ? Work Summary and Next steps

2 ? 2017 Arm Limited

Introduction

? Assembly optimization benefits

? Take advantages of ARMv8 capabilities ? Hardware specific instructions (such as SVC, AES, SHA and etc.) ? Vector (Single Instruction Multiple Data) Instructions

? Others ? No need for CGo dependency ? Avoid runtime context switching overhead ? Optimized code (vs Go compiler) ? Faster compilation

3 ? 2017 Arm Limited

Assembly Optimization Current Status

? Go Standard packages with assembly optimization

crypto/aes crypto/elliptic crypto/internal/cipherhw crypto/md5

crypto/rc4 crypto/sha1 crypto/sha256 crypto/sha512

hash/crc32 math

math/big

reflect

runtime runtime/cgo runtime/internal/atomicruntime/internal/sys

strings sync/atomic syscall

......

red ? arm64 optimization ongoing black ? no arm64 optimization

4 ? 2017 Arm Limited

Assembly Terminology

? Mnemonic

? CALL, MOVW, MOVD, ...

? Register

? R1, F0, V3, ...

? Immediate

? $1, $0x100, ...

? Memory

? (R1), 8(R3), ...

5 ? 2017 Arm Limited

Registers in AArch64

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download