Assembly High Performance with ARM64 SFO17-314 …
SFO17-314 Optimizing Golang for
High Performance with ARM64
Wei Xiao Staff Software Engineer Wei.Xiao@
Assembly
September 27, 2017 Linaro Connect SFO17
? 2017 Arm Limited
Agenda
? Introduction ? Differences from GNU Assembly ? Integrate assembly into Golang ? Optimize CRC32 for arm64 ? Optimize SHA256 for arm64 ? Optimize IndexByte for arm64 ? Work Summary and Next steps
2 ? 2017 Arm Limited
Introduction
? Assembly optimization benefits
? Take advantages of ARMv8 capabilities ? Hardware specific instructions (such as SVC, AES, SHA and etc.) ? Vector (Single Instruction Multiple Data) Instructions
? Others ? No need for CGo dependency ? Avoid runtime context switching overhead ? Optimized code (vs Go compiler) ? Faster compilation
3 ? 2017 Arm Limited
Assembly Optimization Current Status
? Go Standard packages with assembly optimization
crypto/aes crypto/elliptic crypto/internal/cipherhw crypto/md5
crypto/rc4 crypto/sha1 crypto/sha256 crypto/sha512
hash/crc32 math
math/big
reflect
runtime runtime/cgo runtime/internal/atomicruntime/internal/sys
strings sync/atomic syscall
......
red ? arm64 optimization ongoing black ? no arm64 optimization
4 ? 2017 Arm Limited
Assembly Terminology
? Mnemonic
? CALL, MOVW, MOVD, ...
? Register
? R1, F0, V3, ...
? Immediate
? $1, $0x100, ...
? Memory
? (R1), 8(R3), ...
5 ? 2017 Arm Limited
Registers in AArch64
Instruction Differences from GNU Assembly
? Semi-abstract instruction set (Plan 9 from Bell Labs)
? Architecture independent mnemonics like MOVD ? Some architecture aspects shine through ? Assembler may insert prologues, remove `unreachable'
instructions ? Instructions may be expanded by the assembler
1 // func Add(a, b int) int 2 TEXT ?Add(SB),$0-24 3 MOVD arg1+0(FP), R0 4 MOVD arg2+8(FP), R1 5 ADD R1, R0, R0 6 MOVD R0, ret+16(FP) 7 RET
? Not all instructions available
? BYTE/WORD/LONG directives to lay down opcodes into instruction stream directly
6 ? 2017 Arm Limited
Operand Differences from GNU Assembly
? Data flow from left to right
? ADD R1, R2
R2 += R1
? SUBW R12 ................
................
In order to avoid copyright disputes, this page is only a partial summary.
To fulfill the demand for quickly locating and searching documents.
It is intelligent file search solution for home and business.
Related searches
- high performance 22r crate engine
- high performance vw bug engines
- high performance parts near me
- high performance vw engines
- volkswagen high performance engines
- high performance 5.3 crate engine
- high performance shops near me
- high performance 22re engines complete
- high performance auto shops near me
- aftermarket high performance auto parts
- high performance vw bug motors
- high performance 5 3 crate engine