1 回答

TA貢獻(xiàn)1812條經(jīng)驗 獲得超5個贊
我發(fā)現(xiàn)這是因為 SSA 優(yōu)化。lower在傳遞過程中更明確。此過程將中間表示更改為特定于機(jī)器的裝配。
在writebarrier(前 1 步lower)處,兩種結(jié)構(gòu)尺寸的說明仍然相同。
v22 (7) = Phi <*SomeStruct> v14 v45
v28 (7) = Phi <int> v16 v37
v23 (7) = Phi <mem> v12 v27
v37 (+7) = Add64 <int> v28 v36
v39 (7) = Less64 <bool> v37 v8
v25 (7) = VarDef <mem> {.autotmp_7} v23
v26 (7) = LocalAddr <*SomeStruct> {.autotmp_7} v2 v25
v27 (+7) = Move <mem> {SomeStruct} [72] v26 v22 v25 # <-- copy operation
如您所見, Move在 v27 上有操作。
然而,在lower通過之后,指令就不同了。
有 9 個 int64(72 字節(jié))
v22 (7) = Phi <*SomeStruct> v14 v45
v28 (7) = Phi <int> v16 v37
v23 (7) = Phi <mem> v12 v27
v37 (+7) = ADDQconst <int> [1] v28
v25 (7) = VarDef <mem> {.autotmp_7} v23
v26 (7) = LEAQ <*SomeStruct> {.autotmp_7} v2
v44 (7) = CMPQconst <flags> [1000000] v37
v32 (+7) = LEAQ <*SomeStruct> {.autotmp_7} [8] v2
v31 (+7) = ADDQconst <*SomeStruct> [8] v22
v29 (+7) = MOVQload <uint64> v22 v25
v24 (+7) = LEAQ <*SomeStruct> {.autotmp_7} [40] v2
v15 (+7) = ADDQconst <*SomeStruct> [40] v22
v46 (+7) = LEAQ <*SomeStruct> {.autotmp_7} [56] v2
v35 (+7) = ADDQconst <*SomeStruct> [56] v22
v21 (+7) = LEAQ <*SomeStruct> {.autotmp_7} [24] v2
v17 (+7) = ADDQconst <*SomeStruct> [24] v22
v39 (7) = SETL <bool> v44
v42 (7) = TESTB <flags> v39 v39
v30 (+7) = MOVQstore <mem> {.autotmp_7} v2 v29 v25
v41 (+7) = MOVOload <int128> [8] v22 v30
v20 (+7) = MOVOstore <mem> {.autotmp_7} [8] v2 v41 v30
v34 (+7) = MOVOload <int128> [24] v22 v20
v19 (+7) = MOVOstore <mem> {.autotmp_7} [24] v2 v34 v20
v33 (+7) = MOVOload <int128> [40] v22 v19
v38 (+7) = MOVOstore <mem> {.autotmp_7} [40] v2 v33 v19
v47 (+7) = MOVOload <int128> [56] v22 v38
v27 (+7) = MOVOstore <mem> {.autotmp_7} [56] v2 v47 v38
具有 10 個 int64(80 字節(jié)),它使用 DUFFCOPY 設(shè)備優(yōu)化 MOVE
v22 (7) = Phi <*SomeStruct> v14 v45
v28 (7) = Phi <int> v16 v37
v23 (7) = Phi <mem> v12 v27
v37 (+7) = ADDQconst <int> [1] v28
v25 (7) = VarDef <mem> {.autotmp_7} v23
v26 (7) = LEAQ <*SomeStruct> {.autotmp_7} v2
v44 (7) = CMPQconst <flags> [1000000] v37
v32 (+7) = LEAQ <*SomeStruct> {.autotmp_7} [8] v2
v31 (+7) = ADDQconst <*SomeStruct> [8] v22
v29 (+7) = MOVQload <uint64> v22 v25
v39 (7) = SETL <bool> v44
v42 (7) = TESTB <flags> v39 v39
v30 (+7) = MOVQstore <mem> {.autotmp_7} v2 v29 v25
v27 (+7) = DUFFCOPY <mem> [826] v32 v31 v30 # <---
這種優(yōu)化是由于rewriteAMD64.go 上的這條規(guī)則
match: (Move [s] dst src mem)
cond: s > 64 && s <= 16*64 && s%16 == 0 && !config.noDuffDevice
result: (DUFFCOPY [14*(64-s/16)] dst src mem)
在后期(elim unread autos),SSA 優(yōu)化可以檢測到臨時變量autotmp_7沒有被使用并且可以被移除。使用 DUFFCOPY 的較大結(jié)構(gòu)不是這種情況
我在這里寫得更詳細(xì)一點
- 1 回答
- 0 關(guān)注
- 146 瀏覽
添加回答
舉報