1 回答

TA貢獻1853條經(jīng)驗 獲得超18個贊
如果中間適合內(nèi)存,則以下內(nèi)容應該相當有效
import numpy as np
from scipy.signal import fftconvolve,convolve
# example
rng = np.random.default_rng()
A = rng.random((5,6,2,3))
B = rng.random((4,3,3,4))
# custom matmul
Ae,Be = A[...,None],B[:,:,None]
shsh = np.maximum(Ae.shape[2:],Be.shape[2:])
Ae = np.broadcast_to(Ae,(*Ae.shape[:2],*shsh))
Be = np.broadcast_to(Be,(*Be.shape[:2],*shsh))
C = fftconvolve(Ae,Be,axes=(0,1),mode='valid').sum(3)
# original loop for reference
out = np.zeros_like(C)
for row in range(A.shape[2]):
for column in range(B.shape[3]):
for index in range(B.shape[2]): # Could also be "A.shape[3]"
out[:, :, row, column] += convolve(
B[:, :, : , column][:, :, index],
A[:, :, row, : ][:, :, index],
mode='valid'
)
print(np.allclose(C,out))
# True
通過批量進行卷積,我們減少了我們必須做的 fft 的總數(shù)。
如果需要,可以通過使用 對傅里葉空間進行總和縮減,進一步優(yōu)化速度和內(nèi)存einsum。不過,這需要手動進行 fft 卷積。
添加回答
舉報