首頁(yè) 猿問 Titan RTX...

Titan RTX 上雙精度和單精度的矩陣乘法基準(zhǔn)測(cè)試

Python

www說(shuō) 2023-08-15 17:28:09

我試圖了解我們的 GPU 工作站的單精度和雙精度之間的性能差異。我們的工作站配備了兩個(gè) TITAN RTX GPU，但我在單個(gè) Titan RTX 上運(yùn)行基準(zhǔn)測(cè)試。我正在使用 cublas 矩陣乘法測(cè)試性能。我將由隨機(jī)浮點(diǎn)數(shù)或雙精度數(shù)組成的 8192x8192 矩陣相乘。為了確保我這邊沒有錯(cuò)誤，我還在Python中使用cupy庫(kù)重復(fù)了這個(gè)過(guò)程，結(jié)果非常相似。浮點(diǎn)型的測(cè)試結(jié)果約為每 1 次乘法 75 毫秒，雙精度型的測(cè)試結(jié)果約為 2,000 毫秒。如果我有一個(gè)較舊的 GPU，這將很有意義，因?yàn)?75*32 = 2,400~2000，因此我的雙精度性能將比 https://docs.nvidia 表中預(yù)期的差約 32 倍。然而，我的 GPU 的計(jì)算能力為 7.5，因此我預(yù)計(jì)性能只會(huì)翻倍 2 倍。其他信息：Ubuntu 18 LTS、nvcc 10.2、驅(qū)動(dòng)程序 440.82。這是 CUDA 代碼：#include <iostream>#include <chrono>#include <string>#include <cuda_runtime.h>#include "cublas_v2.h"#include <math.h>#include <stdio.h>#include <cuda.h>#include <device_functions.h>#include <sstream>#include <time.h>unsigned long mix(unsigned long a, unsigned long b, unsigned long c){? ? a=a-b;? a=a-c;? a=a^(c >> 13);? ? b=b-c;? b=b-a;? b=b^(a << 8);? ? c=c-a;? c=c-b;? c=c^(b >> 13);? ? a=a-b;? a=a-c;? a=a^(c >> 12);? ? b=b-c;? b=b-a;? b=b^(a << 16);? ? c=c-a;? c=c-b;? c=c^(b >> 5);? ? a=a-b;? a=a-c;? a=a^(c >> 3);? ? b=b-c;? b=b-a;? b=b^(a << 10);? ? c=c-a;? c=c-b;? c=c^(b >> 15);? ? return c;}using namespace std;int main(){? ? ? ? int deviceCount;? ? ? ? cudaGetDeviceCount(&deviceCount);? ? ? ? cudaDeviceProp deviceProp;? ? ? ? cublasStatus_t err;? ? ? ? cudaGetDeviceProperties(&deviceProp, 0);? ? ? ? printf("Detected %d devices \n", deviceCount);? ? ? ? printf("Device %d has compute capability %d.%d:\n\t maxshmem %d. \n\t maxthreads per block %d. \n\t max threads dim %d. %d. %d.\n ", 0,? ? ? ? ? ? ? ? deviceProp.major, deviceProp.minor, deviceProp.sharedMemPerBlock, deviceProp.maxThreadsPerBlock, deviceProp.maxThreadsDim[0],? ? ? ? ? ? ? ? deviceProp.maxThreadsDim[1], deviceProp.maxThreadsDim[2]);? ? ? ? cudaEvent_t start_d, stop_d;? ? ? ? cudaEventCreate(&start_d);? ? ? ? cudaEventCreate(&stop_d);? ? ? ? //RND insicialization? ? ? ? unsigned long seed = mix(clock(), time(NULL), 0);? ? ? ?srand(seed);? ? ? ? int N=8192;? ? ? ? int Nloops=2;? ? ? ? ? ? ? ? }}

查看完整描述