https://wu-kan.cn/2019/12/13/CUDA%E7%9F%A9%E9%98%B5%E4%B9%98%E6%B3%95%E7%9A%84%E4%BC%98%E5%8C%96/ #42
utterances-bot
announced in
Announcements
Replies: 1 comment 1 reply
-
const cublasOperation_t opA = CUBLAS_OP_N, opB = CUBLAS_OP_N, opC = CUBLAS_OP_N;
const int m = 1 << 13, n = 1 << 13, k = 1 << 13,
lda = opA == CUBLAS_OP_N ? k : m, ldb = opB == CUBLAS_OP_N ? n : k,
ldc = opC == CUBLAS_OP_N ? n : m; 感觉这里是有点问题的,A、B、C都是列主序的(CUBLAX_OP_N,N是normal的意思),此时lda应该算等于m的。ldb、ldc同理。而这里没出现问题是因为m,n,k这三者相等,lda无论在哪种情况下等于的值相等,所以不会出现问题。 修改成下面这样应该就没问题了: const cublasOperation_t opA = CUBLAS_OP_N, opB = CUBLAS_OP_N, opC = CUBLAS_OP_N;
const int m = 1 << 13, n = 1 << 13, k = 1 << 13,
lda = opA == CUBLAS_OP_N ? m : k, ldb = opB == CUBLAS_OP_N ? k : n,
ldc = opC == CUBLAS_OP_N ? m : n; |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
CUDA矩阵乘法的优化 · wu-kan
2021-10-08:在[email protected]、[email protected]下重新做了这个实验。 因为超算队的新大三队员正在做 CUDA 矩阵乘法的作业,因此给他们安排了一期关于 CUDA 的内培。重看了一年前我在刚学 CUDA 时候做的实验,有些辣眼睛,那重新做一遍这个实验叭
https://wu-kan.cn/2019/12/13/CUDA%E7%9F%A9%E9%98%B5%E4%B9%98%E6%B3%95%E7%9A%84%E4%BC%98%E5%8C%96/
Beta Was this translation helpful? Give feedback.
All reactions