Understanding this CUDA kernels launch parameters -
Understanding this CUDA kernels launch parameters -
i trying analyze code have found online , maintain thinking myself corner. looking @ histogram kernel launched next parameters
histogram<<<2500, numbins, numbins * sizeof(unsigned int)>>>(...);
i know parameters grid, block, shared memory sizes.
so mean there 2500 blocks of numbins
threads each, each block having numbins * sizeof(unsigned int)
chunk of shared memory available threads?
also, within kernel there calls __syncthreads()
, there 2500 sets of numbins
calls __syncthreads()
on course of study of kernel call?
so mean there 2500 blocks of numbins threads each, each block having numbins * sizeof(unsigned int) chunk of shared memory available threads?
from cuda toolkit documentation:
the execution configuration (of global function call) specified inserting look of form <<<dg,db,ns,s>>>
, where:
so, @fazar pointed out, reply yes. memory allocated per block.
also, within kernel there calls __syncthreads(), there 2500 sets of numbins calls __syncthreads() on course of study of kernel call?
__syncthreads()
waits until threads in thread block have reached point. used coordinate communication between threads in same block.
so, there __syncthread()
phone call per block.
cuda
Comments
Post a Comment