Understanding this CUDA kernels launch parameters -
Understanding this CUDA kernels launch parameters -
i trying analyze code have found online , maintain thinking myself corner. looking @ histogram kernel launched next parameters
histogram<<<2500, numbins, numbins * sizeof(unsigned int)>>>(...); i know parameters grid, block, shared memory sizes.
so mean there 2500 blocks of numbins threads each, each block having numbins * sizeof(unsigned int) chunk of shared memory available threads?
also, within kernel there calls __syncthreads(), there 2500 sets of numbins calls __syncthreads() on course of study of kernel call?
so mean there 2500 blocks of numbins threads each, each block having numbins * sizeof(unsigned int) chunk of shared memory available threads?
from cuda toolkit documentation:
the execution configuration (of global function call) specified inserting look of form <<<dg,db,ns,s>>>, where:
so, @fazar pointed out, reply yes. memory allocated per block.
also, within kernel there calls __syncthreads(), there 2500 sets of numbins calls __syncthreads() on course of study of kernel call?
__syncthreads() waits until threads in thread block have reached point. used coordinate communication between threads in same block.
so, there __syncthread() phone call per block.
cuda
Comments
Post a Comment