Understanding this CUDA kernels launch parameters -



Understanding this CUDA kernels launch parameters -

i trying analyze code have found online , maintain thinking myself corner. looking @ histogram kernel launched next parameters

histogram<<<2500, numbins, numbins * sizeof(unsigned int)>>>(...);

i know parameters grid, block, shared memory sizes.

so mean there 2500 blocks of numbins threads each, each block having numbins * sizeof(unsigned int) chunk of shared memory available threads?

also, within kernel there calls __syncthreads(), there 2500 sets of numbins calls __syncthreads() on course of study of kernel call?

so mean there 2500 blocks of numbins threads each, each block having numbins * sizeof(unsigned int) chunk of shared memory available threads?

from cuda toolkit documentation:

the execution configuration (of global function call) specified inserting look of form <<<dg,db,ns,s>>>, where:

dg (dim3) specifies dimension , size of grid. db (dim3) specifies dimension , size of each block ns (size_t) specifies number of bytes in shared memory dynamically allocated per block phone call in add-on statically allocated memory. s (cudastream_t) specifies associated stream, optional parameter defaults 0.

so, @fazar pointed out, reply yes. memory allocated per block.

also, within kernel there calls __syncthreads(), there 2500 sets of numbins calls __syncthreads() on course of study of kernel call?

__syncthreads() waits until threads in thread block have reached point. used coordinate communication between threads in same block.

so, there __syncthread() phone call per block.

cuda

Comments

Popular posts from this blog

Delphi change the assembly code of a running process -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -

C++ 11 "class" keyword -