loops - Parallel for inside a while using OpenMP on C -
loops - Parallel for inside a while using OpenMP on C -
i'm trying parallel within while, somothing this:
while(!end){ for(...;...;...) // parallel ... // serial code }
the loop parallel section of while loop. if this, have lot of overhead:
cycles = 0; while(!end){ // 1k 1000000 iterations aprox #pragma omp parallel for(i=0;i<n;i++) // parallel 256 iteration aprox if(time[i] == cycles){ if (wbusy[i]){ wbusy[i] = 0; wfinished[i] = 1; } } // serial code ++cycles; }
each iteration of loop indepent each other.
there dependencies between serial code , parallel code.
so 1 doesn't have worry much putting parallel regions loops, modern openmp implementations pretty efficient using things thread teams , long there's lots of work in loop you're fine. here, outer loop count of ~1e9 , inner loop count of ~256 - , little work beingness done per iteration - overhead comparable or worse amount of work beingness done , performance suffer.
so there noticeable difference between this:
cycles = 0; while(!end){ // 1k 1000000 iterations aprox #pragma omp parallel for(i=0;i<n;i++) // parallel 256 iteration aprox if(time[i] == cycles){ if (wbusy[i]){ wbusy[i] = 0; wfinished[i] = 1; } } // serial code ++cycles; }
and this:
cycles = 0; #pragma omp parallel while(!end){ // 1k 1000000 iterations aprox #pragma omp for(i=0;i<n;i++) // parallel 256 iteration aprox if(time[i] == cycles){ if (wbusy[i]){ wbusy[i] = 0; wfinished[i] = 1; } } // serial code #pragma omp single { ++cycles; } }
but really, scan across time array every iteration unfortunately both (a) slow , (b) not plenty work maintain multiple cores busy - it's memory intensive. more couple of threads have worse performance serial, without overheads, because of memory contention. admittedly have posted here example, not real code, why don't preprocess time array can check see when next task ready update:
#include <stdio.h> #include <stdlib.h> struct tasktime_t { long int time; int task; }; int stime_compare(const void *a, const void *b) { homecoming ((struct tasktime_t *)a)->time - ((struct tasktime_t *)b)->time; } int main(int argc, char **argv) { const int n=256; const long int niters = 100000000l; long int time[n]; int wbusy[n]; int wfinished[n]; (int i=0; i<n; i++) { time[i] = rand() % niters; wbusy[i] = 1; wfinished[i] = 0; } struct tasktime_t stimes[n]; (int i=0; i<n; i++) { stimes[i].time = time[i]; stimes[i].task = i; } qsort(stimes, n, sizeof(struct tasktime_t), stime_compare); long int cycles = 0; int next = 0; while(cycles < niters){ // 1k 1000000 iterations aprox while ( (next < n) && (stimes[next].time == cycles) ) { int = stimes[next].task; if (wbusy[i]){ wbusy[i] = 0; wfinished[i] = 1; } next++; } ++cycles; } homecoming 0; }
this ~5 times faster serial version of scanning approach (and much faster openmp versions). if updating time/wbusy/wfinished arrays in serial code, can maintain track of completion times using priority queue each update taking o(ln(n)) time instead of scanning every iteration taking o(n) time.
c loops parallel-processing openmp
Comments
Post a Comment