Quantcast
Channel: Intel® Software - Intel® Visual Fortran Compiler for Windows*
Viewing all articles
Browse latest Browse all 5691

Questions about DO CONCURRENT

$
0
0

I am experimenting a bit with the DO CONCURRENT construct to see if it would improve the performance of one of our programs. Currently I am using Intel Fortran 15, so perhaps the observations I have made are no longer true.

Anyway, here is the basic code I use:

program doconcurrent
    implicit none

    integer, parameter  :: sz = 10000000
    real, dimension(sz) :: array
    integer             :: i, j, chunk, ibgn, iend, tstart, tstop

    call system_clock( tstart )
    do j = 1,10000
        do concurrent (i = 1:sz)
            array(i) = 10.0 * i * j
        enddo
    enddo
    call system_clock( tstop )

    write(*,*) array(1)
    write(*,*) tstop - tstart

end program doconcurrent

It does not do anything useful except exercise the DO CONCURRENT construct. But:

- Compiling it with and without -Qparallel gives roughly the same runtime, about 25 seconds. So no improvement whatsoever

- I can see that the program runs in 9 threads if I compile it with -Qparallel and with only one thread if I leave out that flag. Also the -Qpar-report flag indicates the loop is parallellized.

- If I insert a write statement to see if the iterations are run in a non-deterministic way, the loop is no longer parallellized.

- My theory was that the runtime is determined by the storing of the new values of the array and that the threads get in each other's ways. So instead of this one loop, I used an outer loop that split the loop in large chunks, something like:

    do j = 1,10000
        do concurrent (chunk = 1:8)
            ibgn = 1 + (chunk-1) * (sz+7)/8
            iend = min( chunk * (sz+7)/8, sz )
            do concurrent (i = ibgn:iend)
                array(i) = 10.0 * i * j
            enddo
        enddo
    enddo

But then only the inner loop is parallellized - if I use an ordinary do-loop for the inner one, nothing gets parallellized.

Any comments? An alternative - in this case - would be to use OpenMP, but the drawback of that is that I have define the "privateness" and "sharedness" of the variables involved myself ;).

 


Viewing all articles
Browse latest Browse all 5691

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>