I have some Fortran code out of a benchmark suite that uses OMP 4.0 features. Among them is the new "omp simd" to vectorize parallelized loop nests.
If I omit the "omp simd" the code actually runs faster (around 15% less runtime on a large dataset with ~25minutes runtime where the modified code causes approx. 25-40% of the total runtime) This was tested on an Intel MIC via omp target.
I checked with -vec-report1 comparing the outputs. I get "OpenMP SIMD LOOP WAS VECTORIZED" / "LOOP WAS VECTORIZED" on every one of the loops, so there should not be such a big difference in runtime.
I suppose this is most likely a bug in the compiler. Can you explain the behavior?
Typical usage is like following:
!$omp do DO k=y_min,y_max+1 !$omp simd DO j=x_min-1,x_max+2 someArray(j,k)=someOtherArray(j,k)-foo(j-1,k)+bar(j,k) ENDDO ENDDO !$omp do DO k=y_min,y_max+1 !$omp simd PRIVATE(xFoo) !!!! When removing the simd here, place the private clause in the "omp do" DO j=x_min-1,x_max+1 IF(someCondition)THEN xFoo=1 ELSE xFoo=j ENDIF ! Some more code someArray(j,k)=foo(xFoo,k)*bar(j,k) ENDDO ENDDO
Please note, that I cannot show the real code here in public, but be assured, the code is pretty much doing exactly that. I may send the actual code to Intel though for investigation purposes.