Problem distributing data across ccNUMA nodes

I have previously written ccNUMA aware code in Fortran by initializing my arrays in parallel using the "first touch" principle , but it appears something has happened lately so this no longer works. For memory bandwidth sensitive code I used to see performance scale linearly with the number of NUMA nodes in the system, but running the code below I now obtain virtually identical results for both the NUMA and non-NUMA aware code ...

Any suggestions as to what is causing this? I have tested the code on both Intel 2 socket systems and AMD 4 socket systems with the same result ...

Best regards,

    program Console6

    use ifport

    use omp_lib

    implicit none

    integer*8          :: I,J,N

    integer            :: Repetitions

    real*8,allocatable :: iVector(:),oVector(:)

    real*8             :: Runtimebegin,RuntimeEnd,FLops

    logical            :: Success

    N=2e8

    allocate(iVector(N))

    allocate(oVector(N))

    success = SETENVQQ("KMP_AFFINITY=verbose,scatter")

!$OMP PARALLEL

!Do nothing except for initializing the OMP threads ...

!$OMP END PARALLEL

   call omp_set_num_Threads(8)

   Repetitions=50

   !initialize the data structure using first touch - everything will reside on the NUMA node of the master thread

   do i=1,N

     iVector(i)=1d0

     oVector(i)=0d0

   end do

   !Perform calculation

   RuntimeBegin=omp_get_wtime()

!$OMP PARALLEL private(i) shared(iVector,oVector,N)

!$OMP DO SCHEDULE(STATIC)

   do j=1,Repetitions

     do i=1,N

      oVector(i)=oVector(i)+iVector(i)*0.01

     end do

   end do

!$OMP END DO

!$OMP END PARALLEL

    print *,(oVector(1))

    RuntimeEnd=omp_get_wtime()

    Flops=2.0*N*Repetitions/((RunTimeEnd-RunTimeBegin)*1024**3)

    print *,'NO DISTRIBUTION ACROSS NUMA NODES ...'

    print *,'Time=',RunTimeEnd-RuntimeBegin,'GFlops=',Flops

   !Deallocate the data and repeat the calculation with the data distributed across the NUMA nodes of the system

   deallocate(iVector)

   deallocate(oVector)

   allocate(iVector(N))

   allocate(oVector(N))

   !Distribute the data across NUMA nodes using the first tough principle ...

!$OMP PARALLEL private(i) shared(iVector,oVector,N)

!$OMP DO SCHEDULE(STATIC)

     do i=1,N

       iVector(i)=1d0

       oVector(i)=0d0

     end do

!$OMP END DO

!$OMP END PARALLEL



    RuntimeBegin=omp_get_wtime()

!$OMP PARALLEL private(i) shared(iVector,oVector,N)

!$OMP DO SCHEDULE(STATIC)

   do j=1,Repetitions

     do i=1,N

      oVector(i)=oVector(i)+iVector(i)*0.01

     end do

   end do

!$OMP END DO

!$OMP END PARALLEL

    print *,(oVector(1))

    RuntimeEnd=omp_get_wtime()

    Flops=2.0*N*Repetitions/((RunTimeEnd-RunTimeBegin)*1024**3)

    print *,'DATA DISTRIBUTED ACROSS NUMA NODES ...'

    print *,'Time=',RunTimeEnd-RuntimeBegin,'GFlops=',Flops



    end program Console6

Problem distributing data across ccNUMA nodes

Trending Articles

Korean Sex Porn Videos: XXX Videos & Free Porn Movies

BRABU Muzaffarpur TDC Part 3 Result 2017 Declared Check Here

VSS wait timeout

Bureau of Internal Revenue: Regional Offices (Directory)

Throw Back: Sony Achiba — Nipa Boniayefour

Thalia – El Sexto Sentido (20 Aniversario – Remastered 2025) [iTunes Plus M4A]

Download: 408 Empire ft Mr Turner – Tebanobe

Practice Sheet of Right form of verbs for HSC Students

Life sentence upheld in 1980 triple murder

Windows Update / Microsoft Update の接続先 URL について

Control indicators for controlling area XXXX do not exist

South West Road Runner Ellie Sutcliffe is third female home in First Chance 10k

Judith Ayu by William Chandra

Neem Baba Extra Questions Answer Class 6 English Poorvi

मतलबी दोस्त स्टेट्स | Matlabi Dost Status in Hindi – Selfish Friends Status

The criminals sentenced this week at Cornwall's magistrates' courts

Muloraki Au

Aoi Teshima – Mori no Chiisana Restaurant – Single [iTunes Plus M4A]

Young Irish army recruit tells of rape threat from soldiers

Mp3 Download: Mr Raw - Hallelujah Ft. J Martins