Quantcast
Channel: Intel® Software - Intel® Visual Fortran Compiler for Windows*
Viewing all articles
Browse latest Browse all 5691

performance when running one compiled app vs. multiple ones simultaneously

$
0
0

So, this confused the daylights out of me.

I'm running on an Intel Core i7-3770 @ 3.4 GHz, 4 cores x hyperthreaded = 8 cores.

For case 1, I compiled on ifort 2013 SP1 Update 4, as:

      <Tool Name="VFFortranCompilerTool" SuppressStartupBanner="true" MultiProcessorCompilation="true" GenAlternateCodePaths="codeForAVX" IntegerKIND="integerKIND8" RealKIND="realKIND8" LocalSavedScalarsZero="true" FloatingPointExceptionHandling="fpe0" FloatingPointModel="source" FlushDenormalResultsToZero="true" Traceback="true" />

Case 2 is not important, other than the fact that it ran and consumed CPU cycles.  It was done using 2015 Update 3.

For Case 3, I ran ifort 2015 Update 4, and I added  GenAlternateCodePaths="codeForCommonAVX512"

I ran all 3 cases at the same time on the same machine, and the run times are as follows:

Case 1 239.852 s

Case 2 267.979 s

Case 3 260.182 s

Naturally, I wonder what is wrong with the 2015 Update 4 compiler, the running time went up!  So, my first question is, why did the running time with 2015U4 go up?

Then I tried running Case 3 with nothing else running on my computer, and the running time was 198.092 seconds.  So, there is a huge improvement using this compiler?

The last test I did was to compile the same code, using 2013SP1U4, but I compiled it to have AVX2 instructions (instead of AVX512), and I also added O3.  Then I ran this in standalone, and the running time was 228.417 seconds, or about 5% faster than Case 1.

I would attribute the standalone vs. 3 simultaneous cases running speed improvement of 5% to adding O3 optimization.

However, with 2015U4, if I'm running only 1 case, the running speed is much, much faster, and if I'm running 3 simultaneously (on an 8-core CPU!) the speed is much, much slower.

-I'm seeing the same performance degradation of Case 3 vs Case 1 on Xeon E5-2699v3 and also on E5-1650 v3, but it's almost negligible on Xeon X5690.  Why is there this performance degradation when running more than one case at the same time on what seems like any newer CPUs?

-Why are the 2015U4 results so much slower when the compiled executable isn't the only thing running? 

-Or, am I the only person who has seen this?

I am tempted to use 2015U4 for the running speed I get when I run just one case ... but that means I need to buy a new CPU for each case I want to run.


Viewing all articles
Browse latest Browse all 5691

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>