Quantcast
Channel: Intel® Software - Intel® Visual Fortran Compiler for Windows*
Viewing all 5691 articles
Browse latest View live

Volatile variable in a DLL - why?

$
0
0

Hi All,

This problem is solved, but I would greatly value any insights or explanations any members of this group could provide.

A user discovered that an old, reliable, DLL failed with a message to provide a system DLL I had not noticed before: svml_dispmd.dll. I checked and found the same problem on 3 machines, but there was no problem on my development machine. So I found the missing dll and loaded it on a test machine, which got rid of the first problem, but this gave rise to an error 193 (Invalid Win 32 application) during the attempt to run LoadLibrary with the original DLL.

After much effort, I pinned the problem down to an integer loop counter variable in a minor subroutine - when I declared it as VOLATILE, the problem was solved.

As far as I know, the only thing different about this DLL in recent years is the fact that it is now built using IVF Composer XE 2013 under Windows 7, whereas  it was formally built using IVF 10 under Windows XP. All versions have been Win 32. Why did it fail on other machines but not on my developer's machine?

Many thanks in advance,

Mike

 


Coarrays

$
0
0

I am wondering if coarrays could be used for the following scenario:


I have a subroutine that calculates Y=F(X), X and Y being vectors of dimension, say, 10000. F is being called hundreds or thousands of time in an iterative process.


F has properties such that the calculation of Y(i) is completely independent from other calculations; in other words we can write Y(i) = Fi(X(i)), where Fi is a specific function (distinct subroutine) which may not be related at all to Fj, i/=j . Each individual calculation Y(i) = Fi(X(i)) is already fairly substantial and we can for now neglect concerns due to communication overheads between images. In practice, there may only be dozens of such individual "basic" functions Fi, that would be applied to all terms of X.


The basic idea is the following:


1. The master thread provides X to F.
2. A bunch of slave images are assigned the calculation of the Y(i); a task stack is used so that the images that complete faster than the others can grab another calculation for another element of X.
3. The master thread collects the components of Y as they are calculated.


It's nice on paper, but the synchronization mechanisms required seem a bit tedious to implement. I am wondering if anybody started looking into similar applications of coarrays. Ideally, this would be used on a cluster with a large number of nodes.


Thanks!

Coarrays - "Attempting to use an MPI routine before initializing MPI"

$
0
0

I just tried to run a very simple code (see below) and got this error message "Attempting to use an MPI routine before initializing MPI".


The problem occurs with IVF 15 Beta, only with a Release x64 configuration.


Do I get a gold star for the smallest bug reproduced ever? :p


PROGRAM A
SYNC ALL
END

C Null char in format statment

$
0
0

Hopefully an easy question, consider the code snip:

        character(20) :: gbuf
        write(gbuf,1) flt,char(0)
        1 format(f10.3,A)

Given format 1 is used many times it would be neater if the null character termination was in the format rather than the write. I just spend a few minutes scratching my head and googleing.... Any ideas?

ifort 15.0 beta update

$
0
0

Is there going to be another (a second?) update?  My premier support issue 6000054145 (to do with move_alloc and argument intent checking - I've got that right haven't I????) means that the latest update (15.0.0.054) won't successfully compile any of my real stuff, which makes it a bit hard to test.

Quickwin appears incompatible with Win 8.1 re screen resolution

$
0
0

My Quickwin programs feature project frame and child windows that have been carefully sized and positioned on the screen, with automatic detection for maximum screen size and appropriate adjustment when the exe's are run on various hardware setups. The programs were developed on XP and Win7.

I now have a Dell Venue 11 Pro running Win8.1. The window sizing and placing fail on this computer. They are generally too big; e.g. a project frame that is supposed to occupy say 0.9 of the screen width and height appears too big to even fit on the screen. This happens even when I build the program on the Dell.

I have traced the issue down to this. The windows screen utility claims that the screen resolution is 1680 x 1050. But the info returned by Quickwin function GETWSIZEQQ (QWIN$FRAMEWINDOW, QWIN$SIZEMAX, frinfo) returns frinfo%w = 1344 and frinfo%h = 790. If I code my program to create a window say 1200 pixels wide, I would normally expect it to occupy 1400/1680 = .83 of the screen width, but in fact it occupies 1400/1344 = 1.04 of the width--iow, it won't fit. It is doing what Quickwin tells it, but it no longer matches the screen's physical characteristics like it did in the past.

So evidence points to the screen having significantly fewer pixels than what Windows (or Dell) claims it has. But maybe the problem is just Quickwin: is it trying to draw everything using pixels that are bigger that what is on my screen?

Why doesn't the info returned by GETWSIZEQQ match the info returned by the Windows screen properties tool? It has always matched on my previous systems.

Data Prefetching using Fortran Directives

$
0
0

Hi every one,

I am working on sparse algorithms' optimization using Intel's Fortran compiler. After applying different optimization features I want to make suitable use of data prefetching and cache utilization. In order to do that I tested several probable configurations of prefetching directives and intrinsic functions on both Intel Corei7 and AMD APU processors. But I don't get expected results. But in a specific case I think I get a real prefetching which gives me a 3-4 times speed up.

Following is the faster code:

    DOUBLE PRECISION, DIMENSION(:), ALLOCATABLE :: A2D, X, TEMP


	    DOUBLE PRECISION :: SUM


	    INTEGER :: SIZE, I, J, COUNT, BLS, I0


	   


	    SIZE = 1000000


	    BLS = 21 * 25


	    


	    ALLOCATE(A2D(0:BLS * SIZE - 1))


	    ALLOCATE(X(0:SIZE - 1))


	    ALLOCATE(TEMP(0:BLS - 1))


	    !DEC$ SIMD


	    DO J = 0, SIZE - 1


	        DO I = 0, BLS - 1


	            A2D(BLS * J + I) = I + J


	        END DO


	    END DO


	    DO COUNT = 0, 50


	        !$OMP PARALLEL SHARED(A2D, X, SIZE, BLS)


	            !$OMP DO SCHEDULE(STATIC) PRIVATE(J, I, SUM, TEMP, I0)            


	        !DEC$ SIMD


	        DO J = 0, SIZE - 1            


	            I0 = BLS * J


	            DO I = 0, BLS - 1


	                TEMP(I) = A2D(I0 + I)


	            END DO


	            SUM = 0.D0


	            DO I = 0, BLS - 1


	                SUM = SUM + TEMP(I) * 2.D0


	            END DO


	            X(J) = SUM


	        END DO


	            !$OMP END DO


	        !$OMP END PARALLEL


	    END DO

And the following is the code I expect to be correct but is around 4 times slower (I think because the prefetch directive does not work):

    DOUBLE PRECISION, DIMENSION(:), ALLOCATABLE :: A2D, x


	    DOUBLE PRECISION :: SUM


	    INTEGER :: SIZE, I, J, COUNT, BLS, I0
    SIZE = 1000000


	    BLS = 21 * 25


	    


	    ALLOCATE(A2D(0:BLS * SIZE - 1))


	    ALLOCATE(X(0:SIZE - 1))


	    !DEC$ SIMD


	    DO J = 0, SIZE - 1


	        DO I = 0, BLS - 1


	            A2D(BLS * J + I) = I + J


	        END DO


	    END DO


	    DO COUNT = 0, 50


	        !$OMP PARALLEL SHARED(A2D, X, SIZE, BLS)


	            !$OMP DO SCHEDULE(STATIC) PRIVATE(J, I, SUM, TEMP, I0, J_CACHE)            


	        !DEC$ PREFETCH A2D


	        DO J = 0, SIZE - 1            


	            I0 = BLS * J


	            SUM = 0.D0


	            !DEC$ SIMD


	            DO I = 0, BLS - 1


	                SUM = SUM + A2D(I0 + I) * 2.D0


	            END DO


	            X(J) = SUM


	        END DO


	            !$OMP END DO


	        !$OMP END PARALLEL


	    END DO

I am really confused and need your help.

Are Update 2 for Visual Studio 2013 and Intel Fortran 2015 Beta Update 1 compatible?

$
0
0

Are Update 2 for Visual Studio 2013 and Intel Fortran 2015 Beta Update 1 compatible?  If not, will the Intel Fortran 2015 Beta Update 2 (expected soon, right?) be tested and validated with VS 2013, Update 2?

Silly me, I should have checked on this before installing VS2013, Update 2.  I'm running into strange problems with inter-procedural optimizations (IPO) related errors while compiling code in Release mode, getting "out of virtual memory" errors with ALLOCATE statements in code that had worked reliably just before I upgraded VS 2013.

Anyone else noticing such issues or other problems along the same lines with VS2013 Update 2 and Intel Fortran?

Thanks,

 


Error LNK2019: unresolved external symbol _IARGC

$
0
0
Software Setup:  Windows 7 Ultimate Service Pack 1
                 Visual Studio Professional 2012 Version 11.0.6.61030.00 Update 4
                 Intel Parallel Studio XE 2013  Version 14.0.3.202 Build 20140422

The following error was encountered while compiling a project with:

Configuration: Active(Debug)
Platform: Active(Win32)

error LNK2019: unresolved external symbol _IARGC referenced in function _COMMANDLINEARGUMENTS
error LNK2019: unresolved external symbol _GETARG referenced in function _COMMANDLINEARGUMENTS

The error does not occur when the Configuration is set to: Release

Any suggestions?

John

 

 

How do I specify a c-string

$
0
0

I'm trying some code Steve posted years ago:

                ! Code for message box


	                ret = MessageBox (         &


	                    GetForegroundWindow(), & ! Handle to window


	                    "Hello World!"C,           & ! Text (don't forget C-string)


	                    "Example of using MessageBox"C, & ! Caption for title bar


	                    MB_ICONINFORMATION + MB_OK) ! Type flags



	

which works. In reality I want to post some useful information that is contained in a character variable but I can't figure out the correct syntax. In fact the only thing I've tried is using LOC but that doesn't compile.

              character(len=:), allocatable :: myvar
              myvar = 'Hello World!'
                ! Code for message box


	                ret = MessageBox (         &


	                    GetForegroundWindow(), & ! Handle to window


	                    LOC(myvar),           & ! Text (don't forget C-string)


	                    "Example of using MessageBox"C, & ! Caption for title bar


	                    MB_ICONINFORMATION + MB_OK) ! Type flags



	

Compiling with Intel(R) Visual Fortran Compiler XE 14.0.1.139 [Intel(R) 64]...

error #6633: The type of the actual argument differs from the type of the dummy argument.   [LOC]

 

Thanks.

Implicit array lengths

Question about local variable of recursive subroutine

$
0
0

Dear all,

For the following code, the compiler will show error #8000 if I use /warn:interfaces option. Could anyone tell me the problem of the code?

After disabled this option, the running result is not what I expected. I wish the results are every subroutine's local value i, but now every display is 11. Could anyone help me to take a look at the problem?

 

Thanks,

Zhanghong Tang

 

 

    subroutine test_recursive(i)

    implicit none

    integer::i

    i=i+1

    if(i<=10)call test_recursive(i)

    print *,i

    end subroutine

    program testrecursive

    implicit none

    ! Variables

    integer::i=0

    ! Body of testrecursive

    call test_recursive(i)

    end program testrecursive

 

Ubound gives incorrect result

$
0
0

The two Dim array should give 29x5, it gives 5x5.

Notice that SIZE does give the correct count = 29*5 =145

AttachmentSize
Downloadmain_2.f90369 bytes

Coarrays, parallelization, vectorization

$
0
0

Colleagues,

This is meant to be a summary of one coding group's experience with the above 3 aspects of Fortran programming and to elicit other opinions, experiences, and insights.

Background: Our group provides commercially available software for the building design and construction industries. Our most extensive code is, essentially, an elaborate radiative transfer analysis of a building. Input is user-produced CAD along with supporting data that describes building equipment. Essential and important computational tasks involve computational geometry, radiative transfer analysis, setting up and solving systems of equations, and so on. Typical tasks of most large-scale engineering analysis systems. Over the past 10 years we have generated and now modify/maintain/update about 250,000 lines of code.

Vectorization: This has proved to be (for our work, at least) the most important and efficacious optimization technique -- by far. The analysis of even a modest-sized project involves 10's of millions of dot-products, cross-products, and geometric bounds-checking. Good practice of a decade ago had data arranged so that, say, the x,y,z Cartesian coordinates of a vertex were contiguous in memory: x:y:z. Best for dot-products and cross-products. Now is it best to arrange arrays so that all the x coordinates are contiguous: x1:x2:x3: . . .:xn. Similarly for y and z. Or at least maintain a duplicate data set with the coordinates arranged so. The processing of the x-part of a large set of dot-products is then vectorizable.

DotProd(1:N) =  CoorA(1:N,1)*CoorB(1:N,1)+CoorA(1:N,2)*CoorB(1:N,2)+CoorA(1:N,3)*CoorB(1:N,3)

Where CoorA(1:N,1) points to the x-coordinates of all N surfaces; and so on. We have found the speed-up to be larger than that expected from just the use of SIMD (4, in our case). Evidently, memory is (much) better used/accessed in this way. In general, we have found this to be (much) faster, even when some of the dot-products produced are not used or inappropriate. That is, it's better to throw away some of the vectorized results, than to trouble not computing them. We have found the speed-up even greater for the cross-product intensive part of our code. 

We have found that axis-aligned bounding box checking is an important opportunity for vectorization: InOut is a vector of integers

InOut(1:N) = ( Coor(1:N,1) < BoxMaxX )*( Coor(1:N,2) < BoxMaxY )*( Coor(1:N,3) < BoxMaxZ)

The check against the bounding box minimum coordinates can be (often is) concatenated onto the Max check. In general (that is, statistically) we find this is to be considerably faster than an explicit, early-out loop that checks x, then, y, then z. Obviously, if any of the checks fail, the value of InOut for that surface will be zero. In this regard, we have been looking for an efficient way to pack the zeros out of a long vector -- without success so far.  The intrinsic Pack routine is hopelessly slow. We also wonder (we've made no investigation yet) whether such results are better stored in vectors of smaller individual element byte length; 2-byte integers, 1-byte integers. If the results are later operated on repeatedly, and SIMD is used, then instead of 4-at-a-time, testing/evaluating/check can be done 8-at-a-time, or in even large clumps. 

All this is obvious. But what is important (to us, at least) is that in general, in practice (statistically ,for most projects) the speed-up is significant and worth the significant and wide-spread changes in code required. This is an important aspect for those dealing with valuable, legacy code. And to some extent it requires a different type of thought (maybe even different algorithms) when generating new code. We imagine that as the SIMD registers get larger, these effects will be even more pronounced.

Parallelization: We have found that in general, and for our code, parallelization by threading is essentially useless. (Our team jokes that parallelization/OpenMP isn't a false promise, it's a cruel hoax). To be sure there is lots of evidence that there are many cases where sharing work in multiple threads is very efficacious. But we find, almost always, that the overhead involved completely swamps whatever gain their might be. Some of this is due to the nature of what we are computing. There are very, very few places in our analysis where the work to be done is "tight"; that is, expressible or accomplishable with just a few operations and so just a few lines of code. As when one multiplies a matrix, or is manipulating 10^8 pixels in an image. In general, the work to be done in our code is elaborate and so the work necessary to establish threads is also elaborate. If, for example, we have 10^4 surfaces, then we have 10^8 occlusion analyses to do (can one surface "see" another?). There might be 10^3 potential blocking surfaces to check, with each check requiring a relatively elaborate analysis. By the time we back out of the nested loops far enough to prevent overhead/setup time from being prohibitive, it proves better (by far) to use the Coarray Fortran paradigm. We are particularly interested in others' experience (and advice!) in this regard.

Having written that, I should add that there are some (very few) times when threading is efficacious: as in matrix multiplication. By-the-way, if you are interested in a crystal-clear, practical, detailed exposition of how such a task can be handled, we suggest you view the series of videos that Jim Demsey (frequent and important contributor to this forum) has produced. You can find the link at his web site.

In general, we have found that evaluations of various optimization techniques that use matrix multiplication are not useful, because they are NOT indicative of what is required for scientific/engineering work that involves repeated use of an elaborate or lengthy process. I don't mean to sound silly, but we no longer pay attention to claims (or evaluations) that involve matrix multiplication. The problem is, in many ways, trivial and not sufficiently indicative. The difficult and expensive work is setting up the matrices or system of equations, not multiplying the matrices or solving the system.

Coarray Fortran: We have had considerable success with this. Very considerable. Our approach does not focus on the shared data between images (the coarrays), but rather the opportunity to have multiple instances of (very nearly) identical code working on pieces of very large problems. We note the following. The most difficult part of making effective use of multiple images is to predict the work load. We have had to spend considerable time developing quick, effective ways to predict work and so generate more-or-less even workloads for each image. In our case, for example, simple functions involving surface area, orientation, square of separating distance, and so on. This turns out to be important (and non-trivial) since it doesn't help to have 1 or 2 of the images doing all the heavy lifting. In this regard, we have found it useful to have a non-coarray Fortran program do an initial analysis and determine workload, and then have it launch a coarray Fortran program is establishes multiple images and performs the work.

As Steve Lionel has mentioned several times, The implementation of coarrays in the Intel Fortran compiler is a work in progress, and aspects of it will improve over time. For the present, we find the communication between images using coarrays directly to be too slow. Communcation using files is faster. (We were surprised, too). This may change. Currently, we limit communication between images that uses coarrays (usually at the start and end of the work to be done), and each image writes its result to a file. The "launcher" Fortran program (having waited for all images to finish) then gathers the result into a single, neat package.

We have found it important to limit the number of images to the number of physical core present on the host machine. Using the virtual cores in addition to the physical ones generally slows the overall process. And so, setting the appropriate number-of-cores environment variable is very important since we have found the slowing effect can be considerable.  Several months ago, Steve provided a routine that can be called from Fortran that returns this information about a host.

We strongly suspect that Coarray Fortan will like be our team's most significant investment in optimizing our engineering code in the furture.

Perhaps I should apologize for such a long post, but it is a very interesting subject and we are interested other's experiences and findings.

David

 

 

 

 

NAN

$
0
0

Hello,

I wrote this program:

real :: x,y,z,result

x=2

y=1

z=1

result=acos((x-y)/z)

It returns result = NAN while it should it zero and I don't know why ! any help ? 

Thanks,

Andrew.


More problems with LBOUND and UBOUND

$
0
0

Ok, this pretty well nails it -

This simplified test case shows what happens when I have a lower dimension of ZERO,

for both the INPUT arrays AND the OUTPUT array. Using one dimensional arrays for simplicity.

Apparently the CALLED routine does not give the correct values for LBOUND and UBOUND.

It crashes thinking that the lower dimension is ONE, not Zero.

The upper dimension is also off by ONE, it should be 5, not 6.

BTW, the Fortran texts you referred to do not accurately portray the actual

behavior of the compiler, they portray the way it is SUPPOSED to work.

So, who is right?

If there is something subtle I missed, it sure never gets properly explained as far as I can tell.

Maybe the pre Fortran 95 approach would work better?

 

The problem goes away when I explicitly give the lower dimensions of ZERO.

AttachmentSize
Downloadtest49_0.f90830 bytes

Application Distribution, side-by-side configuration is incorrect error

$
0
0

I have a VB-Net application that calls several FORTRAN dlls. The program runs fine on computers with the Intel compiler installed, but when running on computers without the compiler, the application crashes when the dll's are called. The error says: "The application has failed to start because its side-by-side conjuration is incorrect"

The target computer is running Windows 7 and has a 64 bit intel processor, which is the same as the development computer.

I have the document Redistributing Application Binaries Built with 11.x Intel® Compiler Professional Editions for Microsoft Windows*. Per the directions in the document, I installed the redistributable library package for Intel 64 and ia32 on the target machine (from the Intel site). I also tried each redistributable package independently. I ran dependency Walker on the application and the dll and all of the dependencies are present on the target machine. I found from another post that Microsoft C++ redistributables must be installed  as well, which I did. I'm still getting the error.

Any help would be appreciated. 

PS, I tried doing this several years ago when we purchased the compiler but could never get it to work, so I went back to using my old trusty Compaq Fortran compiler from 1997. I figure it's time to bite the bullet and see if I can get the Intel compiler working  again. :)

 

 

 

 

Another problem with ASSUMED SHAPE ARRAYS

$
0
0

I took a previoulsy existing program and converted it using using assumed shape arrays,

ran into this which looks like a BUG anyway. Or maybe something subtle I overlooked ?

I have these two routines STUFFIT and FILLIt, and to keep the coding simple I am using a lower bound of ZERO

on the matrices.

When it calls FILLIT, the called routine crashes, thinking that the lower bound is ONE.

But you can see that the lower bound is ZERO in the calling routine.

This might have to do with the third argument is an OUTPUT as I have specifically

declared. I was under the impression I can pass the lower dimension to an output

array as well. We don't see that problem with the INPUT arrays to FILLIT.

When I check the results of LBOUND it gives me back ONE rather than ZERO as it should.

Of course I could "kluge" this by putting the result in an array with a lower bound of ONE,

but that makes the code a LOT MESSIER cause I then have to copy it elsewhere.

Previously the problem did not appear when I had regular arrays with an explicit lower bound of ZERO.

I was wondering if perhaps the problems appears with just REAL(16) arrays. I was trying to

have a high precision summation, otherwise they would be REAL(8).

See attached routines. I hope they both upload this time.

Maybe I have to give the output array an EXPLICIT lower bound? Its always ZERO anyway.

And * for the upper bound ?

AttachmentSize
Downloadfillit.f90860 bytes
Downloadstuffit.f90441 bytes

Trouble accessing Fortran Help

$
0
0

Our IT has set me up with a new computer, new OS (Win8.1), and new IVF (2013_sp1.3.202 on VS2010). Not being satisfied with the standard Help, which lacks features I was used to, I also installed MS Help Viewer 1.1. So several things have changed.

Everything is fine except access to Help, which is intermittent. Initially (after any restart) it works fine, but after some time--which includes shutting VS down and running other programs--any attempt to access Help results in an error popup window as follows:

Window title: Microsoft Help Viewer 1.1 - Catalog VS_100_EN_US

Message:

     [BrowserSafeguard] The socket connection to 127.0.0.1 failed.

     ErrorCode: 10061

     No connection could be made because the target machine actively refused it 127.0.0.1:47873

Rebooting always fixes it, but the fix is short-lived.

Any ideas?

Setting the value to CLASS(*)

$
0
0

Hi All,

I am trying to create a derived data type that can at run time take on different values. The follow is a snipit of code:

PROGRAM TEST
!
TYPE FLEXIBLE
  CLASS(*),         ALLOCATABLE:: VAL
END TYPE
TYPE(FLEXIBLE)::A
!
CLASS(*),         ALLOCATABLE:: B
INTEGER:: I
!IT SEEMS LIKE TH FOLLOWING SHOULD WORK:
I=5
ALLOCATE(A%VAL,SOURCE=I)
ALLOCATE(B,SOURCE=I)
!
A%VAL=10
B=50
!
END PROGRAM

When I try to set the value to a new value, I get a compiler error. Even if I remove the A%VAL= and B= they do not seem to hold the value of I.

On a side note, what does MOLD do within an ALLOCATE statement.

Thanks as always for your help.

Viewing all 5691 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>