I am wondering if coarrays could be used for the following scenario:
I have a subroutine that calculates Y=F(X), X and Y being vectors of dimension, say, 10000. F is being called hundreds or thousands of time in an iterative process.
F has properties such that the calculation of Y(i) is completely independent from other calculations; in other words we can write Y(i) = Fi(X(i)), where Fi is a specific function (distinct subroutine) which may not be related at all to Fj, i/=j . Each individual calculation Y(i) = Fi(X(i)) is already fairly substantial and we can for now neglect concerns due to communication overheads between images. In practice, there may only be dozens of such individual "basic" functions Fi, that would be applied to all terms of X.
The basic idea is the following:
1. The master thread provides X to F.
2. A bunch of slave images are assigned the calculation of the Y(i); a task stack is used so that the images that complete faster than the others can grab another calculation for another element of X.
3. The master thread collects the components of Y as they are calculated.
It's nice on paper, but the synchronization mechanisms required seem a bit tedious to implement. I am wondering if anybody started looking into similar applications of coarrays. Ideally, this would be used on a cluster with a large number of nodes.
Thanks!