i2 ( dist =*; sched = owner ) ) 16 for ( i1 = 0; i1 < n ; i1 ++) 17 for ( i2 = n -1, p.18 ,
i2 ( dist = block ; sched = ordered ) ) 21 for ( i1 = 0; i1 < n ; i1 ++) 22 for ( i2 = n -2; i2 >= 1, pp.2-23 ,
Deployment on GPUs of an Application in Computational Atomic Physics, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1359-1366, 2011. ,
DOI : 10.1109/IPDPS.2011.285
URL : https://hal.archives-ouvertes.fr/hal-01285671
Numerical Validation and GPU Performance in Atomic Physics, Designing Scientific Applications on GPUs, 2013. ,
Generating data transfers for distributed GPU parallel programs, Journal of Parallel and Distributed Computing, vol.73, issue.12, pp.1649-1660, 2013. ,
DOI : 10.1016/j.jpdc.2013.07.022
URL : https://hal.archives-ouvertes.fr/hal-00925733
Numerical linear algebra on emerging architectures: The PLASMA and MAGMA projects, Journal of Physics: Conference Series, vol.180, issue.1, 2009. ,
DOI : 10.1088/1742-6596/180/1/012037
Validity of the single processor approach to achieving large scale computing capabilities, Proceedings of the April 18-20, 1967, spring joint computer conference on, AFIPS '67 (Spring), pp.483-485, 1967. ,
DOI : 10.1145/1465482.1465560
Static Compilation Analysis for Host-Accelerator Communication Optimization, Languages and Compilers for Parallel Computing, pp.237-251, 2013. ,
DOI : 10.1007/978-3-642-36036-7_16
URL : https://hal.archives-ouvertes.fr/hal-00743496
Par4all : From Convex Array Regions to Heterogeneous Computing, IMPACT 2012 : Second International Workshop on Polyhedral Compilation Techniques HiPEAC 2012, 2012. ,
URL : https://hal.archives-ouvertes.fr/hal-00744733
TreadMarks: shared memory computing on networks of workstations, Computer, vol.29, issue.2, pp.18-28, 1996. ,
DOI : 10.1109/2.485843
A Linear Algebra Framework for Static High Performance Fortran Code Distribution, Scientific Programming, pp.3-27, 1997. ,
DOI : 10.1155/1997/195689
Data-Aware Task Scheduling on Multi-accelerator Based Platforms, 2010 IEEE 16th International Conference on Parallel and Distributed Systems, pp.291-298, 2010. ,
DOI : 10.1109/ICPADS.2010.129
URL : https://hal.archives-ouvertes.fr/inria-00523937
StarPU : a Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. Concurrency and Computation : Practice and Experience, pp.187-198, 2011. ,
URL : https://hal.archives-ouvertes.fr/inria-00384363
Putting Automatic Polyhedral Compilation for GPGPU to Work, Proceedings of the 15th Workshop on Compilers for Parallel Computers (CPC'10), 2010. ,
URL : https://hal.archives-ouvertes.fr/inria-00551517
The NAS parallel benchmarks, International Journal of High Performance Computing Applications, vol.5, issue.3, pp.63-73, 1991. ,
The Paradigm compiler for distributed-memory multicomputers, Computer, vol.28, issue.10, pp.2837-2884, 1995. ,
DOI : 10.1109/2.467577
Automatic C-to-CUDA code Generation for Affine Programs, Compiler Construction, pp.244-263, 2010. ,
Towards automatic translation of OpenMP to MPI, Proceedings of the 19th annual international conference on Supercomputing , ICS '05, pp.189-198, 2005. ,
DOI : 10.1145/1088149.1088174
Programming Distributed Memory Sytems Using OpenMP, 2007 IEEE International Parallel and Distributed Processing Symposium, pp.1-8, 2007. ,
DOI : 10.1109/IPDPS.2007.370397
The Polyhedral Model Is More Widely Applicable Than You Think, Compiler Construction, pp.283-303, 2010. ,
DOI : 10.1007/978-3-642-11970-5_16
URL : https://hal.archives-ouvertes.fr/inria-00551087
Grid'5000: A Large Scale And Highly Reconfigurable Experimental Grid Testbed, International Journal of High Performance Computing Applications, vol.20, issue.4, pp.481-494, 2006. ,
DOI : 10.1177/1094342006070078
URL : https://hal.archives-ouvertes.fr/hal-00684943
GASNet Specification, V1.1, 2002. ,
SPOC : GPGPU Programming Through Stream Processing with OCaml. Parallel Processing Letters, p.2012 ,
URL : https://hal.archives-ouvertes.fr/hal-00697257
Efficient Abstractions for GPGPU Programming, International Journal of Parallel Programming, vol.34, issue.5, pp.583-600, 2014. ,
DOI : 10.1007/s10766-013-0261-x
URL : https://hal.archives-ouvertes.fr/hal-01216144
hwloc: A Generic Framework for Managing Hardware Affinities in HPC Applications, 2010 18th Euromicro Conference on Parallel, Distributed and Network-based Processing, pp.180-186, 2010. ,
DOI : 10.1109/PDP.2010.67
URL : https://hal.archives-ouvertes.fr/inria-00429889
Productive Cluster Programming with OmpSs, Euro-Par 2011 Parallel Processing, pp.555-566, 2011. ,
DOI : 10.1147/rd.515.0593
Implementing OmpSs support for regions of data in architectures with multiple address spaces, Proceedings of the 27th international ACM conference on International conference on supercomputing, ICS '13, pp.359-368, 2013. ,
DOI : 10.1145/2464996.2465017
Parallel Programmability and the Chapel Language, International Journal of High Performance Computing Applications, vol.21, issue.3, pp.291-312, 2007. ,
X10 : an Object-oriented Approach to Non-uniform Cluster Computing, ACM SIGPLAN Notices, issue.10, pp.40519-538, 2005. ,
Unified Parallel C for GPU Clusters: Language Extensions and Compiler Implementation, Languages and Compilers for Parallel Computing, pp.151-165, 2011. ,
DOI : 10.1007/978-3-642-03869-3_82
Interprocedural array region analyses, Languages and Compilers for Parallel Computing, pp.46-60, 1996. ,
DOI : 10.1007/BFb0014191
URL : https://hal.archives-ouvertes.fr/hal-00752611
OpenMP: an industry standard API for shared-memory programming, IEEE Computational Science and Engineering, vol.5, issue.1, pp.46-55, 1998. ,
DOI : 10.1109/99.660313
The SPMD Model: Past, Present and Future, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.1-1, 2001. ,
DOI : 10.1007/3-540-45417-9_1
An Approach To Data Distributions in Chapel, International Journal of High Performance Computing Applications, vol.21, issue.3, pp.313-335, 2007. ,
DOI : 10.1177/1094342007078451
HMPP : A Hybrid Multi-core Parallel Programming Environment, Workshop on General Purpose Processing on Graphics Processing Units, 2007. ,
High-Performance Computing: Clusters, Constellations, MPPs, and Future Directions, Computing in Science and Engineering, vol.7, issue.2, pp.51-59, 2005. ,
DOI : 10.1109/MCSE.2005.34
OmpSs : A Proposal For Programming Heterogeneous Multi-Core Architectures. Parallel Processing Letters, pp.173-193, 2011. ,
A survey of parallel computer architectures, Computer, vol.23, issue.2, pp.5-16, 1990. ,
DOI : 10.1109/2.44900
A Comprehensive Performance Comparison of CUDA and OpenCL, 2011 International Conference on Parallel Processing, pp.216-225, 2011. ,
DOI : 10.1109/ICPP.2011.45
Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.24, issue.4, 1991. ,
DOI : 10.1007/BF01407931
Some Computer Organizations and Their Effectiveness, IEEE Transactions on Computers, vol.21, issue.9, pp.948-960, 1972. ,
DOI : 10.1109/TC.1972.5009071
Deployment on GPUs of an Application in Computational Atomic Physics, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, pp.1359-1366, 2011. ,
DOI : 10.1109/IPDPS.2011.285
URL : https://hal.archives-ouvertes.fr/hal-01285671
Implementation of NAS Parallel Benchmarks in High Performance Fortran, 1998. ,
Implementation of NAS Parallel Benchmarks in High Performance Fortran, 1998. ,
Open MPI: Goals, Concept, and Design of a Next Generation MPI Implementation, Recent Advances in Parallel Virtual Machine and Message Passing Interface, pp.97-104, 2004. ,
DOI : 10.1007/978-3-540-30218-6_19
A novel approach towards automatic data distribution, Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '95, pp.78-78, 1995. ,
DOI : 10.1145/224170.224500
MPICH2: A New Start for MPI Implementations, Recent Advances in Parallel Virtual Machine and Message Passing Interface, 2002. ,
DOI : 10.1007/3-540-45825-5_5
Beyond Do Loops: Data Transfer Generation with Convex Array Regions, Languages and Compilers for Parallel Computing, pp.249-263, 2013. ,
DOI : 10.1007/978-3-642-37658-0_17
URL : https://hal.archives-ouvertes.fr/hal-00742583
Demonstration of Automatic Data Partitioning Techniques for Parallelizing Compilers on Multicomputers. Parallel and Distributed Systems, IEEE Transactions on, vol.3, issue.2, pp.179-193, 1992. ,
hiCUDA : A High-level Directivebased Language for GPU Programming, Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, pp.52-61, 2009. ,
Data Parallel Algorithms, Communications of the ACM, vol.29, issue.12, pp.1170-1183, 1986. ,
Extending OpenMP to Clusters. White Paper, Intel Corporation, 2006. ,
An Analytical Model for a GPU Architecture with Memory-level and Thread-level Parallelism Awareness ,
Semantical Interprocedural Parallelization : An Overview of the PIPS Project, Proceedings of the 5th international conference on Supercomputing, ICS '91, pp.244-251, 1991. ,
URL : https://hal.archives-ouvertes.fr/hal-00984684
Parameterized and Multi-level Tiled Loop Generation ,
Performance Evaluation of the Omni OpenMP Compiler, High Performance Computing, pp.403-414, 2000. ,
DOI : 10.1007/3-540-39999-2_39
Automatic data and computation decomposition on distributed memory parallel computers, ACM Transactions on Programming Languages and Systems, vol.24, issue.1, pp.1-50, 2002. ,
DOI : 10.1145/509705.509706
Cetus ??? An Extensible Compiler Infrastructure for Source-to-Source Transformation, Languages and Compilers for Parallel Computing, pp.539-553, 2004. ,
DOI : 10.1007/978-3-540-24644-2_35
OpenMP to GPGPU, ACM SIGPLAN Notices, vol.44, issue.4, pp.101-110, 2009. ,
DOI : 10.1145/1594835.1504194
A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction, Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU '10, pp.51-61, 2010. ,
DOI : 10.1145/1735688.1735698
URL : https://hal.archives-ouvertes.fr/inria-00551084
Index domain alignment: minimizing cost of cross-referencing between distributed arrays, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation, pp.424-433, 1990. ,
DOI : 10.1109/FMPC.1990.89493
A New Vision for Co-Array Fortran, Proceedings of the Third Conference on Partitioned Global Address Space Programing Models, PGAS '09, pp.1-5, 2009. ,
Advanced optimization strategies in the Rice dHPF compiler, Concurrency and Computation : Practice and Experience, pp.741-767, 2002. ,
DOI : 10.1002/cpe.647
Distributed OMP : Extensions to OpenMP for SMP Clusters, Second European Workshop on OpenMP (EWOMP), pp.14-15, 2000. ,
STEP: A Distributed OpenMP for Coarse-Grain Parallelism Tool, OpenMP in a New Era of Parallelism, pp.83-99, 2008. ,
DOI : 10.1007/978-3-540-79561-2_8
URL : https://hal.archives-ouvertes.fr/hal-01373120
Christian Parrot, and Frédérique Silber- Chaussumier. From OpenMP to MPI : First Experiments of the STEP Source-tosource Transformation Tool, The international Parallel Computing Conference (ParCo), pp.669-676, 2009. ,
Cramming More Components onto Integrated Circuits, 1965. ,
Dynamic Load-Balancing for the STEM-II Air Quality Model, Computational Science and Its Applications-ICCSA 2006, pp.701-710, 2006. ,
XcalableMP implementation and performance of NAS Parallel Benchmarks, Proceedings of the Fourth Conference on Partitioned Global Address Space Programming Model, PGAS '10, pp.1-11, 2010. ,
DOI : 10.1145/2020373.2020384
Productivity and Performance of Global-View Programming with XcalableMP PGAS Language, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012), pp.402-409, 2012. ,
DOI : 10.1109/CCGrid.2012.118
Global arrays: A nonuniform memory access programming model for high-performance computers, The Journal of Supercomputing, vol.10, issue.2, pp.169-189, 1996. ,
DOI : 10.1007/BF00130708
Co-Array Fortran for Parallel Programming, SIGPLAN Fortran Forum, vol.17, issue.2, pp.1-31, 1998. ,
A Survey of General-Purpose Computation on Graphics Hardware, Computer graphics forum, pp.80-113, 2007. ,
Hybrid Iterative and Model-driven Optimization in the Polyhedral Model The Polyhedral Benchmark suite, 2014. ,
The capacity of low-density parity-check codes under message-passing decoding. Information Theory, IEEE Transactions on, vol.47, issue.2, pp.599-618, 2001. ,
Understanding the Behavior and Performance of Non-blocking Communications in MPI, Euro-Par 2004 Parallel Processing, pp.173-182, 2004. ,
DOI : 10.1007/978-3-540-27866-5_22
Run-time scheduling and execution of loops on message passing machines, Journal of Parallel and Distributed Computing, vol.8, issue.4, pp.303-312, 1990. ,
DOI : 10.1016/0743-7315(90)90129-D
Generating data transfers for distributed GPU parallel programs, Journal of Parallel and Distributed Computing, vol.73, issue.12, pp.1649-1660, 2013. ,
DOI : 10.1016/j.jpdc.2013.07.022
URL : https://hal.archives-ouvertes.fr/hal-00925733
OpenCL : A Parallel Programming Standard for Heterogeneous Computing Systems Computing in science & engineering, p.66, 2010. ,
The Free Lunch is Over : A Fundamental Turn Toward Concurrency in Software, Dr. Dobb's Journal, vol.30, issue.3, pp.202-210, 2005. ,
A multithreaded communication engine for multicore architectures, 2008 IEEE International Symposium on Parallel and Distributed Processing, pp.1-7, 2008. ,
DOI : 10.1109/IPDPS.2008.4536139
URL : https://hal.archives-ouvertes.fr/inria-00224999
Vienna-Fortran/HPF Extensions for Sparse and Irregular Problems and their Compilation. Parallel and Distributed Systems, IEEE Transactions on, vol.8, issue.10, pp.1068-1083, 1997. ,
A Bridging Model for Parallel Computation, Communications of the ACM, vol.33, issue.8, pp.103-111, 1990. ,
NAS Parallel Benchmarks Version 2.4, pp.2-007, 2002. ,
Polyhedral parallel code generation for CUDA, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, p.54, 2013. ,
DOI : 10.1145/2400682.2400713
URL : https://hal.archives-ouvertes.fr/hal-00786677
Active Messages : A Mechanism for Integrating Communication and Computation, 25 Years of the International Symposia on Computer Architecture ISCA '98, pp.430-440, 1998. ,
OpenACC ??? First Experiences with Real-World Applications, Euro-Par 2012 Parallel Processing, pp.859-870, 2012. ,
DOI : 10.1007/978-3-642-32820-6_85
The potential of the cell processor for scientific computing, Proceedings of the 3rd conference on Computing frontiers , CF '06, pp.9-20, 2006. ,
DOI : 10.1145/1128022.1128027
Productivity and performance using partitioned global address space languages, Proceedings of the 2007 international workshop on Parallel symbolic computation, PASCO '07, pp.24-32, 2007. ,
DOI : 10.1145/1278177.1278183
Parametrically Tiled Distributed Memory Parallelization of Polyhedral Programs, 2013. ,