M. Serge-guelton and . Amini, Ten tiles. Personal Bibliography [Guelton et al. 2011a (perso) Ronan Keryell and Béatrice Creusillet. PyPS, a Programmable Pass Manager.P o s t e ra tt h e2 4 t hI n t e r n a t i o n a l Workshop on Languages and Compilers for Parallel Computing, pp.143-144, 2011.

. Guelton, Ronan Keryell and Béatrice Creusillet. PyPS, a Programmable Pass Manager, pp.144-145, 2011.

. Guelton, Beyond Do Loops: Data Transfer Generation with Convex Array Regions, The 25th International Workshop on Languages and Compilers for Parallel Computing, p.2012, 2012.
DOI : 10.1007/978-3-642-37658-0_17

URL : https://hal.archives-ouvertes.fr/hal-00742583

. Adve, V. Sarita, and . Adve, Designing Memory Consistency Models for Shared-Memory Multiprocessor, 1993.

. Adve, V. Sarita, and . Adve, Rethinking shared-memory languages and hardware, Proceedings of the international conference on Supercomputing, ICS '11, pp.1-1, 2011.
DOI : 10.1145/1995896.1995898

. Aho, Compilers: principles, techniques, and tools, p.113, 1986.

. Alias, Christophe Alias, Alain Darte and Alexandru Plesco. Program Analysis and Source-Level Communication Optimizations for High-Level Synthesis.R a p p o r t de recherche RR-7648, pp.62-90, 2011.

. Alias, Optimizing Remote Accesses for Offloaded Kernels: Application to High-Level Synthesis for FPGA, 2nd International Workshop on Polyhedral Compilation Techniques, Impact , (in conjunction with HiPEAC 2012), pp.90-91, 2012.
URL : https://hal.archives-ouvertes.fr/hal-00761477

]. Enmyren and C. W. Kessler, SkePU, Proceedings of the fourth international workshop on High-level parallel programming and applications, HLPP '10, pp.5-14, 2010.
DOI : 10.1145/1863482.1863487

. Eyles, PixelFlow, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS workshop on Graphics hardware , HWWS '97, pp.57-68, 1997.
DOI : 10.1145/258694.258714

I. Part, Multidimensional time, International Journal of Parallel Programming, vol.21, pp.389-420, 1992.

C. Feng, K. Feng, and . Cameron, The Green500 List: Encouraging Sustainable Supercomputing, Computer, vol.40, issue.12, pp.50-55, 2007.
DOI : 10.1109/MC.2007.445

. Ferrante, The program dependence graph and its use in optimization, ACM Transactions on Programming Languages and Systems, vol.9, issue.3, pp.319-349, 1987.
DOI : 10.1145/24039.24041

&. Fisher, L. Ghuloum-allan, A. M. Fisher, and . Ghuloum, Parallelizing complex scans and reductions, Proceedings of the ACM SIGPLAN 1994 conference on Programming language design and implementation, PLDI '94, pp.135-146, 1994.

&. Frigo, G. Steven, and . Johnson, The Design and Implementation of FFTW3 Special issue on " Program Generation, Optimization, and Platform Adaptation, Proceedings of the IEEE, pp.216-231, 2005.

. Frigo, The Implementation of the Cilk-5 Multithreaded Language, Proceedings of the SIGPLAN Conference on Program Language Design and Implementation, PLDI, pp.212-223, 1998.

. Irigoin, François Irigoin, Fabien Coelho and Béatrice Creusillet. Dependencies between Analyses and Transformations in the Midd le-End of a Compiler, InAnalyse to Compile, Compile to Analyse Workshop (ACCA), in conjunction with CGO 2011, p.111, 2011.

. Jablin, Automatic CPU-GPU communication management and optimization, Proceedings of the 32nd ACM SIGPLAN conference on Programming language design and implementation, PLDI '11, pp.142-151, 2011.

J. Ji and X. Ma, Using Shared Memory to Accelerate MapReduce on Graphics Processing Units, 2011 IEEE International Parallel & Distributed Processing Symposium, pp.805-816, 2011.
DOI : 10.1109/IPDPS.2011.80

&. Jouvelot, P. Jouvelot, and B. Dehbonei, Au n i fi e ds e m a n t i ca p proach for the vectorization and parallelization of generalized reductions, Proceedings of the 3rd international conference on Supercomputing, ICS '89, pp.186-194, 1989.

]. Kahn, The Semantics of Simple Language for Parallel Programming, IFIP Congress, pp.471-475, 1974.

. Kalray, MPPA : Multi-Purpose Processor Array,2012.Online, 2012.

&. Karypis, G. Kumar, V. Karypis, and . Kumar, Parallel multilevel k-way partitioning scheme for irregular graphs, Proceedings of the 1996 ACM/IEEE conference on Supercomputing (CDROM) , Supercomputing '96, pp.96-129, 1998.
DOI : 10.1145/369028.369103

&. Kedem and . Ishihara, Brute force attack on UNIX passwords with SIMD computer, SSYM'99: Proceedings of the 8th conference on USENIX Security Symposium, pp.8-8, 1999.

-. Lupp, F. Hannig, J. Teich, M. Körner, and W. Eckert, Dynamic task-scheduling and resource management for GPU accelerators in medical imaging, Proceedings of the 25th international conference on Architecture of Computing Systems, ARCS'12, pp.147-159, 2012.

. Millot, STEP: a distributed OpenMP for coarse-grain parallelism tool.I n Proceedings of the 4th international conference on OpenMP in a new era of parallelism , IWOMP'08, pp.83-99, 2008.
URL : https://hal.archives-ouvertes.fr/hal-01373120

M. Gordon and E. Moore, Cramming more components onto integrated circuits, Electronics, vol.38, issue.8, 1965.

E. Munk, C. Ayguadé, P. Bastoul, Z. Carpenter, A. Chamski et al., Acotes project: Advanced [Wikipedia 2012c] Wikipedia. OpenHMPP ? Wikipedia, The Free Encyclopedia,2 0 1 2 . Online; accessed 24, pp.27-262, 2012.

]. M. Wolf and M. S. Lam, A loop transformation theory and an algorithm to maximize parallelism, IEEE Transactions on Parallel and Distributed Systems, vol.2, issue.4, pp.452-471, 1991.
DOI : 10.1109/71.97902

E. Michael, M. S. Wolf, and . Lam, Ad a t al o c a l i t yo p t i m i z i n g algorithm, Proceedings of the ACM SIGPLAN 1991 conference on Programming language design and implementation, PLDI '91, pp.30-44, 1991.

M. Wolfe and . Wolfe, Iteration Space Tiling for Memory Hierarchies, Proceedings of the Third SIAM Conference on Parallel Processing for Scientific Computing, pp.357-361, 1989.

W. Wolfe, Optimizing supercompilers for supercomputers, pp.112-115, 1990.

M. Wolfe and . Wolfe, Implementing the PGI Accelerator model, Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU '10, pp.43-50, 2010.
DOI : 10.1145/1735688.1735697

M. Wolfe and . Wolfe, Optimizing Data Movement in the PGI Accelerator Programming Model,F e b r u a r y2 0 1 1 .O n l i n e ;a c c e s s e d2 4 -F e b r u a r

. Wong, Demystifying GPU microarchitecture through microbenchmarking, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS), pp.235-246, 2010.
DOI : 10.1109/ISPASS.2010.5452013

&. Xiao, . Chun-feng-2010-]-shucai, . Xiao, and . Wu-chun-feng, Inter-Block GPU Communication via Fast Barrier Synchronization, Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2010.

&. Yang and . Chang, AP a r a l l e lL o o pS e l f - Scheduling on Extremely Heterogeneous PC Clusters, Proc. of Intl Conf. on Computational Science, pp.1079-1088, 2003.

. Yelick, Titanium: a high-performance Java dialect, Concurrency: Practice and Experience, pp.10-11, 1998.
DOI : 10.1002/(SICI)1096-9128(199809/11)10:11/13<825::AID-CPE383>3.0.CO;2-H

I. Fifo-first, First Out. 23 FIR Finite Impulse Response, p.75

L. Figure and 6. , 4b illustre cette transformation, et indique les régions de tableau calculées par PIPS sur cet exemple. Le code final avec les instructions de transferts est montré sur la Figure 6.4c.L e sa p pe l sa u xf o n c t i o n sp r é fi x é e sp a rP4A

. Al-'exécution, desfilesOpenCL sont crées et associées avec des accélérateurs et l'espace d'itération est dynamiquement découpé en autant de tuiles que de files, grâce à la méthode de tuilage dynamique. Un système de support exécutif a été développé pour répondre à ce mode de fonctionnement, Il enregistre implicitement les données à transférer pour chaque tuile

. Finalement, Chapitre 7 pour valider toutes les transformations proposées, et les supports exécutifs implémentés. Vingt jeux d'essai de la suite Polybench, trois de Rodinia, et la simulation cosmologique n-corps Stars-PM ont été utilisés. J'ai obtenu une moyenne géométrique de quatorze pour l'accélération mesurée en comparaison avec le code séquentiel d'origine compilé avec GCC. La simulation numérique Stars-PM est accélérée d'un facteur cinquantedeux

. De, SILKAN a intégré les techniques présentées dans cette dissertation dans un environnement utilisateur adapté aux développeurs Scilab, offrant en un " click

L. , P. Performance, . Portabilité, and . Programmabilité, sont de nos jours complétés par la contrainte énergétique Certaines bibliothèques ou certains langages spécifiques sont conçus pour prendre en compte une exécution qui s'adapte automatiquement à la consommation . Faire des compromis sur les trois P sd, re np r e n a n te nc o m p t e la consommation énergétique offre des opportunités intéressantes de recherches futures

. Alors-que-les-auteurs-de-?cs, d e n tàc eq u ' i lp u i s s eê t r ec i b l ép a rd e so u t i l st e l s que Par4All

L. Finalement and . Matériel-progresse-d-'année-en-année, Cette thèse a débuté avant que l'architecture Fermi ne soit disponible, et la génération suivante, Kepler,e s tm a