. Cuda-c-programming-guide,

. Gnu-upc,

, HPF: High Performance Fortran

, HPX: High Performance ParalleX

. Intel®-cilk?-plus,

, MPI: Message Passing Interface

. Opencl,

. Openmp,

. Polybench, Polyhedral Benchmark Suite

. Posix-threads,

. Redbaron,

. Clan, Representation Extraction Tool for C-Based High Level Languages, 2014.

. Numpy, , 2017.

A. V. Aho, R. Sethi, and J. D. Ullman, Compilers: Principles, Techniques, and Tools, 1986.

F. E. Allen, Control flow analysis, Proceedings of a Symposium on Compiler Optimization, pp.1-19, 1970.

R. Baghdadi, U. Beaugnon, A. Cohen, T. Grosser, M. Kruse et al., PENCIL: A PlatformNeutral Compute Intermediate Language for Accelerator Programming, Proceedings of the 24th International Conference on Parallel Architectures and Compilation Techniques, p.15, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01257236

R. Baghdadi, A. Cohen, T. Grosser, S. Verdoolaege, A. Lokhmotov et al., PENCIL Language Specification, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01154812

L. Bagnères, O. Zinenko, S. Huot, and C. Bastoul, Opening polyhedral compiler's black box, Proceedings of the 2016 International Symposium on Code Generation and Optimization, pp.128-138, 2016.

C. Bastoul, Code Generation in the Polyhedral Model is Easier than You Think, Proceedings of the Inernation Conference on Parallel Architectures and Compilation Techniques (PACT'04, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00017260

V. Basupalli, T. Yuki, S. Rajopadhye, A. Morvan, S. Derrien et al., Polyhedral Analysis for the OpenMP Programmer, Proceedings of the 7th International Conference on OpenMP in the Petascale Era, pp.37-53, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00752626

G. Baumgartner, A. Auer, D. E. Bernholdt, A. Bibireata, V. Choppella et al., Synthesis of high-performance parallel programs for a class of ab initio quantum chemistry models, Proceedings of the IEEE, vol.93, issue.2, pp.276-292, 2005.

U. Beaugnon, A. Kravets, S. Van-haastregt, R. Baghdadi, D. Tweed et al., VOBLA: A Vehicle for Optimized Basic Linear Algebra, SIGPLAN Notices, vol.49, pp.115-124, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01508181

M. Belwal and T. Sudarshan, Intermediate Representation for Heterogeneous Multi-Core: A Survey, VLSI Systems, Architecture, Technology and Applications, pp.1-6, 2015.

N. Benoit, L. , and S. , Extending GCC with a Multi-Grain Parallelism Adaptation Framework for MPSoCs, GCC for Research Opportunities Workshop, 2010.

N. Benoit, L. , and S. , Kimble: a Hierarchical Intermediate Representation for Multi-Grain Parallelism, Proceedings of the Workshop on Intermediate Representations, pp.21-28, 2011.

N. Benoit, L. , and S. , Using an Intermediate Representation to Map Workloads on Heterogeneous Parallel Systems, PDP'16, pp.811-819, 2016.

J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu et al., Theano: a CPU and GPU math expression compiler, Proceedings of the Python for Scientific Computing Conference (SciPy), 2010.

J. Bircsak, P. Craig, R. Crowell, Z. Cvetanovic, J. Harris et al., Extending OpenMP for NUMA Machines, Proceedings of the, 2000.

, ACM/IEEE Conference on Supercomputing, 2000.

U. Bondhugula, Compiling Affine Loop Nests for Distributed-memory Parallel Architectures, Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, vol.33, p.12, 2013.

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral program optimization system, ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2008.

U. Bondhugula, A. Hartono, J. Ramanujam, and P. Sadayappan, A practical automatic polyhedral program optimization system, ACM SIGPLAN Conference on Programming Language Design and Implementation, 2008.

F. Broqedis, J. Clet-ortega, S. Moreaud, N. Furmento, B. Goglin et al., hwloc: a Generic Framework for Managing Hardware Affinities in HPC Applications, PDP 2010 -The 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing, 2010.

F. Broqedis, N. Furmento, B. Goglin, P. Wacrenier, and R. Namyst, Forestgomp: An efficient openmp environment for numa architectures, International Journal of Parallel Programming, vol.38, pp.418-439, 2010.

K. J. Brown, A. K. Sujeeth, H. J. Lee, T. Rompf, H. Chafi et al., A Heterogeneous Parallel Framework for Domain-Specific Languages, Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, pp.89-100, 2011.

D. Bruening, S. Devabhaktuni, A. , and S. Softspec, Software-based speculative parallelism, 3rd ACM Workshop on Feedback-Directed and Dynamic Optimization, p.3, 1998.

V. Cavé, J. Zhao, J. Shirako, V. Sarkar, and . Habanero-java, Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, pp.51-61, 2011.

D. R. Chakrabarti and P. Banerjee, Static Single Assignment Form for MessagePassing Programs, Int. J. Parallel Program, vol.29, pp.139-184, 2001.

P. Chatarasi, J. Shirako, and V. Sarkar, Polyhedral Transformations of Explicitly Parallel Programs, Proceedings of the Fifth International Workshop on Polyhedral Compilation Techniques, p.15, 2015.

P. Chatarasi, J. Shirako, and V. Sarkar, Static Data Race Detection for SPMD Programs via an Extended Polyhedral Representation, Proceedings of the Sixth International Workshop on Polyhedral Compilation Techniques, p.16, 2016.

C. Chen, J. Chame, and M. Hall, Chill: A framework for composing high-level loop transformations, 2008.

T. Chen, Typesafe abstractions for tensor operations (short paper), Proceedings of the 8th ACM SIGPLAN International Symposium on Scala, pp.45-50, 2017.
DOI : 10.1145/3136000.3136001

URL : http://arxiv.org/pdf/1710.06892

T. Chen, T. Moreau, Z. Jiang, H. Shen, E. Q. Yan et al., TVM: end-to-end optimization stack for deep learning, 2018.

Y. Choi, Y. Lin, N. Chong, S. Mahlke, and T. Mudge, Stream Compilation for Real-Time Embedded Multicore Systems, Proceedings of the 7th Annual IEEE/ACM International Symposium on Code Generation and Optimization, pp.210-220, 2009.
DOI : 10.1109/cgo.2009.27

URL : http://cccp.eecs.umich.edu/papers/ychoi-cgo09.pdf

A. Cohen, A. Darte, and P. Feautrier, Static Analysis of OpenStream Programs, Proceedings of the Sixth International Workshop on Polyhedral Compilation Techniques, p.16, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01251845

A. Cohen, S. Girbal, and O. Temam, A Polyhedral Approach to Ease the Composition of Program Transformations, pp.292-303, 2004.
URL : https://hal.archives-ouvertes.fr/hal-01257301

A. Cohen, M. Sigler, S. Girbal, O. Temam, D. Parello et al., Facilitating the search for compositions of program transformations, Proceedings of the 19th Annual International Conference on Supercomputing, pp.151-160, 2005.
URL : https://hal.archives-ouvertes.fr/hal-01257296

J. Collard, Array SSA for Explicitly Parallel Programs, Proceedings of the 5th International Euro-Par Conference on Parallel Processing, pp.383-390, 1999.

R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, Efficiently Computing Static Single Assignment Form and the Control Dependence Graph, ACM Trans. Program. Lang. Syst, vol.13, pp.451-490, 1991.

A. Darte, A. Isoard, Y. , and T. , Liveness Analysis in Explicitly Parallel Programs, Proceedings of the Sixth International Workshop on Polyhedral Compilation Techniques, p.16, 2016.
URL : https://hal.archives-ouvertes.fr/hal-01251843

M. Dashti, A. Fedorova, J. Funston, F. Gaud, R. Lachaize et al., Traffic management: A holistic approach to memory placement on numa systems, Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, pp.381-394, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00945758

J. W. Davidson, J. , and S. , Memory access coalescing: A technique for eliminating redundant memory accesses, Proceedings of the ACM SIGPLAN 1994 Conference on Programming Language Design and Implementation, pp.186-195, 1994.

X. Deng, M. Mao, G. Tu, H. Zhang, and Y. Zhang, High-order and high accurate cfd methods and their applications for complex grid problems, Communications in Computational Physics, vol.11, pp.1081-1102, 2012.

S. Donadio, J. Brodman, T. Roeder, K. Yotov, D. Barthou et al., A Language for the Compact Representation of Multiple Program Versions, pp.136-151, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00141067

I. Fassi and P. Clauss, Xfor: Filling the gap between automatic loop optimization and peak performance, 14th International Symposium on Parallel and Distributed Computing, pp.100-109, 2015.
DOI : 10.1109/ispdc.2015.19

URL : https://hal.archives-ouvertes.fr/hal-01155144

P. Feautrier and C. Lengauer, Polyhedron Model. Springer US, pp.1581-1592, 2011.

J. Ferrante, K. J. Ottenstein, and J. D. Warren, The Program Dependence Graph and Its Use in Optimization, ACM Trans. Program. Lang. Syst, vol.9, issue.3, pp.319-349, 1987.
DOI : 10.1007/3-540-12925-1_33

URL : http://www.cs.utexas.edu/users/less/reading/spring00/ferrante.pdf

B. Goglin and N. Furmento, Enabling High-performance Memory Migration for Multithreaded Applications on LINUX, Proceedings of the 2009 IEEE International Symposium on Parallel&Distributed Processing, pp.1-9, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00358172

T. Grosser, H. Zheng, R. Aloor, A. Simbürger, A. Grösslinger et al., Polly -Polyhedral Optimization in LLVM, Proceedings of the Sixth International Workshop on Polyhedral Compilation Techniques, p.11, 2011.

T. Henretty, K. Stock, L. Pouchet, F. Franchetti, J. Ramanujam et al., Data layout transformation for stencil computations on short-vector simd architectures, Proceedings of the 20th International Conference on Compiler Construction: Part of the Joint European Conferences on Theory and Practice of Software, pp.225-245, 2011.
DOI : 10.1007/978-3-642-19861-8_13

URL : http://users.ece.cmu.edu/~franzf/papers/cc2011.pdf

L. Huang, H. Jin, L. Yi, and B. Chapman, Enabling Locality-aware Computations in, OpenMP. Sci. Program, vol.18, pp.169-181, 2010.

I. Huismann, J. Stiller, and J. Fröhlich, Fast Static Condensation for the Helmholtz Equation in a Spectral-Element Discretization, pp.371-380, 2016.

I. Huismann, J. Stiller, and J. Fröhlich, Factorizing the factorization -a spectralelement solver for elliptic equations with linear operation count, Journal of Computational Physics, vol.346, pp.437-448, 2017.

K. Z. Ibrahim, S. W. Williams, E. Epifanovsky, and A. I. Krylov, Analysis and tuning of libtensor framework on multicore architectures, 21st International Conference on High Performance Computing, HiPC, pp.1-10, 2014.

A. Jimborean, P. Clauss, J. Dollinger, V. Loechner, and J. M. Martinez-caamaño, Dynamic and speculative polyhedral parallelization using compiler-generated skeletons, Int. J. Parallel Program, vol.42, pp.529-545, 2014.
URL : https://hal.archives-ouvertes.fr/hal-00825738

H. Jordan, S. Pellegrini, P. Thoman, K. Kofler, and T. Fahringer, INSPIRE: The Insieme Parallel Intermediate Representation, Proceedings of the 22Nd International Conference on Parallel Architectures and Compilation Techniques, pp.7-18, 2013.

J. Caamano, A. Sukumaran-rajam, A. B. Clauss, and P. Apollo, Automatic speculative POLyhedral Loop Optimizer, Proceedings of the Sixth International Workshop on Polyhedral Compilation Techniques, IM-PACT '17
URL : https://hal.archives-ouvertes.fr/hal-01533692

K. Kennedy, A. , and J. R. , Optimizing Compilers for Modern Architectures: A Dependence-based Approach, 2002.

D. Khaldi, Automatic Resource-Constrained Static Task Parallelization, 2013.
URL : https://hal.archives-ouvertes.fr/pastel-00935483

D. Khaldi, P. Jouvelot, F. Irigoin, A. , and C. , SPIRE : A Methodology for Sequential to Parallel Intermediate Representation Extension, HiPEAC Computing Systems Week, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00823324

D. Khaldi, P. Jouvelot, F. Irigoin, C. Ancourt, and B. Chapman, LLVM Parallel Intermediate Representation: Design and Evaluation Using OpenSHMEM Communications, Proceedings of the Second Workshop on the LLVM Compiler Infrastructure in HPC, vol.2, pp.1-2, 2015.
URL : https://hal.archives-ouvertes.fr/hal-01254368

F. Kjolstad, S. Kamil, S. Chou, D. Lugato, A. et al., The tensor algebra compiler, Proc. ACM Program. Lang, vol.1, p.29, 2017.

A. Klöckner and . Loo, py: Transformation-based code generation for gpus and cpus, Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, vol.82, p.87, 2014.

P. M. Knijnenburg, T. Kisuki, and M. F. Boyle, Embedded processor design challenges, pp.171-187, 2002.

K. Knobe and V. Sarkar, Array SSA Form and Its Use in Parallelization, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp.107-120, 1998.

W. Landi and B. G. Ryder, A safe approximate algorithm for interprocedural aliasing, Proceedings of the ACM SIGPLAN 1992 Conference on Programming Language Design and Implementation, pp.235-248, 1992.

J. Lee, D. A. Padua, and S. P. Midkiff, Basic Compiler Algorithms for Parallel Programs, Proceedings of the Seventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.1-12, 1999.

Y. Lin, Static Nonconcurrency Analysis of OpenMP Programs, OpenMP Shared Memory Parallel Programming, pp.36-50, 2008.

W. Liu, J. Tuck, L. Ceze, W. Ahn, K. Strauss et al., Posh: A tls compiler that exploits program structure, Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.158-167, 2006.

V. Loechner, PolyLib: A library for manipulating parameterized polyhedra, 1999.

F. Luporini, A. L. Varbanescu, F. Rathgeber, G. Bercea, J. Ramanujam et al., COFFEE: an optimizing compiler for finite element local assembly, 2014.

Z. Majo and T. R. Gross, Matching Memory Access Patterns and Data Placement for NUMA Systems, Proceedings of the Tenth International Symposium on Code Generation and Optimization, pp.230-241, 2012.

Z. Majo and T. R. Gross, A Library for Portable and Composable Data Locality Optimizations for NUMA Systems, Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.227-238, 2015.

B. Meister, N. Vasilache, D. Wohlford, M. M. Baskaran, A. Leung et al., , pp.1756-1765, 2011.

C. Miranda and . Erbium, Reconciling Languages, Runtimes, Compilation and Optimizations for Streaming Applications, 2013.
URL : https://hal.archives-ouvertes.fr/tel-00840333

C. Miranda, A. Pop, P. Dumont, A. Cohen, and M. Duranton, Erbium: A Deterministic, Concurrent Intermediate Representation to Map Data-flow Tasks to Scalable, Persistent Streaming Processes, Proceedings of the 2010 International Conference on Compilers, Architectures and Synthesis for Embedded Systems, pp.11-20, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00551510

A. Muddukrishna, P. A. Jonsson, and M. Brorsson, Locality-Aware Task Scheduling and Data Distribution for OpenMP Programs on NUMA Systems and Manycore Processors, Scientific Programming, p.2015, 2015.

R. Müller-pfefferkorn, W. E. Nagel, and B. Trenkler, Optimizing cache access: A tool for source-to-source transformations and real-life compiler tests, Euro-Par 2004 Parallel Processing, pp.72-81, 2004.

P. Nandy, M. Hall, E. C. Davis, C. Olschanowsky, M. S. Mohammadi et al., Abstractions for Specifying Sparse Matrix Data Transformations, Proceedings of the Eighth International Workshop on Polyhedral Compilation Techniques, 2018.

G. C. Necula, S. Mcpeak, S. P. Rahul, and W. Weimer, Cil: Intermediate language and tools for analysis and transformation of c programs, Proceedings of the 11th International Conference on Compiler Construction, pp.213-228, 2002.

D. Novillo, R. C. Unrau, and J. Schaeffer, Concurrent SSA Form in the Presence of Mutual Exclusion, Proceedings of the 1998 International Conference on Parallel Processing, p.356, 1998.

S. Pai, R. Govindarajan, and M. J. Thazhuthaveetil, PLASMA: Portable Programming for SIMD Heterogeneous Accelerators, Proceedings of the Workshop on Language, Compiler, and Architecture Support for GPGPU, 2010.

S. Pellegrini, On Simplifying and Optimizing Message Passing Programs: a Compiler and Runtime-Based Approach, 2011.

S. Pellegrini, T. Hoefler, and T. Fahringer, Exact dependence analysis for increased communication overlap, Proceedings of the 19th European Conference on Recent Advances in the Message Passing Interface, pp.89-99, 2012.

M. Pérache, H. Jourdren, and R. Namyst, MPC: A Unified Parallel Runtime for Clusters of NUMA Machines, Proceedings of the 14th International Euro-Par Conference on Parallel Processing, pp.78-88, 2008.

A. Plesco, Program Transformations and Memory Architecture Optimizations for HighLevel Synthesis of Hardware Accelerators, 2010.
URL : https://hal.archives-ouvertes.fr/tel-00544349

A. Pop and A. Cohen, Preserving High-Level Semantics of Parallel Programming Annotations Through the Compilation Flow of Optimizing Compilers, Proceedings of the 15th Workshop on Compilers for Parallel Computers, p.10, 2010.
URL : https://hal.archives-ouvertes.fr/inria-00551518

A. Pop, A. Cohen, and . Openstream, Expressiveness and Data-flow Compilation of OpenMP Streaming Programs, ACM Transactions on Architecture and Code Optimization (TACO), vol.9, p.25, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00710409

S. Pop, A. Cohen, C. Bastoul, S. Girbal, G. Silber et al., GRAPHITE: Polyhedral Analyses and Optimizations for GCC, Proceedings of the 2006 GCC Developers Summit, 2006.

C. Pousa-ribeiro, M. Castro, J. Méhaut, and A. Carissimi, High Performance Computing for Computational Science -VECPAR, Improving Memory Affinity of Geophysics Applications on NUMA Platforms Using Minas, pp.279-292, 2010.

B. Pradelle, B. Meister, M. Baskaran, J. Springer, and R. Lethin, Polyhedral optimization of tensorflow computation graphs, Proceedings of the 6th Workshop on Extreme-scale Programming Tools at The International Conference for High Performance Computing, Networking, Storage and Analysis, 2017.

Z. Qu, Static Condensation, pp.47-70, 2004.

J. Ragan-kelley, C. Barnes, A. Adams, S. Paris, F. Durand et al., Halide: A language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines, Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation, pp.519-530, 2013.

N. A. Rink, Modeling of languages for tensor manipulation, 2018.

N. A. Rink, I. Huismann, A. Susungi, J. Castrillon, J. Stiller et al., Cfdlang: High-level code generation for high-order methods in fluid dynamics, Proceedings of the Real World Domain Specific Languages Workshop, vol.5, pp.1-5, 2018.
URL : https://hal.archives-ouvertes.fr/hal-01857925

G. Rudy, M. M. Khan, M. Hall, C. Chen, C. et al., A Programming Language Interface to Describe Transformations and Code Generation, pp.136-150, 2011.

S. Rus, G. He, C. Alias, and L. Rauchwerger, Region Array SSA, Proceedings of the 15th International Conference on Parallel Architectures and Compilation Techniques, pp.43-52, 2006.

V. Sarkar, Analysis and optimization of explicitly parallel programs using the parallel program graph representation, Proceedings of the 10th International Workshop on Languages and Compilers for Parallel Computing, pp.94-113, 1998.

V. Sarkar and B. Simons, Parallel Program Graphs and Their Classification, Proceedings of the 6th International Workshop on Languages and Compilers for Parallel Computing, pp.633-655, 1994.

T. B. Schardl, W. S. Moses, C. E. Leiserson, and . Tapir, Embedding fork-join parallelism into llvm's intermediate representation, Proceedings of the 22Nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.249-265, 2017.

J. Shirako, J. M. Zhao, V. K. Nandivada, and V. N. Sarkar, Chunking Parallel Loops in the Presence of Synchronization, Proceedings of the 23rd International Conference on Supercomputing, pp.181-192, 2009.

D. R. Shires and L. Pollock, Program Flow Graph Construction for Static Analysis of Explicitly Parallel Message-Passing Programs, Army Research Laboratory, 2000.

D. G. Spampinato, D. Fabregat-traver, P. Bientinesi, and M. Püschel, Program generation for small-scale linear algebra applications, Proceedings of the 2018 International Symposium on Code Generation and Optimization, pp.327-339, 2018.

D. G. Spampinato and M. Püschel, A basic linear algebra compiler, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, vol.23, p.32, 2014.

D. G. Spampinato and M. Püschel, A basic linear algebra compiler for structured matrices, International Symposium on Code Generation and Optimization (CGO, pp.117-127, 2016.

P. Springer, A. Sankaran, and P. Bientinesi, TTC: A tensor transposition compiler for multiple architectures, 2016.

H. Srinivasan, J. Hook, and M. Wolfe, Static Single Assignment for Explicitly Parallel Programs, Proceedings of the 20th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp.260-272, 1993.

H. Srinivasan and M. Wolfe, Analyzing Programs with Explicit Parallelism, Languages and Compilers for Parallel Computing, vol.589, pp.405-419, 1992.

J. Stanier and D. Watson, Intermediate Representations in Imperative Compilers: A Survey, ACM Computing Surveys, vol.45, issue.3, p.27, 2013.

M. Steuwer, T. Remmelg, and C. Dubach, Lift: A functional data-parallel ir for high-performance gpu code generation, Proceedings of the 2017 International Symposium on Code Generation and Optimization, pp.74-85, 2017.

E. Stoltz, M. P. Gerlek, and M. Wolfe, Extended ssa with factored use-def chains to support optimization and parallelism, Proceedings of the Twenty-Seventh Hawaii International Conference on, vol.2, pp.43-52, 1994.

M. M. Strout, B. Kreaseck, and P. D. Hovland, Data-flow Analysis for MPI Programs, Internationl Conference on Parallel Processing, pp.175-184, 2006.

A. Susungi, A. Cohen, and C. Tadonki, More data locality for static control programs on numa architectures, Proceedings of the 7th International Workshop on Polyhedral Compilation Techniques, p.17, 2017.
URL : https://hal.archives-ouvertes.fr/hal-01529354

P. Thoman, Insieme-RS: A Compiler-supported Parallel Runtime System, 2013.

M. Valiev, E. Bylaska, N. Govind, K. Kowalski, T. Straatsma et al., Nwchem: A comprehensive and scalable open-source solution for large scale molecular simulations, Computer Physics Communications, vol.181, pp.1477-1489, 2010.

N. Vasilache, O. Zinenko, T. Theodoridis, P. Goyal, Z. Devito et al., Tensor comprehensions: Frameworkagnostic high-performance machine learning abstractions, 2018.

S. Verdoolaege and T. Grosser, Polyhedral Extraction Tool, Proceedings of the Sixth International Workshop on Polyhedral Compilation Techniques, p.12, 2012.

S. Verdooleage, An Integer Set Library for the Polyhedral Model, Mathematical Software (ICMS'10), vol.6327, pp.299-302, 2010.

S. Verdooleage, Counting Affine Calculator and Applications, Proceedings of the Sixth International Workshop on Polyhedral Compilation Techniques, p.11, 2011.

S. Verdooleage, J. C. Juega, A. Cohen, J. I. Gómez, C. Tenllado et al., Polyhedral Parallel Code Generation for CUDA, ACM Trans. Archit. Code Optim, vol.9, p.23, 2013.

J. Xue, Loop Tiling for Parallelism, 2000.

Q. Yi, K. Seymour, H. You, R. Vuduc, Q. et al., Poet: Parameterized optimizations for empirical tuning, IEEE International Parallel and Distributed Processing Symposium, pp.1-8, 2007.

T. Yuki, P. Feautrier, S. Rajopadhye, and V. Saraswat, Array Dataflow Analysis for Polyhedral X10 Programs, Proceedings of the 18th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.23-34, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00761537

J. Zhao and V. Sarkar, Intermediate Language Extensions for Parallelism, Proceedings of the Compilation of the Co-located Workshops on DSM'11, TMC'11, AGERE! 2011, AOOPES'11, NEAT'11, & VMIL'11, pp.329-340, 2011.

J. Zory and F. Coelho, Using algebraic transformations to optimize expression evaluation in scientific code, Proceedings of the 1998 International Conference on Parallel Architectures and Compilation Techniques, p.376, 1998.