. Faust, an Audio Signal Processing Language

. Streamit, Programming Language and a Compilation Infrastructure

B. Ackland, A. Anesko, D. Brinthaupt, S. Daubert, A. Kalavade et al., A single-chip, 1.6-billion, 16-b mac/s multiprocessor dsp. Solid-State Circuits, IEEE Journal, issue.3, pp.35412-424, 2000.

T. L. Adam, K. M. Chandy, and J. R. Dickson, A comparison of list schedules for parallel processing systems, Communications of the ACM, vol.17, issue.12, pp.685-690, 1974.
DOI : 10.1145/361604.361619

S. V. Adve and K. Gharachorloo, Shared memory consistency models: a tutorial, Computer, vol.29, issue.12, pp.66-76, 1996.
DOI : 10.1109/2.546611

S. Alam, R. Barrett, J. Kuehn, and S. Poole, Performance Characterization of a Hierarchical MPI Implementation on Large-scale Distributed-memory Platforms, 2009 International Conference on Parallel Processing, pp.132-139, 2009.
DOI : 10.1109/ICPP.2009.51

E. Allen, D. Chase, J. Hallett, V. Luchangco, J. Maessen et al., The Fortress Language Specification, 2007.

R. Allen and K. Kennedy, Automatic translation of FORTRAN programs to vector form, ACM Transactions on Programming Languages and Systems, vol.9, issue.4, pp.491-542, 1987.
DOI : 10.1145/29873.29875

S. P. Amarasinghe and M. S. Lam, Communication Optimization and Code Generation for Distributed Memory Machines, Proceedings of the ACM SIGPLAN 1993 conference on Programming language design and implementation, PLDI '93, pp.126-138, 1993.

M. Amini, F. Coelho, F. Irigoin, and R. Keryell, Static Compilation Analysis for Host-Accelerator Communication Optimization, 24th Int. Workshop on Languages and Compilers for Parallel Computing (LCPC), Fort Collins, 2011.
DOI : 10.1007/978-3-642-36036-7_16

URL : https://hal.archives-ouvertes.fr/hal-00743496

C. Ancourt, F. Coelho, and F. Irigoin, A Modular Static Analysis Approach to Affine Loop Invariants Detection, Electronic Notes in Theoretical Computer Science, vol.267, issue.1, pp.3-16, 2010.
DOI : 10.1016/j.entcs.2010.09.002

URL : https://hal.archives-ouvertes.fr/hal-00586338

C. Ancourt, B. Creusillet, F. Coelho, F. Irigoin, P. Jouvelot et al., PIPS: a Workbench for Interprocedural Program Analyses and Parallelization, Meeting on data parallel languages and compilers for portable parallel computing, 1994.

C. Ancourt and F. Irigoin, Scanning Polyhedra with DO Loops, Proceedings of the third ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPOPP '91, pp.39-50, 1991.
DOI : 10.1145/109625.109631

URL : https://hal.archives-ouvertes.fr/hal-00752774

E. Ayguadé, R. M. Badia, P. Bellens, D. Cabrera, A. Duran et al., Extending OpenMP to Survive the Heterogeneous Multi-Core Era, International Journal of Parallel Programming, vol.41, issue.1, pp.440-459, 2010.
DOI : 10.1007/s10766-010-0135-4

H. Bao, J. Bielak, O. Ghattas, L. F. Kallivokas, D. R. O-'hallaron et al., Large-scale simulation of elastic wave propagation in heterogeneous media on parallel computers, Computer Methods in Applied Mechanics and Engineering, vol.152, issue.1-2, pp.85-102, 1998.
DOI : 10.1016/S0045-7825(97)00183-7

A. Basumallik, S. Min, and R. Eigenmann, Programming Distributed Memory Sytems Using OpenMP, 2007 IEEE International Parallel and Distributed Processing Symposium, pp.1-8, 2007.
DOI : 10.1109/IPDPS.2007.370397

N. Benoit and S. Louise, Kimble: a Hierarchical Intermediate Representation for Multi-Grain Parallelism, Proceedings of the Workshop on Intermediate Representations, pp.21-28, 2011.

G. Blake, R. Dreslinski, and T. Mudge, A survey of multicore processors, IEEE Signal Processing Magazine, vol.26, issue.6, pp.26-37, 2009.
DOI : 10.1109/MSP.2009.934110

R. D. Blumofe, C. F. Joerg, B. C. Kuszmaul, C. E. Leiserson, K. H. Randall et al., Cilk: An Efficient Multithreaded Runtime System, In Journal of Parallel and Distributed Computing, pp.207-216, 1995.
DOI : 10.1006/jpdc.1996.0107

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.18.3175

U. Bondhugula, Automatic Distributed Memory Code Generation using the Polyhedral Framework, Indian Institute of Science, 2011.

C. T. Brown, L. S. Liebovitch, and R. Glendon, L??vy Flights in Dobe Ju/???hoansi Foraging Patterns, Human Ecology, vol.15, issue.3, pp.129-138, 2007.
DOI : 10.1007/s10745-006-9083-4

S. Campanoni, T. Jones, G. Holloway, V. J. Reddi, G. Wei et al., HELIX, Proceedings of the Tenth International Symposium on Code Generation and Optimization, CHO '12, pp.84-93, 2012.
DOI : 10.1145/2259016.2259028

V. Cavé, J. Zhao, and V. Sarkar, Habanero-Java, Proceedings of the 9th International Conference on Principles and Practice of Programming in Java, PPPJ '11, 2011.
DOI : 10.1145/2093157.2093165

Y. Choi, Y. Lin, N. Chong, S. Mahlke, and T. Mudge, Stream Compilation for Real-Time Embedded Multicore Systems, 2009 International Symposium on Code Generation and Optimization, pp.210-220, 2009.
DOI : 10.1109/CGO.2009.27

B. Cirou and E. Jeannot, Triplet: A clustering scheduling algorithm for heterogeneous systems, Proceedings International Conference on Parallel Processing Workshops, pp.231-236, 2001.
DOI : 10.1109/ICPPW.2001.951956

URL : https://hal.archives-ouvertes.fr/inria-00100488

F. Coelho, P. Jouvelot, C. Ancourt, and F. Irigoin, Data and Process Abstraction in PIPS Internal Representation, Proceedings of the Workshop on Intermediate Representations, pp.77-84, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00744291

B. Creusillet and . Paristech, Analyses de Régions de Tableaux et Applications, 1996.

B. Creusillet and F. Irigoin, Interprocedural Array Region Analyses, International Journal of Parallel Programming, vol.2, issue.3, pp.513-546, 1996.
DOI : 10.1007/BF03356758

URL : https://hal.archives-ouvertes.fr/hal-00752611

E. Cuevas, A. Garcia, F. J. Fernandez, R. J. Gadea, and J. Cordon, Importance of Simulations for Nuclear and Aeronautical Inspections with Ultrasonic and Eddy Current Testing, Simulation in NDT, 2010.

R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, Efficiently computing static single assignment form and the control dependence graph, ACM Transactions on Programming Languages and Systems, vol.13, issue.4, pp.451-490, 1991.
DOI : 10.1145/115372.115320

M. I. Daoud and N. N. Kharma, GATS 1.0, Proceedings of the 2005 conference on Genetic and evolutionary computation , GECCO '05, pp.2209-2210, 2005.
DOI : 10.1145/1068009.1068378

J. B. Dennis, G. R. Gao, and K. W. Todd, Modeling the Weather with a Data Flow Supercomputer, IEEE Transactions on Computers, vol.33, issue.7, pp.592-603, 1984.
DOI : 10.1109/TC.1984.5009332

E. W. Dijkstra, M. Van-lamsweerde, and . Sintzoff, Formal Derivation of Strongly Correct Parallel Programs, 1977.

E. Ehrhart, Polynômes arithmétiques et méthode de polyèdres en combinatoire, International Series of Numerical Mathematics, p.35, 1977.

P. Feautrier, Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.24, issue.4, 1991.
DOI : 10.1007/BF01407931

J. Ferrante, K. J. Ottenstein, and J. D. Warren, The program dependence graph and its use in optimization, ACM Transactions on Programming Languages and Systems, vol.9, issue.3, pp.319-349, 1987.
DOI : 10.1145/24039.24041

M. J. Flynn, Some Computer Organizations and Their Effectiveness, IEEE Transactions on Computers, vol.21, issue.9, pp.948-960, 1972.
DOI : 10.1109/TC.1972.5009071

M. R. Garey and D. S. Johnson, Computers and Intractability: A Guide to the Theory of NP-Completeness, 1990.

A. Gerasoulis, S. Venugopal, and T. Yang, Clustering task graphs for message passing architectures, ACM SIGARCH Computer Architecture News, vol.18, issue.3, pp.447-456, 1990.
DOI : 10.1145/255129.255188

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.49.1744

M. Girkar and C. D. Polychronopoulos, Automatic extraction of functional parallelism from ordinary programs, IEEE Transactions on Parallel and Distributed Systems, vol.3, issue.2, pp.166-178, 1992.
DOI : 10.1109/71.127258

G. Goumas, N. Drosinos, M. Athanasaki, and N. Koziris, Message-passing code generation for non-rectangular tiling transformations, Parallel Computing, vol.32, issue.10, 2006.
DOI : 10.1016/j.parco.2006.07.003

. Graphviz, Graph Visualization Software

L. Griffiths, A simple adaptive algorithm for real-time processing in antenna arrays, Proceedings of the IEEE, vol.57, issue.10, pp.1696-1704, 1969.
DOI : 10.1109/PROC.1969.7385

R. Habel, F. Silber-chaussumier, and F. Irigoin, Generating Efficient Parallel Programs for Distributed Memory Systems, 2013.

C. Harris and M. Stephens, A Combined Corner and Edge Detector, Procedings of the Alvey Vision Conference 1988, pp.147-151, 1988.
DOI : 10.5244/C.2.23

M. J. Harrold, B. Malloy, and G. , Efficient construction of program dependence graphs, ACM SIGSOFT Software Engineering Notes, vol.18, issue.3, pp.160-170, 1993.
DOI : 10.1145/174146.154268

J. Howard, S. Dighe, Y. Hoskote, S. Vangal, D. Finan et al., A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS, 2010 IEEE International Solid-State Circuits Conference, (ISSCC), pp.108-109, 2010.
DOI : 10.1109/ISSCC.2010.5434077

A. Hurson, J. T. Lim, K. M. Kavi, and B. Lee, Parallelization of DOALL and DOACROSS Loops???a Survey, Emphasizing Parallel Programming Techniques, pp.53-103, 1997.
DOI : 10.1016/S0065-2458(08)60706-8

. Insieme, Insieme -an Optimization System for OpenMP, MPI and OpenCL Programs, 2011.

F. Irigoin, P. Jouvelot, and R. Triolet, Semantical Interprocedural Parallelization: An Overview of the PIPS Project, ICS, pp.244-251, 1991.
URL : https://hal.archives-ouvertes.fr/hal-00984684

K. Jainandunsing, Optimal Partitioning Scheme for Wavefront/Systolic Array Processors, Proceedings of IEEE Symposium on Circuits and Systems, pp.940-943, 1986.

P. Jouvelot and R. Triolet, Newgen: A Language Independent Program Generator, 1989.

G. Karypis and V. Kumar, A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs, SIAM Journal on Scientific Computing, vol.20, issue.1, 1998.
DOI : 10.1137/S1064827595287997

H. Kasahara, H. Honda, A. Mogi, A. Ogura, K. Fujiwara et al., A Multi-Grain Parallelizing Compilation Scheme for OS- CAR (Optimally Scheduled Advanced Multiprocessor), Proceedings of the Fourth International Workshop on Languages and Compilers for Parallel Computing, pp.283-297, 1992.

P. Kenyon, P. Agrawal, and S. Seth, High-level microprogramming: an optimizing C compiler for a processing element of a CAD accelerator, [1990] Proceedings of the 23rd Annual Workshop and Symposium@m_MICRO 23: Microprogramming and Microarchitecture, pp.97-106, 1990.
DOI : 10.1109/MICRO.1990.151431

D. Khaldi, P. Jouvelot, and C. Ancourt, Parallelizing with BDSC, a resource-constrained scheduling algorithm for shared and distributed memory systems, MINES ParisTech, 2012.
DOI : 10.1016/j.parco.2014.11.004

URL : https://hal.archives-ouvertes.fr/hal-01097328

D. Khaldi, P. Jouvelot, C. Ancourt, and F. Irigoin, Task Parallelism and Data Distribution: An Overview of Explicit Parallel Programming Languages, Lecture Notes in Computer Science, vol.7760, pp.174-189, 2012.
DOI : 10.1007/978-3-642-37658-0_12

URL : https://hal.archives-ouvertes.fr/hal-00742536

D. Khaldi, P. Jouvelot, F. Irigoin, and C. Ancourt, SPIRE: A Methodology for Sequential to Parallel Intermediate Representation Extension, Proceedings of the 17th Workshop on Compilers for Parallel Computing, CPC'13, 2013.
URL : https://hal.archives-ouvertes.fr/hal-00823324

M. A. Khan, Scheduling for heterogeneous Systems using constrained critical paths, Parallel Computing, vol.38, issue.4-5, pp.175-193, 2012.
DOI : 10.1016/j.parco.2012.01.001

B. Kruatrachue and T. G. Lewis, Duplication Scheduling Heuristics (DSH): A New Precedence Task Scheduler for Parallel Processor Systems, 1987.

S. Kumar, D. Kim, M. Smelyanskiy, Y. Chen, J. Chhugani et al., Atomic Vector Operations on Chip Multiprocessors, SIGARCH Comput. Archit. News, issue.3, pp.36441-452, 2008.

Y. Kwok and I. Ahmad, Benchmarking the task graph scheduling algorithms, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing, pp.531-537, 1998.
DOI : 10.1109/IPPS.1998.669967

Y. Kwok and I. Ahmad, Static scheduling algorithms for allocating directed task graphs to multiprocessors, ACM Computing Surveys, vol.31, issue.4, pp.406-471, 1999.
DOI : 10.1145/344588.344618

J. Larus and C. Kozyrakis, Transactional memory, Communications of the ACM, vol.51, issue.7, pp.80-88, 2008.
DOI : 10.1145/1364782.1364800

S. Lee, S. Min, and R. Eigenmann, OpenMP to GPGPU: a Compiler Framework for Automatic Translation and Optimization, Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of parallel programming, PPoPP '09, pp.101-110, 2009.

V. Maisonneuve, Convex Invariant Refinement by Control Node Splitting: a Heuristic Approach, Electronic Notes in Theoretical Computer Science, vol.288, pp.49-59, 2012.
DOI : 10.1016/j.entcs.2012.10.007

URL : https://hal.archives-ouvertes.fr/hal-00833344

J. Merrill, GENERIC and GIMPLE: a New Tree Representation for Entire Functions, GCC developers summit 2003, pp.171-180, 2003.

D. Millot, A. Muller, C. Parrot, and F. Silber-chaussumier, STEP: A Distributed OpenMP for Coarse-Grain Parallelism Tool, Proceedings of the 4th international conference on OpenMP in a new era of parallelism, IWOMP'08, pp.83-99, 2008.
DOI : 10.1007/978-3-540-79561-2_8

URL : https://hal.archives-ouvertes.fr/hal-01373120

D. I. Moldovan and J. A. Fortes, Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays, IEEE Transactions on Computers, vol.35, issue.1, pp.1-12, 1986.
DOI : 10.1109/TC.1986.1676652

A. Moody, D. Ahn, and B. Supinski, Exascale Algorithms for Generalized MPI_Comm_split, Recent Advances in the Message Passing Interface, pp.9-18, 2011.
DOI : 10.1007/978-3-642-24449-0_4

G. Moore, Cramming More Components Onto Integrated Circuits, Proceedings of the IEEE, vol.86, issue.1, pp.82-85, 1998.
DOI : 10.1109/JPROC.1998.658762

V. K. Nandivada, J. Shirako, J. Zhao, and V. Sarkar, A Transformation Framework for Optimizing Task-Parallel Programs, ACM Transactions on Programming Languages and Systems, vol.35, issue.1, pp.1-3, 2013.
DOI : 10.1145/2450136.2450138

C. J. Newburn and J. P. Shen, Automatic partitioning of signal processing programs for symmetric multiprocessors, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique, pp.269-280, 1996.
DOI : 10.1109/PACT.1996.552675

D. Novillo, OpenMP and Automatic Parallelization in GCC, the Proceedings of the GCC Developers Summit, 2006.

Y. Orlarey, D. Fober, and S. Letz, Adding Automatic Parallelization to Faust, Linux Audio Conference, 2009.

D. A. Padua and M. J. Wolfe, Advanced compiler optimizations for supercomputers, Communications of the ACM, vol.29, issue.12, pp.1184-1201, 1986.
DOI : 10.1145/7902.7904

S. Pai, R. Govindarajan, and M. J. , PLASMA: Portable Programming for SIMD Heterogeneous Accelerators, BIBLIOGRAPHY Workshop on Language, Compiler, and Architecture Support for GPGPU. [92] PolyLib. A Library of Polyhedral Functions, 2010.

A. Pop and A. Cohen, A stream-computing extension to OpenMP, Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, HiPEAC '11, pp.5-14, 2011.
DOI : 10.1145/1944862.1944867

URL : https://hal.archives-ouvertes.fr/hal-00659411

A. Pop and A. Cohen, OpenStream, ACM Transactions on Architecture and Code Optimization, vol.9, issue.4, pp.1-5325, 2013.
DOI : 10.1145/2400682.2400712

URL : https://hal.archives-ouvertes.fr/hal-00786675

T. Saidani, Optimisation multi-niveau d'une application de traitement d'images sur machinesparalì eles, 2012.

T. Saidani, L. Lacassagne, J. Falcou, C. Tadonki, and S. Bouaziz, Parallelization Schemes for Memory Optimization on the Cell Processor: A Case Study on the Harris Corner Detector, Transactions on High-Performance Embedded Architectures and Compilers III, pp.177-200, 2011.
DOI : 10.1007/s10766-007-0034-5

URL : https://hal.archives-ouvertes.fr/hal-00753708

V. Sarkar, Synchronization using counting semaphores, Proceedings of the 2nd international conference on Supercomputing , ICS '88, pp.627-637, 1988.
DOI : 10.1145/55364.55426

V. Sarkar, Partitioning and Scheduling Parallel Programs for Multiprocessors, 1989.

V. Sarkar, COMP 322: Principles of Parallel Programming, 2009.

V. Sarkar and B. Simons, Parallel Program Graphs and their classification, Lecture Notes in Computer Science, vol.768, pp.633-655, 1993.
DOI : 10.1007/3-540-57659-2_36

J. Shirako, D. M. Peixotto, V. Sarkar, and W. N. Scherer, Phasers, Proceedings of the 22nd annual international conference on Supercomputing , ICS '08, pp.277-288, 2008.
DOI : 10.1145/1375527.1375568

M. Solar, A Scheduling Algorithm to Optimize Parallel Processes, 2008 International Conference of the Chilean Computer Science Society, pp.73-78, 2008.
DOI : 10.1109/SCCC.2008.8

M. Solar and M. Inostroza, A scheduling algorithm to optimize real-world applications, 24th International Conference on Distributed Computing Systems Workshops, 2004. Proceedings., pp.858-862, 2004.
DOI : 10.1109/ICDCSW.2004.1284133

H. Topcuouglu, S. Hariri, and M. Wu, Performance-effective and low-complexity task scheduling for heterogeneous computing, IEEE Transactions on Parallel and Distributed Systems, vol.13, issue.3, pp.260-274, 2002.
DOI : 10.1109/71.993206

S. Verdoolaege, A. Cohen, and A. Beletska, Transitive Closures of Affine Integer Tuple Relations and Their Overapproximations, Proceedings of the 18th International Conference on Static Analysis, pp.216-232, 2011.
DOI : 10.1007/978-3-642-02658-4_44

URL : https://hal.archives-ouvertes.fr/hal-00645221

M. Wu and D. D. Gajski, Hypertool: a programming aid for message-passing systems, IEEE Transactions on Parallel and Distributed Systems, vol.1, issue.3, pp.330-343, 1990.
DOI : 10.1109/71.80160

T. Yang and A. Gerasoulis, PYRROS: Static Task Scheduling and Code Generation for Message Passing Multiprocessors, Proceedings of the 6th International Conference on Supercomputing, ICS '92, pp.428-437, 1992.

T. Yang and A. Gerasoulis, DSC: scheduling parallel tasks on an unbounded number of processors, IEEE Transactions on Parallel and Distributed Systems, vol.5, issue.9, pp.951-967, 1994.
DOI : 10.1109/71.308533

X. Yang and S. Deb, Cuckoo Search via Levy Flights, Nature Biologically Inspired Computing NaBIC 2009. World Congress on, pp.210-214, 2009.
DOI : 10.1109/nabic.2009.5393690

K. Yelick, D. Bonachea, W. Chen, P. Colella, K. Datta et al., Productivity BIBLIOGRAPHY and Performance Using Partitioned Global Address Space Languages, Proceedings of the 2007 International Workshop on Parallel Symbolic Computation, PASCO '07, pp.24-32, 2007.
DOI : 10.1145/1278177.1278183

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.126.6770

J. Zhao and V. Sarkar, Intermediate language extensions for parallelism, Proceedings of the compilation of the co-located workshops on DSM'11, TMC'11, AGERE!'11, AOOPES'11, NEAT'11, & VMIL'11, SPLASH '11 Workshops, pp.329-340, 2011.
DOI : 10.1145/2095050.2095103