F. Li, A. Pop, and A. Cohen, Advances in parallel-stage decoupled software pipelining, Proceedings of the Workshop on Intermediate Representations, pp.29-36, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00870687

C. Miranda, P. Dumont, A. Cohen, M. Duranton, and A. Pop, ERBIUM, Proceedings of the 7th ACM international conference on Computing frontiers, CF '10, pp.119-120, 2010.
DOI : 10.1145/1787275.1787312

URL : https://hal.archives-ouvertes.fr/inria-00551510

C. Miranda, A. Pop, P. Dumont, A. Cohen, and M. Duranton, Erbium, Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems, CASES '10, pp.11-20, 2010.
DOI : 10.1145/1878921.1878924

URL : https://hal.archives-ouvertes.fr/inria-00551510

A. Pop and A. Cohen, A Stream-Comptuting Extension to OpenMP, International Workshop on OpenMP (IWOMP'10), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00551507

A. Pop and A. Cohen, Preserving high-level semantics of parallel programming annotations through the compilation flow of optimizing compilers, Proceedings of the 15th Workshop on Compilers for Parallel Computers (CPC'10), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00551518

A. Pop and A. Cohen, A stream-computing extension to OpenMP, Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, HiPEAC '11, pp.5-14, 2011.
DOI : 10.1145/1944862.1944867

URL : https://hal.archives-ouvertes.fr/hal-00659411

A. Pop, S. Pop, H. Jagasia, J. Sjödin, and P. H. Kelly, Improving gnu compiler collection infrastructure for streamization, Proceedings of the 2008 GCC Developers' Summit, pp.77-86, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00817445

A. Pop, S. Pop, and J. Sjödin, Automatic streamization in gcc, Proceedings of the 2009 GCC Developers' Summit, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00817455

J. Sjödin, S. Pop, H. Jagasia, T. Grosser, and A. Pop, Design of graphite and the polyhedral compilation package, Proceedings of the 2009 GCC Developers' Summit, pp.33-45, 2009.

S. V. Adve and K. Gharachorloo, Shared memory consistency models: a tutorial, Computer, vol.29, issue.12, pp.66-76, 1996.
DOI : 10.1109/2.546611

M. Aldinucci, M. Torquati, and M. Meneghin, FastFlow: Efficient Parallel Streaming Applications on Multi-core. CoRR, abs/0909, 1187.

J. Alglave, L. Maranget, S. Sarkar, and P. Sewell, Fences in Weak Memory Models, Computer Aided Verification, pp.258-272, 2010.
DOI : 10.1007/978-3-642-14295-6_25

URL : https://hal.archives-ouvertes.fr/hal-01100859

R. S. Arvind, K. Nikhil, and . Pingali, I-structures: data structures for parallel computing, ACM Transactions on Programming Languages and Systems, vol.11, issue.4, pp.598-632, 1989.
DOI : 10.1145/69558.69562

E. A. Ashcroft and W. W. Wadge, Lucid, a nonprocedural language with iteration, Communications of the ACM, vol.20, issue.7, pp.519-526, 1977.
DOI : 10.1145/359636.359715

H. Attiya, R. Guerraoui, D. Hendler, P. Kuznetsov, M. M. Michael et al., Laws of order, ACM SIGPLAN Notices, vol.46, issue.1, pp.487-498, 2011.
DOI : 10.1145/1925844.1926442

D. Barthou, J. Collard, and P. Feautrier, Fuzzy Array Dataflow Analysis, Journal of Parallel and Distributed Computing, vol.40, issue.2, pp.210-226, 1997.
DOI : 10.1006/jpdc.1996.1261

URL : https://hal.archives-ouvertes.fr/hal-00551673

P. Bellens, J. M. Pérez, R. M. Badia, and J. Labarta, CellSs: a Programming Model for the Cell BE Architecture, ACM/IEEE SC 2006 Conference (SC'06), 2006.
DOI : 10.1109/SC.2006.17

A. Bernstein, Analysis of Programs for Parallel Processing, IEEE Transactions on Electronic Computers, vol.15, issue.5, pp.757-762, 1966.
DOI : 10.1109/PGEC.1966.264565

G. Berry and G. Gonthier, The Esterel synchronous programming language: design, semantics, implementation, Science of Computer Programming, vol.19, issue.2, pp.87-152, 1992.
DOI : 10.1016/0167-6423(92)90005-V

URL : https://hal.archives-ouvertes.fr/inria-00075711

G. Bilsen, M. Engels, R. Lauwereins, and J. Peperstraete, Cyclo-static data flow, 1995 International Conference on Acoustics, Speech, and Signal Processing, pp.3255-3258, 1995.
DOI : 10.1109/ICASSP.1995.479579

R. H. Bisseling, Parallel Scientific Computation: A Structured Approach using BSP and MPI, 2004.
DOI : 10.1093/acprof:oso/9780198529392.001.0001

P. M. Carpenter, D. Ródenas, X. Martorell, A. Ramírez, and E. Ayguadé, A Streaming Machine Description and Programming Model, SAMOS, pp.107-116, 2007.
DOI : 10.1007/978-3-540-73625-7_13

P. Caspi, G. Hamon, and M. Pouzet, Real-Time Systems: Models and verification ? Theory and tools, chapter Synchronous Functional Programming with Lucid Synchrone, ISTE, 2007.

P. Caspi and M. Pouzet, Synchronous Kahn networks, Proceedings of the first ACM SIGPLAN international conference on Functional programming, ICFP '96, pp.226-238, 1996.

R. Cytron, Doacross: Beyond vectorization for multiprocessors, Intl. Conf. on Parallel Processing (ICPP), 1986.

R. Cytron, J. Ferrante, B. K. Rosen, M. N. Wegman, and F. K. Zadeck, Efficiently computing static single assignment form and the control dependence graph, ACM Transactions on Programming Languages and Systems, vol.13, issue.4, pp.451-490, 1991.
DOI : 10.1145/115372.115320

J. B. Dennis and G. R. Gao, An efficient pipelined dataflow processor architecture, Proceedings. SUPERCOMPUTING '88, pp.368-373, 1988.
DOI : 10.1109/SUPERC.1988.44674

P. Feautrier, Array Dataflow Analysis, Lecture Notes in Computer Science, vol.1808, pp.173-219, 2001.
DOI : 10.1007/3-540-45403-9_6

URL : https://hal.archives-ouvertes.fr/hal-00761537

P. Feautrier, Scalable and Structured Scheduling, International Journal of Parallel Programming, vol.28, issue.6, pp.459-487, 2006.
DOI : 10.1007/s10766-006-0011-4

M. Frigo, C. E. Leiserson, and K. H. Randall, The Implementation of the Cilk-5

J. Gaudiot, T. Deboni, J. Feo, W. Böhm, W. Najjar et al., The Sisal model of functional programming and its implementation, Proceedings of IEEE International Symposium on Parallel Algorithms Architecture Synthesis, p.112, 1997.
DOI : 10.1109/AISPAS.1997.581640

M. Gordon, W. Thies, and S. Amarasinghe, Exploiting Coarse-Grained Task, Data, and Pipeline Parallelism in Stream Programs, International Conference on Architectural Support for Programming Languages and Operating Systems, 2006.

P. L. Guernic, A. Benveniste, P. Bournai, and T. Gautier, Signal--A data flow-oriented language for signal processing, IEEE Transactions on Acoustics, Speech, and Signal Processing, vol.34, issue.2, pp.362-374, 1986.
DOI : 10.1109/TASSP.1986.1164809

URL : https://hal.archives-ouvertes.fr/inria-00076178

N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud, The synchronous data flow programming language LUSTRE, Proceedings of the IEEE, vol.79, issue.9, pp.1305-1320, 1991.
DOI : 10.1109/5.97300

N. Halbwachs, P. Caspi, P. Raymond, and D. Pilaud, The synchronous data flow programming language LUSTRE, Proceedings of the IEEE, vol.79, issue.9, pp.1305-1320, 1991.
DOI : 10.1109/5.97300

M. P. Herlihy and J. M. Wing, Axioms for concurrent objects, Proceedings of the 14th ACM SIGACT-SIGPLAN symposium on Principles of programming languages , POPL '87, pp.13-26, 1987.
DOI : 10.1145/41625.41627

C. A. Hoare, Communicating sequential processes, Communications of the ACM, vol.21, issue.8, pp.666-677, 1978.
DOI : 10.1145/359576.359585

. Intel, 64 Architecture Memory Ordering White Paper, 2007.

F. Irigoin, P. Jouvelot, and R. Triolet, Semantical interprocedural parallelization: an overview of the pips project, Proceedings of the 5th international conference on Supercomputing, ICS '91, pp.244-251, 1991.
URL : https://hal.archives-ouvertes.fr/hal-00984684

G. Kahn, The semantics of a simple language for parallel programming, Information processing, pp.471-475, 1974.

R. Kalla, B. Sinharoy, and J. Tendler, IBM power5 chip: a dual-core multithreaded processor, IEEE Micro, vol.24, issue.2, pp.40-47, 2004.
DOI : 10.1109/MM.2004.1289290

P. Kongetira, K. Aingaran, and K. Olukotun, Niagara: A 32-Way Multithreaded Sparc Processor, IEEE Micro, vol.25, issue.2, pp.21-29, 2005.
DOI : 10.1109/MM.2005.35

M. Kudlur and S. Mahlke, Orchestrating the execution of stream programs on multicore platforms, Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation, PLDI '08, pp.114-124, 2008.

C. Kyriacou, P. Evripidou, and P. Trancoso, Data-Driven Multithreading Using Conventional Microprocessors, IEEE Transactions on Parallel and Distributed Systems, vol.17, issue.10, pp.1176-1188, 2006.
DOI : 10.1109/TPDS.2006.136

L. Lamport, How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Progranm, IEEE Transactions on Computers, vol.28, pp.690-691, 1979.

E. A. Lee and D. G. Messerschmitt, Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing, IEEE Transactions on Computers, vol.36, issue.1, pp.24-25, 1987.
DOI : 10.1109/TC.1987.5009446

E. A. Lee and A. Sangiovanni-vincentelli, A framework for comparing models of computation, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol.17, issue.12, pp.1217-1229, 1998.
DOI : 10.1109/43.736561

F. Li, A. Pop, and A. Cohen, Advances in Parallel-Stage Decoupled Software Pipelining, Proceedings of the Workshop on Intermediate Representations, pp.29-36, 2011.
URL : https://hal.archives-ouvertes.fr/hal-00870687

V. Marjanovic, J. Labarta, E. Ayguadé, and M. Valero, Effective communication and computation overlap with hybrid MPI/SMPSs, PPOPP, 2010.

D. Millot, A. Muller, C. Parrot, and F. Silber-chaussumier, STEP: A Distributed OpenMP for Coarse-Grain Parallelism Tool, OpenMP in a New Era of Parallelism, pp.83-99978, 1007.
DOI : 10.1007/978-3-540-79561-2_8

URL : https://hal.archives-ouvertes.fr/hal-01373120

C. Miranda, A. Pop, P. Dumont, A. Cohen, and M. Duranton, Erbium, Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems, CASES '10, pp.11-20, 2010.
DOI : 10.1145/1878921.1878924

URL : https://hal.archives-ouvertes.fr/inria-00551510

S. Owens, S. Sarkar, and P. Sewell, A Better x86 Memory Model: x86-TSO, Theorem Proving in Higher Order Logics, pp.391-407, 2009.
DOI : 10.1007/11817963_46

V. Pankratius, A. Jannesari, and W. F. Tichy, Parallelizing Bzip2: A Case Study in Multicore Software Engineering, IEEE Software, vol.26, issue.6, pp.70-77, 2009.
DOI : 10.1109/MS.2009.183

T. M. Parks, Bounded scheduling of process networks, UMI Order, pp.96-21312, 1995.

J. Planas, R. M. Badia, E. Ayguadé, and J. Labarta, Hierarchical Task-Based Programming With StarSs, International Journal of High Performance Computing Applications, vol.23, issue.3, pp.284-299, 2009.
DOI : 10.1177/1094342009106195

A. Pop and A. Cohen, A Stream-Comptuting Extension to OpenMP, International Workshop on OpenMP (IWOMP'10), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00551507

A. Pop and A. Cohen, Preserving high-level semantics of parallel programming annotations through the compilation flow of optimizing compilers, Proceedings of the 15th Workshop on Compilers for Parallel Computers (CPC'10), 2010.
URL : https://hal.archives-ouvertes.fr/inria-00551518

A. Pop and A. Cohen, A stream-computing extension to OpenMP, Proceedings of the 6th International Conference on High Performance and Embedded Architectures and Compilers, HiPEAC '11, pp.5-14, 2011.
DOI : 10.1145/1944862.1944867

URL : https://hal.archives-ouvertes.fr/hal-00659411

A. Pop and S. Pop, A proposal for last private clause on OpenMP task Pragma, MINES ParisTech, CRI -Centre de Recherche en Informatique, Mathématiques et Systèmes, 35 rue St Honoré 77305, 2009.

A. Pop, S. Pop, H. Jagasia, J. Sjödin, and P. H. Kelly, Improving GNU compiler collection infrastructure for streamization, Proceedings of the 2008 GCC Developers' Summit, pp.77-86, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00817445

A. Pop, S. Pop, and J. Sjödin, Automatic streamization in GCC, Proceedings of the 2009 GCC Developers' Summit, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00817455

S. B. Dolbeau and F. Bodin, HMPP: A Hybrid Multi-core Parallel Programming Environment, Workshop on General Purpose Processing on Graphics Processing Units, 2007.

R. Ramanathan, Intel multi-core processors: Making the move to quad-core and beyond, Technology@Intel Magazine, vol.4, issue.1, pp.2-4, 2006.

R. Rangan, N. Vachharajani, M. Vachharajani, and D. August, Decoupled software pipelining with the synchronization array, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004., 2004.
DOI : 10.1109/PACT.2004.1342552

S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams, Understanding POWER multiprocessors, PLDI 2011
URL : https://hal.archives-ouvertes.fr/hal-01100824

S. Sarkar, P. Sewell, F. Z. Nardelli, S. Owens, T. Ridge et al., The semantics of x86-CC multiprocessor machine code, Proceedings of the 36th annual ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL '09, pp.379-391, 2009.

S. International and . Inc, CORPORATE. The SPARC architecture manual (version 9), 1994.

K. Stavrou, M. Nikolaides, D. Pavlou, S. Arandi, P. Evripidou et al., TFlux: A Portable Platform for Data-Driven Multithreading on Commodity Multicore Systems, 2008 37th International Conference on Parallel Processing, pp.25-34, 2008.
DOI : 10.1109/ICPP.2008.74

R. Stephens, A survey of stream processing, Acta Informatica, vol.34, issue.7, pp.491-541, 1997.
DOI : 10.1007/s002360050095

I. Watson and J. R. Gurd, A Practical Data Flow Computer, Computer, vol.15, issue.2, pp.51-57, 1982.
DOI : 10.1109/MC.1982.1653941

I. Symbols, (. , X. Cddf, and ?. , 61, see task activation A e 62, see CDDF program state A o 62, see CDDF program state C(K e ) 65, see continuation activation 62, see control program trace 62, see CDDF program state 62, see activation point 60, see stream ? 60, see stream access ? 70, see task activation dependence 71, see task activation dependence 66, see stream prefix order ? ? 122, see stream buffer reuse order sc 79, see stream clock see task order ?? . . . 122, see stream buffer reuse order, see stream prefix order 122, see stream buffer reuse order 66, see stream prefix order ? . see task graph A activation index 138 activation point, buffering semantics . . . . . . . . . . . . . . . . . . . . . 38 burst . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 C causality, pp.142-86

C. Cddf-program-state, .. , and .. , 58 CDDF execution rules 62 circular buffer 159, 224 coding patterns 32 data parallelism 46 dynamic pipeline36 pipeline42 stateful filters 37 variable horizon38 continuation activation . . . . . 65, see control program control program 62 control program parallelization . . . . 50, 116 control program trace, lastprivate . . . . . . . . . . . . . . . . . . . . 109?114 resource . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 spurious, pp.78-115