, Mat new_speed_up, new_speed_up_inv
, selectPointsToRecalcFlow(flow, averaging_radius, (float) speed_up_thr, 623 curr_rows, curr_cols, speed_up, new_speed_up, mask)
, 626 (float) speed_up_thr, curr_rows, curr_cols, speed_up_inv, 627 new_speed_up_inv, mask_inv)
, flow_inv = upscaleOpticalFlow(curr_rows, curr_cols, prev_to, 637 confidence_inv, flow_inv, upscale_averaging_radius, 638 (float) upscale_sigma_dist, (float) upscale_sigma_color)
, calcOpticalFlowSingleScaleSF(curr_from_extended, curr_to_extended, mask, 642 flow, averaging_radius, max_flow, (float) sigma_dist, 643 (float) sigma_color)
, calcConfidence(curr_to, curr_from, flow_inv, confidence_inv, max_flow)
, 646 calcOpticalFlowSingleScaleSF(curr_to_extended, curr_from_extended, 647 mask_inv, flow_inv, averaging_radius, max_flow, 648 (float) sigma_dist, (float) sigma_color)
, extrapolateFlow(flow, speed_up)
, 651 extrapolateFlow(flow_inv, speed_up_inv)
, should we remove occlusions for the last stage? 654 removeOcclusions(flow, flow_inv
, 655 removeOcclusions(flow_inv, flow, (float) occ_thr, confidence_inv)
, crossBilateralFilter(flow, curr_from, confidence, flow, postprocess_window, 659 (float) sigma_color_fix, (float) sigma_dist_fix)
, , vol.5
, CV_32FC2
, 664 Mat resulted_flow = _resulted_flow.getMat(
, 666 mixChannels(&flow, 1, &resulted_flow, 1, from_to, 2)
, CV_EXPORTS_W void calcOpticalFlowSF, p.670
, OutputArray flow, int layers, int averaging_block_size, int max_flow) { 671 orig_calcOpticalFlowSF(from, to, flow, layers, averaging_block_size, max_flow
, Évolution du nombre de publications référencées par Google Scholar pour les mots clés GPU et GPGPU
, Exemple de vulgarisation comparant les architectures CPU et GPU. Source : Nvidia
, Répartition des capteurs embarqués sur la voiture model S
, Évolution des performances maximales de différentes architectures au cours du temps. Le graphique du haut reprèsente les performances calculatoires, celui du bas la bande passante mémoire, vol.9
, Vue globale de l'architecture Nvidia Pascal -GP104 utilisée pour les GTX 1080
, , p.13
, Vue macroscopique de la méthodologie de placement d'algorithmes sur architecture hybride CPU et GPU
, Détails de la phase d'analyse de code statique
, Représentation spinale de la fonction removeOcclusions, p.52
, Représentation spinale de la fonction removeOcclusions, p.55
Détails de la phase d'analyse de code dynamique, p.57 ,
, Transformations de nid de boucles pour architectures, vol.58
, , p.58
, Extrait de représentation spinale pour la fonction crossBilateralFilter (1/2) 66
, Extrait de représentation spinale pour la fonction crossBilateralFilter, p.67
, Génération de code source pour hôte et accélérateur de type GPU, p.82
, Représentation spinale de la fonction calcIrregularityMat où les blocs b 1 et b 2 ne permettent pas d'avoir des boucles parfaitement imbriquées, p.84
, Déplacement de blocs encastrés pour la fonction calcIrregularityMat. (Méthode par exclusion)
, Déplacement de blocs encastrés pour la fonction calcIrregularityMat. (Méthode par inclusion), vol.87
, Déplacement de blocs interboucles pour la fonction calcIrregularityMat. (Méthode par inclusion et synchronisation)
, Vue d'un cluster SMX de l'architecture Nvidia Kepler de première génération utilisée pour les Quadro K2000
, Vue d'un cluster SMM de l'architecture Nvidia Maxwel de seconde génération utilisée pour la Tegra X1
, Exécution de l'algorithme original simpleFlow
, Exécution de l'algorithme original simpleFlow
, Placement initial de l'algorithme Simpleflow sur le GPU de la Jetson TX1, vol.106
, Placement initial de l'algorithme Simpleflow sur le GPU d'Endicott, p.107
, Extrait de représentation spinale pour la fonction crossBilateralFilter (1/2) 108
, Extrait de représentation spinale pour la fonction crossBilateralFilter (2/2) 109
, Extrait de représentation spinale pour la fonction calcOpticalFlowSinglsS-caleSF (1/2)
, Extrait de représentation spinale pour la fonction calcOpticalFlowSinglsS-caleSF (2/2)
, Amélioration de la quantité de placement sur le GPU de la Jetson
, , p.115
, Temps d'exécution de l'algorithme de variance locale en fonction de la taille du voisinage
, Temps d'accès moyen en lecture pour une distribution cyclique des accès mémoire sur Nvidia Quadro K2000. Fonction d'accès : R 1, p.132
, Temps d'accès moyen en lecture pour une distribution par blocs des accès mémoire sur Nvidia Quadro K2000. Fonction d'accès : R 2, p.133
, Temps d'accès moyen en lecture pour une distribution cyclique des accès mémoire sur Nvidia Quadro K2000. Fonction d'accès : R 1, p.134
, Temps d'accès moyen en lecture pour une distribution par blocs des accès mémoire sur Nvidia Quadro K2000. Fonction d'accès : R 2, p.135
, Temps d'accès moyen en lecture pour une distribution cyclique des accès mémoire sur Nvidia TX1. Fonction d'accès : R 1 . Référentiel : Block, p.138
, Temps d'accès moyen en lecture pour une distribution par blocs des accès mémoire sur Nvidia TX1. Fonction d'accès : R 2 . Référentiel : Block, p.139
, Temps d'accès moyen en lecture pour une distribution cyclique des accès mémoire sur Nvidia TX1. Fonction d'accès : R 1 . Référentiel : Warp, p.140
, Temps d'accès moyen en lecture pour une distribution par blocs des accès mémoire sur Nvidia Quadro K2000. Fonction d'accès : R 2, p.141
, Analyse de la concurrence de kernels intra-GPU sur architecture Nvidia
Représentation spinale du programme simpleflow (1/18), p.174 ,
Représentation spinale du programme simpleflow (2/18), p.175 ,
Représentation spinale du programme simpleflow (3/18), p.176 ,
Représentation spinale du programme simpleflow (4/18), p.177 ,
Représentation spinale du programme simpleflow (5/18), p.178 ,
Représentation spinale du programme simpleflow (6/18), p.179 ,
Représentation spinale du programme simpleflow (7/18), p.180 ,
Représentation spinale du programme simpleflow (8/18), p.181 ,
Représentation spinale du programme simpleflow (9/18), p.182 ,
Représentation spinale du programme simpleflow (10/18), p.183 ,
Représentation spinale du programme simpleflow (11/18), p.184 ,
Représentation spinale du programme simpleflow (12/18), p.185 ,
Représentation spinale du programme simpleflow (13/18), p.186 ,
Représentation spinale du programme simpleflow (14/18), p.187 ,
Représentation spinale du programme simpleflow (15/18), p.188 ,
Représentation spinale du programme simpleflow (16/18), p.189 ,
Représentation spinale du programme simpleflow (17/18), p.190 ,
Représentation spinale du programme simpleflow (18/18), p.191 ,
, Tableau récapitulatif des solutions de placement pour, p.42
, Tableau récapitulatif des architectures expérimentales utilisées, p.99
, Résultats de l'expérimentation sur la concurrence de threads, p.150
Temps d'exécution de l'algorithme simpleflow original sur la Tegra X1, p.202 ,
, Temps d'exécution de l'algorithme simpleflow suite à son placement initial sur le GPU de la Tegra X1
, Temps d'exécution de l'algorithme simpleflow suite à l'amélioration de la quantité de placement sur le GPU de la Tegra X1
, Temps d'exécution de l'algorithme simpleflow original sur Endicott, p.211
, Temps d'exécution de l'algorithme simpleflow suite à son placement initial sur le GPU d'Endicott
, Temps d'exécution de l'algorithme simpleflow suite à l'amélioration de la quantité de placement sur le GPU d'Endicott
, Liste des codes source
Extrait de code provenant de l'algorithme simpleflow, vol.46 ,
, , p.102
, Modèle de kernel utilisé pour l'évaluation du parallélisme coarse grain sur GPU
, 197 HMPP Hybrid Multicore Parallel Programming, vol.25, p.26
, HPC High Performance Computing. 10-12, vol.24, p.240
, IGP Integrated Graphics Processor, vol.10
, ILP Instruction Level Parallelism, vol.78, p.152
, Intel GMA Intel Graphics Media Accelerator, vol.10
, Intel IPL Intel Image Processing Library, vol.19
, Intel IPP Intel Integrated Performance Primitive, vol.19, p.20
, IR Internal Representation, p.41
, ISA Instruction Set Architecture, vol.4, p.127
, ISL Integer Set Library, vol.33
, JIT Just In Time, vol.16, p.37
, LIDAR LIght Detection And Ranging, vol.1
, MIMD Multiple Instructions on Multiple Data, vol.14, p.157
, MMX MultiMedia eXtension
, MPI Message Passing Interface, p.31
, MPPA Multi-Purpose Processor Array, p.155
, MSI Modified Shared Invalid, vol.35
, NASA National Aeronautics and Space Administration, vol.118
, NPP Nvidia Performance Primitive, vol.19, p.20
, NUMA Non Uniform Memory Access, p.145
, , vol.16, p.128
, NVPTX NVidia Parallel Thread eXecution, p.16
, OpenACC Open ACCelerators, p.30
, OpenCL Open Computing Language, vol.14, issue.20, pp.30-38
, OpenCLIPP OpenCL Image Processing Primitives, vol.20
, OpenCV Open Computer Vision, vol.19, p.157
, OpenGL Open Graphics Library, pp.15-20
, OpenGL ES OpenGL for Embedded System, vol.15
, OpenGL SC OpenGL for Safety Critical applications, p.15
, OpenMP Open Multi-Processing, vol.25, p.42
, OS Operating System, vol.16, p.155
, PCIe Peripheral Component Interconnect express, vol.99, p.155
, PET Polyhedral Extraction Tool, vol.33
, PGCD Plus Grand Commun Diviseur, vol.54
, PIPS Programming Integrated Parallel System, vol.31, p.32
, PPCG Polyhedral Parallel Code Generator, vol.32, p.124
,
, Dépôt des contributions à OpenCV
, Intel Integrated Performance Primitives
,
,
,
, Parallel computing toolbox
,
The portland group, pgi fortran and c accelarator programming model, 2009. ,
, Compilers, principles, techniques, vol.7, p.9, 1986.
Openclipp : Opencl integrated performance primitives library for computer vision applications, Proc. SPIE Electronic Imaging, pp.25-31, 2014. ,
Blocklib : a skeleton library for cell broadband engine, Proceedings of the 1st international workshop on Multicore software engineering, pp.7-14, 2008. ,
Ankit Agarwal et Cindula Saipriyadarshan : Gpucv : an opensource gpu-accelerated framework forimage processing and computer vision, Proceedings of the 16th ACM international conference on Multimedia, pp.1089-1092, 2008. ,
Source-to-Source Automatic Program Transformations for GPU-like Hardware Accelerators, vol.2012 ,
URL : https://hal.archives-ouvertes.fr/pastel-00958033
Ronan Keryell et Pierre Villalon : PIPS is not (just) polyhedral software adding GPU code generation in PIPS, First International Workshop on Polyhedral Compilation Techniques (IMPACT 2011) in conjonction with CGO 2011, 2011. ,
François Irigoin et Ronan Keryell : Static compilation analysis for host-accelerator communication optimization, International Workshop on Languages and Compilers for Parallel Computing, pp.237-251 ,
, , 2011.
Grégoire Péan et Pierre Villalon : Par4all : From convex array regions to heterogeneous computing, IMPACT 2012 : Second International Workshop on Polyhedral Compilation Techniques HiPEAC, 2012. ,
Namyst : Data-aware task scheduling on multi-accelerator based platforms, Parallel and Distributed Systems (ICPADS), pp.291-298, 2010. ,
Concurrency and Computation : Practice and Experience, Special Issue : Euro-Par, StarPU : A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures, vol.23, pp.187-198, 2009. ,
, Dependence analysis, vol.3, 1997.
Automatic C-to-CUDA code generation for affine programs, Compiler Construction, pp.244-263, 2010. ,
Code generation in the polyhedral model is easier than you think, PACT'13 IEEE International Conference on Parallel Architecture and Compilation Techniques, pp.7-16, 2004. ,
URL : https://hal.archives-ouvertes.fr/hal-00017260
Thrust : A productivity-oriented library for cuda, vol.2, pp.359-371, 2011. ,
Freia : Framework for embedded image applications, 2008. ,
Diamond tiling : Tiling techniques to maximize parallelism for stencil computations, IEEE Transactions on Parallel and Distributed Systems, vol.28, issue.5, pp.1285-1298, 2017. ,
Atanas Rountev et Ponnuswamy Sadayappan : Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model, International Conference on Compiler Construction, pp.132-146, 2008. ,
Jagannathan Ramanujam et Ponnuswamy Sadayappan : A practical automatic polyhedral parallelizer and locality optimizer, In ACM SIGPLAN Notices, vol.43, pp.101-113, 2008. ,
, , 2003.
Mike Houston et Pat Hanrahan : Brook for gpus : stream computing on graphics hardware, ACM transactions on graphics, vol.23, pp.777-786, 2004. ,
Api-compiling for image hardware accelerators technical report-mines paristech a/500/cri, 2012. ,
Power consumption of gpus from a software perspective, International Conference on Computational Science, pp.914-923, 2009. ,
Sean Lee et Adriana Susnea : Nova : A functional language for data parallelism, Proceedings of ACM SIGPLAN International Workshop on Libraries, Languages, and Compilers for Array Programming, 2014. ,
Phil Parsonage et Bruno Nicoletti : High-performance simt code generation in an active visual effects library, Proceedings of the 6th ACM conference on Computing frontiers, pp.175-184, 2009. ,
Exact versus approximate array region analyses, International Workshop on Languages and Compilers for Parallel Computing, pp.86-100, 1996. ,
, Béatrice Creusillet et François Irigoin : Interprocedural array region analyses, International Journal of Parallel Programming, vol.24, issue.6, pp.513-546, 1996.
Stéphanie Even, Serge Guelton et François Irigoin : Par4all : Auto-parallelizing c and fortran for the cuda architecture, 2009. ,
Esteban Clua et Cristiana Bentes : Analyzing and estimating the performance of concurrent kernels execution on gpus, 2017. ,
Fourier-motzkin elimination and its dual, 1972. ,
The generalized simplex method for minimizing a linear form under linear inequality restraints, Pacific Journal of Mathematics, vol.5, issue.2, pp.183-195, 1955. ,
On the complexity of loop fusion, Parallel Computing, vol.26, issue.9, pp.1175-1193, 2000. ,
URL : https://hal.archives-ouvertes.fr/hal-02101854
On the optimality of allen and kennedy's algorithm for parallelism extraction in nested loops, European Conference on Parallel Processing, pp.379-388, 1996. ,
Auto-tuning skepu : a multi-backend skeleton programming framework for multi-gpu systems, Proceedings of the 4th International Workshop on Multicore Software Engineering, pp.25-32, 2011. ,
Smart containers and skeleton programming for gpu-based systems, International journal of parallel programming, vol.44, issue.3, pp.506-530, 2016. ,
Flexible runtime support for efficient skeleton programming on heterogeneous gpu-based systems, PARCO, pp.159-166, 2011. ,
Adaptive implementation selection in the skepu skeleton programming library, International Workshop on Advanced Parallel Processing Technologies, pp.170-183, 2013. ,
Hugues Cassé et Pascal Sainrat : Loop normalization (suite) ,
, Proc. EWOMP, pp.5-11, 2003.
Hmpp : A hybrid multicore parallel programming environment, Workshop on general purpose processing on graphics processing units, vol.28, 2007. ,
Flownet : Learning optical flow with convolutional networks, Proceedings of the IEEE international conference on computer vision, pp.2758-2766, 2015. ,
Towards a tunable multi-backend skeleton programming framework for multi-gpu systems, Proceedings of the 3rd Swedish Workshop on Multicore Computing, 2010. ,
Skepu : a multi-backend skeleton programming library for multi-gpu systems, Proceedings of the fourth international workshop on High-level parallel programming and applications, pp.5-14, 2010. ,
Hmpp : A hybrid multicore parallel programming platform ,
Skepu 2 : language embedding and compiler support for flexible and type-safe skeleton programming, 2016. ,
Skepu 2 user guide, 2016. ,
Skepu 2 : Flexible and type-safe skeleton programming for heterogeneous parallel systems, International Journal of Parallel Programming, pp.1-19, 2017. ,
Erwan Guehenneux et Yannick Alusse : Gpucv : A framework for image processing acceleration with graphics processors, Multimedia and Expo, 2006 IEEE International Conference on, pp.585-588, 2006. ,
Dataflow analysis of array and scalar references, International Journal of Parallel Programming, vol.20, issue.1, pp.23-53, 1991. ,
Flynn's taxonomy, Encyclopedia of parallel computing, pp.689-697, 2011. ,
Loop fusion for memory space optimization, Proceedings of the 14th international symposium on Systems synthesis, pp.95-100, 2001. ,
URL : https://hal.archives-ouvertes.fr/hal-00399639
Performance optimization and profiling of image processing algorithms on parallel architectures, 2018. ,
Méthode de calcul de variance locale adaptée aux processeurs graphiques, COMPAS2016, Conférence d'informatique en Parallélisme, 2016. ,
Threewise : a local variance algorithm for gpu, 19th IEEE International Conference on Computational Science and Engineering (CSE 2016), pp.257-262, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01359482
An up to date Mapping Methodology for GPUs, 20th Workshop on Compilers for Parallel Computing (CPC 2018), 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01759238
Ponuswamy Sadayappan et Sven Verdoolaege : Hybrid hexagonal/classical tiling for gpus, Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization, p.66, 2014. ,
, The relation between diamond tiling and hexagonal tiling. Parallel Processing Letters, vol.24, p.1441002, 2014.
URL : https://hal.archives-ouvertes.fr/hal-01257249
François Irigoin et Ronan Keryell : Compilation for heterogeneous computing : Automating analyses, transformations and decisions, 2011. ,
Enabling task parallelism in the cuda scheduler, Workshop on Programming Models for Emerging Architectures, vol.9, 2009. ,
Fabien Coelho et François Irigoin : A dynamic to static dsl compiler for image processing applications, 2017. ,
Directive-Based General-Purpose GPU Programming, 2009. ,
hi cuda : a high-level directive-based language for gpu programming, Proceedings of 2nd Workshop on General Purpose Processing on Graphics Processing Units, pp.52-61, 2009. ,
, hicuda : High-level gpgpu programming. IEEE Transactions on Parallel and Distributed systems, vol.22, pp.78-90, 2011.
Parallel prefix sum (scan) with cuda. GPU gems, vol.3, pp.851-876, 2007. ,
, Computer architecture : a quantitative approach, 2011.
, Data parallel algorithms, vol.29, pp.1170-1183, 1986.
Alexey Dosovitskiy et Thomas Brox : Flownet 2.0 : Evolution of optical flow estimation with deep networks, Proceedings of the IEEE conference on computer vision and pattern recognition, pp.2462-2470, 2017. ,
, Intel IPL : Intel® Image Processing Library, Reference Manual, 2000.
Béatrice Creusillet et Ronan Keryell : Polyedres et compilation, Rencontres francophones du Parallélisme (RenPar'20), 2011. ,
An effective fusion and tile size model for optimizing image processing pipelines, Proceedings of the 23rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp.261-275, 2018. ,
Maximizing loop parallelism and improving data locality via loop fusion and distribution, International Workshop on Languages and Compilers for Parallel Computing, pp.301-320, 1993. ,
Khronos sycl for opencl : a tutorial, Proceedings of the 3rd International Workshop on OpenCL, p.24, 2015. ,
Programming massively parallel processors : a hands-on approach, 2016. ,
The parallel execution of do loops, Communications of the ACM, vol.17, issue.2, pp.83-93, 1974. ,
Llvm and clang : Next generation compiler technology, The BSD Conference, pp.1-2, 2008. ,
Llvm : A compilation framework for lifelong program analysis & transformation, Proceedings of the international symposium on Code generation and optimization : feedback-directed and runtime optimization, p.75, 2004. ,
Cetus-an extensible compiler infrastructure for source-to-source transformation, International Workshop on Languages and Compilers for Parallel Computing, pp.539-553, 2003. ,
Gpu kernels as data-parallel array computations in haskell, Workshop on Exploiting Parallelism using GPUs and other Hardware-Assisted Methods, pp.1-9, 2009. ,
Openmpc : Extended openmp programming and tuning for gpus, Proceedings of the 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, pp.1-11, 2010. ,
Openmpc : extended openmp for efficient programming and tuning on gpus, International Journal of Computational Science and Engineering, vol.8, issue.1, pp.4-20, 2013. ,
Openmp to gpgpu : a compiler framework for automatic translation and optimization, ACM Sigplan Notices, vol.44, issue.4, pp.101-110, 2009. ,
Openarc : extensible openacc compiler framework for directive-based accelerator programming study, Proceedings of the First Workshop on Accelerator Programming using Directives, pp.1-11, 2014. ,
Gpuwattch : enabling energy optimizations in gpgpus, ACM SIGARCH Computer Architecture News, vol.41, pp.487-498, 2013. ,
Cédric Bastoul et Richard Lethin : A mapping path for multigpgpu accelerated computers from a portable high level programming abstraction, Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp.51-61, 2010. ,
, Openuh : An optimizing, portable openmp compiler, vol.19, pp.2317-2332, 2007.
Effective source-to-source outlining to support whole program empirical optimization, LCPC, vol.9, pp.308-322, 2009. ,
Mantle Programming Guide and API Reference, mars, 2015. ,
Togpu : Automatic source transformation from c++ to cuda using clang/llvm. Electronic Imaging, vol.2016, pp.1-9, 2016. ,
Evaluating openmp 4.0's effectiveness as a heterogeneous parallel programming model, Parallel and Distributed Processing Symposium Workshops, pp.338-347, 2016. ,
Meta-programming and multistage programming for gpgpus, 2016 IEEE 10th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSOC), pp.369-376, 2016. ,
URL : https://hal.archives-ouvertes.fr/hal-01416797
Improving data locality with loop transformations, ACM Transactions on Programming Languages and Systems (TOPLAS), vol.18, issue.4, pp.424-453, 1996. ,
Optimal weighted loop fusion for parallel programs, Proceedings of the ninth annual ACM symposium on Parallel algorithms and architectures, pp.282-291, 1997. ,
Muthu Manikandan Baskaran, Allen Leung et Richard Lethin : R-stream compiler, Encyclopedia of Parallel Computing, pp.1756-1765, 2011. ,
A survey of cpu-gpu heterogeneous computing techniques, ACM Computing Surveys (CSUR), vol.47, issue.4, p.69, 2015. ,
A survey of methods for analyzing and improving gpu energy efficiency, ACM Computing Surveys (CSUR), vol.47, issue.2, p.19, 2015. ,
Maxima for graphs and a new proof of a theorem of turán, Canad. J. Math, vol.17, issue.4, pp.533-540, 1965. ,
Polymage : Automatic optimization for image processing pipelines, In ACM SIGARCH Computer Architecture News, vol.43, pp.429-443, 2015. ,
Toshio Endo et Satoshi Matsuoka : Statistical power modeling of gpu kernels using performance counters, Green Computing Conference, pp.115-122, 2010. ,
, Gabriel Noaje : un environnement parallèle de développement haut niveau pour les accélérateurs graphiques : mise en oeuvre à l'aide d'OpenMP, 2013.
Source-to-source code translator : Openmp c to cuda, High Performance Computing and Communications (HPCC), pp.512-519, 2011. ,
, Cedric Nugteren : The bones source-to-source compiler manual, 2012.
A modular and parameterisable classification of algorithms, 2011. ,
Introducing'bones' : a parallelizing sourceto-source compiler based on algorithmic skeletons, Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, pp.1-10, 2012. ,
Bones : an automatic skeleton-based c-tocuda compiler for gpus, ACM Transactions on Architecture and Code Optimization (TACO), vol.11, issue.4, p.35, 2015. ,
Skeleton-based automatic parallelization of image processing algorithms for gpus, Embedded Computer Systems (SAMOS), 2011 International Conference on, pp.25-32, 2011. ,
Algorithmic species revisited : A program code classification based on array references, Multi-/Manycore Computing Systems (MuCoCoS), pp.1-8, 2013. ,
Algorithmic species : a classification of affine loop nests for parallel programming, ACM Transactions on Architecture and Code Optimization (TACO), vol.9, issue.4, p.40, 2013. ,
Automatic skeletonbased compilation through integration with an algorithm classification, International Workshop on Advanced Parallel Processing Technologies, pp.184-198 ,
, , 2013.
, CUDA Occupancy Calculator. NVIDIA
, CUDA Nvidia : Tuning CUDA applications for FERMI, 2010.
, Whitepaper NVIDIA Tegra X1, 2015.
, CUDA C best practices guid, 2017.
CUDA C programming guide, 2017. ,
, CUDA compiler driver NVCC, 2017.
, , 2017.
, , 2017.
, CUDA Nvidia : Parallel Thread Execution ISA, 2017.
Thrust quick start guide, 2017. ,
, CUDA Nvidia : Tuning CUDA applications for KEPLER, 2017.
, CUDA Nvidia : Tuning CUDA applications for MAXWELL, 2017.
, CUDA Nvidia : Tuning CUDA applications for PASCAL, 2017.
, CUDA Nvidia : Tuning CUDA applications for VOLTA, 2017.
, Whitepaper : NVIDIA GeForce GTX 1080, 2017.
, IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pp.1-8, 2008.
Energy and execution time comparison of optical flow algorithms on simd and gpu architectures, Conference on Design and Architectures for Signal and Image Processing (DASIP), pp.25-30, 2018. ,
URL : https://hal.archives-ouvertes.fr/hal-01925886
Quentin Meunier et Lionel Lacassagne : Comparaison de la consommation énergétique et du temps d'exécution d'un algorithme de traitement d'images optimisé sur des architectures SIMD et GPU, Conférence d'informatique en Parallélisme, 2018. ,
The omega test : a fast and practical integer programming algorithm for dependence analysis, Proceedings of the 1991 ACM/IEEE conference on Supercomputing, pp.4-13, 1991. ,
Going beyond integer programming with the omega test to eliminate false data dependences, IEEE Transactions on Parallel and Distributed Systems, vol.6, issue.2, pp.204-211, 1995. ,
Profitable loop fusion and tiling using modeldriven empirical search, Proceedings of the 20th annual international conference on Supercomputing, pp.249-258, 2006. ,
Rose : Compiler support for object-oriented frameworks, Parallel Processing Letters, vol.10, issue.02n03, pp.215-226, 2000. ,
, , 2012.
Symbolic bounds analysis of pointers, array indices, and accessed memory regions, ACM Sigplan Notices, vol.35, pp.182-195, 2000. ,
Optimized unrolling of nested loops, Proceedings of the 14th international conference on Supercomputing, pp.153-166, 2000. ,
A parametrized loop fusion algorithm for improving parallelism and cache locality, The Computer Journal, vol.40, issue.6, pp.340-355, 1997. ,
, Skepu user guide. Rapport technique, 2015.
, Papiers présentés à la conférence renpar 2002, 2004.
Skelcl-a portable skeleton library for high-level gpu programming, Parallel and Distributed Processing Workshops and Phd Forum, pp.1176-1182, 2011. ,
Opencl : A parallel programming standard for heterogeneous computing systems, Computing in science & engineering, vol.12, issue.3, pp.66-73, 2010. ,
Dense point trajectories by gpu-accelerated large displacement optical flow, European conference on computer vision, pp.438-451, 2010. ,
Simpleflow : A non-iterative, sublinear optical flow algorithm, Computer Graphics Forum, issue.2, p.31, 2012. ,
, Openuh : open source openacc compiler. GTC2014, 2014.
Compiling a high-level directive-based programming model for gpgpus, International Workshop on Languages and Compilers for Parallel Computing, pp.105-120, 2013. ,
, Cuda-lite : Reducing gpu programming complexity, vol.8, pp.1-15, 2008.
Mint : realizing cuda performance in 3d stencil methods with annotated c, Proceedings of the international conference on Supercomputing, pp.214-224, 2011. ,
Joint scheduling and layout optimization to enable multi-level vectorization, IMPACT, 2012. ,
isl : An integer set library for the polyhedral model, ICMS, vol.6327, pp.299-302, 2010. ,
Multi-dimensional incremental loop fusion for data locality, Application-Specific Systems, Architectures, and Processors, pp.17-27, 2003. ,
Polyhedral extraction tool, Second International Workshop on Polyhedral Compilation Techniques (IMPACT'12), 2012. ,
, , 2017.
José Ignacio Gómez, Christian Tenllado et Francky Catthoor : Polyhedral parallel code generation for cuda, ACM Trans. Archit. Code Optim, vol.9, issue.4, 2013. ,
Better performance at lower occupancy, Proceedings of the GPU technology conference, vol.10, p.16, 2010. ,
Understanding latency hiding on gpus, 2016. ,
Xia Zhao et Jos BTM Roerdink : Evaluation of autoparallelization toolkits for commodity gpus, International Conference on Parallel Processing and Applied Mathematics, pp.447-457, 2013. ,
Roofline : an insightful visual performance model for multicore architectures, Communications of the ACM, vol.52, issue.4, pp.65-76, 2009. ,
More iteration space tiling, Proceedings of the 1989 ACM/IEEE conference on Supercomputing, pp.655-664, 1989. ,
Implementing the pgi accelerator model, Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, pp.43-50, 2010. ,
Gpucc : An open-source gpgpu compiler, Proceedings of the 2016 International Symposium on Code Generation and Optimization, pp.105-116, 2016. ,
A gpgpu compiler for memory optimization and parallelism management, ACM Sigplan Notices, vol.45, pp.86-97, 2010. ,