Piecewise Holistic Autotuning of Compiler and Runtime Parameters

Abstract : Current architecture complexity requires fine tuning of compiler and runtime parameters to achieve full potential performance. Autotuning substantially improves default parameters in many scenarios but it is a costly process requiring a long iterative evaluation. We propose an automatic piecewise autotuner based on CERE (Codelet Extractor and REplayer). CERE decomposes applications into small pieces called codelets: each codelet maps to a loop or to an OpenMP parallel region and can be replayed as a standalone program. Codelet autotuning achieves better speedups at a lower tuning cost. By grouping codelet invocations with the same performance behavior, CERE reduces the number of loops or OpenMP regions to be evaluated. Moreover unlike whole-program tuning, CERE customizes the set of best parameters for each specific OpenMP region or loop. We demonstrate CERE tuning of compiler optimizations, number of threads and thread affinity on a NUMA architecture. On average over the NAS 3.0 benchmarks, we achieve a speedup of 1.08x after tuning. Tuning a single codelet is 13x cheaper than whole-program evaluation and estimates the tuning impact on the original region with a 94.7% accuracy. On a Reverse Time Migration (RTM) proto-application we achieve a 1.11x speedup with a 200x cheaper exploration.
Type de document :
Communication dans un congrès
Springer. Euro-Par 2016 Parallel Processing - 22nd International Conference, Aug 2016, Grenoble, France. 238-250, Euro-Par 2016 Parallel Processing - 22nd International Conference, 9833, Lecture Notes in Computer Science. 〈10.1007/978-3-319-43659-3_18〉
Liste complète des métadonnées

https://hal-uvsq.archives-ouvertes.fr/hal-01417211
Contributeur : Pablo De Oliveira Castro <>
Soumis le : jeudi 15 décembre 2016 - 14:01:45
Dernière modification le : mardi 6 mars 2018 - 12:16:01

Identifiants

Collections

Citation

Mihail Popov, Chadi Akel, William Jalby, Pablo De Oliveira Castro. Piecewise Holistic Autotuning of Compiler and Runtime Parameters. Springer. Euro-Par 2016 Parallel Processing - 22nd International Conference, Aug 2016, Grenoble, France. 238-250, Euro-Par 2016 Parallel Processing - 22nd International Conference, 9833, Lecture Notes in Computer Science. 〈10.1007/978-3-319-43659-3_18〉. 〈hal-01417211〉

Partager

Métriques

Consultations de la notice

48