Making your program faster and greener

Jeremy Bennett, Embecosm

Abstract

Compilers take computer programs and translate them to the binary machine code processors actually run. Two of the most widely used compilers are completely free and open source: GCC and LLVM. In this article we look at two recent industrial research projects supported by Innovate UK, the government’s innovation agency, which advance the state of the art with these compilers.

The MAGEEC project, a joint project between Embecosm and Bristol University, is an open source machine learning framework for any compiler, which allows the compiler to be trained to generate more energy efficient code. A side-benefit is that energy efficient code turns out to be much faster code.

Superoptimization as a technique to achieve the ultimate in compiled performance has been around in academic circles for nearly 30 years.

During the summer of 2014, Innovate UK funded a feasibility study to see whether any of these techniques were commercially viable. The good news is that some techniques could now, or with a modest amount of further industrial R&D, offer exceptional benefit for real-world software. And once again the software is open source.

MAchine Guided Energy Efficient Compilation (MAGEEC)

A study carried out by James Pallister at Bristol University and funded by the UK compiler development company Embecosm in summer 2012 found that choice of compiler optimization flags had a major effect on the energy consumed by the compiled program. The bad news was that the options to be used varied from architecture to architecture and program to program [1].

The MAGEEC project was funded by the Technology Strategy Board (now Innovate UK) under its Energy Efficient Computing initiative, to develop a machine learning based compiler infrastructure capable of optimizing for energy. Running from June 2013 to November 2014, it was a joint feasibility study between Embecosm and Bristol University, to develop a machine learning compiler infrastructure that could optimize for energy. Key criteria were that the infrastructure should be generic, it should optimize for energy, that it should be based on real energy measurements, not models and that it should create a fully working system. The entire project was free and open source.

MAGEEC needs just two interfaces to a compiler, which are typically provided by the compiler plugin interface.

First it needs to know the characteristics of the source code. This sort of language analysis is exactly what the front of the compiler does, so use it to output this information. MAGEEC defines the set of features we are interested in, and the compiler plugin must provide this information. Information like the number of variables in a function, the number of basic blocks and son on. At present we have 55 features defined, and any program can be reduced to a collection of values – one for each of these. The machine learning system will train against this feature vector.

The second interface we need is a way to control which optimization passes the compiler runs. While at the command level, these correspond to optimization flags, at the programming level the compiler has a pass manager, which decides which pass should run. MAGEEC uses a plugin interface to the pass manager to override which pass should run.

MAGEEC

MAGEEC operates in two modes. In training mode, it collects data that will be used to drive the machine learning system. A very large number of data is needed from a large and representative set of programs, which are then run with a wide range of different compiler optimizations to measure how the optimizations affect energy. A key challenge was the lack of free and open source benchmarks suitable for embedded systems, so we created the Bristol/Embecosm Benchmark Suite (BEEBS), which comprises 80 programs with minimal system dependencies. It is freely available from www.beebs.eu.

Selecting which optimizations to use when building up the training data is a huge challenge. GCC for example has over 200 optimizations, many of which affect each other, so we can’t just try each one in isolation. Yet to run 2²⁰⁰ tests would take millions of years, so we turned to Experimental Design theory. MAGEEC is one of the first computer applications to make use of fractional factorial design to optimize the number of measurements to run. This is combined with Plackett-Burman design to identify the most important optimizations to evaluate, reducing the number of runs to a few hundred, which can be completed in a few hours.

The output of all these tests is a large data set, from which we can construct a database correlating source code with the best optimizations to use for energy efficiency. MAGEEC is deliberately independent of any particular machine learning algorithm. By default MAGEEC uses a C5.0 Decision Tree, but others in our team have explored the use of genetic algorithms and inductive logic programming.

With the machine learning database, MAGEEC can be switched to its default mode. In this mode, when supplied with a program, the pass manager plugin will look into the machine learning database to determine which optimization passes to run.

We are in the early stages of evaluating MAGEEC, but initial results using a small training subset with the Cortex M0 processor have shown we can get a 3–7% improvement in energy usage over the best possible today. However with a fully trained dataset our objective is a 40% reduction in energy usage.

There are a number of side benefits that have arisen out of this project. Energy is reasonably well correlated with performance, so energy efficient programs typically run much faster than can be achieved with standard optimizations. A second effect is that the pass manager is typically controlling compilation of individual functions, each of which gets its own optimization settings. This fine-grained optimization adds a further improvement to the code generated.

Superoptimization

We are all used to compilers “optimizing” code. However the code is not truly optimized, in the sense that is the best possible sequence of instructions to perform the desired functionality. Superoptimization works by searching all possible sequences of instructions to find the best possible sequence. The following example shows how a simple C sign function is compiled by a standard compiler and then by a superoptimizer.

MAGEEC code sample

In each case the value is supplied in register d0 and returned in register d1. The superoptimizer code is not at all obvious—although three of the instructions use the carry (x) bit. As an exercise look at the values in d0, d1 and x when the input value is a negative number, a positive number and zero.

This is typical of superoptimizers finding code sequences that no human coder is likely to see. The technique is old, introduced by Henry Massalin in 1997 [2], but has always suffered from the immense computational requirements of the huge search space, and so has remained an academic speciality.

Much research in the intervening 27 years has gone into improving search efficiency. During the summer of 2014, James Pallister of Embecosm and Bristol University spent 4 months funded by Innovate UK investigating whether any of these techniques are now feasible for commercial deployment.

The good news is that the advances in compute power, combined with improvements in the search algorithms mean some types of superoptimization can now be considered for commercial deployment. One of the oldest superoptimization tools is the free and open source GNU Superoptimizer (GSO). This tool can typically optimize sequences up to around 7 instructions on a modern PC. This may not sound much, but if that code sequence is the key inner loop at the heart of a major piece of software, the returns are more than justified.

More generally GSO can be used as an amanuensis for the compiler and library writer, identifying architecture specific peephole optimizations, and efficient implementations of key library functions.

The study also found that some techniques will be commercially feasible with a relatively modest investment in further industrial research. Perhaps most exciting is the use of Monte Carlo methods to explore the search space in the technique known as stochastic superoptimization. This can find much longer code sequences, and remarkably those code sequences may represent completely new algorithms. In other words code sequences that could never be found by a conventional compiler’s stepwise refinement of code. In a paper on the subject [3] a team from Stanford used this approach to discover a completely new implementation of the Montgomery multiplication benchmark.

Embecosm is now working with James Pallister on a new version of the GNU Superoptimizer that will add stochastic superoptimization.

Summary

Some of the leading work on compiler optimization is being carried out in the UK, with contributions from both industry and academia. The output from this work is all open source, meaning the technology can be quickly and widely adopted.

The work from both projects has been presented at meetings of the BCS Open Source Specialist Group, with videos of many of the talks available for download. Its meetings, many held in conjunction with the UK Open Source Hardware User Group, are free to all, whether BCS members or not.

References

1 James Pallister, Simon J Hollis and Jeremy Bennett, Identifying Compiler Options to Minimize Energy Consumption for Embedded Platforms. The Computer Journal 2013, published online 11 Nov 2013, doi:10.1093/comjnl/bxt129.

2 Henry Massalin. Superoptimizer: A Look at the Smallest Program. Second International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS II), pages 122–126, 1987.

3 Eric Schkufza, Rahul Sharma, Alex Aiken, Stochastic Superoptimization. ASPLOS ’13, March 16-20 2013, Houston, Texas, USA.

About the Author

Dr Jeremy Bennett is Chief Executive of Embecosm, a UK software consultancy specializing in open source compiler development and silicon chip modeling. He is author of the standard textbook “Introduction to Compiling Techniques” (McGraw-Hill 1990, 1995, 2003). Dr Bennett holds a MA and PhD from Cambridge University and joined the BCS in 1983. Contact him at jeremy.bennett@embecosm.com.

This paper was prepared for the BCS publication “Digital Leaders: Plan for the Future”.

This work is licensed under the Creative Commons Attribution 2.0 UK: England & Wales License. To view a copy of this license, visit creativecommons.org/licenses/by/2.0/uk/ or send a letter to Creative Commons, 171 Second Street, Suite 300, San Francisco, California, 94105, USA.

This license means you are free:

to copy, distribute, display, and perform the work
to make derivative works

under the following conditions:

Attribution. You must give the original author, Jeremy Bennett, credit;
For any reuse or distribution, you must make clear to others the license terms of this work;
Any of these conditions can be waived if you get permission from the copyright holder, Embecosm; and
Nothing in this license impairs or restricts the author’s moral rights.

Open Source Specialist Group

A Specialist Group of the British Computer Society