Performance anxiety: the end of software’s free ride

We are consumers of software that is ever more capable, diverse and clever.

Our Google queries, our Facebook experience, our ability to play HD movies on our iPads, and the convenience of reading emails on our phones, all depend on computing power that we don’t see and don’t usually give a second thought to.

This progress is quietly driven by improvements in computer hardware. Many of us find it unremarkable that a $600 iPad can outperform the Cray 2, which not so long ago was the fastest computer on Earth.

Unfortunately the source of these endless performance improvements is drying up, and the free ride so long enjoyed by software developers is in jeopardy. Worse, this is occurring at a time when software has become more important than ever.

For decades, hardware advances were fueled by two givens: Moore’s Law and Dennard Scaling.

Moore’s Law says that the number of transistors on a chip roughly doubles every two years as advances in device physics yield smaller transistors.

Dennard Scaling is less well known but no less significant – it states that as a transistor shrinks, both switching time (the time needed for a transistor to go from a non-conducting state to a conducting state) and power consumption will fall proportionately.

Together these tell us that we should expect transistors to get smaller, faster and more power efficient with every technology generation.

For many years this was true – hardware simply got faster, delivering performance improvements that lead to all of our software running faster, as if by magic.

Multicore

Unfortunately, physics got in the way in the end. Wire delay (the time it takes a signal to propagate along a length of microscopic wire) became a limiting factor.

While for many years a signal could traverse the entire chip at each tick of the computer’s clock, today only a tiny fraction of the chip is accessible in the time it takes the clock to tick. This is because today’s clocks run faster and today’s on-chip wires are so small that signals propagate more slowly.

To combat this problem, hardware manufacturers turned to multicore designs (a single computing component with two or more independent processors, or “cores”).

Rather than using the surfeit of transistors to make the chip’s processor ever larger and more capable, they put multiple cores on each chip.

But just as two cars are unlikely to get you to work faster than one, the addition of another core is often unhelpful in completing a computing problem more quickly.

This observation was made famous by Gene Amdahl back in 1967, when he coined what we now know as Amdahl’s Law: the speed-up gained by using multiple processors is limited by the time needed to complete the portion of the program that cannot be made parallel (i.e. spread across multiple cores).

What this means is that today’s software developers have a major challenge on their hands.

Hardware advances, which were once delivered as transparent performance improvements (a faster car) now increasingly come in the form of hardware parallelism (two cars). The former meant existing programs ran faster as if by magic. The latter is only helpful for particular classes of problem (moving a football team, perhaps).

When this situation became clear in 2007, Stanford University President John Hennessy said:

“When we start talking about parallelism and ease of use of truly parallel computers, we’re talking about a problem that’s as hard as any that computer science has faced … I would be panicked if I were in industry.”

Unfortunately things are set to get worse. Multicore hardware is just the first of three seismic changes that herald the end to software’s free ride.

Heterogeneity

Today’s multicore designs comprise a relatively straightforward combination of orthodox processor cores.

But acknowledging Amdahl’s law, many designers now believe that we need a more complex combination of simple and powerful cores on each chip.

The portions of a task that do exhibit parallelism can be efficiently solved by many simple cores.

How so? Well, consider the problem of moving a large number of commuters across Manhattan: thousands of unsophisticated yellow taxis would be perfect for the job.

But those portions of the task that lack parallelism still require a large, capable core in order to be solved quickly.

In other words, consider the problem of getting a person to the moon: one very sophisticated Saturn V rocket would be appropriate to the task.

A heterogenous central processing unit (CPU) – often referred to as “the brain” of computers – may offer both the taxies and the rocket, side by side.

Unfortunately, heterogeneity takes us even further from the world of transparent performance improvements.

This second major change in computer hardware means that software must now not only exhibit parallelism, but must also be capable of somehow effectively utilising complex, non-uniform hardware resources.

Customisation and energy

But it’s another major change that’s set to be most disruptive to computer science. Although Moore’s Law continues to deliver us transistors, Dennard Scaling is coming to an end.

In practice, power densities on chip have become so high that we can no longer fully power an entire chip lest we melt the silicon. This radically changes the economics of microarchitecture.

For the past 40 years, a relative scarcity of transistors lead to a mantra of generality. Customisation is an unjustifiable luxury when transistors are scarce, but energy is in good supply. Therefore each design must be as general as possible.

That mantra of generality, ingrained in the minds of generations of designers, needs a radical rethink as we become energy constrained.

This means that, as energy becomes the dominant concern, we must turn to custom chip designs.

This flies in the face of orthodox hardware design and has the potential to enormously complicate the task for software designers who must efficiently harness a large, complex, non-uniform set of computing resources.

If this were not enough, programmers, trained for decades to obsess over performance, now have an entirely new focus: energy.

To complicate matters further, programmers are not only not trained to optimise for energy, there are actually few tools to help them do so.

Thus software developers suddenly find themselves having to:

1) adapt to parallel hardware 2) adapt to heterogenous hardware 3) understand and optimise for energy rather than performance.

These are enormous challenges, and it will be fascinating to see how the software industry adapts.

Where to next?

The human capacity for innovation is breathtaking. A case in point is Intel’s announcement earlier this year that its 3D tri-gate transistor is ready for commercial use after about ten years of development.

At a time when we thought there was little room to move in transistor design, a deceptively simple idea changes the way we build the most fundamental element of computing technology.

This promises great improvements to performance and power consumption – which is particularly important for mobile devices.

The computing industry is extremely competitive, so Intel’s competitors will be hard at work developing competing technologies.

We find ourselves at a point of enormous change. The foundations of the computing landscape are radically shifting at a time when our appetite for software is growing faster than ever.

It’s hard to imagine where these trajectories will take us, but for computer science researchers the challenges are both imposing and exciting.

Performance anxiety: the end of software’s free ride

Author

Disclosure statement

Partners

Multicore

Heterogeneity

Customisation and energy

Where to next?

Want to write?