It’s not the flop count
But the data location;
The paradigm shifts.
Until recently, the analysis of algorithms emphasized finding the minimum number of operations required to complete a task to a given precision – the algorithmic complexity. This is natural when the operations are both the most time-consuming steps of the computation and a reasonable proxy for all other costs, including data motion.
Today, floating point operations (“flops”) are much cheaper in time and in energy than moving the data into and out of the processing unit while memory hierarchies based on multilevel caches deliver operands at latencies and energy costs that vary by several orders of magnitude, depending upon where the freshest copies of the data are found. This situation results in the resurrection of old algorithms and the design of new ones that may do many more flops than previously “optimal” methods, but move less data back and forth from remote locations, and thus finish in less time, with smaller energy expenditure.
The author’s group has created several pieces of software that have displaced previous choices by optimizing memory transfers rather than flops. An example of a singular value decomposition that overcomes a flops handicap of as much as an order of magnitude is given in Sukkari et al (2016). For a community discussion of new paradigms on the path to exascale, see Dongarra et al (2011).
Sukkari et al, 2016, https://doi.org/10.1145/2894747
Dongarra et al, 2011 https://doi.org/10.1177/1094342010391989
David Keyes directs the Extreme Computing Research Center at KAUST, where he was a founding dean in 2009. He inhabits the intersection of Mathematics, Computer Science, and applications, with a focus on colonizing emerging energy-efficient architectures for scientific computations. He is a Fellow of SIAM and AMS and has received the ACM Gordon Bell Prize and the IEEE Sidney Fernbach Award. As a lover of poetry, he is delighted to discover the Sciku community.