Today, power constraints determine our ability to keep compute units active and busy. Interestingly, storing and moving the data used and produced by the computation consumes more energy than the computation itself. Whether multicores, GPUs or fixed-function accelerators, how we move and feed the computation units has critical impact on the programming model and the compute efficiency. We observe that unlike the latency overhead of the data movement which could potentially be hidden, energy overhead dictates that we need to fundamentally reduce waste in the memory hierarchy.
Our... Read More