Improvements in the performance and energy consumption of general purpose processors have slowed dramatically over the last decade. This is due to the combined effect of breakdowns in transistor scaling, causing severe chip-level power limitations, and monolithic and inefficient general purpose microarchitecture.
In this work, I propose and evaluate a concept called “behavioral specialization”, where the design of general purpose processors is modularized by adding programmable offload engines, each best-suited for different program characteristics. To explore this principle, I designed a modular general-purpose core which transparently improves performance and energy efficiency by integer factors. I also extend these principles to create an architecture for highly-regular and parallelizable workloads for a further order of magnitude improvements.
This work reveals that 1) a small number of domain-agnostic program behaviors can cover a majority of applications, 2) principles of VonNeumann and Dataflow architectures can be used synergistically to efficiently target different program behaviors, and 3) programmable architectures can be competitive with domain specific alternatives, with only small area and negligible energy overheads. Overall, behavioral specialization causes disruptive change in microprocessor tradeoffs -- for example, enabling mobile-class processor energy-efficiency with desktop-class performance.