Mark Hill - Efficiently Enabling Conventional Block Sizes for Very Large Die-stacked DRAM Caches
Room: CS 1221
This is Micro 2011 practice talk for a paper co-authored with Gabriel H. Loh of AMD.
Die-stacking technology enables multiple layers of DRAM to be integrated with multicore processors. A promising use of stacked DRAM is as a cache, since its capacity is insufficient to be all of main memory (for all but some embedded systems). However, a 1GB DRAM cache with 64-byte blocks requires 96MB of tag storage. Placing these tags on-chip is impractical (larger than on-chip L3s) while putting them in DRAM is slow (two full DRAM accesses for tag and data). Larger blocks and sub-blocking are possible, but less robust due to fragmentation.
This work efficiently enables conventional block sizes for very large die-stacked DRAM caches with two innovations. First, we make hits faster than just storing tags in stacked DRAM by scheduling the tag and data accesses as a compound access so the data access is always a row buffer hit. Second, we make misses faster with a MissMap that eschews stacked-DRAM access on all misses. Like extreme sub-blocking, our implementation of the MissMap stores a vector of block-valid bits for each “page” in the DRAM cache. Unlike conventional sub-blocking, the MissMap (a) points to many more pages than can be stored in the DRAM cache (making the effects of fragmentation rare) and (b) does not point to the “way” that holds a block (but defers to the off-chip tags).
For the evaluated large-footprint commercial workloads, the pro- posed cache organization delivers performance that is within 92.9% of the performance benefit of an ideal 1GB DRAM cache with an impractical 96MB on-chip SRAM tag array.
Biography (since conference talks are usually given by grad students, not senior professors):
Hill began graduate studies in 1981 at Berkeley with David Patterson and Alan Smith. Fortunately, he has finished. He worked on the SPUR multiprocessor workstation and his thesis included developing the 3C model of cache behavior (compulsory, capacity, and conflict misses) and identifying when direct-mapped caches can out-perform set-associative ones ("Big and Dumb is Better"). Hill subsequently joined Wisconsin as an assistant professor.
