################################################################################################################ # This package contains an implementation of the cache performance models described in the paper: # # # # Rathijit Sen and David A. Wood, "Reuse-based Online Models for Caches", # # Proceedings of the ACM SIGMETRICS/international conference on Measurement and modeling of computer systems, # # pp. 279-292, 2013. # # # # DOI Link: http://dx.doi.org/10.1145/2465529.2465756 # # # # If your use of this package contributes to a published paper, we request that you cite the above paper. # # # # Please email any comments or suggestions to rathijit at cs dot wisc dot edu. We appreciate your feedback. # ################################################################################################################ PACKAGE CONTENTS: ---------------- This package contains 24 files, described below: * README.txt (this file) * Example r distribution: r(1): example_r_S1.txt (example r distribution for S=1) r(2^10): example_r_S1024.txt (r distribution for S=2^10 generated from example_r_S1.txt using rB.cpp (see below)) * Common files and Utilities: Common Macros: common.h (constants for sizing various arrays) Utilities: util.h, util.cpp (methods to read r from a file, compute d from r) * Per-set Locality transformers: Binomial: binomial.h, binomial.cpp (method to compute r.B, where B is a generalized stochastic Binomial matrix) Poisson: poisson.h, poisson.cpp (method to compute r.BP, where BP is a Poisson approximation to B) * Example Programs: read r and compute d: d.cpp read r and compute r'=rB: rB.cpp * Online Miss ratio predictors: Base predictor class: pred_mr.h, pred_mr.cpp (routines used by predictors for all replacement policies) Derived classes: LRU: pred_lru.h, pred_lru.cpp (LRU-specific predictor) PLRU: pred_plru.h, pred_plru.cpp (PLRU-specific predictor) RANDOM: pred_random.h, pred_random.cpp (RANDOM-specific predictor) NMRU: pred_nmru.h, pred_nmru.cpp (NMRU-specific predictor) Driver: driver.cpp * Makefile INSTALLATION: ------------ 1) Download the tarred and gzipped file: reuse_models_cache_wisc.tar.gz 2) Extract files: tar -xzf reuse_models_cache_wisc.tar.gz 3) cd reuse_models_cache 4) make all. This will create 3 programs: online_mr, gen_d, gen_rB EXPERIMENTS: ----------- * ./online_mr PLRU This program reads in example_r_S1024.txt and predicts miss-ratios for LRU/PLRU/RANDOM/NMRU. You should see an output similar to the following: Avg. Computation Time: 0.017 msec 2MB 2-way 0.24765956 2MB 4-way 0.20913283 2MB 8-way 0.19433198 2MB 16-way 0.18866381 2MB 32-way 0.18325245 4MB 2-way 0.14304311 4MB 4-way 0.11239636 4MB 8-way 0.10281167 4MB 16-way 0.09919409 4MB 32-way 0.09762271 8MB 2-way 0.08368735 8MB 4-way 0.06620499 8MB 8-way 0.06175478 8MB 16-way 0.06021207 8MB 32-way 0.05957142 16MB 2-way 0.05101016 16MB 4-way 0.04160668 16MB 8-way 0.03935462 16MB 16-way 0.03861996 16MB 32-way 0.03834320 32MB 2-way 0.03366558 32MB 4-way 0.02806263 32MB 8-way 0.02663976 32MB 16-way 0.02611063 32MB 32-way 0.02589918 The first two columns specify the target cache configuration. The third column specifies the estimated miss ratio (miss/access). * ./gen_rB This program reads in example_r_S1.txt and prints the transformed distribution as in example_r_S1024.txt. It needs a few seconds to calculate depending on the size of the input r distribution. * ./gen_d This program reads in example_r_S1.txt and generates d. The first column in the output specifies the URD whereas the second column specifies the expected ARD given that URD. IMPORTANT NOTES: --------------- * We tested the package using g++ for x86_64-redhat-linux, gcc version 4.4.6 20120305 (Red Hat 4.4.6-4), on a Linux 2.6.32-358.0.1.el6.x86_64 system (Nehalem, 2.26 GHz). * Each line of the input file for r must be of the form "i P(URD=i)" with i in increasing order. r_i = P(URD=i) must be in [0,1], \sum {r_i} <= 1. Refer to the example files example_r_S1.txt and example_r_S1024.txt. * r_\infty is never explicitly specified in the input file. Whenever needed, e.g., estimating d, it is calculated as 1-\sum {r_i} So, it is usually the case that for the input file \sum {r_i} < 1. r_\infty does not participate in matrix transformations for per-set locality. * Many routines in the implementation use "num_diff_set_bits". This is -log2(S/S') with S <= S'. S = #sets in the source cache configuration, S' = #sets in the target cache configuration * "r_tgt" means the same as "r'" which is r for the target cache configuration (with #sets = S'). * The online miss ratio predictors assume r(2^10) for 512 positions as input. The other programs, gen_rB and gen_d, can work over a wider range. * The online miss ratio predictors use the Poisson per-set locality transformer. * While B is a (lower) triangular matrix, BP is not.