Predicting Web Cache Behavior Using Stochastic State-Space Models

Web caches play an important role in improving the surfing experience of Web clients and reducing the network traffic seen by Web servers. Accurate analytical models of Web caches are desirable as they can provide inexpensive ways to make resource provisioning decisions at a cache itself as well as at the Web servers it is servicing. Explicitly modeling a Web cache has two major shortcomings: (i) it makes several simplifying assumptions about the operations of the cache for mathematical tractability resulting in loss of accuracy, and (ii) it requires measurements of phenomena internal to the cache that may not always be available without adding monitoring hooks within the cache. Therefore, in this paper, we turn toward statistical techniques to develop a model that is non-intrusive (that is, requires no additions to the cache) and treats the Web cache as a black-box (that is, operates solely by observing readily available inputs/outputs and requires no knowledge about the internals of the cache). Relying on the intuition that the internal dynamics of a cache can be captured by a first-order time-dependent process, where the new cache state depends entirely on the current state and the incoming request, we develop a model called State-space Modeled Cache Prediction (SMCP), based on the well-studied linear Gaussian state space model, to observe, characterize, and predict the hit ratios at a Web cache. A comparison with three time-independent models, including one based on Linear Regression (LR), validates our intuition for the need to employ a time-dependent model. A detailed evaluation using the request trace at IRCache shows the efficacy of our model with LRU and LFU, two representative cache replacement policies. In our experiments, SMCP predicts hit ratio within 0.05 (absolute value) of their actual value 81.4% of the times for LRU and 65% of the times for LFU. Secondly, SMCP captures the time-varying behavior more accurately than done by several time-independent models.