Analysis of Working Set Model to enhance the efficiency of Memory Management

Ankit Bagde
Advances in Operating Systems
20 min readOct 13, 2020

This article is submitted as a reading assignment for the course of Advances in Operating Systems by Adhikansh Singh, Ankit Bagde, Gaurav Goyal, Rajat Kumar Jenamani, and Rohan Henry Ekka

Introduction

The Computer Memory System implements various levels of Storage media for storage purposes to optimize the memory access and best use the functionalities provided by the various levels of memory like — Primary Memory, Secondary Memory. With the advent of Multiprogramming and Memory Virtualization, the problem of managing and handling different layers of Memory became a critical aspect of the performance of the memory system. The most basic problem was memory resource allocation because of various constraints out of which the major one is finiteness constraint.

Because the Primary memory is limited, it is expanded through virtualization but then this causes time delay for access to data in virtual memory. Thus the problem boils down to :

How and what should we keep in the primary memory so as to minimize the access to virtual memory?

This article focuses on the various approaches to solve the above problems in an efficient manner.

The Principle of Locality is utilized as the foundational basis of approaching the problem of Memory Management. The concept of locality of reference(reference of a page) refers to the experimentally observed phenomenon that for relatively extended periods of time a program references only some subset of its namespace or virtual address space.

The Properties and behavior of computations operating in the multiprogrammed environments are embodied using a new model called the Working Set Model which utilizes the principle of the locality to help decide which information in memory is in use by a running program and which is not. The various Properties of Working Set Model and Behaviour of the Working Set Model are discussed in this article. The various properties and behavior provides the basis for the development and design of various scheduling and paging algorithms which increases the efficiency of memory management.

The design of Working Set Dispatcher is proposed based on the pretext that it will aim to decrease the response time to memory transactions where users perform highly interactive transactions. This is proposed by exploiting the properties of the process and program localities.

The Properties of the Program Localities, like Bounded Locality Intervals(BLI) and Activity Sets, are being exploited to come up with various concepts to enhance the efficiency of paging. All these criteria help us decide the answer to what we should keep in the limited primary memory.

Therefore, this article broadly tackles the problem of increasing the efficiency of Memory Management, primarily focused on paging, by approaching through various perspectives. The Problem is approached by using Properties of Computations, Pages Accesses, Properties and Behaviour of Localities, and the Properties of processes and then the result is compiled together by interrelating all the different approaches

* Characteristics of Program Localities

Here we discuss the concept of “locality of reference”, which plays a very important role in the characterization of program behavior, its implications for the design of memory management schemes in OS, and its effect on the performance of computer systems, providing a formal definition and a corresponding mechanism for the detection of the localities along with experimentations for exploring different characteristics of Program Localities.

What is Locality of Reference?

It refers to a phenomenon in which a computer program tends to access the same set of memory locations for a particular time period. In other words.The intuitive notion of a “locality” has two components:

  • A particular subset of segments (information units such as pages, etc.) is being referenced,
  • This favored subset remains unchanged over a definite time interval

A program’s sequence of information accesses can be represented as a tuple of (li, ti), where li is the set of segments referenced in the ith locality, and ti is its lifetime. This is not a useful description of the program’s behavior. The flaw here is that the program’s entire information set and period of execution could be considered as a single locality.

The challenge, therefore, is to determine those intervals of the program’s execution history that are of significant duration, involve references to a relatively small subset of the information set, and are in some sense “distinctive” compared to neighboring phases.

Bounded Locality Intervals(BLI) & Activity Sets

The paper derives entities bounded locality interval and activity sets that satisfy the above-mentioned requirements.

These terms are derived from the extension of the Least Recently Used (LRU) set,

L(t) = (L1(t), L2(t),…, Li(t)) where Li(t) is the ith most-recently-referenced segment at time t.

The topmost i segments in the stack Si(t) = {L1(t), L2(t) .. Li(t)} are the i most recently referenced segments at time t.

σ(t) = (σ1(t), σ2(t),… σi(t))

T(t) = (Tl(t), T2(t), . . . Ti(t))

where σ(t) is the time at which the segment in the ith stack position was last referenced and Ti(t) corresponds to the formation time of the set Si(t).

Activity set at time t, Ai(t) is defined as any Si(t) for which σ(t) > Ti(t).

A bounded locality interval is defined as the 2-tuple (Ai, r_i) where (Ai) is an activity set and r_i is its lifetime (duration at the top of the LRU stack).

A simple example is given below for time t=30:

sample reference string
LRU stack instance at t=30

The activity sets are {D}, {C, D}, and {A, B, C, D} [σ(t) > Ti(t)]

We can see that every activity set is the smallest member of a class of sets having the same formation time

The hierarchical nature of localities is observed by the given mechanism from time t=0 to t = 30. The level of a bounded locality interval is defined as its distance down in the hierarchy.

Various experiments were carried out for level one BLI hierarchy, the results of which are summarized below:

  1. A very large number of level-one BLI’s have very short lifetimes
  2. These short-lived level-one BLI’s, do not cover much of the program’s execution time & are formed mostly during transition periods
  3. The short-lived BLI’s, by their very nature, have rather small activity sets but make a negligible contribution to the data

Therefore, these properties of Program localities can be used in practice to come up with efficient Memory Management Algorithms.

* Working Set Model and its Properties

A computation is the fundamental activity in a computer system. It consists of a single process together with the information available to it. The “process” is one manifestation of a computation, in the form of a demand for a processor (a “processor demand”), and the “working set of information” is another manifestation of a computation, in the form of a demand for memory (a “memory demand”). A computation’s “system demand” will consist jointly of its processor and memory demands. The most basic reason for the absence of a general treatment of resource allocation is the lack of an adequate model for program behavior.

The working set model is implemented to model the behavior of computation and page access and then derive basic properties and quantities. The properties of the said model are found to be extremely helpful in developing various paging and scheduling algorithms. This subtopic focuses on the properties of paging in virtual memory and how it can be exploited using the Working set Model. The basic reasons which caused issues in the aspect of paging are

1. Complexity of the Computer Memory Systems

2. Lack of reliable prior information about the Memory demands of running programs

As a result, the adaptive program-behavior models were developed, to tackle the problem of slow paging, from which adaptive memory management policies can be derived. This subtopic focuses on deriving the various properties of the Working set model which helps to increase the efficiency of the paging process.

The Working Set -

The working set of information is the smallest collection of information that must be present in the main memory to assure the efficient execution of his program. To model the behavior of programs in the general-purpose computer system or computer utility, the operating system must determine on its own the behavior of programs it runs; it cannot count on outside help (such as from the user or the compiler).

The working set of information W(t, τ) of a process at time t is defined to be the collection of information referenced by the process during the process time interval (t-τ, t). The elements of W(t, τ) are regarded as pages, and thus the working set size w(t, τ) is: w(t, τ) = number of pages in W(t, τ).

The basic allocation problem, “core memory management,” is that of deciding just which pages are to occupy main memory. The fundamental strategy advocated here — a compromise against a lot of expensive main memory — is to minimize page traffic. Prior work has shown that the “ideal” algorithm should possess much of the simplicity of Random or FIFO selection (for efficiency) and some, though not much, accumulation of data on past reference patterns.

Denning Et al. [4] state that the working set W(t, τ) has four important, general properties. All are properties of typical programs and need not hold in special cases:

Property 1 — As a function of τ, w(t, τ) is monotonically increasing, and since more pages can be referenced in longer intervals, w(t, τ) is concave downward. The general character of w(t, τ) is suggested by the smoothed curve of the following figure.

Property 2 — For small time separations a, the set W(t, τ) is a good predictor for the set W(t + a, τ). On the other hand, for large time separations a(say, a>> τ) control will have passed through a great many program modules during the interval (t, t + a), and W(t, τ) is not a good predictor for W(t + a, τ).

Property 3- As τ is reduced, w(t, τ) decreases, so the probability that useful pages are not in W(t, τ) increases; correspondingly the rate at which pages are recalled to W(t, τ) increases. A process-time reentry rate λ(τ) defined so that the mean process time between the instants at which a given page reenters W(t, τ) is 1/λ(τ), and a real-time reentry rate φ(τ) defined so that the mean real-time between the instants at which a given page reenters W(t, r) is 1/φ(τ). A re-entry point is a reference instant that finds the page not in W(t, τ): at such an instant the page reenters W(t, τ). “Memory balance,” is a condition in which the collection of working sets residing in main memory at any time just consumes some predetermined portion β of the available M pages of main memory. The total return traffic rate Φ(τ) is defined to be the total reentry rate in real-time to main memory, when the working sets contained therein are not interrupted except for page waits. Since “memory balance” is an equilibrium condition, there must also be a flow of pages Φ(τ) from main to auxiliary memory. Therefore 2Φ(τ) measures the capacity required of the channel bridging the two memory levels.

Property 4 — The sensitivity function σ(τ) of a working set W(t, τ), a measure of the sensitivity of the reentry rate λ(τ) to changes in τ, is defined to be: σ(τ) = -d λ(τ) / dτ

Principle of Memory Management under the Working Set Model-

The working-set principle of memory management states-

1. Program may use a processor only if its working set is in main memory

2. No working-set page of an active program may be considered for removal from the main memory of the computer.

The Simulation results on RCM Spectra 70/46 and experimental observations of IBM TSS/ 360 provide evidence that the principle is viable and good for use in practical situations.

Assumptions and Principles for the Model -

The principle of locality on programs asserts the following assumptions -

1. In any interval of time, a program references pages in a non-uniform manner over the set of all the pages

2. The frequency with which a page is referenced by a program tends to change slowly, i.e. it is quasi-stationary

3. Correlation between immediate past and immediate future patterns of page references tends to be high, whereas the correlation between disjoint reference patterns tends to zero because the distance between the pattern tends to infinity

The locality is defined as the Subset of pages of a program in secondary memory. Programs make transitions from time to time among the localities and the pages in the current locality are referenced with higher probability. Thus, the Program working set should comprise pages in the current locality.

The Working set model is analyzed mathematically using the time-based analysis to come up with the following definitions and results which are further utilized in the program’s paging policies.

The basic deductions of the above analysis are -

Inter Reference interval density function is the negative slope of the missing page rate function, which is the slope of the average working set size function

DEFINITIONS -

1. Program P : n-page

Set N = {0,1,2,3,4,5,…n}; rt ∈ N

ρ = r1r2r3r4….rt…

rt = i => page i referenced at t th reference — t measures discrete process time.Definitions are represented as time averages rather than stochastic averages -

1. Working Set -

W(t,T) at time t — set of distinct pages referenced in time [t-T+1, t] .

T — Window size

w(t, T) — Number of pages in W(t, T)

2. Average Working size -

Average working size with Window T and averaged over t instances -

Average working set

The above limit converges due to conditions — W2 and the additional condition W3.

3. Missing page rate m(T) -

Measures the number of pages per unit time returning to the working set. Binary variable — for t>=0

Missing page rate -

4. Inter Reference Interval -

Let the page reference string be r1r2r3r4…..rt. Let two successive references occur to page i at times t and t + xi. Therefore, xi is called the interference interval of the page i . It is the time interval between its successive intervals. Inter Reference distribution for page i can be defined as -

Fi(x) — fraction of xi’s for which xi <= x

Inter reference density function for the page i is defined to be -

fi(x) — fraction of xi’s for which xi = x

Relative frequency of reference to pages i -

The page is i called recurrent page if

nR — It denotes the number of recurrent pages in set N of pages.

Overall functions are defined as -

1. Overall Density function

2. Overall Distribution function -

The Mean Overall inter reference interval is -

DEDUCTIONS -

The Analysis of the previously described functions establishes various properties and establishes the relationships between the functions s(T), m(T), and 𝑓i(x) -

Properties -

Property 1 :

States: s(T) is non-decreasing and bounded above and below

Property 2 :

States: Slope of s(T) is m(T) — Missing page rate

Property 3 :

States: m(T) is nonincreasing in T.

Property 4 :

States: m(T) is the Probability P[x>T]

Property 5 :

States : f(T+1) is the negative slope of the function m(T)

Property 6 :

Property 7 :

States: The Graph is concave down

Property 8 :

States: Limiting average working set size is nR

Property 9 :

Summary of Results can be depicted using two graphs —

  1. m(T) vs Window Size

2. nR vs Window Size

CONTRIBUTIONS -

1. The assumption that the reference substrings of interest are quasi-stationary and references are asymptotically uncorrelated can be used to extend the model for even better applications, like, the reference strings used by the program can be broken down into instruction references and data references and analogously the working set can be broken down into instruction and data working sets and better applications of the locality can be implemented to increase the efficiency of paging.

2. The effects of Overlapping working sets(Overlapping between Programs) can be analyzed with respect to memory demand and efficiency of access to further increase the efficiency of the paging process.

3. The Inter Reference Distributions are known to have clustering effects and that effects can be used to obtain better approximations of locality within the framework which can be further used to produce better paging algorithms.

4. The LRU Paging (Least Recently Used) is intimately related to the working set model. The curve m(T) of the Working set model can be used to estimate the page-fault rate of LRU paging and the working set model can be used to simulate the LRU paging.

CRITICISMS -

1. The basic problem is the assumption state in the section of principle of locality. This may be true for the programming standards of today’s scenario but it may not hold for the future as well because the principles are based on observations.

2. The assumption that most of the programs comprise a particular type of construct, as stated in the principle of locality section, renders this model inefficient in the areas where a different kind of model is frequently used and employed.

* Empirical working set behavior

The purpose of this paper is to present empirical data from actual program measurements, so that workers from field of Working Set Behavior can have experimental evidence upon which to substantiate and base theoretical work.

Overview

Definitions:

Reentry rate

l(t,T) -> Number of pages reentering per unit of time. [Reentry rate]

W(t,T) -> Collection of pages referenced by process in time span (t-T,t). [Working Set]

Reentry Rate

w(t,T) -> Number of pages in working set.

w’(t,T) -> Expected value of w(t,T)

Design of Experiment

  1. A wholly interpretive simulator designed to monitor any IBM System/360 program was used.
  2. A set of routines gathers information to pages a program references at every instruction.
  3. The pages are then annotated in a Boolean matrix kept in core.
  4. When the program requests an I/O operation, the pages it references are also noted down by a routine that examines the channel program.
  5. All data reduction is done after the monitor relinquishes control.
  6. The data are later presented on an IBM2250 Graphic Display Unit, and a conversational system permits the choice of parameters and of functions to be displayed. The data thus presented can be sent upon request to an offline plotter.
  7. To run the experiments, the monitoring system is loaded in a virtual machine running under the control of the Cambridge Monitor System, CMS.
  8. All programs were run using the operating system CMS, which was itself also monitored and worked under the illusion of having 256K 8-bit bytes of memory available in the virtual machine. This address space was divided in pages of lK, 2K, 4K, and 8K bytes, giving respectively 256, 128, 64, and 32 pages. This structure permits observation of the influence of page size upon the working set of programs.
  9. Any program running under CMS can be monitored. The programs available under the CMS system have not been written with a paging mechanism in mind. On the other hand, care has been taken when developing CMS to reorganize the system nucleus to produce fewer page exceptions.
Working Set

Results

  1. Halving the page size less than doubles the average working set size. More memory will thus be available for multiprogramming, the drawback being the in- creased page fault rate. For example, decreasing the page size from 4K to l K roughly quadruples the page fault rate. This could mean that paging channels which are adequate to handle the page traffic resulting from 4K page sizes might risk saturation.
  2. The optimal page size is primarily influenced by fragmentation and efficiency of page transport operations. From the point of view of fragmentation, the optimal page size should be close to 50 words.
  3. This optimal page size depends largely on the operating system.
  4. The fact remains that careful consideration must be given to the operating system’s structure before attempting to calculate optimal system parameters. Large page sizes are not only due to considerations about the efficiency of page transport operations. It is not clear whether for all operating systems there is a great discrepancy between the page size for maximum storage utilization and the page size for maximizing page transport efficiency.
  5. To determine optimal page size from the point of view of page transport efficiency, many factors come into play. The multiprogramming level of the system affects cpu utilization. Both the choice of page size and the choice of T affect the paging rate. The care must be taken to insure that paging channels do not become bottlenecks. The hardware characteristics of the devices involved and the system’s configuration must also be taken into consideration.

The object of the paper is to present data about program working set behavior. The data provided will be used to compare data obtained by experimental computer scientists and ascertain patterns in program behavior.

* The design, implementation, and evaluation of a working set dispatcher

We know that an important program property is locality. Intuitively, locality means that during a given interval of execution a program addresses only a subset of the total addressable space. From the notion of locality comes that of “working set”, and in this article, we discuss a theory of resource allocation based on this notion.

From the notion of locality comes that of “working set,” The working set, W(t, T), of a process is the set of virtual pages referenced by the process in the time span (t- T, t). Another variable, the working set size, w(t, T), is simply the number of pages in the working set.

As the process executes it will reference pages not present in the working set. We denote by l(t, T) the number of pages reentering the working set per unit time. Its expected value can be written as I(T), a function of T alone.

Types of Processes

The logged-in processes are divided into three classes.

Structure of working set dispatcher (arrows indicate paths followed by processes)
  1. Processes in the running set compete for the system’s resources.
  2. Processes in the ready set wait until system resources are freed and can be allocated to them.
  3. Processes in the blocked set do not demand either main memory or CPU.

Designing the Dispatcher

Time-sharing systems operate under one very important constraint that the response time to certain transactions where users performing highly interactive transactions must be low. Noninteractive processes can however take more time than usual.

So the running set(queue) is divided into 2 classes. The interactive users belong to the RUnning Set’s CONsole group, henceforth called ROSCON. The other users belong to the RUnning Set’s BATch group, denoted RUSBAT.

The ready set will be similarly divided, the two groups being called, respectively, RESCON and RESBAT. There is no such division in BOS, which is the BlOcked Set.

When processes in the running set become inactive, either because they cause a page fault or because they demand nonconcurrent I/O operations, they are not expelled from the running set unless the I/O operation is directed to a very slow unit, such as the virtual machine’s console. The dispatcher is entered only when an interrupt arrives. When control is to be given to a user program, the dispatcher tries to restart the process that was active at the moment the interrupt arrived.

In order to ensure the shortest response time for interactive machines, RUSCON will first be scanned in a round-robin fashion. Thereafter RUSBAT will be scanned, also in a round-robin fashion. There is no preemptive priority in the system. In other words, an interrupt that arrives and changes the state of a member of RUSCON from nonrunnable to runnable will not take away CPU control from a member of RUSBAT.

Promotion from READY SET to RUNNING SET

1.The primary criterion is memory balance. A process is promoted if the number of free pages in memory is greater than the process’ expected working set size. The last working set size computed serves as a predictor for the next working set size. The memory balance rule applies only to processes migrating from RESBAT to RUSBAT,

2. The dispatcher tries to ensure that computing power is equally distributed among the processes in the system. To do this, the time spent waiting in the ready set since entering it is divided by the time spent executing in the last stay in the running set. This quotient is called G. The process with the largest G that satisfies the memory balance condition is chosen to run. This ensures that no process will remain forever in the ready queue.

The actual values in the current version are burst size of 30 msec and quantum size of 1 see for members of RUSBAT, burst and quantum size for 30 msec for members of RUSCON. This last choice means that any request needing more than 30 msec to complete will be denied interactive status. Also, a process will be promoted to the running set without any regard to memory balance if its G factor is more than four times the average G factor.

Conclusion

Various approaches were discussed above tackling broad issues of resource allocation and memory management in order to enhance the performance of the computer systems, along with the experimental observations. Here, we try to provide the comprehensive impact of the outcomes of the above-discussed approaches.

Least Recently Used stack served as a starting point for the development of Bounded Locality Intervals and Activity sets, which leads to an explicit characterization of the phenomenon of locality of reference in actual programs. Experiments with level one BLI shows how program execution can be modeled as a sequence of residencies in fairly long-lived stable phases, wherein a small subset of segments are repeatedly referenced. Apart from this, the results can be used in simulation modeling for computer system performance studies, and as a guide for the development of models based on mathematical hypotheses.

The concept of the Working state model is introduced after observing that the interactions during resource allocations, which is intuitively a problem of “balancing” memory and memory demands against the equipment, can easily be exceedingly complex.

The evaluation of the working set dispatcher shows that both CPU utilization and interactive user response time have improved. Apart from this, the thrashing phenomenon has also become negligible.

References

--

--