CSE 594

Autumn 1999

The Minirel Buffer Manager

 
 

Introduction

    Your heapfile manager code will call the Buffer Manager to request pages from the database. A database buffer pool is an array of memory buffers (also called frames) that are used to hold database pages (also called disk blocks) that have been read from disk into memory. A page is the unit of transfer between the disk and main memory and typically varies between 2K and 8K in size. Another important thing to note is that a page in memory is an exact copy of the corresponding page on disk when it is first read in. In other words, the way that you will layout the data structures in a page while it is in memory will be exactly the same as the way they will be laid out on the disk page.

    Since the database itself is usually about 100 times larger than amount of main memory available on a machine, only a subset of the database pages can fit in memory at any given time.  (For an antithetical view on this subject, read this.) The buffer manager is used to control which pages are memory-resident. Whenever a request is made (by higher layers of the database system, like your Heapfile layer) for a data page, the buffer manager must check to see whether the requested page is already in the buffer pool. If so, the buffer manager simply returns a pointer to the page. If not, the buffer manager frees a frame if necessary (possibly by writing to disk the page it contains, in case it is dirty) and then reads in the requested page from disk into the frame that has been freed.
 

Buffer Replacement Policies and the Clock Algorithm

    There are many ways of deciding which page to replace when a free frame is needed. Commonly used policies in operating systems are FIFO, MRU, and LRU. Even though LRU is one of the most commonly used policies it has high overhead and is not the best strategy to use in a number of common cases that occur in database systems. Instead, many systems use the clock algorithm which approximates LRU behavior and is much faster.

 
    The above figure illustrates the execution of the clock algorithm. Conceptually, all the frames in the buffer pool are arranged in a circular list. Associated with each frame is a refbit. Each time a page in the buffer pool is accessed (via a readPage() call) the refbit of the corresponding frame is set to true. If readPage needs a page that is not currently in the buffer pool, the clock algorithm comes into play. At any point in time the clock hand (an integer whose value is between 0 and NUMBUFS - 1) is advanced (using modular arithmetic) in a clockwise fashion. For each frame that the clockhand passes, the refbit is examined and then cleared. If the bit had been set, the corresponding frame has been referenced "recently" and is not replaced. On the other hand, if the refbit is false, the page is selected for replacement (assuming it is not pinned). If the selected buffer frame is dirty (i.e. it has been modified), the page currently occupying the frame is written to disk. Otherwise the frame is just cleared (using the clear() function) and a new page from disk is read in to that location. The details of the algorithm are given below.
 
 
 

The Structure of the Buffer Manager

    The buffer manager contains three classes: the BufHashTbl class that is used to map from file and page number to buffer pool frame, the BufDesc class for capturing the state of each buffer frame, and the main BufMgr class. There will be one instance of the BufHashTbl class, one instance of the BufDesc class for each buffer frame and one instance of the BufMgr class. These classes are described in detail below.
 

The BufHashTbl Class

    Since a buffer pool will typically contain several thousand frames, searching it sequentially each time a page request is made will not provide reasonable performance. One solution is to use a hash table that maps (File *file,  int Page) pairs (contained in the FileAndPage struct) to frame numbers.
 

struct FileAndPage
{
  File* file;    // pointer to a file object
  int   pageNo;  // page number within a file
}

The FileAndPage struct (along with a couple of member functions) is in the file buf.h

The following is the definition of the BufHashTbl class (given in buf.h).

class BufHashTbl
{
private:
  typedef hash_map<FileAndPage, unsigned int> BufHashTable;
  BufHashTable hashTable;  // Hash table for locating frame

public:
 
    // insert entry into hash table mapping (file,pageNo) to frameNo
  const Status insert(File* file, const int pageNo,
                      const unsigned int frameNo);

    // Check if (file,pageNo) is currently in the buffer pool (ie. in
    // the hash table) and return frameNo if found.
  const Status lookup(File* file, const int pageNo,
                      unsigned int & frameNo) const;

    // delete entry (file,pageNo) from hash table.
  const Status remove(File* file, const int pageNo);
};
 
 
    Notice that the BufHashTbl class definition that we have supplied uses the hash_map class from the Standard Template Library (STL) to implement the hash table that maps an instance of the FileAndPage struct to an unsigned int (the frameNo).

const Status insert(File* file, const int pageNo, const unsigned int frameNo)

    Inserts a new entry into the hash table mapping file and pageNo to frameNo. Returns HASHTBLERROR if the entry already exists and OK otherwise.

const Status lookup(File* file, const int pageNo, unsigned int & frameNo) const

    Finds the frameNo that is obtained by hashing file and pageNo. Returns HASHNOTFOUND if entry is not found and OK otherwise.
 
const Status remove(File* file, const int pageNo)

    Removes the entry obtained by hashing file and pageNo. Returns HASHTBLERROR if entry is not found, OK otherwise.
 

 The BufDesc Class

    The BufDesc class is used to keep track of the state of each frame in the buffer pool.

class BufDesc {
    friend class BufMgr;
private:
  File* file;     // pointer to file object
  int   pageNo;   // page within file
  int   pinCnt;   // number of times this page has been pinned
  bool dirty;     // true if dirty;  false otherwise
  bool valid;     // true if page is valid
  bool refbit;    // has this buffer frame been referenced recently?

  void Clear() {  // initialize buffer frame for a new user
        pinCnt = 0;
        file = NULL;
        pageNo = -1;
        dirty = refbit = false;
        valid = false;
  };

  void Set(File* filePtr, int pageNum) {
      file = filePtr;
      pageNo = pageNum;
      pinCnt = 1;
      dirty = false;
      refbit = true;
      valid = true;
  }

  BufDesc() {
      Clear();
  }
};

First notice that all attributes of the BufDesc class are private and that the BufMgr class is defined to be a friend. While this may seem strange, this approach restricts access to only the BufMgr class. The alternative (making everything public) opens up access too far.

The purpose of most of the attributes of the BufDesc class should be pretty obvious. The dirty bit, if true, indicates that the page is dirty (i.e. has been updated) and thus must be written to disk before the frame is used to hold another page. The pinCnt indicates how many times the page has been pinned. The refbit is used by the clock algorithm. The valid bit is used to indicate whether the frame contains a valid page.
 

The BufMgr Class

    The BufMgr class is the heart of the buffer manager.

class BufMgr
{
private:
  unsigned int   clockHand;  // clock hand for clock algorithm
  BufHashTbl     hashTable;  // hash table mapping (File, page)
                             // to frame number
  BufDesc       *bufTable;  // vector of status info, 1 per page
  unsigned int   numBufs;    // Number of pages in buffer pool
  BufStats       bufStats;   // Statistics about buffer pool usage
 

  const Status allocBuf(unsigned int & frame);   // allocate a free
                                // frame using the clock algorithm
public:

  Page           *bufPool;   // actual buffer pool
 

  BufMgr(const unsigned int bufs);
  ~BufMgr();

  const Status readPage(File* file, const int PageNo, Page*& page);

  const Status unPinPage(File* file, const int PageNo,
                         const bool dirty);

  const Status allocPage(File* file, int& PageNo, Page*& page);
 
  const Status flushFile(File* file);

  const Status disposePage(File* file,
                           const int PageNo);

  const BufStats & getBufStats() const // Get buffer pool usage
    {
      return bufStats;
    }

  void clearBufStats()
    {
      bufStats.clear();
    }
};
 
 
 BufMgr(const unsigned int bufs)
 
    Allocates a buffer pool with bufs page frames and a corresponding BufDesc table. The way things are set up all frames will be in the clear state when the buffer pool is allocated. The hash table will also start out in an empty state.

~BufMgr()

    Flushes out all dirty pages and deallocates the buffer pool and the BufDesc table.

const Status allocBuf(unsigned int & frame)

    Allocates a free frame using the clock algorithm; if necessary, writes a dirty page back to disk. Returns BUFFEREXCEEDED if all buffer frames are pinned, UNIXERR if an OS file system error occurred when a dirty page was being written to disk and OK otherwise.

const Status readPage(File* file, const int PageNo, Page*& page)

    If the page is already in the buffer pool, sets the refbit, increments the pinCnt and then returns a pointer to the buffer frame containing the page (via the page parameter). If the page is not in the buffer pool, allocBuf() is called to allocate a buffer frame. Then the method file->readPage() is called to read the page into the buffer pool frame from the disk. Finally, Set() is invoked on the frame to set it up properly.
    Returns OK if no errors occurred, UNIXERR if an OS error occurred, BUFFEREXCEEDED if all buffer frames are pinned, HASHTBLERROR if a hash table error occurred.

const Status unPinPage(File* file, const int PageNo, const bool dirty)

    Decrements the pinCnt of the frame containing (file, PageNo) and if dirty == true, sets the dirty bit.Returns OK if no errors occurred, HASHNOTFOUND if the page is not in the buffer pool hash table, PAGENOTPINNED if the pin count is already 0.

const Status allocPage(File* file, int& PageNo, Page*& page)

    Allocates a new page in the specified file. Returns the page number of the page via the pageNo parameter and a pointer to the buffer frame allocated for the page via the page parameter. Returns OK if no errors occurred, UNIXERR if an OS error occurred, BUFFEREXCEEDED if all buffer frames are pinned and HASHTBLERROR if a hash table error occurred.

const Status disposePage(File* file, const int PageNo)

    Deallocates the page denoted by PageNo from the file. The page may or may not currently exist in the buffer pool. If it exists, then the frame containing the page must be cleared. Returns OK if no errors occurred,  UNIXERR if an OS error occurred,  PAGEPINNED if the page is currently pinned in the buffer pool.

const Status flushFile(File* file)

    This method will be called by DB::closeFile() when all instances of a file have been closed (in which case all pages of the file should have been unpinned). flushFile() should scan bufTable and for each dirty page it should  call file->writePage() to flush the dirty page to disk. Note that if the page is not dirty, nothing needs to be done. In other words some pages of the file might remain in the buffer pool even after the file has been closed. These pages might become useful if the file is opened again. Returns OK if no errors occurred and PAGEPINNED if some page of the file is pinned.
 

Buffer Manager Statistics

    To monitor buffer pool usage and the effectiveness of the buffer replacement strategies, it is useful to have a data structure that counts various buffer manager events. For this purpose, we have defined a structure called BufStats in buf.h:

struct BufStats
{
  unsigned accesses;    // Total number of accesses to buffer pool
  unsigned diskreads;   // Number of pages read from disk
  unsigned diskwrites;  // Number of pages written back to disk

  void clear()
    {
      accesses = diskreads = diskwrites = 0;
    }
 
  BufStats()
    {
      clear();
    }
};

    The statistics can be retrieved using the getBufStats() and cleared using clearBufStats().