CSE 451 05/28/03
prepared by: Beau Crawford and Benjamin Luque

What was the point of today's lecture?
    The file system abstraction exists in order to simplify
    the user's life.      

The file system provides 4 things for the user. 

*****************************************************************

--1. Storage Abstraction 
    A way of thinking of disk information, without thinking of 
    the physical aspects of the cylinders, tracks, sectors etc.

    Most file systems consider a file to be a named collection 
    of bytes. e.g.
				      -----
    "foo" maps to some blob. "foo" -->|   |    
				      |   |
				      -----

    The data may be laid out in any way on the physical disk. 
    In other words, it may or may not have some internal 
    structure.

*****************************************************************
    
--2. Organizational Structure 
    A way of grouping/clustering related files.

    Most file systems are based on "directories". Directories
    are represented in a hierarchical fashion (i.e. tree). There
    are several benefits to using this tree structure:
	** Localized naming - files with the same name but in 
			      different directories
			      (e.g. /tempA/foo.c vs. /tempB/foo.c)
	** Provides logical/intuitive grouping
	** Nice mathematical properties - fast searching of files/ on 
					  average O(log(n)) traversal
					  (e.g. finding a file)

*****************************************************************
		
--3. An Interface for File Access
    Every file system provides the same basic interface. The
    operations may have slightly different syntax, but the overall
    behavior is the same. For example, lets compare operations in
    UNIX and NT

    UNIX				   NT
    fd = create("name")			   CreateFile()
    read(fd, buf, sz)			   ReadFile()
    write				   WriteFile
    ...					   ...
    (note that 'fd' is the file descriptor)

    -------------------------------------------------------------
    
    "Sessions" allow us to interact with a collection of bytes.
    They begin with a call to the 'open' or 'create' operation.
    These operations return the fd (file descriptor)  which is 
    a small integer used for indexing into the PPOFT table. 
    see figure below:

      PCB
    --------
    |      |
    |------|             PPOFT		    SWOFT
    |      |----------> ---------         ---------
    |------|	        |       |         |	  |
    |      |	        |-------|  ptr    |-------|	
    --------	       7|  *----|-------->|	  |
		     	|-------|	  |-------|
		       8|	|	  |       |		
		     	---------         |-------|
					  |	  |		
    \			       /	  ---------
     \________________________/          \_________/
		  |			       |
		  V			       V
      These exist for each process	   Shared by
					   all processes
		  
    PPOFT (Per Process Open File Table)
    SWOFT (System Wide Open File Table)

    Note that 'open' or 'create' searches the PPOFT table to
    find an unused entry, and returns the index to it (i.e. fd
    aka the "opaque handle").

    -------------------------------------------------------------

    We need some way of reading data from the file. There are
    three common classifications of files each having a different
    protocol for reading data from it:
    
    	  1. Sequential Access Files
	     Read bytes sequentially in order from the beginning to
	     the end of the file. If it is a sequential access file
	     the OS can optimize performance by minimizing the wait
	     time of the process. for example, consider reading a
	     mp3 file:
			for(;;){
			   read(fd, buf, sz);
			   playMusic(buf, sz);
			}
	     By using the "Read Ahead Technique" we can read one
	     block ahead while playing the current block.

	     There are two ways of telling whether or not a file
	     is a sequential access file:
	        i.  Assume it is a sequential access file until proven
		    otherwise.
		ii. By given access type information when the file
		    is opened or created. e.g.
		      open("name", AccessType);

	   2. Direct Access Files
	      Read bytes randomly. The OS cannot take advantage
	      of techniques that exploit the structure of a file.
	      For example, we cannot take advantage of the Read
	      Ahead Technique due to the lack of sequentiality.

	   3. Record Access Files
	      The file system will figure out what read next. This
	      makes the programmers job a little easier since there
	      is one less thing to worry about.  These types of files
	      are common in databases that need access to structured
	      data, but not in any particular order.

	   Note that there exists a "seek pointer". This is an
	   indicator of what will be read next from the file.  This
	   pointer can be moved to a different location if the
	   corresponding byte is not supposed to be next.
	   The seek pointer when used with sequential access can
	   be used to simulate different access protocols such as:
	   Direct Access & Record Access.

*****************************************************************

--4. Access Control
    There are three thing to keep in mind when doing access
    control.
	1. Principals - subjects/users
	2. Object - files or directories 	           			
	3. Operations - Read, Write, Delete, Create, Search

    We need all three to describe any access policy. Each user
    has a set of operations that they can perform on each object.
    This can be represented by a table called the Access Control
    Matrix. For example,

	    /f1		/f2	     /f3
    John    R		W	     S
    Fred    RW		S	     D
    Sue     C		D	     CD