CSE 451 05/28/03 prepared by: Beau Crawford and Benjamin Luque What was the point of today's lecture? The file system abstraction exists in order to simplify the user's life. The file system provides 4 things for the user. ***************************************************************** --1. Storage Abstraction A way of thinking of disk information, without thinking of the physical aspects of the cylinders, tracks, sectors etc. Most file systems consider a file to be a named collection of bytes. e.g. ----- "foo" maps to some blob. "foo" -->| | | | ----- The data may be laid out in any way on the physical disk. In other words, it may or may not have some internal structure. ***************************************************************** --2. Organizational Structure A way of grouping/clustering related files. Most file systems are based on "directories". Directories are represented in a hierarchical fashion (i.e. tree). There are several benefits to using this tree structure: ** Localized naming - files with the same name but in different directories (e.g. /tempA/foo.c vs. /tempB/foo.c) ** Provides logical/intuitive grouping ** Nice mathematical properties - fast searching of files/ on average O(log(n)) traversal (e.g. finding a file) ***************************************************************** --3. An Interface for File Access Every file system provides the same basic interface. The operations may have slightly different syntax, but the overall behavior is the same. For example, lets compare operations in UNIX and NT UNIX NT fd = create("name") CreateFile() read(fd, buf, sz) ReadFile() write WriteFile ... ... (note that 'fd' is the file descriptor) ------------------------------------------------------------- "Sessions" allow us to interact with a collection of bytes. They begin with a call to the 'open' or 'create' operation. These operations return the fd (file descriptor) which is a small integer used for indexing into the PPOFT table. see figure below: PCB -------- | | |------| PPOFT SWOFT | |----------> --------- --------- |------| | | | | | | |-------| ptr |-------| -------- 7| *----|-------->| | |-------| |-------| 8| | | | --------- |-------| | | \ / --------- \________________________/ \_________/ | | V V These exist for each process Shared by all processes PPOFT (Per Process Open File Table) SWOFT (System Wide Open File Table) Note that 'open' or 'create' searches the PPOFT table to find an unused entry, and returns the index to it (i.e. fd aka the "opaque handle"). ------------------------------------------------------------- We need some way of reading data from the file. There are three common classifications of files each having a different protocol for reading data from it: 1. Sequential Access Files Read bytes sequentially in order from the beginning to the end of the file. If it is a sequential access file the OS can optimize performance by minimizing the wait time of the process. for example, consider reading a mp3 file: for(;;){ read(fd, buf, sz); playMusic(buf, sz); } By using the "Read Ahead Technique" we can read one block ahead while playing the current block. There are two ways of telling whether or not a file is a sequential access file: i. Assume it is a sequential access file until proven otherwise. ii. By given access type information when the file is opened or created. e.g. open("name", AccessType); 2. Direct Access Files Read bytes randomly. The OS cannot take advantage of techniques that exploit the structure of a file. For example, we cannot take advantage of the Read Ahead Technique due to the lack of sequentiality. 3. Record Access Files The file system will figure out what read next. This makes the programmers job a little easier since there is one less thing to worry about. These types of files are common in databases that need access to structured data, but not in any particular order. Note that there exists a "seek pointer". This is an indicator of what will be read next from the file. This pointer can be moved to a different location if the corresponding byte is not supposed to be next. The seek pointer when used with sequential access can be used to simulate different access protocols such as: Direct Access & Record Access. ***************************************************************** --4. Access Control There are three thing to keep in mind when doing access control. 1. Principals - subjects/users 2. Object - files or directories 3. Operations - Read, Write, Delete, Create, Search We need all three to describe any access policy. Each user has a set of operations that they can perform on each object. This can be represented by a table called the Access Control Matrix. For example, /f1 /f2 /f3 John R W S Fred RW S D Sue C D CD