CS322: File Systems

Introduction

Earliest devices were usually serial and therefore needed simple drivers

In our survey of the history of operating systems, we saw that the first step up from the bare machine was the provision of a library of input output device drivers. In the earliest days, input output devices were typically sequential in nature (e.g. cards, tape, printers), so that the services provided by these drivers were relatively simple:
- Read a record (card, tape block)
- Write a record (card, tape block, printer line)
- Position device (e.g. rewind tape, form-feed printer)
- Check for end of medium (e.g. last card read, end of tape)
Direct-access devices require more sophisticated drivers

With the introduction of direct-access storage devices (drum and disk), access became more complicated: Each read and write must now specify the location on the device where it is to be done - or, alternately, must be preceded by a position device function that embodies that information.
1. At the physical level, disks may use both - e.g. a position device function is used to move the head to the correct track, and the read/write operation specifies the desired sector on the track.
2. However, most of the time programmers want to just issue a read or write operation, leaving the details of positioning to a lower level of software.
Drum/Disk devices contain files (data) belonging to many applications

Further, with sequential devices the application program can assume that the entire device belongs to it - e.g. if it is reading from a card reader, then all of the cards in the input hopper (at least up to the $EOJ control card) are input to the current job. A drum/disk typically contains files belonging to many applications. Some of these files belong to the one or more jobs currently running on the machine, while others belong to jobs not currently active. Thus, we must ensure that a given application accesses its data and no other.
Modern operating systems use a file system to hide the details...

For these reasons, all modern operating systems incorporate a file system that hides many of the details of the storage device from the application program. The file system in essence creates an abstract data type file, and provides various primitives such as:
- open file
- close file
- read next record (from previously opened file)
- write next record (to previously opened file)
The file system is a noticeable feature of an operating system

The file system is often the most noticeable feature of an operating system. It is the portion of the operating system most commonly accessed by both the application program and user interfaces.
1. Non-file device drivers are often accessed through the file system
  
  IO accesses to non-file-structured devices generally pass through the file system and are made to look like file operations. Unix uses files in the /dev directory to access devices. The file system translates this into an access to the appropriate driver, transparently to the user. Some systems (Linux, Solaris, etc.) use a /proc directory to access system data structures.
2. Some systems use logical names for devices
  
  Some operating systems carry this further through the use of logical names for IO devices. Instead of referring to a specific physical device, a program may refer to a logical name such as SYS$INPUT. Before the program is run, an operating system function is invoked to bind this logical name to a specific device (and possibly a specific file on the device). This binding may vary from run to run.
  1. A given program, run interactively on a timesharing system, may do IO to the interactive terminal.
  2. The same program, run as a batch, may do IO through cards, tape, or a disk file - without modification.
  3. The technical name for this is device independent IO.
3. The central role of the file system shows up in the occurrence of the abbreviation "DOS" (disk operating system) in the names of some operating systems - e.g. MS/DOS.

File systems are usually built as a layer on top of the physical device drivers

For example: MS/DOS

         BDOS         User/application interface invokes
    --------------      a file system operation such as OPEN,
   |     BIOS     |     READ, etc.
   |  ----------  |
   | | Hardware | |   File system translates this into device
   ----------------     driver calls to position the head etc.

Our focus in this lecture will be specifically on disk file systems.
1. We will not dwell further on the role of the file system in handling other types of IO. This is largely a pass-through operation.
2. Though file systems with directories, etc. can be built on tapes as well as disks, we will say little about this. Many of the same principles apply.

File System Interfaces

A file system must interface to a number of other components of the overall computer system.

Hardware interface

As noted above, the file system usually interfaces to the hardware through device drivers. From the file system's vantage point, however, we can still speak of a "hardware interface" as being the physical devices as seen through the drivers - i.e. the file system's hardware interface is a set of primitives provided by the disk driver.
1. These primitives may correspond directly to physical operations of the disk drive...
  
  For example:
  1. Seek head(s) to a specified track/cylinder.
  2. (On a multi-surface disk) select a specific surface (head).
  3. Wait for a particular sector to come up under the head.
  4. Perform a read transfer of a specified number of bytes to a specified buffer in memory.
  5. Perform a write transfer of a specified number of bytes from a specified buffer in memory.
  This is a lower-level approach, since it requires the file system to "know" about the geometry of the disk (how many tracks, how many sectors per track, etc.)
2. Or... the device driver may hide some of the details of the device by allowing the file system to view the device as a one-dimensional array of sectors.
  
  For example, the file system may issue a request to read from sector number 3713. The device driver may then translate this into the correct cylinder, surface, and sector. E.g., if there are 40 cylinders numbered 0..39, 10 surfaces numbered 0..9, each with 20 sectors per cylinder numbered 0..19, then the device driver might map sectors 0..19 to cylinder 0, surface 0; 20..39 to cylinder 0, surface 1, etc. In this case, sector 3713 would be cylinder 18, surface 5, sector 13. [(18*10+5)*20+13]
  
  This is a higher-level approach as far as the file system is concerned, since it pushes the knowledge of disk geometry down to the driver.
Application program interface

As noted above, the file system supplies a large number of primitives to applications programs, such as:
1. open a named file
2. create a new file
3. close a file
4. read from a previously opened file
5. write to a previously opened file
6. rename a file
7. delete a file
8. Supply information on a file - e.g. size, date created, etc.
9. In some cases, these primitives may be used with logical names that refer to sequential IO devices rather than disk files. The file system simply passes these on to the appropriate driver. When they do refer to the disk, the file system translates them into a series of disk driver primitives that may involve accessing one or more directory files as well as the application file.
User interface

The file system interfaces to the command interpreter that serves as the overall command interpreter for the operating system (e.g. MS/DOS COMMAND.COM). Typically, many of the command interpreter commands are basic file system operations, such as as delete a file.
Management interface

On a large system, the file system supports a number of fundamental management capabilities, some through privileged utility programs that only management personnel can run. For example:
1. Creation of directories (at least in part restricted to management)
2. Space allocation, either file-by-file or user-by-user (the latter, at least, is restricted to management)
3. Controlling protection on files (generally a user can control the protection on files he/she owns, with management override)
4. Performance optimizations regarding physical placement of files, contiguity of files, etc. (generally restricted to management)

Summary

The file system is structured as follows. Items in italics are form the interface between two different layers:

Command Processor (Interpreter)	Application & Management Utility Programs
Primitives like open, read, delete
File System
Primitives like seek, transfer
Device Drivers
Primitives like position head, read sector
Hardware

Terminology

Medium or media

the physical material that stores information - such as a disk or tape.
Volume

one unit of media - e.g. one disk pack (with one or more platters), one floppy disk, one reel of tape.
Sector / block / cluster

File storage media are physically organized into units called variously sectors or blocks:
1. For disks, the typical term is sector. All IO transfers occur in multiples of one sector. These are generally of a fixed size - e.g. 512 bytes for MS/DOS, VMS and many other systems. However, some disks dedicated to large files may offer variable-size sectors - e.g. IBM Count-key format. (Illustration: one normally buys beans in cans of fixed size, and cannot buy less than one can. Sometimes, however, one can find a market where one can scoop any amount into a bag.) For our discussion, we will assume fixed-size sectors.
2. For tapes, the typical term is block. Block size on tape can often be tuned to the application - i.e. one block will exactly contain some number of logical records. The term "block" is also sometimes used with disks - either as a synonym for "sector" or to refer to an intermediate grouping of two or more sectors that are normally read and written as a unit. (Notice that the bulk of physical disk access time is occupied with head movement (seek) and search; therefore it is advantageous to transfer as much data as possible and useful on each access.)
3. As a performance improvement device, disks are often clustered. Though disk access is still in terms of sectors, disk space is allocated one cluster at a time. Thus, for example, a file needing five sectors on a disk with a cluster size of 4 would be given 2 clusters, or 8 sectors. It could grow into those 3 sectors at any time without further space allocation, but would be allocated a whole new cluster when it needed a ninth sector. This allows a user of the file to read/write an entire cluster in one disk access, though it is still possible to read/write only one sector. Trying to read/write more than one cluster in one access may actually end up carrying the time penalty of multiple accesses if the heads have to be moved.
  1. A cluster size is normally established for a given disk when it is first initialized.
  2. Typically, if a disk has relatively few files of relatively large size, then a large cluster size is chosen. Perhaps an entire cylinder will be made a single cluster. On the other hand, a disk with many small files will typically use a small cluster size - perhaps as small as one sector (i.e. no clustering.)
File

the logical unit of information storage. Usually, a given volume will contain many files; but it is possible to dedicate a single volume to a single large file or even to have one file spread out over multiple volumes. On a disk, a file is allocated some number of sectors when it is created, and there is generally provision to allow it to grow bigger or (more rarely) to grow smaller.
Records

the logical subdivisions of a file.
1. Files are typically structured into smaller units called records - e.g. lines of text, data on one employee, etc. Various record structures are possible:
  1. Fixed length records: all the records of the file are of the same fixed size. This is common in many data processing applications.
  2. Variable length records: each record includes an indication of its its actual size - frequently stored in the first few bytes of the record. This is useful for applications where the amount of information stored on an entity may vary widely - e.g. in a medical records system, one patient may have only been seen once, while another may have been seen several times per week for a long time.
  3. Stream records: a variant of variable length records used with text files. Instead of beginning with a length indicator, each record ends with a special delimiter such as CR or LF. Word processing applications typically work with stream files.
2. Files may contain special information outside of the regular record structure - e.g. indexes, headers etc.
3. We have seen that the disk is physically organized into sectors which are in turn grouped into tracks (cylinders). A file is logically organized into records. Rarely is it the case that the logical record size bears any direct relationship to the physical sector size. Thus, a single logical record may:
  1. Correspond to a single physical sector or block - possibly with significant wasted space (but access is convenient).
  2. Be one of several packed into a sector or block. If the logical record size does not divide equally into the sector/block size, then there may be wasted space at the end, or the last logical record may span into the next sector/block.
  3. Itself occupy two or more sectors, in whole or in part.
4. One important issue in designing a file system is the question of how much support the file system should provide for various record organizations.
  1. VMS is at one extreme. VMS's file system (RMS) provides direct support for all three kinds of records (fixed length, variable length, and stream) as well as for non-record structured access to 512-byte blocks.
  2. Unix is at the other extreme. The Unix file system presents a file to the user as a sequence of bytes. The user may position himself at any byte position in the file prior to a read or write. But any division of the bytes into records is entirely the responsibility of the programmer - though there are some standard library routines for dealing with stream files.
  3. Question: which approach is better? Why?
File access
1. In the time when most input-output was done using cards, tapes, and printers, the only practical way of accessing data was sequential. As a result, many efficient, standard algorithms were developed for processing sequential files.
2. With devices like disk, it becomes possible to process the records of a file in any order. A variety of standard direct-access file organizations have been developed to take advantage of this for applications where direct access is needed (e.g. interactive access to a database.)
  1. Relative file: the file is structured as an array of fixed-length cells, logically numbered 1.. . Each cell is capable of holding a single record (generally a fixed length record of the same size as the cell, though a variable length record can be stored with some wasted space.) Permitted data transfer operations are:
    - read record n
    - write record n
    (note: sequential access is possible by doing direct access and incrementing the record number each time.)
  2. Indexed file: Each record has a designated field called the key - which is usually unique (e.g. SSN, part number etc.). The file consists of two areas:
    - A data area in which the records are actually stored - either as fixed-length or variable length records.
    - An index area in which each key appears together with a pointer to the corresponding record. The index is structured to make looking up a particular key easy.
3. Again, we face a design question. How much support should the file system provide for these various organizations?
  1. VMS's RMS provides direct support for a set of standard structures including sequential, relative, and indexed files. It also supports an unstructured organization into 512 byte blocks on which the programmer can build his own organization.
  2. In contrast, Unix provides direct support only for sequential access to a file and for direct access at the level of individual bytes. There is no direct support for relative or indexed files.
  3. Including file-organization support in the file system is a great convenience to the user, since he otherwise needs to provide the structure himself or use library routines to do so. Further, it simplifies the transporting of data between application programs if all application programs use a common set of file support routines provided by the file system. But this does complicate the file system, and can make life difficult if a particular need is not supported.
Binary files versus text files

One distinction that is of some broad importance is the distinction between files which contain binary information (e.g. executable programs; numeric data stored in internal form; encrypted files) vs text files that contain only ASCII (or EBCDIC) characters. The difference shows up in two places:
1. Attempts to print the file.
2. Transmitting the file over communications channels - e.g. between a micro and a mainframe. Many micro-mainframe links are built around the terminal ports of the mainframe, which are designed for ASCII input. These ports may support only 7-bit characters, and may treat certain control characters in a special way (e.g. ^S, ^C). A binary file transmitted over such a channel may lose bits and/or experience control character problems unless some provision is made.
Directory (directory structure)

If a volume contains more than 1 file, then it must also contain some sort of listing of the files it contains and where they are located, along with other useful information. A given volume may:
1. Contain a single directory (e.g. micro-computer floppies)
2. Contain a tree of directories (e.g. most timeshared systems)
3. Be part of a multi-volume system having a common directory structure, with portions of the directory stored on different volumes in a way that is transparent to the user.
  
  Note that in many file systems a directory is, in fact, implemented as a file having a special structure known by the file system. (This is true on both VMS and Unix.)
Housekeeping files

Often, a volume will contain some information that falls outside the regular directory structure - e.g.:
1. System "files" used to bootstrap the operating system.
  
  In MS/DOS, sector 1 of track 0 is used for a bootstrap routine, with the BIOS and BDOS contained in two hidden files. These do not show up in regular directory listings.
2. Bad block file.
  
  It is very difficult to make a perfect large disk. One way to handle local imperfections is to build a "file" out of those sectors that have failed to perform properly. This file keeps the sectors out of the pool from which user files are built - as long as it is not deleted (which would result in its sectors being recycled to other files.)
3. Storage allocation table.
  
  This keeps track of which sectors on the disk are currently in use, and which are available for allocation when a user creates or expands a file. Example: MS/DOS uses sectors 2 and 3 of track 0 for this purpose.

Major File System Design Issues

There are several major design issues one must deal with in designing a file system. They are:

What file organizations will be supported?

To what extent, if any, should the file system provide support for various file organizations. As noted above, this ranges from the approach of Unix - where a file is simply a stream of bytes which the user must interpret - to sophisticated systems like VAX/VMS RMS which directly supports most of the standard file organizations used by application programmers. Since we have already discussed this, we will not pursue it further.
How is the directory structure organized?
What provisions (if any) are made for protecting files from unauthorized access or accidental damage?
Space allocation: when a file is created, how is its space allocated?

Related to this, how does the file system keep track of space that is currently unallocated and thus available for new files?

Directory Structures

The most visible part of a file system is its directory structure.

Since a typical volume contains many files, there must be some mechanism for naming and locating files.
A directory is a special kind of file

A directory is a special kind of file that is maintained by the file system. It contains file names and associated information about the file.
1. File-naming conventions vary from system to system.
  1. MS/DOS
    
    A fairly common approach is to use a two part name: a name and a type or extension. For example: MS/DOS uses an 8 character name, 3 character type, separated by a '.'. The type, by convention, denotes the kind of information in the file:
    - .COM or .EXE: a file containing an executable program (command)
    - .PAS: Pascal program
    - .BAT: a batch file to be executed by the command interpreter
    - .TXT: human readable text
    - etc.
  2. VMS
    
    Systems like VMS add to this two part name a version number, so that one can list several versions of the same file in the same directory.
  3. Unix
    
    In contrast, Unix supports neither file types nor version numbers in the operating system. One often sees Unix files with names like proj1.c but the period is part of the file name, not a separator between a file name and type, as far as the Unix file system is concerned. (It is meaningful to programs like the C compiler, but not to Unix per se.)
  4. Defaults to parts of file names
    
    To save the user the trouble of typing complete file names, most systems make use of various defaulting conventions, either at the operating system or application program level.
    - MS/DOS to run a program, one simply types its name. MS/DOS assumes the file type .COM or .EXE.
    - Most Pascal compilers assume a file type of .PAS if the user does not specify one.
    - On VMS, if a version number is not specified, the file system uses the highest numbered version if an existing file is being opened, or creates a new file with version number one higher than the highest existing number if new file creation is requested. One cannot force VMS to create a new file with a version number lower than that of an existing file of the same name and type. The major exception to this is the DELETE command, where the user must specify a version; or one can use PURGE to eliminate all but the highest numbered version.
  5. Wild Cards
    
    Most systems make some provision for wild-carded filenames. A wild-carded name is one that can potentially match many files; so it can only be used in contexts where this is meaningful (e.g. directory lookup, copying files to another location, deleting files etc.)
    - MS/DOS allows the use of ? in a filename to match any one character, and * to match any number of characters. For example
      
      DEMO?.TXT would match DEMO1.TXT, DEMO2.TXT, DEMO3.TXT but not DEMO.TXT, DEMO35.TXT
      
      DEMO*.TXT would match all of the above plus any other file DEMO as the first four letters of its name and .TXT as its type.
      
      *.*, of course, matches any file
    - Typically, the wildcarded name is used in a call to the file system to find a matching actual name, and then a second file system call is done with the actual name to open the file.
    - Unix has a very powerful wildcard facility, but it is not implemented in the file system. Rather, it is part of the command interpreter (shell) which does the wildcard processing before starting the application program up. The shell then passes to the application a list of filenames that matched the wildcard.
2. Information stored in a directory entry
  
  Other information stored in the directory varies from system to system, but may include such things as:
  1. The file's actual location on the disk. (see below)
  2. Its overall size
  3. Protection information (see below).
  4. Information about its record structure and organization, if the file system deals with these matters.
  5. Its creation date and time.
  6. Date and time of last modification.
  7. Date and time of last backup.
3. Actually, some systems store this "directory information" in two different places - a file header and the directory entry proper.
  1. When a disk is initialized, a region is set aside for file headers. These contain all the information needed to access a file except its name (and type and version if these are supported.)
  2. The actual directory entry for a file contains just its name (and type and version), plus a pointer to its header.
  3. This approach is used by both VMS and Unix.
    - On VMS if you do a DIRECTORY/FULL on a file, the display will include an entry called File ID. This is the pointer to the header for the file. All the rest of the information displayed is stored there.
    - On Unix, the file headers are called i-nodes. They exist as an array at the beginning of the disk structure. Directories point to a file by an index into the i-node array. That is, the entry for a file in a directory consists only of the file's name and an i-node number. (The -i option of the ls command will include the i-node number in the listing.)
  4. This approach does have a couple of weaknesses, of course:
    - An extra disk access is needed when opening the file to get at the header.
    - The number of headers on the disk is fixed when it is initialized. This means either that a situation can arise where a new file cannot be created because there are no more headers (even though there is space for its body), or else space is wasted on unused headers.)
4. However, it does have the advantage of facilitating sharing of files - a topic we will discuss further shortly.
Some systems store the number of users that have opened a file

Some multiprogrammed systems also use the directory or file header to store a record of how many users currently have the file open. This is necessary because multiple users writing to the same file can produce inconsistent information unless their access is somehow interlocked. (However, some other systems keep this kind of information in main memory in an "open files" table.)

Single directory

The simplest sort of directory structure is a volume directory: a single directory describing all the files on a particular volume.

Most file-structured tapes are configured this way.

One-user microcomputer systems often take this approach. We will discuss CP/M as an example, because of its extreme simplicity. In CP/M track 2 of a floppy disk is the directory track. (Tracks 0, 1 are used to store CP/M itself. These tracks are not visible to the user; they do not appear in the directory.)

The directory track contains a series of FCB's (file control blocks.) On a single density diskette, there are normally 64 of these; on a double density diskette, 128.

A given file is represented in the directory by one or more FCB's. Each FCB is laid out as follows:

Bytes	Description
0	drive flag - 0 if the FCB represents an existing file, 'e' if it is an erased file. (These files are invisible to the user, and the FCB is subject to re-use when a new file is created.)
1 -- 8	filename - padded on the right with spaces if need be
9 -- 11	filetype - padded on the right with spaces if need be
12	extent number - to be discussed below
13 -- 14	information used to partition the directory - see below
15	record count (number of sectors in the file or file portion addressed by this FCB).
16 -- 31	pointers to the actual storage allocation (See below.)

On a multi-user system, a single volume directory is clearly not satisfactory, since all files of all users are visible to everyone. This structure is even unsatisfactory on many one-user systems, particularly if they are hard disk based. Therefore, most file systems provide some sort of multiple-directory facility.

The simplest such facility is that provided by CP/M.
1. The common directory is allowed to support up to 16 distinct users, designated user 0 .. user 15. In addition, there are system files which are accessible to all users, but which don't show up in an ordinary directory listing (they must be requested by the DIR/S command.)
2. By default, a user is user 0. But he can change his user number by using the CP/M utility USER.
3. When a file is created, the user number in force is placed into the FCB in bytes 13..14 (in a way not publicly documented.)
4. The CP/M BDOS only "sees" files belonging to the current user, plus certain files belonging to user 0 which have the attribute SYS set (i.e. those that support CP/M commands such as PIP.)
5. This approach is of limited utility.
A more sophisticated system uses two levels of directories:
1. A master file directory (MFD), often not visible to users, contains entries of the form:
  
  UserName Pointer to UFD
  
  where UserName is the name by which a user logs into the system and UFD is a user file directory.
2. Each user has his own UFD, containing his files. When a user requests access to a file, the file system by default looks in the UFD corresponding to his username. (The pointer to this UFD is normally stored in a special memory location when the user logs in to avoid unnecessary MFD accesses.)
3. Provision is usually made for the user to access files in other UFD's, protection permitting.
4. Example: RSTS/E
  - Usernames are pairs of numbers - e.g. 100,120.
  - Access to another UFD can be had by prefixing the UFD specifier enclosed in [].
    
    if user 100,200 requests access to file DEMO.TXT, the file system searches his UFD - [100,200]
    
    if he wants to access a file called DEMO.TXT in UFD [100,130], he may specify [100,130]DEMO.TXT
  - Certain system UFD's are accessible in special shorthand ways - e.g. $ refers to a system library UFD that contains most of the system executable programs. (This happens to be [1,2]). So, to run the system utility SYSTAT, a user may type
```
	RUN [1,2]SYSTAT
```
    or
```
			
	RUN $SYSTAT
```
Even more sophisticated systems use a tree of directories:
1. The root of the tree is an MFD, as in the two-level scheme.
2. Each user has a default directory, which may be implemented as a file in the MFD. He in turn may create subdirectories, which are implemented as files in his default directory. In many cases, the subdirectories may contain sub-sub-directory files.
3. We said above that the user default directories may be implemented as files in the MFD. If there are many users, it may be expedient to introduce another level of the tree before the user directories. That is, the MFD may contain group directories, each of which contains the actual user directories.
4. As with the two-level scheme, the system manager establishes a default login directory for each user. Any plain file specification the user issues results in the file system searching his default directory.
5. Access to other files (protection permitting) is by specifying a path name. This consists of a series of directory names, beginning either at the current default or at the root of the system, and ending with the desired sub-directory.
  - A path beginning at the root directory is called a complete path.
  - A path beginning at the current default directory is called a relative path.
6. As a convenience, most multi-level systems allow a user to change his default directory to minimize typing. (VMS has the SET DEFAULT command and Unix has the cd command).

Shared files and subdirectories

Another capability that can be supplied by a multi-level directory system is sharing of files and/or subdirectories.

In our discussion of multi-level directories, we have used a tree structure: each file or directory has exactly one parent. Thus, the path name to any given file is unique.
It is frequently the case that teams of users working together on a common project need to share one or more files - perhaps a whole subdirectory of files. One approach to allowing this is to allow a file or a subdirectory to have more than one parent. This, however, raises some implementation questions:
1. What is stored in the directory entries that point to the shared file?
  - One approach is to store complete directory information in each. This is problematic due to space wastage, plus the update problems that always arise with redundant data. For example, if one user extends the file, all the directory entries that point to it must be changed.
  - Alternately, the directory in which the file is first created might contain the full information. Other directories would contain links, in the form of pathnames.
    
    This avoids the redundancy problem, at the expense of more overhead whenever a user other than the original creator accesses the file.
  - If separate file headers are used (as on VMS and Unix), then sharing files becomes fairly easy. As many directory entries as desired can point to the same file header, where most of the information about the file is stored.
    
    Note: Though the VMS documentation does not say much about file sharing, its file system structure supports it very cleanly. [However, the HELP text says use of this facility is discouraged - largely due to the way that deletion of shared files is handled. More on this later.]
  - Unix is interesting in that it actually offers two mechanisms for sharing files.
    - Several directory entries can point to the same i-node. We call such entries hard links to the file.
    - A directory may contain a special file that contains the path (absolute or relative) to another file. This is called a symbolic link or a soft link. (Symbolic links were introduced by BSD Unix).
    - Unix restricts hard links to only be used where the directory and the file are located on the same physical disk; further, hard links cannot be used to point to directories. There are no restrictions on symbolic links, however.
2. What happens when a file is deleted?
  - It can be very cumbersome to delete all the references to a shared file.
    
    Either the file itself must contain back pointers to all the directories referencing it, or else a search of the entire file system for other references must be made when it is deleted. Neither is very workable. Therefore, it is more common to say that if a shared file is deleted, only the path that was actually followed in getting to it is updated. Other pointers to the file are left dangling.
  - Dangling pointers can cause severe problems
    
    If file sharing is implemented by duplicate directory entries, dangling pointers can cause severe problems. When the file system deletes the file, it reclaims the space for later allocation to another file. This leaves the possibility of dangling references seeming to point into the middle of an altogether unrelated file.
  - If file sharing is implemented by links, dangling references are a less serious problem.
    - If the deletion that was done was through a path that involved a link, then the original file is not really deleted; all that is removed is one path to it.
    - If the deletion that was done was through the original path, then either:
      1. The file is actually deleted. The dangling references are all links, which are path names that correspond to a non-existent file and will be flagged as such when they are used (unless the original creator of the shared file creates a new file of the same name in the same directory, in which case it will look like the original file to the shared users. This may or may not be a problem.)
      2. The file is not deleted until all links to it are deleted. This would require that some link be converted to a regular directory entry, with all other links modified to the new path name. This would be very difficult (since we would have to find all the links), so the previous alternative is to be preferred.
  - When separate file headers are used, the problem becomes less severe.
    
    For example, Unix handles deletion of shared files this way: one entry in each i-node is a reference count that counts the number of outstanding hard links to the file. When a delete operation on one of these links is done, the reference count is decremented - but the file is not actually deleted until the count goes to 0. This approach is very clean - but suffers from the limitation that if the owner of a file deletes it while someone else has a link to it, the file remains under his ownership and counting against his disk quota, but with no way for him to get at it!
    
    Though VMS's file headers are similar in principle to Unix's i-nodes, VMS does not use reference counts for shared files. Instead, VMS handles deletion of shared files as follows:
    - Any delete over any path deletes the file and returns its header to the common pool. Other references are left dangling, pointing to the original header. Since the header can be re-used to create a new file later, this would seem to pose severe problems.
    - However, associated with each header is a counter that is incremented each time the header is allocated to a new file. The link in each directory contains both the location of the header and the value of the counter. If, when following the link, the counter in the header does not match, then it is known that the file in question does not exist.
    - This mechanism also supports temporary references to file headers in main memory outside the directory structure - as when a file is queued for printing. (The header number is put in the queue.) A reference might be established, but the directory entry - and thus the file - might be deleted before the file is used. This is detected when the file reaches the front of the queue and an attempt is made to access it.

Protection

A file system used on a machine shared by multiple users must provide some means of protecting files from unauthorized access.
Access Control Lists (ACL)

The most common approach is to associate with the file a list of users authorized to access it, together with their access rights:
1. Read
2. Write
3. Execute
4. Delete
5. Control
Access Control Lists can be cumbersome; use classes of users

The most general scheme would be to have an entry on the access control list for each individual authorized user: but this would be cumbersome both to store and to search, especially for "public" files. A more common approach is to partition all users into three or more classes:
1. The file's owner
2. The owner's group
3. The rest of the world (i.e. all other users)
Access rights can be established separately for each group, using perhaps 6-12 bits: 1 bit per class per right. (Note that read and execute are often combined, and write and delete are often combined.)
In Unix when you request a long format directory with ls -l on the left of each line you will see the protection settings for each file and subdirectory. A hyphen is used to indicate that a particular permission is not set, a letter corresponding to the permission is used to indicate when a particular permission is set.

The first character is usually either a hyphen or a "d", indicating that the corresponding entry is a file or a subdirectory, respectively. The next three characters correspond to the user's own permissions. The three following that specify the group's permissions and the last three show the general world permissions:
```
   user   group  world
   -----  -----  -----
-  r w x  r w x  r w x
```
Consider the following output from the "ls -li" command:
```
2098301 -rwxr-xr-x    2 senning  user       12488 Feb 12 16:09 pipe
2098297 -rw-r--r--    1 senning  user         937 Feb 12 16:08 pipe.c
2098301 -rwxr-xr-x    2 senning  user       12488 Feb 12 16:09 pipe2
2098298 lrwxr-xr-x    1 senning  user           4 Apr  5 22:54 pipe3 -> pipe
7705094 drwxr-xr-x    2 senning  user        4096 Feb 26 23:28 sem
3568549 drwxr-xr-x    2 senning  user          99 Feb 19 09:00 shm
7232144 drwxr-xr-x    2 senning  user          63 Mar  4 16:03 socket
```
In this example, pipe is an executable file and it is readable, writable and executable by the owner (senning). It is readable and executable by anyone in the group user and also by anyone else on the system. The file pipe.c is only readable and writable by the owner. The file pipe2 is a hard link to pipe: as far as Unix is concerned pipe and pipe2 are equivalent. Notice that these two files have the same i-node and that the number just after the protection bits is a 2, indicating that there are two links to this file. The file pipe3 is a symbolic link to pipe. The three remaining entries are subdirectories.
Access control lists revisited...

Some systems combine a mechanism like this with access control lists. For example, on VMS 4.0 and later, a file may be given an access control list that grants or denies certain access rights to certain specified users. If a file has an ACL (and most won't), the ACL is checked first:
1. If the ACL grants the access, the access is granted without further checking.
2. If the ACL denies the access, the access is denied without further checking unless the user has certain privileges.
3. The regular protection code is checked only if the ACL does not mention the user, or if the user was denied access but has certain privileges.
Where is the access-control information stored?
1. If all directory information is in the directory per se, then access control information must be there too.
2. If the file has a separate header, then access control information can be stored either in the directory entry pointing to the header (in which case different users can have different accesses) or in the file header. The latter approach is more secure, since the user in some sense "owns" his directory.

Space allocation

When a file is created or expanded...

the file system must allocate space for it from a pool of previously unused space on the disk, must set a pointer or pointers to that space in the directory entry for the file, and must remove the space from its list of available space.
Three basic allocation strategies:
1. Contiguous Allocation
  
  The entire allocation for a given file is a series of adjacent sectors.
  1. Conceptually the simplest scheme.
  2. Easy to keep track of where the file is. All one needs to know is where the first sector is, and how big the file is. For example, a file of 1000 sectors whose first sector is number 1379 will occupy sectors 1379, 1380, ..., 2378. We will refer to these as physical sector numbers.
  3. Easy to access the file either sequentially or directly
    - Sequential access: a pointer is initialized to the number of the first sector assigned to the file. As each sector is accessed, the pointer is incremented.
    - Direct access: An application program can view each file as an array of sectors numbered 1 .. size. We will refer to these as virtual sector numbers. In our example above, if the application program issues a request for virtual sector number 3, then the file system will access physical sector number 1381. (1379 + (3-1)).
  4. Processing the entire file, or large portions of it, is very efficient.
    
    Head movement between accesses is minimized. (This is of lesser consequence on a heavily multiprogrammed system, however, since other users may move the head in between accesses.)
  5. But this scheme requires that all of the space that any file will ever need must be allocated when it is first created.
    
    This is because once a file is created, it can only grow into adjacent space, and this may well be already allocated to another file. This can be solved by moving the entire file to a larger space (assuming one is available) but that is quite costly, and will not always work.
  6. Eventually the disk can have a large amount of noncontiguous space.
    
    Further, after a lot of file allocation/deallocation activity on a disk, the disk tends to become checkerboarded. When a request to create a new file comes in, if no one chunk of contiguous space is big enough then either the request must be denied or the disk must be compacted.
  7. Example of a contiguous allocation system: OS/8 for DEC PDP-8's:
    - Disk drive used is a floppy disk, with a single directory.
    - When a file creation request comes in, OS/8 allocates the largest available chunk of contiguous free space to it, and allows the file to grow into the space by successive write operations. When the file is closed, any space left over is put back in the available space pool.
    - System SQUISH command provided for compacting the disk - all files moved toward the front of the disk and all free space collected into a single large chunk at the rear. (Risky command, protected by 'Are you sure' and 'Are you really sure' prompts).
  8. Use of contiguous allocation as the principal strategy is rare except in cases where a single volume is dedicated to a small number of large files of predictable size. Many operating systems allow a file creator to request contiguous allocation, however, with the OS honoring the request if possible.
2. Linked allocation
  
  The file is allocated space one sector or cluster at a time. The directory maintains a pointer to the first sector/ cluster, and each sector/cluster contains a pointer to its successor. The last sector/cluster contains a special pointer value (typically 0.)
  1. This approach is also conceptually simple.
    
    It does allow the file to grow to any size, provided that there is any free space at all on the disk.
  2. However, it is only practical for sequential-access files.
    
    Direct access to a record near the end of the file might require hundreds of disk accesses! (Which would hardly be direct access performance at all.)
  3. It is dangerous
    
    If a link field is corrupted in one sector, the entire file structure could be corrupted.
  4. This strategy may be an option in a given system for sequential files.
3. Indexed allocation
  
  This should be sharply distinguished from indexed files.
  1. File system maintains pointers to each sector/cluster of the file.
    
    The file is allocated as many sectors/clusters as it needs, and the file system maintains a list of pointers to each one. (This list is generally in the directory or file header, but if large may be stored in one or more sectors of the file itself.) This index is organized as an array, so that finding logical sector/cluster n involves following the pointer stored in index[n].
  2. This organization supports efficient sequential and direct access, but is slightly more complex to implement.
  3. The most difficult issue is the size of the index block.
    - Ideally, it should be just big enough to hold pointers to all of the data sectors/clusters of the file. But setting it up this way would require us to know the size of the file in advance, when we create it.
    - More typically, we use index blocks of some fixed, reasonable size. If a file is large, then it may have several index blocks. These blocks can be pointed to by a higher-level index block, or can be linked together. The former adds the possibility of an additional disk access for each operation on the file (to read the correct index block). The latter suffers from the performance problems of straight linked allocation, but to a lesser degree.
    - Unix uses a combination of approaches. A single index block (the i-node) contains 15 pointers. The first 12 of these point directly to data blocks, which is sufficient for many small files, especially if a large block size is used. The next pointer points to a full-size index block containing pointers to data blocks. For really large files, the 14th pointer points to a double index block - one that contains pointers to index blocks which contain pointers to data. The last pointer is reserved for a triple index block (not needed with current implementations.) Thus, small files can be accessed rapidly, while large files some additional (and unavoidable) delay for additional disk accesses.
    - Generally, the size of the index block is such that we can keep a copy of it in main memory at all times. If the file has just one index block, then all operations on the file can be done with just one disk access.
    - If the file has multiple index blocks, then we can keep the top-level index block and the current second-level index block in memory at all times. This means that sequential access and direct access within limited region of the file can be done with no time penalty; but truly random access will involve two disk accesses per operation.
  4. An example of a system using indexed allocation: CP/M
    - We mentioned earlier that a CP/M sector is 128 bytes. However, in original CP/M, storage is allocated in groups of 8 sectors, or 1K. This means that the storage allocated to a given file is always rounded up to the next highest 1K - e.g. a 1 byte file requires 1K, as does a 1024 byte file; a 1025 byte file requires 2K.
    - On the systems on which CP/M was first used, disk capacities were typically around 100K. Thus, the entire space on the disk could be divided into 1K groups numbered 1 .. 100 or so. A pointer to a group then requires only a single byte. (When higher capacity disks came along, this posed a problem since they might have more than 255K. The solution is to allocate storage in 2K groups, which is the practice of CP/M systems using higher capacity floppies. Hard disk is another matter which we won't go into.)
    - The directory entry for each file contains an array of 16 bytes, which serves as an index block to up to 16 1K (or 2K) groups. For smaller files, some of these entries are unused and contain zero. A separate field in the directory entry indicates the total number of sectors in use for the file; this can be used to determine how many sectors of the last group allocated are actually in use. A copy of this directory entry is kept in memory for each open file.
    - What if a file requires more than 16 groups (16 K or 32K)? The CP/M solution is to allocate another directory entry to the file. Each directory entry allocated is called an extent, and contains an extent number. Thus, a file requiring 40 groups (40K or 80K) would have three extents - each occupying a directory slot - containing extent numbers 0, 1, and 2.
    - When CP/M looks up a file in the directory, it searches for a match not only on file name and type, but also for an extent number of zero, so as to get the first extent. If the file is processed sequentially, it eventually becomes necessary to fetch a new block from the directory, so CP/M searches for the correct name and extension, but an extent number of 1. Of course, direct access operations can require two reads: one to fetch the correct extent from the directory, if the access is outside the current one, and one to fetch the data. Of course, commands like DIR are smart enough to only display information on a given file once, by only considering extents with a number of zero.
    - where do these directory entries themselves come from? CP/M reserves one track on each diskette (track 2) for the directory. On a single-density floppy, this typically allows for 64 entries: 64 files up to 16K each, or 1 file up to 1 Meg (obviously impossible) or any combination. On double density diskettes, the number of available slots is typically 128. Unused slots on the directory track are flagged by a special code in the first byte, and can be grabbed by CP/M whenever needed.
Free space control
1. We have talked a lot about how the file system keeps track of the space allocated to each existing file. An important related question is how does it keep track of space that is not allocated?
2. One approach is to keep all unallocated space in one or more dummy files that the file system chops up as needed. This is the approach used by OS/8. These files appear in the directory, each occupying a contiguous region on the disk. Though ordinary directory listings do not show them, a special switch on the directory command will print out the empty files as well so that the user can see exactly how fragmented the disk is.
3. Alternately, one could keep a linked list of contiguous portions of free storage. This would be workable for a system that allocated all disk storage in fixed-size units without regard to contiguity of units within a file (i.e. one based solely on linked allocation), but if the entries are of differing sizes then the overhead of traversing the free list to find an appropriate-size entry would be excessive.
4. The most common approach is to use a bit map. This is an array of bits, each corresponding to one allocation unit. A value of 0 means the allocation unit is free, and 1 means it is in use. Of course, the bit map must be kept on the disk, but a copy is usually retained in main memory. To keep overhead down, as disk space is allocated the main memory copy alone is updated. Periodically, and when the disk is dismounted, the bit map is written back out to disk. This can pose a problem if the system crashes before the disk copy of the bit map is updated. In this case, it becomes necessary to traverse the directory structure and reconstruct the bit map when the system comes back up. (Note that any allocated block must be pointed to by some directory entry; thus, the information in the bitmap can always be reliably reconstructed.)

$Id: file_systems.html,v 1.4 2000/03/27 01:27:23 senning Exp $

These notes were written by R. Bjork of Gordon College. They were edited, revised and converted to HTML by J. Senning of Gordon College in April 1998.