CS422/522 Lab 5: File Systems

due 2017-11-28


Introduction

In this lab, we will add the file system support to mCertiKOS. We have already implemented an IDE disk driver and some parts of the file system for you. You will extend the mCertiKOS scheduler with the ability to make threads sleep and wake threads up; this is necessary for the implementation of a more efficient file system. Then you will be asked to implement various parts of the file system across multiple abstraction layers.

* In this lab, you will be reading many file system related code we have (partially) implemented for you. Please START EARLY.

Getting started

In this and future labs you will progressively build up your kernel. We will also provide you with some additional source. To fetch that source, use Git to commit changes you've made since handing in lab 4 (if any), fetch the latest version of the course repository. Then create a local branch called lab5 based on our lab5 branch, origin/lab5. Note that for this assignment, you SHOULD NOT merge the lab5 branch with your previous lab4 branch. We have provided all the code you would need to get started.

$ cd ~/cpsc422/mcertikos
$ /c/cs422/apps/syncrepo.sh lab5
lab5 not set up yet, fetching new branch from release
remote: Counting objects: 105, done.
remote: Compressing objects: 100% (104/104), done.
remote: Total 105 (delta 55), reused 1 (delta 0)
Receiving objects: 100% (105/105), 39.69 KiB | 0 bytes/s, done.
Resolving deltas: 100% (55/55), completed with 46 local objects.
From /c/cs422/repo/mcertikos
8edaa5e..4b96a41  lab4       -> release/lab4
* [new branch]      lab5       -> release/lab5
create new branch lab5
Switched to a new branch 'lab5'
set up push path
Counting objects: 105, done.
Delta compression using up to 20 threads.
Compressing objects: 100% (49/49), done.
Writing objects: 100% (105/105), 39.69 KiB | 0 bytes/s, done.
Total 105 (delta 55), reused 105 (delta 55)
To /c/cs422/SUBMIT/lab/netid.git/
* [new branch]      lab5 -> lab5
Branch lab5 set up to track remote branch lab5 from origin.
$ git pull
Already up-to-date    

Hand-In Procedure

Include the following information in the file called README in the mcertikos directory: who you have worked with; whether you coded this assignment together, and if not, who worked on which part; and brief description of what you have implemented; and anything else you would like us to know. When you are ready to hand in your lab code, add the new files to git, commit your changes, and then run git push in the mcertikos.

$ git commit -am "finished lab5"
[lab5 a823de9] finished lab5
 4 files changed, 87 insertions(+), 10 deletions(-)
$ git push

File System Review

Exercise 1

The implementation of the file system in mCertiKOS is based on the file system in xv6. Read the Chapter 5, "File system", of the xv6 book carefully to get the overall idea of the file system implementation in xv6.

Test The File System

In user/fstest/fstest.c, we have implemented a user process that tests various aspects of the file system. Once you have implemented required modules in this lab, you should expect to see the following test outputs from the fstest program (some tests are big and may run a little bit slowly):

*As in the previous assignments, the provided tests will not pass until you implement required modules.

*******usertests starting*******

=====test file usertests.ran does not exists=====
=====test file usertests.ran does not exists: ok

=====small file test=====
creat small succeeded; ok, fd: 0
writes ok
open small succeeded ok
read succeeded ok
=====small file test ok=====

=====big files test=====
=====big files ok=====

=====many creates, followed by unlink test=====
=====many creates, followed by unlink; ok=====

=====rmdot test=====
=====rmdot ok=====

=====fourteen test=====
=====fourteen ok=====

=====bigfile test=====
=====bigfile test ok=====

=====subdir test=====
=====subdir ok=====

=====linktest=====
=====linktest ok=====

=====unlinkread test=====
=====unlinkread ok=====

=====dir vs file=====
=====dir vs file OK=====

=====empty file name=====
=====empty file name OK=====

=====bigdir test=====
=====bigdir ok=====

*******end of tests*******

To manipulate the files, we need a disk to store the files and directories. We cannot directly use a linux disk partition because linux and mCertiKOS implement different disk and file system structures. Instead, we have created a disk image newfs/certikos_disk_new.img which can be fed into the qemu as an additional disk. DO NOT modify this file, since you will repeatedly use it as the empty disk. That image file is already structured in a way that implements the mCertiKOS file system structures and represents a disk with no files in it. To use the disk, you just need to copy the file into the project root directory and rename it to certikos_disk.img. The line 125 of the Makefile already does this job for you whenever you run make qemu or make qemu-nox. This means whenever you starts mCertiKOS, you will already start with a disk with no files. This does not mean that our file system is not persistent. When you exit from qemu, all of your new changes are already saved into certikos_disk.img. We just manually overwrite it with the empty one so that the above test programs can pass on this empty disk. Once you are done with this lab, feel free to comment out the line in the Makefile and play with the persistent disk.


Part 1: Scheduler Support: Sleep and Wakeup

Before we dive into the disk driver and file system implementation, we first have to add more scheduler support to mCertiKOS. We will implement two functions thread_sleep and thread_wakeup that allow a thread to wait on some resources by temporarily going into sleep, which can be waken up later when the corresponding resource is ready. For example, disk writes are asynchronous. When you write some blocks into disk, even though the function immediately returns, the disk requires some time to finish the operation. One naive implementation of a disk write would be to simply busy-wait until the operation isuering complete, by repeatedly querying the disk to see whether the operation is complete. As you can imagine, the downside of this approach is the performance. You basically waste many CPU cycles by doing this unnecessary busy waiting. Instead, mCertiKOS takes the interrupt driven approach. When mCertiKOS finds that the operation is not finished, it will put the current thread into sleep. We configure the hard disk so that it will issue a special device hardware interrupt when it finishes the disk write operation. Then in the corresponding interrupt handler in mCertiKOS, we wake up all the threads that were waiting upon this signal. This brings us better performance as now the scheduler can schedule and run another thread while that particular thread is in sleep.

On the other hand, this synchronization mechanism can be used in many different scenarios. Thus we need to record the particular reason a thread is put into sleep so that later we only wake up the relevant threads. The object that implements this kind of connections is called a channel. Each channel should be uniquely identifiable by an identifier. In mCertiKOS, we utilize the fact that each variable in the kernel has a unique physical address. Thus, you can allow threads to sleep on address of arbitrary objects in the kernel. If you checkout the new TCB structure in kern/PTCBIntro/PTCBIntro.c, it now has a new attribute called channel indicating which address your thread is currently sleeping on. We use the value 0 to indicate that the thread is not waiting on any object.

In mCertiKOS, we use locks to protect the critical sections in the kernel. When you put the current thread into sleep, you have to release the relevant locks so that other thread can enter the critical sections. However, you cannot release it before you call the sleep. This will allow other threads to enter the critical section even before you execute sleep, which is not what we want. Instead, we should first acquire the scheduler lock before we release the other lock to make sure no other threads can enter the scheduler and they can only be able to access the critical section after I go to sleep. This means that the lock release operation has to be performed within thread_sleep. This is why the thread_sleep in kern/thread/PThread/PThread.c takes a lock as the second argument.

Exercise 2

Based on the information above, implement thread_sleep, and thread_wakeup in kern/thread/PThread/PThread.c. In thread_sleep you need to make sure the following:

  • The lock is released only after you acquired the scheduler lock.
  • The scheduler lock is released before you switch to another thread and reacquired after the switch returns.
  • The scheduler lock is released and the lock is reacquired before you leave the function. (how should they be ordered?)

Exercise 3

As described above, when mCertiKOS receives the interrupt signal (IRQ_IDE1) from the disk, we should notify the disk driver to wake up all the threads waiting for this signal. The corresponding interrupt handler ide_intr is implemented in kern/dev/disk/ide.c. Add another case in the interrupt handler to handle this disk interrupt by calling ide_intr. After that, you should also call intr_eoi to send the "end of interrupt" signal back to the disk so that it knows we have already received and handled the interrupt. Unfortunately, the disk sometimes also generates additional spurious interrupt signals (IRQ_IDE2). If we do not handle this signal, based on the implementation of the current interrupt handler, it will go to the default interrupt handler which immediately calls intr_eoi and returns. This may mislead the disk to believe that we have already handled the disk interrupt signals correctly. Add another case to handle this spurious interrupts by simply returning without calling intr_eoi.


Part 2: Disk Driver

Before we add the file system support, we need a disk driver to manipulate the disk. We have provided a simple IDE disk driver in kern/dev/disk/ide.c.

Exercise 4

Skim through the code to try to get the general idea in the implementation. You will find the "Drivers" section on page 45 of the xv6 book helpful.


Part 3: File System Support

The file system in mCertiKOS (kern/fs/) is implementated in eight layers. We will examine the layers one by one, filling in some missing implementations.

The Buffer Cache Layer

A buffer cache has two jobs:

  • synchronize access to disk blocks to ensure that only one copy of a block is in memory and that only one kernel thread at a time uses that copy, and
  • cache popular blocks so that they don’t have to be re-read from the (slow) disk.

Files: bufcache.h, bufcache.c.

Dependent files:

  • params.h -- file system parameters.
  • kern/lib/buf.h -- definition of buffer data structure.
  • kern/dev/disk/ide.h -- IDE disk driver, which reads and writes blocks on an IDE hard drive.

Exercise 5

Skim through the code to try to get the general idea in the implementation. It is OK if you do not fully understand all the details.


The Block Layer

This layer allocates raw disk blocks. The file system divides the disk blocks into several sections. Block 0 holds the boot sector, so it cannot used by the file system. Block 1 is called the superblock. It contains metadata about the file system (the file system size in blocks, the number of data blocks, the number of inodes, and the number of blocks in the log). Blocks starting at 2 hold inodes, with multiple inodes per block. After those come bitmap blocks tracking which data blocks are in use. Most of the remaining blocks are data blocks; each is either marked free in the bitmap block, or holds content for a file or directory. The blocks at the end of the disk hold the logging layer’s log.

Files: block.h, block.c

Dependent files:

  • stat.h -- defines the file statistics data structure.
  • dinode.h -- defines the on-disk inode data structure, containing a size and an array of block numbers.

Exercise 6

Skim through the code to try to get the general idea in the implementation. It is OK if you do not fully understand all the details.


The Log Layer

Simple version logging mechanism for crash recovery. It allows higher layers to wrap multi-step updates to several blocks in a single transaction, to ensure that blocks are updated atomically.

Files: log.h, log.c

Exercise 7

Skim through the code to try to get the general idea in the implementation. It is OK if you do not fully understand all the details.


The Inode Layer

Inode allocator, reading, writing, metadata. This layer provides unnamed files, each represented using an inode and a sequence of blocks holding the file's data.

Files: inode.h, inode.c

Dependent files:

  • stat.h -- defines the file statistics data structure

Exercise 8

Skim through the code to try to get the general idea in the implementation. It is OK if you do not fully understand all the details.


The Directory Layer

A directory is an inode with special contents (i.e., a list of other inodes). Its inode has type T_DIR and its data is a sequence of directory entries. Each entry is a struct dirent (see kern/fs/dir.h), which contains a name and an inode number.

Files: dir.h, dir.c

Exercise 9

Implement dir_lookup and dir_link in kern/fs/dir.c.

In dir_lookup, you are given an inode pointer (dp) for the current directory, the name of the subdirectory to look up, and a pointer (poff) to write found byte offset to. You should loop through all the subdirectory entries of dp by calling inode_read, and find the one that is allocated (the inum is not 0) and has the matching name. Then you write the offset of the found inode to the address poff and return the pointer to the inode that the subdirectory entry corresponds to by calling inode_get with the inode number in the found subdirectory entry. If not found, it should return 0 (null pointer). Note that the size attribute in the inode structure represents the total sizes, in bytes, of the subdirectory entries, not the number of subdirectory entries. Also, the inode_read takes an offset in bytes, not the index of the subdirectory entry.

In dir_link, you are given an inode pointer (dp) for the current directory, and the name and inode number for the new subdirectory you are going to link. The first thing you should do is to check whether that subdirectory already exists in the current directory (with dir_lookup above), and if so, call inode_put on the inode returned by the dir_lookup and return -1. Recall that dir_lookup calls inode_get in the end, which creates a reference to the in-memory inode structure and increases the reference counter of that inode. Thus, in the case of error, we should call inode_put to drop that unnecessary reference we have created. After the error check, you can loop through the subdirectory structures of dp to find the first unallocated inode (one with the inum field being 0). Then copy the name and inum to the subdirectory structure found and issue a inode_write with the new subdirectory entry and the offset found. You can assume there will always be an unallocated subdirectory entry (or you can also call KERN_PANIC with appropriate error message if not). The function should return 0 in the case of success.


The Path Layer

This layer provides paths like /kern/fs/file.c for convenient naming, and resolves them with recursive lookup. Path name lookup involves a succession of calls to dir_lookup, one for each path component. The function namei evaluates path and returns the corresponding inode. The function nameiparent is just a variant: it stops before the last element, returning the inode of the parent directory and copying the final element into name. Both call the generalized function namex to do the real work.

The function namex starts by deciding where the path evaluation begins. If the path begins with a slash, evaluation begins at the root; otherwise, the current directory. Then it has to use skipelem to consider each element of the path in turn. Each iteration of the loop must look up name in the current inode ip.

Files: path.h, path.c

Exercise 10

Based on the information above, implement the functions skipelem and namex.

The function skipelem(path,name) copies the next path element from path into name. We will use it later in namex to evaluate the succesive elements in the path. It should have the following behavior:

  • skipelem("a/bb/c", name) = "bb/c", setting name to "a"
  • skipelem("///a//bb", name) = "bb", setting name to "a"
  • skipelem("a", name) = "", setting name to "a"
  • skipelem("", name) = skipelem("////", name) = 0

For the implementation, first ignore all leading slash characters in the path by advancing the path pointer until we find a non-slash character. Check that there is something left to process in the new path, and return 0 otherwise. We want to obtain the first group of non-slash characters in the path, and copy them to name. As before, we can advance the pointer until the desired position, but we do not have to forget to record a copy of the original path from where to obtain the first path element. With these two pointers we can obtain the length len of this first path element using pointer arithmetic. We only have to copy len characters to name. Remember to check the case len >= DIRSIZ, and to put 0 at the end of the name string. Finally, as we did at the beginning, ignore the trailing slashes in path.

For the implementation of namex, first take a look at the skeleton code. You will have to process one element in path at a time with the help of skipelem.

Make sure that:

  • Each iteration begins by locking ip and checking that it is a directory. If not, the lookup fails. (Locking ip is necessary not because ip->type can change underfoot —it can’t— but because until inode_lock runs, ip->type is not guaranteed to have been loaded from disk.)
  • If ip's type is not a directory, return 0.
  • If the call is for a parent inode (that is, when nameiparent is true) and this is the last path element (first character in path is '\0'), the loop stops early, as per the definition of nameiparent. We need to copy the final path element into name, so namex need only return the unlocked ip.
  • Then, the loop has to look for the path element using next = dir_lookup(ip, name, 0) and prepare for the next iteration by setting ip = next. When the loop runs out of path elements, it returns ip.
  • Use inode_put every time the inode pointer ip has to be released.
  • Access to the inode pointer ip under evaluation has to be protected with a lock. For every element of the path, lock ip after obtaining the element, and unlock it before returning from namex. The function inode_unlockput may be useful here, which is a shortcut to unlock and put.

Check the comments on the code for more details.


The File Descriptor Layer

Each thread has its own table of open files, or file descriptors. A file descriptor is represented by a struct file, which is a wrapper around an inode.

Files: file.h, file.c

Exercise 11

Read the code carefully and try to understand the code.


The Syscall Layer

This layer is for file system system calls. Most of these syscall functions are just wrappers around the functions in the file descriptor layer, taking care of correctly fetching the syscall arguments.

Files: sysfile.h, sysfile.c

Dependent files:

  • fcntl.h -- defines file modes

Recall that the implementation of the system call handlers you have been working on only had integer arguments. (The system call handler sys_puts for printf passes the pointer to the string constant in the user space, but we have previously implemented the code for you.) When the kernel receives a pointer from a user, you have to be very careful.

  • A user may pass in a bad pointer. If you allow the user to pass in a pointer to arbitrary address (which is possible in C), the user may pass a pointer pointing to kernel data structures or kernel stack, making the kernel crash or even worse, taking control over the kernel. Thus, on each system call handler receiving pointer arguments, we should always check that the pointer does not point to kernel part of the data in memory. Check the implementation of sys_puts in kern/trap/TSyscall/TSyscall.c as an example.

  • We should always insist that the users pass in not only the pointer, but also the length of the content. Assume that we have sys_open system call that passes a pointer to the string containing file name. It may be attempting to just pass in that pointer, and in the kernel you keep reading it until you find '\0'. However, this is very dangerous. The user can actually pass a valid pointer in the user memory, which is close to the kernel-user memory boundary. If the user did not write any '\0' in the user memory, then you will continue to read the kernel part of the memory, thus possibly leaking some secure kernel information to the user, e.g., through error messages. Instead we always insist the user to pass in both the pointer and the length of the content. Then we check that the entire region is located in the user part of the memory.

    On the other hand, e.g., for sys_open, it would not be a good user experience if you always have to pass in both the name of the file and the length of the name, since the length can simply be computed from the name. Instead, we still expose a nice system call interface to the user, but when you actually traps into the kernel (user/include/syscall.h), we automatically add the length of the string as an extra argument. The length of the string can be computed using our user level string library function strlen defined in user/lib/string.c.

  • The pointer passed from the user is a pointer to the user's virual memory space, while in mCertiKOS, kernel always have identity page map, thus directly manipulating the physical address. This means, when the data in the user space is stored in multiple continuous pages, it may no longer be mapped to continuous physical pages. Thus it would be very cumbersome to directly manipulate the data in the user space. Instead, we insist that all the system call handlers only manipulate in-kernel data structures. For example, when the user passes a pointer to some data for read, we create a local in-kernel buffer, and copy the data from the user into the buffer. If the user passes some pointer for write purpose, we let the handlers manipulate a in-kernel buffer instead, and at the end, we copy the content from the in-kernel buffer to the address the pointer points to. The functions that copies data from and to the user's memory are already implemented in kern/lib/pmap.c. They already take care of the multiple-page issue and copy data from and to multiple non-contiguous pages if necessary. As our kernel stack is restricted to one page, any in-kernel buffer that is large should better be defined at the global scope, to prevent the kernel stack overflow.

  • We should limit the length of the data that can be passed from the user. Currently, we restrict the length of the file name to 128 characters and the size of the file buffer to 10000 bytes. If the user passes in a lengh that exceeds its limit, we should reject it and raise appropriate error.

Exercise 12

In user/include/syscall.h, we have some system calls that takes a string constant as arguments but does not pass its length to the kernel. Those are, sys_link, sys_unlink, sys_open, sys_mkdir, and sys_chdir. Modify their implementations to pass in extra arguments for the lengths of corresponding string inputs. Note that you should not change the signatures of those functions, instead, pass in extra arguments to the int inline assembly command. Then modify corresponding system call handlers in kern/fs/sysfile.c so that they take the extra length arguments and copy only required data of given length instead of copying 128 characters every time. Don't forget the trailing '\0' in the end of the string.

Exercise 13

Implement the following system call handlers in kern/fs/sysfile.c related to the file system, considering all the issues explained above. Note that the system call handlers should never go wrong. Thus, you should consider all possible sources of errors and return appropriate error codes to the user. Check kern/thread/PTCBIntro/PTCBIntro.c to see how to manipulate the set of open files for each thread.

  • fdalloc (for sys_open)
  • sys_read
  • sys_write
  • sys_close
  • sys_fstat

Exercise 14

Read the rest of the code we have provided for you, and understand what they are doing.

Exercise 15

Though we only booted all the processes on one CPU in our test setup, our file system implemention should already support concurrency. However, those in-kernel buffers defined at the global level (e.g., the buffers for file read/write) can be simultaneously accessed by multiple processes running on different CPUs. Add appropriate spinlocks to protect the relevant critical sections by preventing others to access the buffers when you are manipulating them.


Part 4: Shell

Now we have implemented a file system support in the mCertiKOS. In this part, you are asked to implement a unix-like shell on top of your file system implementation.

Exercise 16

Implement a user process called shell, and spawn the new process in kern/init/init.c instead of the previous fstest. Feel free to reuse any of the existing user processes, or create a new one.

Exercise 17

Implement a unix-like shell in the new user process. You need to first implement necessary system calls to read appropriate input from the user. Feel free to reuse some of the existing functions in kern/dev/console.c. Then implement at least the following set of commands:

  • ls
  • pwd
  • cd
  • cp
  • mv
  • rm
  • mkdir
  • cat
  • touch (creating an empty file)
  • A command of your choice to write a string into a file, creating the file or overwritting the existing contents if necessary. An example could be "some content" > file.
  • A command of your choice to append a string into a file. It should fail if the file does not exist. An example could be "some content" >> file.
The commands should support both file and directory types, which means that you need to support the -r option forcp and rm. Otherwise, you do not need to implement the command line options or the permissions. Though not required, you are welcome to implement the set of options that you think is necessary. Clearly document what you have implemented, and include an interesting sample run in the README that covers all the commands and options.


Part 5 (Bonus): Concurrent File System Support

Recall that the global buffers we have allocated for the system calls in kern/fs/sysfile.c are not locked. Thus it suffers from the potential race conditions where multiple processes may try to write to or read from the global buffer. As a bonus (thus optional) exercise, you are asked to change the file system implementation to support full concurrency.

Bonus Exercise

We need to really understand the problem before we can solve it. In your README, clearly explain the potential issues that could occur under the current implementation. Note that the file system implementation itself is already properly locked. The issue comes from the unlocked kernel I/O buffers we defined in kern/fs/sysfile.c.

Change kern/fs/sysfile.c (and other appropriate files) to solve the problem. You should solve the issue by using locks in appropriate places. Simply duplicating the buffer for each thread is not acceptable. Note that our file system implementation is interrupt driven, where a particular call may get blocked, and the thread may be put into sleep until some particular interrupt is triggered. Simply wrapping the whole system call handlers with lock acquire and release statements would not work, as we should not allow any thread to hold a lock when it is put into sleep. Thus, we definitely need a more fine-grained solution here.

You should create a test setup to demonstrate the issues in the old implementation and show that your new implementation indeed solves all of them. Clearly document what you have done, your test setup, and a sample run in the README file. The amount of bonus point you receive for this exercise depends on your understanding of the issue, the correctness of your implementation, your test setup, and the quality of your implementation.


This completes the lab. Don't forget editing your README file. Add each of the files that you created to the git repository with the command git add /path/to/the/new/file. After this, you should not see any of the files you want to submit under the "Untracked files" section when you run git status. Commit your changes and type git push in the mcertikos directory to submit your code.