lec 6: file I/O

how’s hw1 going?

You wake up, segfaults

Post up, segfaults

Ride round in it, segfaults

Flossin on that, segfaults

– Stefan Dierauf

common error: lifetime

struct Pair {
    int x, y;
};

void foo(LinkedList list, struct Pair p) {
    list->payload = &p;
}

p is passed by value; its lifetime ends when foo exits

list->payload becomes a dangling pointer

valgrind is your friend

acknowledge those with whom you discuss problems

administrivia

  • hw1 due next Tuesday
  • this Friday: Bitcoin’s bounty
  • midterm in 2 weeks
  • paper readings: high-level systems ideas & concepts
    • MapReduce, valgrind, Rust, Plan 9, node, Eraser, belief, Mars
    • talk to us if you would like to lead a paper discussion

feedback

We never learned anything about how to use typedef, makefiles, or several other things that were then required as part of the exercises. I’ve had to do extensive google-ing for nearly every exercise just to learn how to do the things that were asked. If you expect us to know information, it should be covered in lecture/section, or there should at least be a comprehensive tutorial attached to the exercise.

  • you should know typedef/makefiles from 351 and sections
  • yes, you are encouraged & expected to do google-ing
    • programming job → google-ing every day
    • tutorial: see resources
  • talk to us about what you want to see specifically
  • lectures focus on ideas & concepts; sections focus on details

today’s plan

  • unix filesystem recap
  • file I/O interfaces
    • POSIX I/O syscalls: file descriptors
    • C stream I/O: FILE
    • C++ iostream (next week)

why filesystem

  • naming (week 7)
    • data organization
    • sharing
  • persistent storage across reboots
    • power outage, system crash, meteor strike (?)

Amazon Cloud Goes Down Friday Night, Taking Netflix, Instagram And Pinterest With It

Forbes, June 30, 2012 (see also Amazon’s summary)

file types

  • regular files
  • directories
  • special files
    • symbolic link, pipe, socket, …

filesystem hierarchy

/home/auser/ex0.c

  • a file does not belong to a particular directory
    • a file is linked (hardlinks) and can have multiple names
    • a file is deleted when the last link is removed
    • acyclic: no link between directories
  • current root directory, current working directory

permissions

examples octal description
-r--r--r-- 0444 anyone can read
-rwxrwxrwx 0777 anyone can read/write/execute
-rw-r--r-- 0644 owner can read/write; anyone can read
--w------- 0200 owner can write

see also: setuid/setgid/sticky, access control lists

open / close

#include <err.h>
#include <fcntl.h>
#include <stdlib.h>
#include <unistd.h>
int main(void) {
    int fd = open("foo.txt", O_RDONLY);
    if (fd == -1)
        err(EXIT_FAILURE, "open");
    /* ... */
    close(fd);
}
  • open: pass in the filename and access mode
    • get back a file descriptor
    • need an extra permission parameter if creating a new file
  • close: pass in an opened file descriptor

file descriptor

  • OS abstraction for accessing a file

  • an integer int used in many syscalls

  • kernel maintains a per-process fd table
    • map fd to state: flags, offset, …

predefined file descriptors

name macro integer value
standard input STDIN_FILENO 0
standard output STDOUT_FILENO 1
standard error STDERR_FILENO 2

example:

write(STDOUT_FILENO, "hello world!\n", 13);

read / write

ssize_t read(int fd, void *buf, size_t count);
ssize_t write(int fd, const void *buf, size_t count);
  • read/write up to count bytes for fd
    • offset?
  • return value
    • success: number of bytes read/write (what about 0?)
    • error: -1 (check errno)

q: how to safely update a file? see also Alice’s notes questions

  • write a new temporary file & rename (demo: strace sed -i)
  • make a backup & write (exercise: strace vi/emacs)

fsync

q: is the content on disk if the machine crashes after close?

int fd = open("foo.txt", O_WRONLY | O_CREAT, 0600);
write(fd, "hello world!\n", 13);
close(fd);

maybe, maybe not

you need to flush cache onto storage device to ensure durability

int fsync(int fd);

other fd operations

see a list of fd operations

read glibc’s Low-Level Input/Output/manpages/google

FILE stream

defined in stdio.h: fopen, fread, fwrite, fprintf, fclose, …

implemented on top of POSIX I/O syscalls

FILE stream can be either text or binary

buffered by default: use fflush to flush

predefined streams: stdin, stdout, stderr

see glibc’s Input/Output on Streams/manpages/google

FILE stream example

#include <stdio.h>
#include <stdlib.h>
#include <err.h>
int main(int argc, char **argv) {
    if (argc != 2) return EXIT_FAILURE;
    FILE *f = fopen(argv[1], "rb"); // read only, binary mode
    if (!f)
        err(EXIT_FAILURE, "fopen");
    for (;;) {                      // read from file & write to stdout
        char buf[256];
        size_t count = fread(buf, 1, sizeof(buf), f);
        if (count == 0)
            break;
        fwrite(buf, 1, count, stdout);
    }
    fclose(f);
    return EXIT_SUCCESS;
}

DIR stream

opendir, readdir, closedir, …

read glibc’s Accessing Directories

or read manpage section 3: man 3 opendir

sum up

  • different levels of I/O abstractions
    • file descriptor, FILE, …
  • reliability: caching vs flushing
    • how to update a file
    • what if machine crashes in the middle

self-exercise

$ cat in.txt
1213
3231
000005
52
$ ./ex1 in.txt
5
52
1213
3231

input file: each line contains an integer

your program: parse input into uint32_t’s & sort & print

see soln

see you on Friday

bug case study: Bitcoin

alternative interface: mmap