CSE 374, Lecture 3: I/O and Intro to Scripts

I/O Streams

Each process needs a way to interact with the outside world - a program that is running on the computer but doesn't have a way to get input from you or to show you the result of its computation is not very useful!

We classify the modes of interaction into three categories:

Input. How does the process get data?
Output. How does the process show the results of its computation?
Error. How does the process communicate diagnostic information or things that went wrong?

We call these "I/O streams" for "input" and "output".

Each I/O stream has a default implementation.

Input: "stdin" - the keyboard
Output: "stdout" - the screen
Error: "stderr" - also the screen

Redirection

What if we don't want to use the default input and output streams? Sometimes we might not want to take input manually from the keyboard, or we might want to save output in a file instead of just printing it to the screen. We call the process of changing the input and output streams of a program "redirection".

You specify redirection of input and output with another set of special characters.

    >     # Output stream to a file   ex. ls foo/ > foo_ls.txt
    2>    # Error stream to a file    ex. ls doesnotexist 2> error.txt
    >>    # Output stream to a file, but append to the file, don't overwrite
    &>    # Output and error streams to the same file
    <     # Input stream from a file  ex. cat < test.txt
    |     # Output stream of the first command equals the input stream of the second command
          #     ex. cat AnnaKarenina.txt | less
    (and more: 2>>, &>>, <<, <<<, 2>&1)

Redirection is done entirely in the shell - the programs that you are running know nothing about it. All they have is a way to get input and a place to write output.

Piping is a vary powerful idea. It means we don't have to execute commands in isolation - now we can string them together! This takes small, simple commands and creates a more useful output. For instance, ps aux produces a lot of output. We can navigate that output better by doing:

    $ ps aux | less

This takes the output of ps aux - which is a lot - and uses it as the input stream of the less command. This is a nicer way to view the output.

One particular file on every Linux system that is useful when doing redirection is the file /dev/null. What is this file? Read up on it on your own and figure it out.

Check out a few more commands that combine commands on your own: ";", "||", "&&", "cmd1 `cmd2`"

The Model

Let's take a step back and summarize the model of the shell that we've learned so far.

A computer consists of an operating system, a file system, one or more CPUs, one or more "users" (one or more of whom are "administrators" or superusers), and many "processes" (running programs).
The "shell" is a program that takes text that you write and executes corresponding programs.
The shell has state:
- Working directory
- User
- Aliases
- History
The shell's state can be changed by using the "source" command (ex. source .bashrc)
To run programs within the shell, you type a "command" followed by dash-prefixed "options" and then zero or more "arguments" (just like arguments to a function)
The shell modifies the commands, options, and arguments with substitution. The program never sees the originals - only the substitutions.
- metacharacters (*, ?, !)
- aliases
The shell modifies the input and output streams of the program based on any redirection commands that are provided. The program does not know what input and output streams are used, just that it has an input and output stream.
Running a program results in starting a new "process" to run the code in the program.
Processes can be run in the foreground (visible to you) or the background (running, but not visible to you).

In fact, we can think of the commands and actions we take in the shell as a kind of programming language. You can manipulate the state of the shell program and the state of the underlying file system by executing "lines of code" (commands). This isn't a super great programming language, but it is a language that can do some powerful things.

Scripting

We can actually write shell "programs" - we'll call them scripts. In fact, we've already seen one! The .bashrc file is a type of script. But we'll dive in more deeply and understand how to write these programs ourselves.

Let's start with something simple. Let's write a program "listhome" that lists the contents of the home directory. To do this in bash, we could execute the following two commands.

    $ cd
    $ ls

But how do we write a program that does this? Well, we create a file with those two lines!

    #!/bin/bash
    cd
    ls

The first line indicates that this program is a shell program - the shell should use /bin/bash (which is the shell itself) to execute the commands. Note that the program here could really refer to any program - like python, for example. The shell will find the program you specified and run it with the contents that follow the first line.

Now we want to run the program. We can run a program like so:

     $ ./listhome
     -bash: ./listhome: Permission denied

Uh oh! What went wrong? Last lecture we discussed permissions: read, write, and execute. But if we run ls, we see that our listhome program doesn't have "x" or execute permissions, which means we can't run it.

     $ ls
     -rw-r--r-- 1 mwinst 72623 Mar 29 12:00 listhome

We'll need to mark the file as editable.

     $ chmod +x listhome

Now we can run the file as before. (Aside: we ran "./listhome" and not just "listhome". Why? Investigate and figure it out. Hint: it has something to do with the PATH)

Now wait a minute! When we worked with our .bashrc files, we "ran" those too, but we did it in a different way:

     $ source .bashrc

What's the difference between source and "./"? We can use source on our listhome program too:

     $ source listhome

Notice the difference? When we run the "source" command, we execute the commands in the SAME process as the current shell. This modifies the shell's state - changing the working directory. But if we use the "./" syntax to run the program, the shell will spawn a NEW process of bash to execute the commands - in this case changing the state of the NEW process and not the original process. The ./ syntax provides isolation: the program will not affect the state of the original shell.

Arguments

Now let's do something a little bit cooler. I want to write a script to take an image and produce a thumbnail (smaller version of the image). After a little investigation, I've discovered a couple of programs that might be able to help us:

    djpeg     # Decompresses a jpeg image
    cjpeg     # Recompresses a jpeg image
    pamscale  # Scales an image

How would we combine these to scale an image "Dog.jpg"? We can use piping to accomplish this task:

    djpeg < Dog.jpg | pamscale -xysize 100 150 | cjpeg > DogThumbnail.jpg

OK, but I want to write a program that I can use on any file. I want to take the names of the input file and output file as ARGUMENTS. If we were programming in Java, I'd like to write the function:

    public void makeThumbnail(String inputImg, String outputImg) {...}

We can do this in bash too! In a shell script we don't explicitly declare the variables like we would in Java, but we can refer to arguments by their position.

    #!/bin/bash
    djpeg < $1 | pamscale -xysize 100 150 | cjpeg > $2

Note that we are starting with 1-based indexing here, which is a little odd; don't all computer scientists start at 0? We do actually start at 0, but $0 is always the name of the program that is being run (makethumbnail in this case)

Then we would execute the shell program with two arguments for the input and output files:

    $ ./makethumbnail Dog.jpg Dog2.jpg

What if we also wanted to take the size of the thumbnail as input?

    #!/bin/bash
    djpeg < $1 | pamscale -xysize $3 $4 | cjpeg > $2

Which could be called as:

    $ ./makethumbnail Dog.jpg Dog2.jpg 100 150

If statements

What if we want to make sure that a caller provides all 4 arguments to our makethumbnail program, and print out a helpful error message if they don't?

We really would need an if statement: if there are not four arguments, print an error message. It turns out that bash does have if statements, although they look a little different than in Java.

    if [ $# != 4 ]
    then
      echo "$0: need 4 arguments: source-jpeg destination-file new-x-size new-y-size"
      exit 1
    fi

The spaces around the brackets are required. We're also using $#, which gives the number of arguments (not including $0) that were provided to the script.

We also call exit. What does that do? Remember that when we run the program with "./", it creates a new bash process. So if we call exit, then that process ends. "exit" also has an optional "error code" that you can provide to it to indicate whether or not the program ran successfully. An exit code of "0" indicates no errors, while "1" indicates a general error of some kind.

What if we want to allow the caller to omit the size arguments and use a default if they aren't provided? We can introduce a new branch into our if statement:

    if [ $# == 2 ]
    then
      djpeg < $1 | pamscale -xysize 100 150 | cjpeg > $2
    elif [ $# == 4 ]
    then
      djpeg < $1 | pamscale -xysize $3 $4 | cjpeg > $2
    else
      echo "$0: need 2 arguments: source-jpeg destination-file x-size y-size"
      exit 1
    fi