CSE 374, Lecture 7: sed

Summary of grep

But what if we actually want to CHANGE or ADD some input based on the pattern? We can use a program called "sed" to accomplish this task.

Intro to sed

The "sed" program has the name because it is a "stream editor". sed processes one line at a time and performs basic text transformations. Note that multi-line transformations are possible but painful with sed, so they are not suggested.

You can use sed by giving it any options you would like (see man page for options), a command that directs sed what kinds of transformations to make on the input, and a file name (if not present, sed will use stdin or whatever input stream is used instead of a file).

    $ sed [OPTIONS] [COMMAND] [FILE]

While sed can do a wide variety of interesting and powerful transformations (check out the man page, or search Google for a sed tutorial), we'll use it today to do substitutions: replacing one piece of text with another. The substitution command looks like 's/original/replacement/', where you can specify 'original' as a regular expression.

    $ echo "The original copy is the original" > test.txt
    $ sed 's/original/replacement/' test.txt
    The replacement copy is the original

    # Alternatively, you can redirect the input stream from echo instead of a file.
    $ echo "The original copy is the original" | sed 's/original/replacement/'
    The replacement copy is the original

Note that only the first instance of "original" per line was replaced. If you add "g" onto the end of the command, which stands for "global", then you will substitute ALL of the instances of the pattern on the line. The most common way you will use sed is with the 's/.../.../g' command.

    $ echo "The original copy is the original" | sed 's/original/replacement/g'
    The replacement copy is the replacement

This example only has a single line, but sed will run the substitution command on every line of the input and print out all lines (regardless of whether they matched the pattern) to the output.

Also note that by default, sed uses stdout for its output. This means that the original file is NOT modified by the sed command. If you do want to replace the original file with the substituted version, you can use the "-i" option (stands for "in-place"). Be VERY careful with -i - just like the mv or rm commands, you can't undo it if you get it wrong.

    $ sed -i 's/original/replacement/g' test.txt
    $ cat test.txt
    The replacement copy is the original

Exercise: phone numbers

In the last section, we learned to write a regular expression to match any format of phone number. What if we want to rewrite the file to put all phone numbers in a standard format?

Let's say phone numbers are stored in a file people.txt:

    M, Joe        4253921211
    P, Tina       (206) 123-4567
    V, Sue        310-459-1094
    J, Tom        206 772 7341
    A, Anne       206.858.0109

I want to put all numbers in the format (xxx) xxx-xxxx.

First, we can just make sure we match all phone numbers and replace with the word "test".

    $ sed 's/(\?[0-9]\{3\})\?[- .]\?[0-9]\{3\}[- .]\?[0-9]\{4\}/test/g' people.txt
    M, Joe        test
    P, Tina       test
    V, Sue        test
    J, Tom        test
    A, Anne       test

Then we can use "capture groups" to capture the strings that represent each group of numbers. Then we can use backreferences to those capture groups on the "replacement" side of the command:

                  first capture           2nd capture         3rd capture     replacement
                        v                      v                  v                v
    $ sed 's/(\?\([0-9]\{3\}\))\?[- .]\?\([0-9]\{3\}\)[- .]\?\([0-9]\{4\}\)/(\1) \2-\3/g' people.txt
    M, Joe        (425) 392-1211
    P, Tina       (206) 123-4567
    V, Sue        (310) 459-1094
    J, Tom        (206) 772-7341
    A, Anne       (206) 858-0109

What if we want to remove the phone numbers? We can use an empty replacement:

    $ sed 's/(\?[0-9]\{3\})\?[- .]\?[0-9]\{3\}[- .]\?[0-9]\{4\}//g' people.txt
    M, Joe        
    P, Tina       
    V, Sue        
    J, Tom        
    A, Anne       

How about swapping the order of the first name and last name at the beginning of each line? We can use backreferences but swap the order (\2 and then \1).

    $ sed 's/^\([A-Z][a-zA-Z]*\), \([A-Z][a-zA-Z]*\) /\2 \1 /g' people.txt
    Joe M        4253921211
    Tina P       (206) 123-4567
    Sue V        310-459-1094
    Tom J        206 772 7341
    Anne A       206.858.0109

More details