But what if we actually want to CHANGE or ADD some input based on the pattern? We can use a program called "sed" to accomplish this task.
The "sed" program has the name because it is a "stream editor". sed processes one line at a time and performs basic text transformations. Note that multi-line transformations are possible but painful with sed, so they are not suggested.
You can use sed by giving it any options you would like (see man page for options), a command that directs sed what kinds of transformations to make on the input, and a file name (if not present, sed will use stdin or whatever input stream is used instead of a file).
$ sed [OPTIONS] [COMMAND] [FILE]
While sed can do a wide variety of interesting and powerful transformations (check out the man page, or search Google for a sed tutorial), we'll use it today to do substitutions: replacing one piece of text with another. The substitution command looks like 's/original/replacement/', where you can specify 'original' as a regular expression.
$ echo "The original copy is the original" > test.txt
$ sed 's/original/replacement/' test.txt
The replacement copy is the original
# Alternatively, you can redirect the input stream from echo instead of a file.
$ echo "The original copy is the original" | sed 's/original/replacement/'
The replacement copy is the original
Note that only the first instance of "original" per line was replaced. If you add "g" onto the end of the command, which stands for "global", then you will substitute ALL of the instances of the pattern on the line. The most common way you will use sed is with the 's/.../.../g' command.
$ echo "The original copy is the original" | sed 's/original/replacement/g'
The replacement copy is the replacement
This example only has a single line, but sed will run the substitution command on every line of the input and print out all lines (regardless of whether they matched the pattern) to the output.
Also note that by default, sed uses stdout for its output. This means that the original file is NOT modified by the sed command. If you do want to replace the original file with the substituted version, you can use the "-i" option (stands for "in-place"). Be VERY careful with -i - just like the mv or rm commands, you can't undo it if you get it wrong.
$ sed -i 's/original/replacement/g' test.txt
$ cat test.txt
The replacement copy is the original
In the last section, we learned to write a regular expression to match any format of phone number. What if we want to rewrite the file to put all phone numbers in a standard format?
Let's say phone numbers are stored in a file people.txt:
M, Joe 4253921211
P, Tina (206) 123-4567
V, Sue 310-459-1094
J, Tom 206 772 7341
A, Anne 206.858.0109
I want to put all numbers in the format (xxx) xxx-xxxx.
First, we can just make sure we match all phone numbers and replace with the word "test".
$ sed 's/(\?[0-9]\{3\})\?[- .]\?[0-9]\{3\}[- .]\?[0-9]\{4\}/test/g' people.txt
M, Joe test
P, Tina test
V, Sue test
J, Tom test
A, Anne test
Then we can use "capture groups" to capture the strings that represent each group of numbers. Then we can use backreferences to those capture groups on the "replacement" side of the command:
first capture 2nd capture 3rd capture replacement
v v v v
$ sed 's/(\?\([0-9]\{3\}\))\?[- .]\?\([0-9]\{3\}\)[- .]\?\([0-9]\{4\}\)/(\1) \2-\3/g' people.txt
M, Joe (425) 392-1211
P, Tina (206) 123-4567
V, Sue (310) 459-1094
J, Tom (206) 772-7341
A, Anne (206) 858-0109
What if we want to remove the phone numbers? We can use an empty replacement:
$ sed 's/(\?[0-9]\{3\})\?[- .]\?[0-9]\{3\}[- .]\?[0-9]\{4\}//g' people.txt
M, Joe
P, Tina
V, Sue
J, Tom
A, Anne
How about swapping the order of the first name and last name at the beginning of each line? We can use backreferences but swap the order (\2 and then \1).
$ sed 's/^\([A-Z][a-zA-Z]*\), \([A-Z][a-zA-Z]*\) /\2 \1 /g' people.txt
Joe M 4253921211
Tina P (206) 123-4567
Sue V 310-459-1094
Tom J 206 772 7341
Anne A 206.858.0109
There are a number of other types of commands (besides substitution) which you can use the man page or a tutorial to look into (Google for "sed tutorial") such as "p" and "d".
If you prefix the command with "-e", you can provide multiple substitutions. Those substitutions are applied to every line in the order given.
$ sed -e 's/orig1/replacement1/g' -e 's/orig2/replacement2/g' file.txt
You can use a different delimeter than the default slash "/". You might do this because usually you need to "escape" the delimiter in your regular expression in order to match the literal delimeter character. So if your pattern has a bunch of slashes in it, you might use something like an underscore as a delimiter instead.
$ sed 's_original_replacement_g' file.txt
Sed is really good at one-line replacements but it is possible (although painful) to do multi-line replacements. You can look up sed's "hold buffer" which it uses for preserving information across lines. However if you really want to do multi-line, more complicated things, look into the "awk" program which is much better at that.
Sed is very powerful but we will only be doing the basics in HW3 and exams. A one-liner is plenty for our purposes.