1
|
- Richard C. Davis
UW CSE – 10/9/2006
- Lecture 6 – String Processing
|
2
|
- Assignment 2 is due in one week
- Office hours
- Mine are after class today
- I’ll have extra office hours later this week
- T/A office hours will be posted by next lecture
- Follow Turn-in instructions!
- username-hw2
- Plain text in readme.txt
|
3
|
- Replacing Strings in Emacs
- M-x query-replace
- Enter name of string to search for
- Enter name of string to replace
- Enter “y” or “n” for each match found
- M-x replace-string
- Same, but does not prompt for each string
|
4
|
|
5
|
- String Processing
- sed :single lines
- More Complex
- awk: more complex processing
- perl, python, ruby :general programming
- Shell Wrap-Up
|
6
|
- We’ve learned to automate simple tasks
- Move around files
- Start/Stop processes
- Change user environment/permissions
- But what about…
- Changing strings
- Repetitive edits to multiple files
- sed :can help (used in HW2)
|
7
|
- sed
- Non-interactive editor
- Performs editing actions
- Actions defined in a “script”
- Stream-oriented
- Input from file or stdin
- Script processes each line
- Output goes to stdout
|
8
|
- Each line copied to “pattern space”
- All editing commands applied
- To data in pattern space
- Done in sequence
- Original input does not change
- Possible to restrict edits to subset of lines
|
9
|
- Method 1: One-line syntax
- sed [options] 'command' file(s)
- sed -e 'cmd1' –e 'cmd2' file(s)
- Method 2: Script file holds commands
- sed [options] –f script file(s)
|
10
|
- Most Common use
- sed ‘s/pattern/replacement/g file
- Means “replace every (longest) substring that matches pattern
with replacement”
- Common variations
- Omit g at end: replace only first match
- Put num at end: replace every numth match
- sed -n : suppress normal output
- Put p at end: print matching lines
- sed -r : Use “extended” regular expressions
|
11
|
- Can replace with all or part of a match
- Special characters in replacement
- & : Entire pattern space
- \1 : String that match 1st set of parentheses
- \2 : String that match 2nd set of parentheses
- …
|
12
|
- Not so useful
- sed 's/a/b/g' ex1.txt
- sed 's/a/b/' ex1.txt<=
/li>
- sed 's/a/b/2' ex1.txt
- sed -n 's/a/b/2p' ex1.txt
- More useful
- sed 's/.*Linux \(.*\) .*/\1:/' ex2.txt
- sed 's/.*Linux.*/&:/' ex2.txt
- Newline Note
- The \n is not in the text matched against and is (re)-added when
printed
|
13
|
- General syntax of sed commands
- [address[,address]][!]command[args]
- Address specifies range to look at
- Address types
- Line with a particular number e.g.: 3
- Lines matching pattern e.g.: /SAVE/
- Using two addresses specifies a range of lines
- Using ! Means “use lines not specified in address”
- Other Commands
|
14
|
- Delete lines 3-5: sed '3,5 d' ex3.c
- Delete lines that don’t contain SAVE
- Delete lines that start with //
- Delete lines between /* and */
- sed '/\/\*/, /\*\// d' ex3.c
|
15
|
- Commands so far: substitute, print, delete
- Other commands (not used in class)
- Append, replace with block, insert, translate
- Branch to label
- Multi-line patterns
- The hold space for fancy editing
- E.g., copy and paste of lines
- Need these? Use more powerful language
|
16
|
Awk<=
/div>
- Processes text files
- File contains records
- Separated by newline (default)
- Records contain fields
- Separated by spaces (default)
- Why use awk?
- Generate reports from logs
- Process results of an experiment
- (Named after authors, Aho, Weinberger, and Kernighan)
|
17
|
- One-line syntax
- awk [options] 'script' file(s)
- Script file
- awk [options] –f scriptFile file(s)
|
18
|
- Script structure
- Records processed one at a time
- Pattern restricts to matching records
- Fields accessed with $1, …$n
- BEGIN and END patterns
- For procedures before/after processing file
|
19
|
- awk is a very powerful language
- Looping constructs
- Arrays
- Functions
- Fancy printing
- Powerful math functions
- Need these? Use Perl, Python, or Ruby
|
20
|
- Perl, Python, and Ruby
- Interpreted
- Write scripts like bash
- Prefix script with #!<program path>
- Make executable with chmod
- Pre-compiled (fast!)
|
21
|
- Practical Extraction and Report Language
- Or “Pathologically Eclectic Rubbish Lister”
- Language properties
- Excellent pattern matching
- “Kitchen Sink” syntax
- No objects in original version
|
22
|
- Fully Object Oriented
- Simpler Syntax
- Allows different styles
|
23
|
- Fully Object Oriented
- Syntax more similar to Smalltalk
- Many different ways to do the same things
|
24
|
- String Processing
- sed : quick mods to single lines
- awk : more complex record processing
- perl, python, ruby: learn one
- That’s all for the shell!
- Note: We don’t require you to know how to use any scripting to=
ols
other than sed in this class, but we do require you to know when you
should consider learning to use one of these tools.
|
25
|
|