CSE 190M Web Programming
Lecture 10: Regular Expressions, Tables
Reading: 2.2.2
Except where otherwise noted, the contents of this document are
Copyright 2012 Marty Stepp, Jessica Miller, Victoria Kirst and Roy McElmurry IV.
All rights reserved.
Any redistribution, reproduction, transmission, or storage of part
or all of the contents in any form is prohibited without the author's
expressed written permission.
What is form validation?
- validation: ensuring that form's values are correct
- some types of validation:
- preventing blank values (email address)
- ensuring the type of values
- integer, real number, currency, phone number, Social Security number, postal address, email address, date, credit card number, ...
- ensuring the format and range of values (ZIP code must be a 5-digit integer)
- ensuring that values fit together (user types email twice, and the two must match)
A real form that uses validation
Client vs. server-side validation
Validation can be performed:
- client-side (before the form is submitted)
- can lead to a better user experience, but not secure (why not?)
- server-side (in PHP code, after the form is submitted)
- needed for truly secure validation, but slower
- both
- best mix of convenience and security, but requires most effort to program
An example form to be validated
<form action="http://foo.com/foo.php" method="get">
<div>
City: <input name="city" /> <br />
State: <input name="state" size="2" maxlength="2" /> <br />
ZIP: <input name="zip" size="5" maxlength="5" /> <br />
<input type="submit" />
</div>
</form>
- Let's validate this form's data on the server...
Basic server-side validation code
$city = $_POST["city"];
$state = $_POST["state"];
$zip = $_POST["zip"];
if (!$city || strlen($state) != 2 || strlen($zip) != 5) {
print "Error, invalid city/state/zip submitted.";
}
- basic idea: examine parameter values, and if they are bad, show an error message and abort. But:
- How do you test for integers vs. real numbers vs. strings?
- How do you test for a valid credit card number?
- How do you test that a person's name has a middle initial?
- (How do you test whether a given string matches a particular complex format?)
The htmlspecialchars
function
- text from files / user input / query params might contain <, >, &, etc.
- we could manually write code to strip out these characters
- better idea: allow them, but escape them
$text = "<p>hi 2 u & me</p>";
$text = htmlspecialchars($text);
Exercise: Turnin HTML
- Update our save-homework.php page so that it can display HTML code back to the user
Regular expressions
/^[a-zA-Z_\-]+@(([a-zA-Z_\-])+\.)+[a-zA-Z]{2,4}$/
- regular expression ("regex"): a description of a pattern of text
- can test whether a string matches the expression's pattern
- can use a regex to search/replace characters in a string
- regular expressions are extremely powerful but tough to read
(the above regular expression matches email addresses)
- regular expressions occur in many places:
- Java:
Scanner
, String
's split
method (CSE 143 sentence generator)
- supported by PHP, JavaScript, and other languages
- many text editors (TextPad) allow regexes in search/replace
Basic regular expressions
/abc/
- in PHP, regexes are strings that begin and end with
/
- the simplest regexes simply match a particular substring
- the above regular expression matches any string containing
"abc"
:
-
YES:
"abc"
,
"abcdef"
,
"defabc"
,
".=.abc.=."
,
...
-
NO:
"fedcba"
,
"ab c"
,
"PHP"
,
...
Wildcards: .
- A dot
.
matches any character except a \n
line break
/.oo.y/
matches
"Doocy"
,
"goofy"
,
"LooNy"
,
...
- A trailing
i
at the end of a regex (after the closing /
) signifies a case-insensitive match
-
/mart/i
matches
"Marty Stepp"
,
"smart fellow"
,
"WALMART"
,
...
Special characters: |
, ()
, \
|
means OR
/abc|def|g/
matches "abc"
, "def"
, or "g"
- There's no AND symbol. Why not?
()
are for grouping
/(Homer|Marge) Simpson/
matches "Homer Simpson"
or "Marge Simpson"
\
starts an escape sequence
- many characters must be escaped to match them literally:
/ \ $ . [ ] ( ) ^ * + ?
/<br \/>/
matches lines containing <br />
tags
Quantifiers: *
, +
, ?
*
means 0 or more occurrences
/abc*/
matches "ab"
, "abc"
, "abcc"
, "abccc"
, ...
/a(bc)*/
matches "a"
, "abc"
, "abcbc"
, "abcbcbc"
, ...
/a.*a/
matches "aa"
, "aba"
, "a8qa"
, "a!?xyz__9a"
, ...
+
means 1 or more occurrences
/a(bc)+/
matches "abc"
, "abcbc"
, "abcbcbc"
, ...
/Goo+gle/
matches "Google"
, "Gooogle"
, "Goooogle"
, ...
?
means 0 or 1 occurrences
/a(bc)?/
matches "a"
or "abc"
More quantifiers: {min,max}
{min,max}
means between min and max occurrences (inclusive)
/a(bc){2,4}/
matches "abcbc"
, "abcbcbc"
, or "abcbcbcbc"
- min or max may be omitted to specify any number
{2,}
means 2 or more
{,6}
means up to 6
{3}
means exactly 3
Anchors: ^
and $
^
represents the beginning of the string or line;
$
represents the end
-
/Jess/
matches all strings that contain Jess
;
/^Jess/
matches all strings that start with Jess
;
/Jess$/
matches all strings that end with Jess
;
/^Jess$/
matches the exact string "Jess"
only
-
/^Mart.*Stepp$/
matches "MartStepp"
, "Marty Stepp"
, "Martin D Stepp"
, ...
but NOT "Marty Stepp stinks"
or "I H8 Martin Stepp"
-
(on the other slides, when we say,
/PATTERN/
matches "text"
, we really mean that it matches any string that contains that text)
Character sets: []
-
[]
group characters into a character set; will match any single character from the set
/[bcd]art/
matches strings containing "bart"
, "cart"
, and "dart"
- equivalent to
/(b|c|d)art/
but shorter
- inside
[]
, many of the modifier keys act as normal characters
/what[!*?]*/
matches "what"
, "what!"
, "what?**!"
, "what??!"
, ...
- What regular expression matches DNA (strings of A, C, G, or T)?
Character ranges: [start-end]
- inside a character set, specify a range of characters with
-
/[a-z]/
matches any lowercase letter
/[a-zA-Z0-9]/
matches any lower- or uppercase letter or digit
- an initial
^
inside a character set negates it
/[^abcd]/
matches any character other than a, b, c, or d
- inside a character set,
-
must be escaped to be matched
/[+\-]?[0-9]+/
matches an optional +
or -
, followed by at least one digit
- What regular expression matches letter grades such as A, B+, or D- ?
Escape sequences
- special escape sequence character sets:
-
\d
matches any digit (same as [0-9]
);
\D
any non-digit ([^0-9]
)
-
\w
matches any word character
(same as [a-zA-Z_0-9]
);
\W
any non-word char
-
\s
matches any whitespace character ( , \t
, \n
, etc.);
\S
any non-whitespace
- What regular expression matches dollar amounts of at least $100.00 ?
- regex syntax: strings that begin and end with
/
, such as "/[AEIOU]+/"
function |
description |
preg_match(regex, string)
|
returns TRUE if string matches regex
|
preg_replace(regex, replacement, string)
|
returns a new string with all substrings that match regex replaced by replacement
|
preg_split(regex, string)
|
returns an array of strings from given string broken apart using given regex as delimiter (like explode but more powerful)
|
PHP form validation w/ regexes
$state = $_POST["state"];
if (!preg_match("/^[A-Z]{2}$/", $state)) {
print "Error, invalid state submitted.";
}
preg_match
and regexes help you to validate parameters
- sites often don't want to give a descriptive error message here (why?)
Regular expression PHP example
$str = "the quick brown fox";
$str = preg_replace("/[aeiou]/", "*", $str);
$words = preg_split("/[ ]+/", $str);
for ($i = 0; $i < count($words); $i++) {
if (preg_match("/\\*{2,}/", $words[$i])) {
$words[$i] = strtoupper($words[$i]);
}
}
- notice how
\
must be escaped to \\
Exercise: Turnin Validation
- Use regular expressions to verify that the needed parameters in our save-homework.php
file have proper values.
A 2D table of rows and columns of data (block element)
<table>
<tr><td>1,1</td><td>1,2 okay</td></tr>
<tr><td>2,1 real wide</td><td>2,2</td></tr>
</table>
1,1 | 1,2 okay |
2,1 real wide | 2,2 |
table
defines the overall table, tr
each row, and td
each cell's data
- tables are useful for displaying large row/column data sets
-
NOTE: tables are sometimes used by novices for web page layout, but this is not proper semantic HTML and should be avoided
Table headers, captions:
<th>
,
<caption>
<table>
<caption>My important data</caption>
<tr><th>Column 1</th><th>Column 2</th></tr>
<tr><td>1,1</td><td>1,2 okay</td></tr>
<tr><td>2,1 real wide</td><td>2,2</td></tr>
</table>
My important data
Column 1 | Column 2 |
1,1 | 1,2 okay |
2,1 real wide | 2,2 |
th
cells in a row are considered headers; by default, they appear bold
- a
caption
at the start of the table labels its meaning
Exercise: Score Table
- Use a
table
to display homework scores in our scores.php file.
- Give the table column headers and a caption.
Styling tables
table { border: 2px solid black; caption-side: bottom; }
tr { font-style: italic; }
td { background-color: yellow; text-align: center; width: 30%; }
My important data
Column 1 | Column 2 |
1,1 | 1,2 okay |
2,1 real wide | 2,2 |
- all standard CSS styles can be applied to a table, row, or cell
- table specific CSS properties:
table, td, th { border: 2px solid black; }
table { border-collapse: collapse; }
Without border-collapse
Column 1 | Column 2 |
1,1 | 1,2 |
2,1 | 2,2 |
With border-collapse
Column 1 | Column 2 |
1,1 | 1,2 |
2,1 | 2,2 |
- by default, the overall table has a separate border from each cell inside
- the
border-collapse
property merges these borders into one
The rowspan
and colspan
attributes
<table>
<tr><th>Column 1</th><th>Column 2</th><th>Column 3</th></tr>
<tr><td colspan="2">1,1-1,2</td>
<td rowspan="3">1,3-3,3</td></tr>
<tr><td>2,1</td><td>2,2</td></tr>
<tr><td>3,1</td><td>3,2</td></tr>
</table>
colspan
makes a cell occupy multiple columns; rowspan
multiple rows
text-align
and vertical-align
control where the text appears within a cell
<table>
<col class="urgent" />
<colgroup class="highlight" span="2"></colgroup>
<tr><th>Column 1</th><th>Column 2</th><th>Column 3</th></tr>
<tr><td>1,1</td><td>1,2</td><td>1,3</td></tr>
<tr><td>2,1</td><td>2,2</td><td>2,3</td></tr>
</table>
.urgent {
background-color: pink;
}
.highlight {
background-color: yellow;
}
col
tag can be used to define styles that apply to an entire column (self-closing)
colgroup
tag applies a style to a group of columns (NOT self-closing)
Exercise: Score Table Styles
- Add a collapsed table border style.
- Zebra stripe the rows.
- Make the first column a different color.
Don't use tables for layout!
- (borderless) tables appear to be an easy way to achieve grid-like page layouts
- many "newbie" web pages do this (including many UW CSE web pages...)
- but, a
table
has semantics; it should be used only to represent an actual table of data
- instead of tables, use
div
s, widths/margins, floats, etc. to perform layout
- tables should not be used for layout!
- Tables should not be used for layout!!
- TABLES SHOULD NOT BE USED FOR LAYOUT!!!
- TABLES SHOULD NOT BE USED FOR LAYOUT!!!!