Source: https://xkcd.com/1288/
Wrapping up Module 4 material
Returning to input validation: client-side vs. server-side validation
Which motivates... Regex!
Many websites offer features that allow users to interact with the page and request/submit data to servers. Unfortunately, not all users will behave as expected.
Validation can be performed:
Prioritizing validation is important as web developers so that the websites we build are:
If you're interested in learning more, MDN has a good quick introduction to web security, and OWASP is a fantastic resource for all things related to web security. You can also find a good article on how to write user-friendly form UIs here.
The takeaway? There are many ways to perform validation, but MDN/OWASP both are great resources to refer to based on the context of your websites
Most importantly, don't trust that users will provide correct/safe input!
Name: <input name="name" type="text" />
Email: <input name="email" type="email" />
Student Number: <input name="sid" type="number" />
<textarea name="question" rows=5></textarea>
HTML
Last week, we discussed some ideas about what we might want to require as valid input (e.g. 7-digit student numbers starting with 1).
We can validate this input in a few different ways:
We've already seen some ways to use HTML5 tags to require certain types of input by
adding attributes to your <input>
tags to help with validation
<input type="number">
HTML
We can limit the up and down arrows with min
(and max
if we choose)
<input type="number" min=0>
HTML
To insist that there is a value in the input field we can add required
<input type="number" required>
HTML
To prevent a user from being able to type in erroneous values, we can add a
regular expression to the pattern
attribute
<input type="text" pattern="[A-Z][a-z]+ [A-Z][a-z]+">
HTML
<form id="input-form">
<div>
<label for="name-input">Name: </label>
<input id="name-input" name="student-name" type="text" pattern="[A-Z][a-z]+" required/>
</div>
<div>
<label for="email-input">E-mail (@uw.edu): </label>
<input id="email-input" name="email" type="email" required/>
</div>
<div>
<label for="sid-input">Student Number: </label>
<input id="sid-input" name="sid" type="number" min=1000000 max=1999999 />
</div>
<div id="minute-options">
<label>2-Minute Question <input type="radio" name="minutes" value=2 /></label>
<label>10-Minute Question <input type="radio" name="minutes" value=10 /></label>
</div>
<textarea name="question" minlength=20 rows=5 placeholder="Enter your question..."></textarea>
<button id="submit-btn" type="submit">Enter Queue!</button>
</form>
HTML
/**
* Just before send form data to the WPL web service.
*/
function checkInputs() {
let email = qs("input[name='email']").value;
let question = qs("textarea").value;
if (!email.includes("uw.edu")) {
handleError("Please provide a UW email");
} else if (!question.includes(" ")) // not a real question...
// Build the 5 parameters for our POST request
handleError("Please provide a descriptive question");
} else {
// could add other checks before this
submitWPLForm();
}
}
wpl.js
While this is a bit more work, we don't always need to send all form elements in an API request, and it also gives us more control over validation checks in JS (for properties you can't check with HTML5 attributes) before the fetch.
$city = $_POST["city"];
$state = $_POST["state"];
$zip = $_POST["zip"];
if (!$city || strlen($state) != 2 || strlen($zip) != 5) {
header("HTTP/1.1 400 Invalid Request");
echo "Error, invalid city/state/zip submitted.";
}
PHP
Basic idea: When recieving a GET/POST request, examine parameter values, and if they are bad, show an error message and abort. But there are some limitations of input validation given what we've learned so far in this course.
/^[a-zA-Z_\-]+@(([a-zA-Z_\-])+\.)+[a-zA-Z]{2,4}$/
Regular expression ("regex"): a description of a pattern of text
Regular expressions are extremely power but tough to read (the above regular expression matches email addresses)
Regular expressions occur in many places:
split
method (CSE 143
random grammar generator)
/abc/
In PHP, regexes are strings that begin and end with /
The simplest regexes simply match a particular substring
The above regular expression matches any string containing "abc"
In PHP, we can use preg_match(regex, string) to return TRUE if string matches regex
preg_match("/abc/", "abcdef"); # true
PHP
A . matches any character except a \n line break
/.ow.l./
matches "Mowgli", "Powell", etc.A trailing i at the end of a regex (after the closing /) signifies a case-insensitive match
/cal/i
matches "Pascal", "California", "GCal", etc.| means OR
/abc|def|g/
matches "abc", "def", or "g"() are for grouping
/iP(ad|hone)/
matches "iPad" or "iPhone"\ starts an escape sequence
/<br \/>/
matches lines containing <br /> tags* means 0 or more occurrences
/abc*/
matches "ab", "abc", "abcc", "abccc", .../a(bc)*/
matches "a", "abc", "abcbc", "abcbcbc", .../a.*a/
matches "aa", "aba", "a8qa", "a!?xyz__9a", ...+ means 1 or more occurrences
/Hi!+ there/
matches "Hi! there", "Hi!!! there!", .../a(bc)+/
matches "abc", "abcbc", "abcbcbc", ...? means 0 or 1 occurrences
/a(bc)?/
matches only "a" or "abc"Write a regex for names in the format: "First Last" (Try it!
Should pass:
Should not pass:
{min, max} means between min and max occurrences (inclusive)
/a(bc){2,4}/
matches "abcbc", "abcbcbc", or "abcbcbcbc"min or max may be omitted to specify any number
When you search Google, it shows the number of pages of results as the number of "o"s in the word "Google".
What regex matches such words with an even number of 'o's ("Google", "Goooogle", "Goooooogle", ...?
Your regex should not match strings with fewer than two o's and shold be case-sensitive (only the first letter should be capitalized) (try it)
Solution: G(oo)+gle
or
Go{2}+gle
both work!
^ represents the beginning of the string or line; $ represents the end
/Doggy/
matches all strings that contain Doggy/^Doggy/
matches all strings that start with Doggy/Doggy$/
matches all strings that end with Doggy/^Doggy$/
matches the exact string "Doggy" only/^Mo.*Doggy$/
matches "MoDoggy", "Mowgli Doggy", "Mowgli is my Doggy", ... but
not "Doggy Mowgli is my Doggy", "Mowgli" or "my Doggy"
(on the other slides, when we say, /PATTERN/ matches "text", we really mean that it matches any string that contains the text)
[] groups characters into a character set; will match any single character from the set
/[bcd]art/
matches strings containg "bart", "cart", and "dart"/(b|c|d)art/
but shorterInside [], many of the modifier keys act as normal characters
/what[!*?]*/
matches "what", "what!", "what?**!", "what??!", etc.
Practice: What regex matches strings containing a lowercase vowel? (try it!)
Practice: What regex matches strings containing consecutive vowels? (try it!)
Inside a character set, specify a range of characters with -
/[a-z]/
matches any lowercase letter/[a-zA-Z0-9]/
matches any lowercase or uppercase letter or digitAn initial ^ inside a character set negates it
/[^abcd]/
matches any character other than a, b, c, or dInside a character set, - must be escaped to be matched
/[+\-]?[0-9]+/
matches an optional + or -,
followed by at least one digit
Practice: What regular expression matches camelCasing? (match only trings with at least one capital letter; only alphabetical characters are allowed) (try it!)
Practice: What regular expression matches a sequence of only consonants (non-vowel letters) assuming that the string consists only of lowercase letters? (try it!)
Write a regex for student ID numbers that are exactly 7 digits and start with 1 (try it!)
Should pass:
Should not pass:
Special escape sequence characters sets
Practice: What regular expression matches any string that contains a tab (\t) character?
Practice: What regular expression matches names in a "Last, First M." format, with any number of spaces?
Write a regex for UW emails in the format: "username@uw.edu" or "username@u.washington.edu" where "username" may only contain letter, number, and/or _ characters (and at least one) (Try it!
Regex syntax: strings that begin and end with /, such as "/[AEIOU]+/"
function | description |
---|---|
preg_match(regex, string) | returns TRUE if string matches regex |
preg_replace (regex, replacement, string) | returns a new string with all substrings that match regex replaced by replacement |
preg_split (regex, string) | returns an array of strings from given string broken apart using given regex as delimiter (like explode but more powerful) |
preg_match
example<?php
$pattern = "/th/i"; # words containing "th"
$file_name = "dictionary.txt";
find_words($pattern);
function find_words($pattern) {
$lines = file($file_name, FILE_IGNORE_NEW_LINES);
foreach ($lines as $line) {
# Syntax: preg_match(/patt/, string)
if (preg_match($pattern, $line)) {
echo "{$line}\n";
}
}
}
?>
PHP
preg_match
function check_name($name) {
# One possible pattern that matches all names in <First Last> format
$name_pattern = "/^[A-Z][a-z]+ [A-Z][a-z]+$/";
if (!preg_match($name_pattern, $name)) {
echo "Invalid name submitted. Please enter in format: 'First Last'";
} else {
echo "You successfully submitted your name!";
}
}
$check_good = check_name($name_regex, "Good Student"); # true
$check_bad = check_name($name_regex, "G00d Student4Lyfe"); # false
PHP
preg_match
with regex is particularly useful to validate certain GET/POST parameters
# replace vowels with stars
$str = "the quick brown fox";
$str = preg_replace("/[aeiou]/", "*", $str);
# "th* q**ck br*wn f*x"
# break apart into words
$words = preg_split("/[ ]+/", $str);
# ("th*", "q**ck", "br*wn", "f*x")
# capitalize words that had 2+ consecutive vowels
for ($i = 0; $i < count($words); $i++) {
if (preg_match("/\*{2,}/", $words[$i])) {
$words[$i] = strtoupper($words[$i]);
}
}
# ("th*", "Q**CK", "br*wn", "f*x")
PHP
Create regular expressions like this: let pattern = /cse154/i
or with the
RegExp constructor: let pattern = new RegExp(/cse154/, 'i')
Some JavaScript string methods can take Regular Expressions,
like search
and replace
(example search/replace
code demo)
let pattern = new RegExp("spring", "i");
// or let pattern = /spring/i;
let str = "CSE154: Web Programming Spring 2018";
let newStr = str.replace(pattern, "Autumn");
JS
Note there are a variety of useful methods you may find for different things, but there are also a few nuances depending on whether you are using the RegExp or String type in JavaScript. Refer to this helpful page for more of an overview!
HTML Form Validation (MDN): A neat overview of the different features offered in HTML5 for client-side form validation!
RegexOne: A helpful interactive regex tutorial
Regex Crossword Game: A super fun way to practice regex for puzzle-lovers :)