CSE 154

Lecture 15: Regular Expressions

Agenda

Review POST Requests with Validation

Which motivates... regular expressions!

Friday's Case Study: A WPL Queue Tool

Validation Form Example

Solution code (try adding more validation methods on your own!):

Some Forms in the Wild

Importance of Validation in Web Development

Prioritizing validation is important as web developers so that the websites we build are:

  1. User-friendly
  2. Secure*

If you're interested in learning more, MDN has a good quick introduction to web security, and OWASP is a fantastic resource for all things related to web security. You can also find a good article on how to write user-friendly form UIs here.

The takeaway? There are many ways to perform validation, but MDN/OWASP both are great resources to refer to based on the context of your websites

Most importantly, don't trust that users will provide correct/safe input!

HTML5 Input Validation

We've already seen some ways to use HTML5 tags to require certain types of input by adding attributes to your <input> tags to help with validation

<input type="number">

HTML

We can limit the up and down arrows with min (and max if we choose)

<input type="number" min=0>

HTML

To insist that there is a value in the input field we can add required

<input type="number" required>

HTML

To prevent a user from being able to type in erroneous values, we can add a regular expression to the required attribute

<input type="number" required="\d+">

HTML

wpl.html Form (with Validation Attributes)

<form id="input-form">
  <div>
    <label for="name-input">Name: </label>
    <input id="name-input" name="student-name" type="text" pattern="[A-Z][a-z]+" required/>
  </div>
  <div>
    <label for="email-input">E-mail (@uw.edu): </label>
    <input id="email-input" name="email" type="email" required/>
  </div>
  <div>
    <label for="sid-input">Student Number: </label>
    <input id="sid-input" name="sid" type="number" min=1000000 max=1999999 />
  </div>
  <div id="minute-options">
    <label>2-Minute Question <input type="radio" name="minutes" value=2 /></label>
    <label>10-Minute Question <input type="radio" name="minutes" value=10 /></label>
  </div>
  <textarea name="question" minlength=20 rows=5 placeholder="Enter your question..."></textarea>
  <button id="submit-btn" type="submit">Enter Queue!</button>
</form>

HTML

Client Validation with JS

/**
 * Just before send form data to the WPL web service.
 */
function checkInputs() {
  let email = qs("input[name='email']").value;
  let question = qs("textarea").value;
  if (!email.includes("uw.edu")) {
    handleError("Please provide a UW email");
  } else if (!question.includes(" ")) // not a real question...
    // Build the 5 parameters for our POST request
    handleError("Please provide a descriptive question");
  } else {
    // could add other checks before this
    submitWPLForm();
  }
}

wpl.js

While this is a bit more work, we don't always need to send all form elements in an API request, and it also gives us more control over validation checks in JS (for properties you can't check with HTML5 attributes) before the fetch.

Limitations of HTML5 Attributes

There are some limitations of input validation given what we've learned so far.

  • How do you test for integers vs. real numbers vs. strings?
  • How do you test for a valid UW email?
  • How do you test that a person's name has a middle initial?
  • (How do you test whether a given string matches a particular complex format?)

Regular Expressions!

/^[a-zA-Z_\-]+@(([a-zA-Z_\-])+\.)+[a-zA-Z]{2,4}$/

Regular expression ("regex"): a description of a pattern of text

  • Can test whether a string matches the expression's pattern
  • Can use a regex to search/replace characters in a string

Regular expressions are extremely power but tough to read (the above regular expression matches email addresses)

Regular expressions occur in many places:

  • Java: Scanner, String's split method (CSE 143 random grammar generator)
  • Supported by HTML5, JS, Java, Python, PHP, and other languages
  • Many text editors (TextPad, Sublime, Vim, etc.) allow regexes in search/replace
  • The site Rubular is useful for testing a regex

Basic Regular Expressions

/abc/

In JS, regexes are strings that begin and end with /

The simplest regexes simply match a particular substring

The above regular expression matches any string containing "abc"

  • Match: "abc", "abcdef", "defabc", ".=.abc.=.", ...
  • Don't Match: "fedcba", "ab c", "PHP", ...

Useful Regex Quick Reference

regex quick reference

Wildcards: .

A . matches any character except a \n line break

  • /.ow.l./ matches "Mowgli", "Powell", etc.

A trailing i at the end of a regex (after the closing /) signifies a case-insensitive match

  • /cal/i matches "Pascal", "California", "GCal", etc.

Special Characters: |, (), \

| means OR

  • /abc|def|g/ matches "abc", "def", or "g"

() are for grouping

  • /iP(ad|hone)/ matches "iPad" or "iPhone"

\ starts an escape sequence

  • Many characters must be escaped to match them literally: /\$.[]()^*+?
  • /<br \/>/ matches lines containing <br /> tags

Quantifiers: *, +, ?

* means 0 or more occurrences

  • /abc*/ matches "ab", "abc", "abcc", "abccc", ...
  • /a(bc)*/ matches "a", "abc", "abcbc", "abcbcbc", ...
  • /a.*a/ matches "aa", "aba", "a8qa", "a!?xyz__9a", ...

+ means 1 or more occurrences

  • /Hi!+ there/ matches "Hi! there", "Hi!!! there!", ...
  • /a(bc)+/ matches "abc", "abcbc", "abcbcbc", ...

? means 0 or 1 occurrences

  • /a(bc)?/ matches only "a" or "abc"

Character Sets: []

[] groups characters into a character set; will match any single character from the set

  • /[bcd]art/ matches strings containg "bart", "cart", and "dart"
  • equivalent to /(b|c|d)art/ but shorter

Inside [], many of the modifier keys act as normal characters

  • /what[!*?]*/ matches "what", "what!", "what?**!", "what??!", etc.

Practice: What regex matches strings containing a lowercase vowel? (try it!)

Practice: What regex matches strings containing consecutive vowels? (try it!)

Character ranges: [start-end]

Inside a character set, specify a range of characters with -

  • /[a-z]/ matches any lowercase letter
  • /[a-zA-Z0-9]/ matches any lowercase or uppercase letter or digit

Inside a character set, - must be escaped to be matched

  • /[+\-]?[0-9]+/ matches an optional + or -, followed by at least one digit

Practice: Write a regex for Student ID numbers that are exactly 7 digits and start with 1 (try it!)

Negations

An initial ^ inside a character set negates it

  • /[^abcd]/ matches any character other than a, b, c, or d

Exercise: First and Last Names

Write a regex for names in the format: "First Last" (Try it!)

Should pass:

  • Good Student

Should not pass:

  • G00d Student4Lyfe
  • Not So Good Student
  • Student

When done, let's add this to our HTML using the pattern attribute.

Regular Expressions in HTML Forms

<form id="input-form">
  ... rest of form
  <input id="name-input" name="student-name" type="text" pattern="[A-Z][a-z]+ [A-Z][a-z]+"
     title="Required name format: 'First Last'" />
  ... rest of form
</form>

HTML

Name:

HTML5 adds a new pattern attribute to input elements

When an input is in a form along with a button, clicking the button automatically verifies the input and does a POST request (can use title parameter for more useful feedback, or the JS Constraint Validation API).

More Quantifiers: {min, max}

{min, max} means between min and max occurrences (inclusive)

  • /a(bc){2,4}/ matches "abcbc", "abcbcbc", or "abcbcbcbc"

min or max may be omitted to specify any number

  • {2,} means 2 or more
  • {,6} means up to 6
  • {3} means exactly 3

Example

When you search Google, it shows the number of pages of results as the number of "o"s in the word "Google".

What regex matches such words with an even number of 'o's ("Google", "Goooogle", "Goooooogle", ...?

Your regex should not match strings with fewer than two o's and shold be case-sensitive (only the first letter should be capitalized) (try it)

Solution: G(oo)+gle or Go{2}+gle both work!

Anchors: ^ and $

^ represents the beginning of the string or line; $ represents the end

  • /Doggy/ matches all strings that contain Doggy
  • /^Doggy/ matches all strings that start with Doggy
  • /Doggy$/ matches all strings that end with Doggy
  • /^Doggy$/ matches the exact string "Doggy" only
  • /^Mo.*Doggy$/ matches "MoDoggy", "Mowgli Doggy", "Mowgli is my Doggy", ... but not "Doggy Mowgli is my Doggy", "Mowgli" or "my Doggy"

(on the other slides, when we say, /PATTERN/ matches "text", we really mean that it matches any string that contains the text)

Escape Sequences

Special escape sequence characters sets

  • \d matches any digit (same as [0-9]); \D any non-digit ([^0-9])
  • \w matches any word characters (alphanumeric and _ underscore, same as [a-zA-Z0-9_]); \W any non-word character
  • \s matches any whitespace character ( , \t, \n, etc.); \S any non-whitespace character

Regular Expressions in JavaScript!

Regex can be a very handy tool with JS as well, from validation to fun find/replace features. There are two common ways regex can be used:

  • With the RegExp constructor (either with a literal or string, use when you don't always know what the pattern is, such as user input)
  • With a literal (when you know exactly what the regex pattern is, evaluated exactly once)
  • let pattern1 = new RegExp(/cse154/, "i");
    let pattern2 = new RegExp("cse154", "i");
    let pattern3 = /cse154/;

    JS

    Note that we don't use "/" when using strings for patterns in the second RegExp constructor. This can be useful when we want to search for a particular pattern given as text input (e.g. a word replacer tool)!

    Practice in the console here!

Some Regex Functions in JS

Regex Objects have a few useful functions that take strings as arguments

regex.test(string) returns a boolean if a string matches the regex

let namePattern = /[A-Z][a-z]+ [A-Z][a-z]+/;
namePattern.test("Mowgli Hovik"); // true

let sidPattern = new RegExp("1\d{6}");
sidPattern.test("-123");          // false

JS

Some String Functions in JS

Some JavaScript string methods can take Regular Expressions, like match, search, and replace

string.match(regex) returns an array of information about a match, including the index of the first match

"Hello world".match(/wo.l/); // [0: "worl", index: 6, input: "Hello world"]

JS

origStr.replace(regex, replStr) returns a new string replacing a pattern match in origStr with the string replStr

let newStr = "My dog is a good dog".replace(/dog/, "pup");
// newStr === "My pup is a good dog"

JS

Regex Flags

The two common flags for patterns are "g" and "i"

"i" ignores letter-casing in the match, and "g" is a "global" search, meaning it won't stop on the first match.

let str = "My dog is a good dog";
let newStr = str.replace(/dog/g, "pup"); // My pup is a good pup

JS

let pattern = new RegExp("spring", "i");
// or let pattern = /spring/i;
let str = "CSE154: Web Programming Spring 2019";
let newStr = str.replace(pattern, "Summer"); // Summer 2019

JS

More Regex Functions

Note there are a variety of useful methods you may find for different things, but there are also a few nuances depending on whether you are using the RegExp with strings or the literal type in JavaScript. Refer to this helpful page for more of an overview!

A Word Replacer

Demo

function replace() {
  let search = id("find").value;
  // "ig" are optional flags:
  // "i" - ignore case in search
  // "g" - 'global' search to replace all occurrences (otherwise only one match is replaced)

  // Alternative to /pattern/ig
  let searchRegex = new RegExp(search, "ig");
  let replace = id("replace").value;
  let input = id("input-text").value;

  if (search && replace && input) {
    let output = input.replace(searchRegex, replace);
    id("output").textContent = output;
  }
}

JS

Additional Resources and Regex Fun

HTML Form Validation (MDN): A neat overview of the different features offered in HTML5 for client-side form validation!

RegexOne: A helpful interactive regex tutorial

Regex Crossword Game: A super fun way to practice regex for puzzle-lovers :)