CSE 154

Lecture 16: Returning JSON and Regular Expressions

Returning JSON from PHP

We use the PHP function json_encode(array) to output JSON

json_encode takes PHP arrays (including nested arrays) and generates JSON strings that can be printed

NOTE: we can also use json_decode to convert json strings into PHP arrays.

Example PHP code

<?php
    header("Content-Type: application/json");
    
    $output = array();
    $output["name"] = "Kyle";
    $output["hobbies"] = array("reading", "frisbee");
    
    print(json_encode($output));
?>

PHP

Produces:

{
  "name":"Kyle",
  "hobbies":["reading","frisbee"]
}

JSON

What is Form Validation?

Validation: ensuring that form's values are correct

Some types of validation:

  • Preventing blank values (e-mail address)
  • Ensuring the type of values (e.g. integer, real number, currency, phone number, Social Security Number, postal address, email address, data, credit card number, ...
  • Ensuring the format and range of values (ZIP code must be a 5-digit integer)
  • Ensuring that values fit together (user types email twice, and the two must match)

A Real Form that Uses Validation

Validation Form Example

Client vs. Server-Side Validation

Validation can be performed:

  • Client-side (before the form is submitted)
    • Can lead to a better user experience, but not secure (why not?)
  • Server-side (in PHP code, after the form is submitted)
    • Needed for truly secure validation, but slower
  • Both
    • Best mix of convenience and security, but requires most effort to program

An Example Form to be Validated


City:
State:
ZIP:

HTML

City:

State:

ZIP:

output

Let's validate this form's data on the server...

Basic Server-Side Validation


          $city = $_POST["city"];
          $state = $_POST["state"];
          $zip = $_POST["zip"];
          if (!$city || strlen($state) != 2 || strlen($zip) != 5) {
            print "Error, invalid city/state/zip submitted.";
          } 
          

PHP

Basic idea: Examine parameter values, and if they are bad, show an error message and abort. But:

  • How do you test for integers vs. real numbers vs. strings?
  • How do you test for a valid credit card number?
  • How do you test that a person's name has a middle initial?
  • (How do you test whether a given string matches a particular complex format?)

Regular Expressions


          /^[a-zA-Z_\-]+@(([a-zA-Z_\-])+\.)+[a-zA-Z]{2,4}$/
          

Regular expression ("regex"): a description of a pattern of text

  • Can test whether a string matches the expression's pattern
  • Can use a regex to search/replace characters in a string

Regular expressions are extremely power but tough to read (the above regular expression matches email addresses)

Regular expressions occur in many places:

  • Java: Scanner, String's split method (CSE 143 sentence generator)
  • Supported by PHP, JavaScript, and other languages
  • Many text editors (TextPad, Sublime, Vim, etc.) allow regexes in search/replace
  • The site Rubular is useful for testing a regex

Regular Expressions are Hard to Use

Basic Regular Expressions

/abc/

In PHP, regexes are strings that begin and end with /

The simplest regexes simply match a particular substring

The above regular expression matches any string containing "abc"

  • Match: "abc", "abcdef", "defabc", ".=.abc.=.", ...
  • Don't Match: "fedcba", "ab c", "PHP", ...

Wildcards: .

A . matches any character except a \n line break

  • /.ow.l./ matches "Mowgli", "Powell", etc.

A trailing i at the end of a regex (after the closing /) signifies a case-insensitive match

  • /cal/i matches "Pascal", "California", "GCal", etc.

Special Characters: |, (), \

| means OR

  • /abc|def|g/ matches "abc", "def", or "g"
  • There's no AND symbol - why not?

() are for grouping

  • /iP(ad|hone)/ matches "iPad" or "iPhone"

\ starts an escape sequence

  • Many characters must be escaped to match them literally: /\$.[]()^*+?
  • /<br \/>/ matches lines containing <br /> tags

Quantifiers: *, +, ?

* means 0 or more occurrences

  • /abc*/ matches "ab", "abc", "abcc", "abccc", ...
  • /a(bc)*/ matches "a", "abc", "abcbc", "abcbcbc", ...
  • /a.*a/ matches "aa", "aba", "a8qa", "a!?xyz__9a", ...

+ means 1 or more occurrences

  • /Hi+ there/ matches "Hi! there", "Hi!!! there!", ...
  • /a(bc)+/ matches "abc", "abcbc", "abcbcbc", ...

? means 0 or 1 occurrences

  • /a(bc)?/ matches only "a" or "abc"

More Quantifiers: {min, max}

{min, max} means between min and max occurrences (inclusive)

  • /a(bc){2,4}/ matches "abcbc", "abcbcbc", or "abcbcbcbc"

min or max may be omitted to specify any number

  • {2,} means 2 or more
  • {,6} means up to 6
  • {3} means exactly 3

Practice Exercise

When you search Google, it shows the number of pages of results as the number of "o"s in the word "Google". What regex matches strings like "Google", "Gooogle", "Goooogle", ...? (try it) (data)

Anchors: ^ and $

^ represents the beginning of the string or line; $ represents the end

  • /Doggy/ matches all strings that contain doggy
  • /^Doggy/ matches all strings that start with doggy
  • /Doggy$/ matches all strings that end with doggy
  • /^Doggy$/ matches the exact string "Doggy" only
  • /^Pasc.*Doggy$/ matches "PascDoggy", "Pascal Doggy", "Pascal is my Doggy", ... but not "Doggy Pascal is my Doggy", "Pascal" or "my Doggy"

(on the other slides, when we say, /PATTERN/ matches "text", we really mean that it matches any string that contains the text)

Character Sets: []

[] group characters into a character set; will match any single character from the set

  • /[bcd]art/ matches strings containg "bart", "cart", and "dart"
  • equivalent to /(b|c|d)art/ but shorter

Inside [], many of the modifier keys act as normal characters

  • /what[!*?]*/ matches "what", "what!", "what?**!", "what??!", etc.

What regular expression matches DNA (non-empty strings of A, C, G, and T?)

Character ranges: [start-end]

Inside a character set, specify a range of characters with -

  • /[a-z]/ matches any lowercase letter
  • /[a-zA-Z0-9]/ matches any lower- or uppercase letter or digit

An initial ^ inside a character set negates it

  • /[^abcd]/ matches any character other than a, b, c, or d

Inside a character set, - must be escaped to be matched

  • /[+\-]?[0-9]+/ matches an optional + or -, followed by at least one digit

Practice Exercises

What regular expression matches letter grades such as A, B+, or D-?

What regular expression matches UW Student ID numbers?

What regular expression matches a sequence of only consonants (non-vowel letters) assuming that the string consists only of lowercase letters?

Escape Sequences

Special escape sequence characters sets

  • \d matches any digit (same as [0-9]); \D any non-digit ([^0-9])
  • \w matches any word characters (same as [a-zA-Z0-9]); \W any non-word character
  • \s matches any whitespace character ( , \t, \n, etc.); \S any non-whitespace character

What regular expression matches names in a "Last, First M." format, with any number of spaces?

Regular expressions in PHP

regex syntax: strings that begin and end with /, such as "/[AEIOU]+/"

function description
preg_match(regex, string) returns TRUE if string matches regex
preg_replace(regex, replacement, string) returns a new string with all substrings that match regex replaced by replacement
preg_split(regex, string) returns an array of strings from given string broken apart using given regex as delimiter (like explode but more powerful)

PHP form validation w/ regexes

$state = $_POST["state"];
if (!preg_match("/^[A-Z]{2}$/", $state)) {
   print "Error, invalid state submitted.";
} 

preg_match and regexes help you to validate parameters

sites often don't want to give a descriptive error message here (why?)

Regular expression PHP example

# replace vowels with stars
$str = "the quick brown fox";
$str = preg_replace("/[aeiou]/", "*", $str);
                  # "th* q**ck br*wn f*x"
                  
# break apart into words
$words = preg_split("/[ ]+/", $str);
               # ("th*", "q**ck", "br*wn", "f*x")
               
# capitalize words that had 2+ consecutive vowels
for ($i = 0; $i < count($words); $i++) {
   if (preg_match("/\\*{2,}/", $words[$i])) {
      $words[$i] = strtoupper($words[$i]);
   }
}      # ("th*", "Q**CK", "br*wn", "f*x") 

Regular Expressions in JavaScript

Create regular expressions like this: let pattern = /cse154/i

Some JavaScript string methods can take Regular Expressions, like search and replace.

Additional Regular Expression methods include:

  • test (see if string matches regex):
    pattern.match("I like CSE154!");
  • exec (runs the regex on a string and lets you access matches and groups):
    pattern.exec("I like CSE154!");

Regular expressions in HTML forms

How old are you?
How old are you?

HTML5 adds a new pattern attribute to input elements

When an input is in a form along with a button, clicking the button automatically verifies the input and does a POST request. To cancel the POST request, add an onsubmit event handler that returns false.