Assignment 4: ML Project : Functional Grammatic Images

Due 20 February 2002 at beginning of class

Outline

Objectives
Overview
Background
Problems

Objectives

By the end of this homework assignment, you should be familiar/comfortable with the following concepts/constructs:

Context-free grammars
Code that makes heavy use of higher-order functions
Type synonyms, datatypes, and ML signatures/structures

Hopefully, you will also have had some fun manipulating images, and learned some "higher-order thinking" as well.

Warning: Do not start on this assignment the night before it is due! It will take you a fair amount of time to fully absorb this writeup and the code that we provide you.

Overview

A two-dimensional image can be naturally modeled as a function from the 2D plane to the colors:

(* Point: (x, y) *)
datatype Point = Pt of real * real;

(* Color: 3 color channels + transparency channel *)
datatype Color = RGB of real * real * real * real;

(* An image maps points onto colors *)
type Image = Point -> Color;

In other words, given any point in the real plane, an image can tell you what color it produces at that point. We call such a function a functional image.

A context-free grammar is a way of specifying a particular class of "languages". I put languages in quotes because you should not think of these as computer or human languages necessarily; a "language", in this formulation, is just a well-defined set of strings. A grammar for a language defines the set of strings (or "sentences") that are legal in that language. Don't worry if this is not entirely clear at the moment; we'll explain more about grammars below.

Grammars can be used to specify classes of images as well as strings. In this assignment, we'll define some data types for you to work with functional images and grammars. Your job will be to implement a few images and image generating functions, and ultimately to produce a function that generates random sentences in an image grammar.

What we provide

FuncImage: A SML structure containing type definitions and functions for manipulating functional images and some other geometric primitives.
BMPFile and BMP_FILE: A structure and signature for writing functional images into the common Windows bitmap (BMP) file format. I chose this file format because it was simple to code up, and because you will then be able to use these images as your desktop wallpaper under Windows (actually, Mac and Linux can use BMPs too---but those platforms can use almost any image file format for wallpaper).
RandomInt: A structure for generating random integers in a specified range.
ImageGrammar: A structure that contains partial definitions of the grammatic image types and functions.
SimpleTable and SIMPLE_TABLE: A structure and signature for simple lookup tables.
The following files, containing some example images and other data---i.e., some usage examples for the above libraries:

What you must provide

A few image functions, which we will specify.
A function genRandomSentence that generates a random sentence in the image grammar. We define these terms below.
A ML transcript showing the use of your image and sentence generation functions.

These will be specified more precisely in problems.

Background

Context-free grammars

What is a language?

In order to define a language, we must first define the notion of alphabet. An alphabet is simply a set of tokens over which you can form strings. For example, consider the alphabet {a, b} (the letters 'a' and 'b'). Here are some strings in this alphabet:

aba
bbbbb
a        ; Single-token string
         ; The empty string

If we view a and b as words, for example a = "ug!" and b = "oo!", then it's easy to see that our strings are also sentences:

ug! oo! ug!           ; = 'aba'
oo! oo! oo! oo! oo!   ; = 'bbbbb'
ug!                   ; = 'a'
                      ; silence

From now on, we'll use the terms "string" and "sentence" interchangeably. Now, given an alphabet, we might wish to talk about strings with particular forms. For example, we might wish to talk about:

"All the strings that contain only a", or
"All the strings containing a sequence of a's followed by an equal number of b's".

A language, then, is a subset of the strings that can be generated over an alphabet. How can we describe a language? One way is to list all the sentences that exist in a language. That's hard for languages with an infinite number of strings, like the two languages above. Another way is to use an English description, as above, but that's not suitable for machine manipulation, or formal study. So, we need a formalism.

It's easy to imagine that some languages are very complicated to describe, and others are very simple. And, as it turns out, humans have invented a variety of formal ways to describe languages, varying in power and expressiveness. One way is to use context-free grammars.

What is a context-free grammar?

Context-free grammars, or CFGs, are a way of describing a certain class of languages (called, surprise! context-free languages). They were discovered independently by the linguist Noam Chomsky and the computer scientists John Backus and Peter Naur. A CFG has three components:

A set of terminals, which correspond to the "alphabet" above.
A set of productions, which are rules for composing sentences.
The name of a root production.

A production rule is of the following format:

name ::= expansion1 | expansion2 | expansion3 | ... | expansionN

where name is the name of a production, and each expansion represents one "legal substitution" for that production. (Read the bar symbol "|" as "or".) Each of the expansions is a string, where each element in the string is either

A terminal, or
A production name (nonterminal).

Here is a simple context-free grammar:

  terminals:
      "Hi" "Bye" "Alice" "Bob"

  rules:
      word      ::= "Hi" | "Bye"
      name      ::= "Alice" | "Bob"
      statement ::= word name

  root rule:
     statement

There are only four possible sentences in this language, producible by starting at the root rule and expanding them until only terminals are left. When there is more than one expansion for a production, you can choose any of them. Here are the possible expansion paths, called "derivations":

  statement => word name => word "Bob"   => "Hi" "Bob"
  statement => word name => word "Bob"   => "Bye" "Bob"
  statement => word name => word "Alice" => "Hi" "Alice"
  statement => word name => word "Alice" => "Bye" "Alice"

Life gets more interesting when grammars have recursive productions, e.g.:

  terminals:
      "(" ")" "foo" "bar"

  rules:
      atom ::= "foo" "bar"
      S    ::= "(" L ")" | atom
      L    ::= L S | S

  root rule:
      L

This grammar describes a language that looks very similar to one that you know... can you figure out what it is? Try generating a few sentences in this grammar.

Context-free grammars and trees

So far, we have presented context-free grammars as generators of linear strings. However, viewed another way, you can think of a context-free grammar as a template for a set of trees. In this view, the terminals of a grammar can be viewed as leaf nodes in a tree, and productions can be viewed as internal nodes. Here is the tree corresponding to a production in the "Hi Alice" grammar we showed above:

       statement
        /    \
      word   name
      /        \
    "Hi"      "Bob"

Here's a tree in the second grammar:

        L
        |
        S
      / | \
   "("  L  ")"
       / \
      L   S
     / \   \
    L   S  atom
   /    |    \
  S   atom   "foo"
  |     |
atom  "bar"
  |
"foo"

This tree corresponds to the sentence "( foo bar foo )". The process of taking a string and turning it into a tree in some grammar is called parsing. The result of parsing is a parse tree, sometimes called the syntax tree. Parsing is a basic part of any computer language implementation; it is quite well-understood and there are tools called parser generators that can construct parsers automatically from a grammar specification.

In this assignment, we won't do any parsing, because we'll be going in the other direction. You'll be taking a grammar and producing a random tree in that grammar.

Functional images

A word about colors

We model colors using the following datatype:

(* Color: 3 color channels + transparency channel *)
datatype Color = RGB of real * real * real * real;

The first three reals in this datatype are the red, green, and blue components respectively. By convention, these values must be between 0.0 (no value for this color) and 1.0 (full value for this color). You probably know that by combining red, green, and blue light in correct proportions, you can generate the whole gamut of visible colors; for example, yellow can be generated by the color

  RGB(1.0, 1.0, 0.0, 1.0)   ; maximum red + maximum green

The fourth element in our color type is a little less well-known; it's the alpha channel. The alpha channel is a measure of the color's opacity: if alpha = 1.0, the color is fully opaque, and if alpha = 0.0 then the color is transparent. We'll use the alpha channel when blending images, or overlaying one image onto another. This will become clear in the examples. For convenience, we'll define the invisible color as follows:

  val INVISIBLE = RGB(0.0, 0.0, 0.0, 0.0);

Functions as images

As previous noted, a functional image is a function that maps from the 2D plane to the colors. Any function with this signature can be viewed as an image. Here's a very simple image:

  fun white(point) = RGB(1.0, 1.0, 1.0, 1.0);

This image returns the opaque white color for every point in the real plane. In other words, this is an infinite plane of whiteness. Here's another example:

  fun redCircle(Pt(x, y)) =
      if Math.sqrt(x*x + y*y) <= 2.0 then
          RGB(1.0, 0.0, 0.0, 1.0)
      else
          INVISIBLE

This function returns the opaque red pixel for all values that are within 2.0 units of the origin; it returns invisible for all others. So, in other words, this function is a red circle of radius 2, centered at the origin.

Rendering images

Functional images are interesting, but images that you can't view are pure abstractions. In order to render a functional image viewable, we must create a concrete embodiment of that image. We need to "render" the image as an array of actual color values, so that we can display it on the screen or write it to a file.

However, functional images are infinite, and arrays are finite. Therefore, we need to select a finite set of points in the plane to sample. To that end, we've provided a datatype RenderSpec:

  (* A rectangle is defined by two points (topLeft, lowerRight) *)
  datatype Rectangle = Rect of Point * Point;

  (* A rendering spec is a rectangular window, combined with an
     integer "resolution" for that window.

     The resolution is the number of pixels per real unit.  For
     example, if the resolution is 50, then we should draw 50 evenly
     spaced pixels between 0.0 and 1.0, 50 evenly spaced pixels
     between 1.0 and 2.0, etc *)
  datatype RenderSpec = RendSpec of Rectangle * int;

We've also provided a function render that will take an image and a rendering spec, and produce a RenderedImage. You don't have to know anything in particular about RenderedImage instances, except that you can pass them to the function writeBMP.

Finally, the writeBMP function accepts a RenderedImage and a string, and writes the rendered image to a BMP file named by the given string. You can then view the result in any common graphics program, like Windows Paint or the GIMP.

Here's a complete example of image rendering:

  val aSquare = FuncImage.Rect(FuncImage.Pt(~2.0, 2.0),
                               FuncImage.Pt(2.0, ~2.0));
  val myRenderSpec = FuncImage.RendSpec(aSquare, 50);
  val renderedRedCircle = FuncImage.render(redCircle, myRenderSpec);
  BMPFile.writeBMP(renderedRedCircle, "redcircle.bmp");

The file redcircle.bmp is the result of executing the above code.

Higher-order functions as image generators

If an image is a function, then how do we manipulate functions? How can we make image factories? For example, the red circle above is not very general. Perhaps we would like to write a function makes circles of various colors and sizes. To do this, we need a higher-order function:

  fun makeCircle(color, radius) =
      fn (Pt(x, y)) =>
          if Math.sqrt(x*x + y*y) <= radius then
              color
          else
              INVISIBLE;

The return value is an Image function. Applying makeCircle to various tuples (color, radius) will make circles (functions) of all colors and sizes.

What if we have one of these circles, and we want it on a non-invisible background? To do this, we need an overlay function, which takes two images ("over" and "under") and lays them one on top of another. Since this isn't a graphics course, we've provided this function for you in the FuncImage structure.

  structure FuncImage = struct ...
     fun alphaOverlay(below:Image, above:Image):Image = ...
  end;

  (* A blue cirle and a white background. *)
  val blueCircle = makeCircle(FuncImage.RGB(0.0, 0.0, 1.0, 1.0), 1.0);
  fun white(point) = FuncImage.RGB(1.0, 1.0, 1.0, 1.0);

  (* Blue circle laid on a white background. *)
  val circleOnWhite = FuncImage.alphaOverlay(white, blueCircle);

By composing simple functions, it is easy to build up layered images of surprising complexity and texture; here's a 1024x768-sized image (ZIP) I created using about 120 lines of ML code (most of which consists of utility functions that could be reused in many different contexts; the composition of the final product is about a dozen lines of ML).

Functional grammatic images

As previously noted, grammars can be viewed as specifications for the structure of trees. As it turns out, we can view images as trees---and therefore, grammars can be used to generate families of images.

A grammatic image is specified by a sentence (i.e., a tree):

datatype Sentence =
         Leaf of FuncImage.Image
       | PartitionNode of (FuncImage.Window * Sentence) list;

A leaf in the tree contains any image. This is like a "word" in an ordinary sentence. To draw a leaf, you just draw it in the area allocated to it.

Internal nodes in the tree are specified by PartitionNodes. A partition node divides up space into several rectangular windows, each of which may contain another sentence. The datatype for a window is as follows:

type PointTransform = Point -> Point;
type Window = Rectangle * PointTransform;

A window specifies that one rectangular area "stands for" another. A PointTransform maps one point in 2D space to another; for example, here's the "y-axis flip" point transform:

fun flipY(Pt(x, y)) = Pt(~x, y);

In the case of a Window value, the PointTransform member should map the points in the rectangular window to some other area. In particular, we will want to map the rectangular window to some other rectangular window. For your convenience, we have provided the makeRectTransform function, which generates this kind of "rectangle to rectangle" transform:

fun makeRectTransform(fromRect as Rect(p1, p2),
                      toRect   as Rect(q1, q2)):PointTransform = ..

sample-sentence.sml shows a sample sentence. The result of rendering this image is shown below:

[image of red circle, blue square, green square, and red circle]

We have built this image within the unit square, (Rect(Pt(~1.0,1.0),(1.0,~1.0))). The sentence contains three partitions, and so does the image. The first partition splits the image into two halves, one above the other. Each window contains both the rectangle and a function that maps the rectangle onto the entire unit square. To each subimage in each partition, the square that it has been assigned appears to be the entire square. The transform functions at each PartitionNode perform this mapping.

Grammars and trees

The tree corresponding to the above image is as follows (some information omitted):

         partition
         /       \
      leaf      partition
       /       /       \
redCircle   leaf   partition
             /       /    \
     blueSquare   leaf    leaf
                   /         \
             greenSquare    redCircle

It is easy to see that this tree, and many like it, could be generated by a grammar. The only really tricky part is constructing the appropriate rectangular partitions and scaling transformations.

Recall that grammars have three elements: terminals, productions, and a root production name. How do we map those into our new concept of grammatic images?

The terminals in our grammar are simple Image values. Since images are functions, and functions are not easily named, we'd like to have names for functions. We'll store the terminals in a table mapping from strings to images:

type TerminalTable = (string, FuncImage.Image) SimpleTable.table;

Next, we need a datatype to represent productions. A production is a name, plus a list of expansions. But what is an "expansion" in this context? An expansion cannot just be a list of images; partition nodes must divide up the space in the current image. Therefore, an expansion is either

a leaf, or
a combination of two parts:
- a rectangle partitioning function, and
- a list of productions to assign to each subrectangle.

(* A rectangle splitter is a function that splits one rectangle into
   several subrectangles. *)
type RectangleSplitter =
         FuncImage.Rectangle -> FuncImage.Rectangle list;

(* Any individual expansion can be either a leaf/terminal, or a
   rectangle splitter and a list of productions for each
   subrectangle.  At sentence generation time, we'll choose an
   expansion and apply the rectangle splitter to derive a list
   of subrectangles. *)
datatype Expansion =
         MakeLeaf of string
       | MakePartition of RectangleSplitter * (string list);

(* Productions have a name and a list of expansions for that name. *)
datatype Production = Prod of string * (Expansion list);

The partition node's RectangleSplitter must return a number of rectangles equal to the number of strings in the list for the MakePartition node. There's no good way to enforce this restriction using ML's type system; we leave it to the user.

Here is the datatype for a complete grammar:

datatype Grammar = Gram of TerminalTable * Production list * string;

The third component is the name of the root production. When we're generating sentences, we will look up the root production and begin there.

sample-grammar.sml contains an example of a simple grammar, one that could have generated the example image above. This grammar uses a very simple RectangleSplitter, one that splits a rectangle into two equal halves each time.

Random tree generation

You may have noticed something odd in our definition. Namely, we say a grammatic image may partition its own top-level area into subareas. However, our original definition functional image does not have any notion of top-level area: a functional image maps every point in the plane to a Color, and therefore extends from the origin out to infinity in every direction (even if, as in the case of our circle functions, most of that infinite plane is empty).

When generating a tree, therefore, we must provide both a grammar and an initial area which can be split up:

  (* A "sentence generator" is a function that takes a grammar and
     produces a random sentence in that grammar. *)
  type SentenceGenerator = Grammar * FuncImage.Rectangle -> Sentence;

The sentence generator has the following behavior:

A sentence generator takes a grammar and a rectangle, and generates a random sentence, beginning with its root production, in the given rectangle.
To process a production, choose one of its expansions at random.
When a sentence generator chooses a MakeLeaf expansion, it will construct a Leaf sentence, looking up appropriate image in the terminal table.
When a sentence generator encounters a MakePartition expansion, it will construct a PartitionNode. Recall that PartitionNodes contain both a Window and a subsentence; it creates these two pieces as follows:
- Apply the MakePartition's RectangleSplitter to the current rectangle parameter, yielding a list of subrectangles.
- Generate PointTransform functions that map from each subrectangle to the top-level rectangle. For example, in the sample image above, the PointTransform maps each little rectangle to the top-level unit square.
- For each subrectangle, recursively generate the subsentence for that production.

This is actually the entire algorithm, in general terms. The details are left as an exercise to the reader (see below).

Homework problems

Provide solutions for the following problems:

In your own words, using no more than 2 or 3 English sentences, summarize briefly what the FuncImage.render function does, and how it uses the image functions passed to it.
Use the Math.cos (cosine over reals) function and FuncImage.alphaOverlay to write an image function named blueWaves that, when rendered, results in something like the following image (ignore the border, that's an artifact of this web page):

(Hints: Vary the alpha, not the blue component. Also, if you can't get the gradients to look smooth, try scaling the image up or down a little bit.)
A "texture" is simply a functional image that paints something over the real plane. Any image can be viewed as a texture, but some are better suited than others. For example, here is a checkerboard texture of light red and dark red:
```
fun checker(FuncImage.Pt(x, y)) =
    let
        val i = Real.floor(x) + Real.floor(y)
    in
        if i mod 2 = 0 then
            FuncImage.RGB(1.0, 0.5, 0.5, 1.0)
        else
            FuncImage.RGB(0.6, 0.2, 0.2, 1.0)
    end;
```
For this question, write the function makeTexturedCircle. Instead of taking a simple color value as its first parameter, makeTexturedCircle takes a texture; in other words, its type is
```
Image * real -> Image;
```
Here's an example of the checkerboard texture above applied to a circle using makeTexturedCircle, then overlaid on a white background using alphaOverlay:
Implement the function genRandomSentence, which takes a grammar and a rectangular area and produces a random grammatic image, corresponding to this grammar, in the rectangular area. This function therefore has the type
```
  ImageGrammr.Grammar * FuncImage.Rectangle
                            -> ImageGrammar.Sentence;
```
Here are some hints for solving parts of this problem:
- One of the first things you should do, for debugging purposes, is write a function that prints out a textual representation of a Sentence tree. This will allow you to examine the sentences that your function is generating.
- The pseudocode for the function is roughly as follows:
  1. Find the root production.
  2. Choose a random expansion for that production.
  3. Recursively make subsentences, if necessary, for each part of that expansion. For leaves, there are no subsentences. For partitions, you need to split the current rectangle, generate a Window for each subrectangle, and recursively generate an appropriate subsentence for that rectangle.
  4. Make a sentence that combines the generated subsentences appropriately.
- Every subarea's transform function must map to the top-level rectangular area. This means that you will have to do most of the work in a helper function that carries the "top-level" rectangle, in addition to the "current" rectangle, through all the recursive steps. This function should have three parameters: the grammar, the "current" rectangle, and the "top-level" rectangle.
- A helper function to choose a random expansion for a given production will come in handy.
- After you randomly choose the current expansion, you will probably need another helper function that uses pattern-matching to generate the subsentences for the different cases (leaf or partition).
- Once you split the current rectangle, you will have to generate transform functions for each rectangle.
- The ImageGrammar.genImage function is properly implemented. Use it to render images, to test your function!
- You will not solve this problem with your fingers, but with your brain. ML does not lend itself to random hacking around. Think before you type.
- If you're confused as to what your program is doing, sprinkle TextIO.print statements at strategic points in your code.

cse341-webmaster@cs.washington.edu

Last modified: Tue Feb 19 11:34:35 PST 2002