University of Washington

CSE 373, Data Structures and Algorithms

Project #3: Spreadsheet

Due: Friday, December 7, 2001, 12:30pm

 

Autumn 2001                                                                                       November 21, 2001

Donald Chinn

 

LOOK OUT Microsoft, Corel, and any other software company!  CSE 373 is on its way to building a spreadsheet application that is better and more functional than anything on the market today.  OK, maybe not.  But we are going to build much of the basic functionality of a spreadsheet in just two weeks.

 

IMPORTANT: Read this handout completely and read it several times before you type in any code!  It is vital that you understand what the assignment is and plan your strategy for implementation before spending hours programming.  If you do not plan, many of those hours on the computer will be a wasted effort.

 

Project Objectives

 

Background

 

A spreadsheet is just a two-dimensional grid of cells.  Each cell contains a formula that specifies how to compute the value of the cell.  Every cell has a name consisting of a letter followed by a nonnegative integer (e.g., "A0" or "C14").  A formula can be as simple as an integer constant (e.g., "5"), or it can refer to other cells (e.g., "B1 + C3").

Or, it can be a combination of the two (e.g., "2 * D3").  Here's a sample 8x8 spreadsheet with the formulas corresponding to those cells.

 

 

A

B

C

D

E

F

G

H

0

 

 

23

 

B2+C2

 

 

 

1

4

 

 

B3+2

 

 

 

 

2

 

 

 

2*5

 

 

 

 

3

 

A1+B1

B3+A4

 

3+C0

 

 

 

4

A1-D4

 

 

-5

 

 

 

 

5

 

 

 

 

 

 

 

 

6

 

 

 

 

 

 

 

 

7

 

 

 

 

 

 

 

 

 

 

Note that every cell's formula consists of integer constants, cell references, and operators.  The operators we will use in our spreadsheet are just the integer operations +, –, *, and /.  Any cell that does not have a formula is assumed to have "0" as its formula.

 

Here is the same spreadsheet with the values computed for each cell based on their formulas above.  (All empty cells have value 0.)

 

 

A

B

C

D

E

F

G

H

0

 

 

23

 

0

 

 

 

1

4

 

 

6

 

 

 

 

2

 

 

 

10

 

 

 

 

3

 

4

13

 

26

 

 

 

4

9

 

 

-5

 

 

 

 

5

 

 

 

 

 

 

 

 

6

 

 

 

 

 

 

 

 

7

 

 

 

 

 

 

 

 

 

There are two main questions to ask.  How do you decide what order to evaluate the cells?  How do you evaluate a cell given its formula?

 

To determine what order to evaluate the cells, you first need to construct a dependency graph.  Call the dependency graph G = (V,E).  The vertices, V, in the graph are the cells of the spreadsheet.  (There are 64 cells in the above spreadsheet, and therefore 64 vertices in the graph.)  There is edge in E between vertices v and w if the formula of the cell corresponding to vertex w contains a reference to the cell corresponding to vertex v.  For example, since the formula for cell C3 contains a reference to B3, then there will be an edge in E that goes from the vertex for B3 to the vertex for C3.  Here is part of the dependency graph:

 

All other nodes in the graph are vertices with no edges going in or coming out of them.

 

Once we have the dependency graph, we can then perform a topological sort on it to determine an order to evaluate the formulas in the cells.  Note that if we evaluate the cells in order of their topological numbers, then we can be assured that when we evaluate a formula that has a reference to another cell, that other cell's formula will already have been evaluated.  Also note that this only works if there are no cycles in the dependency graph.  For example, if A0's formula were "B0+3" and B0's formula were "A0+1", then we would have a cycle in the dependency graph, and it would be impossible to evaluate the formulas.  In our spreadsheet, we will disallow the user from creating formulas that would result in a cycle in the dependency graph.  (What does Microsoft's Excel spreadsheet do when you try to create a cyclic dependency?)

 

Once we know what order to evaluate the formulas in the cells, we then have to determine how to evaluate a cell's formula.  This is fairly easy.  We compute its formula using the values in the cells it references, if there are any.

 

The Assignment

 

Your task in this project is to write a spreadsheet ADT and a main() program that allows the user to assign formulas to cells, view the formulas in the cells of the spreadsheet, and view the values computed for all the cells in the spreadsheet.  The user interface will be simple – it will be menu-driven (see the User Interface for more details).

 

You may work by yourself or in a team of two people.  I suggest you work in a team, as you will learn a lot more and you will see how other people think about designing and implementing programs.

 

The one-person assignment is essentially the same as the two-person assignment, except that in the one-person assignment, formulas in the cells are either empty (no formula, evaluates to 0) or have exactly one operator (+, -, *, /) and two operands (either an integer constant or a cell reference).

 

The two-person assignment has no restriction on the length of formulas.  They can consist of any number of operators and an appropriate number of operands.  In addition, the user may use parentheses to enforce some order of evaluation on the formula (e.g., "3 + 5 * B3 * (A0 + 5)").  As we shall see, we will need a fancier representation of these more general formulas than in the one-person assignment.

 

Data Structures

 

You will need to decide how to represent the spreadsheet.  You will likely want to implement it as a class, where the two-dimensional array of cells is a private data member.  The cells themselves will be a class, containing some representation of a formula.  The exact representation will depend on whether you are a one-person team or a two-person team.  We will need to use the concept of a token to represent the various parts of a formula.

We will need to define the following types/classes:

 

enum OperandType = {Cell, Literal};

 

typedef struct CellToken {

     int column;     // column A = 0, B = 1, etc.

     int row;

} CellToken;

 

typedef struct OperandToken {

     OperandType opType;

     int literalID;             // valid if opType == Literal

     CellToken cellID;          // valid if opType == Cell

} OperandToken;

 

enum OperatorToken {Plus, Minus, Mult, Div, LeftParen, Error};

 

enum TokenType {OperatorTokenType, CellTokenType, LiteralTokenType};

 

typedef struct ExpressionTreeToken {

     TokenType tokenType;

     CellToken cellID;    // valid if tokenType == CellTokenType

     OperatorToken nodeOperator;     //  if OperatorTokenType

     int literalID;  //  if tokenType == LiteralTokenType

} ExpressionTreeToken;

 

For the one-person assignment, a cell might be represented like this:

 

class Cell {

public:

void Evaluate (Spreadsheet & spreadsheet);

     void ReadFormula (iostream & ios );

private:

     int value;

     // the following represent the formula

     OperatorToken cellOperator;    

OperandToken leftOperand;

OperandToken rightOperand;

};

 

For the two-person assignment, a cell might be represented like this:

 

class Cell {

public:

void Evaluate (Spreadsheet & spreadsheet);

     void ReadFormula (iostream & ios );

private:

     int value;

// the expression tree below represents the formula

ExpressionTree expressionTree; 

};

 

The nodes of the ExpressionTree (which is a binary tree, NOT a binary search tree) will look like this:

 

class ExpressionTreeNode {

     ExpressionTreeToken token;

     ExpressionTreeNode *left;

     ExpressionTreeNode *right;

 

     friend class ExpressionTree;

};

 

And an ExpressionTree would look like this:

 

class ExpressionTree {

public:

     void makeEmpty();

     void printTree();

     int Evaluate(Spreadsheet & spreadsheet);

private:

     ExpressionTreeNode *root;

};

 

As an example of what the expression tree would look like, suppose we had a formula such as "5 + B3 * 8".  This would be represented by the following expression tree:

 

 

The spreadsheet will be a class with a (private) two-dimensional array of cells.  (At first, you will want it to be a small array so that it is easy to test.  Say, 4x4.  Later, you will want to make it bigger.  Use a #define to make it easy to change the number of rows and columns of the spreadsheet.)  You will also have a (private) graph data structure to perform the topological sort when you need to.

 

Algorithms

 

Both one-person and two-person teams will need to implement topological sort over the cells to determine what order to evaluate them.  Instead of numbering the cells (as presented in class and in the book), it is better to evaluate the cell's value as you do the topological sort.  If you discover a cycle in the graph, then you should stop the topological sort, print an error message, and set the formula of the cell that was changed back to what it was before the change.  (It turns out that the cells that were computed first in the new topological sort will have the same value that they had in the old topological sort (why?), so we don't have to go back and recompute them if a cycle is discovered.)

 

Handling the expressions is a little more complicated.  Code is provided that will take a string that represents an (infix) expression, such as “5 + B3”, and produce a stack of tokens of a specific form.  It will turn the infix expression into a postfix expression, following the algorithm described on pages 105-8 of Weiss.  Thus, in our example, the stack, from bottom to top, will look like this: 5 B3 +.  The code that does this conversion from infix to stack is in expressionTree.cpp.

 

The expression tree that results from the postfix expression on page 108 of Weiss should look like this:

 

 

 

For one-person teams, it is easy to set the cell’s formula to the right value.  Just pop the stack to get the operator.  Depending on what it is, set the cellOperator to the correct value.  Then pop the next item off the stack to get the right operand.  Then pop again to get the left operand.

 

For two-person teams, handling the expressions is more involved, but still pretty easy.  To get an expression tree from the postfix expression on the stack, pop an item off the stack.  If it is a literal or cell, then we create an ExpressionTreeNode from the token and stop.  If it is an operand, then we must generate a right ExpressionTree and a left ExpressionTree from what is left on the stack.  We continue doing this until the stack is empty.  Here is some code that expresses that algorithm:

 

 

// Build an expression tree from a stack of tokens

void ExpressionTree::BuildExpressionTree

                       (Stack<ExpressionTreeNode> & s)

{

     root = GetExpressionTree(s);

     assert(s.isEmpty());

}

 

ExpressionTreeNode * ExpressionTree::GetExpressionTree

                       (Stack <ExpressionTreeNode> & s)

{

     ExpressionTreeNode * returnTree;

     ExpressionTreeToken token;

 

     if (s.isEmpty())

           return NULL;

     token = s.topAndPop();

if ((token.tokenType == LiteralTokenType) ||

(token.tokenType == CellTokenType)

 

     {

           // Literals and Cells are leaves

           // in the expression tree

           returnTree = new ExpressionTreeNode

(token, NULL, NULL);

           return returnTree;

     }

else if (token.tokenType == OperatorTokenType)

     {

           // Continue finding tokens that will form the

           // right subtree and left subtree.

           ExpressionTreeNode *leftSubtree;

           ExpressionTreeNode *rightSubtree;

 

           rightSubtree = GetExpressionTree (s);

           leftSubtree  = GetExpressionTree (s);

           returnTree = new ExpressionTreeNode

(token, leftSubtree, rightSubtree);

           return returnTree;

}

}

 

If you are in a two-person team, be sure you understand how the postfix expression gets transformed into this expression tree using the algorithm described above.

 

To evaluate an expression tree, you will need to to a traversal of the tree.  For example, to evaluate the following tree, you would evaluate subtree A, then subtree B, then add the two results.  (What kind of traversal is this?)

 

User Interface

 

The user interface will be menu-driven.  If you decide to offer more choices to the user, you may add to the code.  The code, which is in the main.cpp file provided, is just a while loop that accepts commands from the user.  When the command is parsed, it is dispatched to the appropriate function.

 

Read main.cpp to see how the menus work.  You need to implement code that handles all the commands on the menu.

 

Reading from and Saving to Files

 

Code is provided to help you read a spreadsheet from a file.  Each line in a file is of the form <cell> <formula>.  The code provided will return a cellToken and a stack of tokens.  You will need to convert the stack into an expression tree (for two-person teams) as you did for the "change cell" command.

 

For bonus, you can write a function that writes a spreadsheet to a file.

 

Organizing Your Files

 

Most of the functionality of your spreadsheet will be in a single file.  A good name for it would be spreadsheet.h and spreadsheet.cpp.  You might need some functions that are not necessarily specific to spreadsheets.  If you need to write such functions, put them in files called util.h and util.cpp.

 

What Code is Provided

 

Weiss's string code has been augmented to convert a string (representing a formula) to a stack of tokens.  You will need to take these tokens and construct an expression tree, using the algorithm described above.  You may take the code and alter it, simplify it, or add to it as you see fit.  However, please document any changes you make (both in the code and in your report).

 

main.cpp has extensive code and pseudocode (function calls to functions that you will have to write) to help get you started on writing the menus and handling the file input/output.

 

You may use any of Weiss's code.  The queue code, binary search tree code, and possibly some of the graph code might be useful.  On the other hand, it might actually be easier to implement these data structures yourself.  They are not that hard to implement if you understand them.  Weiss's code contains a lot of functionality that we do not need for our purposes.

 

 

What Text Files are Provided

 

There are a few text files that represent some small sample spreadsheets.  You will want to use them and your own test cases to check that your code works.

 

What to Turn In

 

Turn in all code (.cpp and .h files) you wrote or used.  Also, create an executable (.exe) using whatever compiler you used.  Finally, create a README.txt textfile that tells who is in your group and what files you submitted.  Use the electronic turnin to submit these files.  If you are in a two-person team, submit using either person's name (but make sure the README.txt file has both team member's names).

 

You should turn in a report (hard copy) that will act as a users manual (how to use the program) and a technical manual (describing the data structures and algorithms you used).  IMPORTANT: Your users manual should also describe exactly how to compile your program by describing the exact steps you used (for example, in VC++, you should describe which files to add to the project, any special settings you used, etc.).

 

Software Engineering Tips

 

Start small.  Put stubs in for functions the menu-driven user interface calls.  Make sure you can read in a spreadsheet and print it out (both as values and as formulas).  Then work on changing the formula for one cell.    Then work on evaluating one cell that has no references to any other cells.  Then work on the topological sort.  Then you can calculate the entire spreadsheet.

 

Make your graph small (8x8).  If you have written your code correctly, it should be easy to make it bigger (by just changing one line of code).

 

Test, test, test!  Every time you add a significant feature to your code, test it before moving on to another part of the code.  Work out on paper interesting spreadsheets to use to test and make a file for it so that you can test your code without having to use the menu to create the test spreadsheet.

 

Document, document, document!  Put comments in your code before, during, and after you implement it.  Does your code actually do what your comments say they do?

 

How Your Project Will Be Graded

 

As usual, completeness, correctness, and style (readability, documentation) will be important criteria for your grade.  The clarity of your user manual and report will also factor heavily into your grade, since they will help us understand your code.

 

Unless there is some exceptional reason, both people in a two-person team will receive the same grade.

 

 

Bonus (up to 15 points)

 

Here are some bonus points ideas.  If you have some other ideas, talk to the instructor or the TAs to determine how much work is involved (which will correlate with how many bonus points it will be worth).  Do not attempt bonus work until you have the "regular" part of the project working, tested, and documented.

 

(10 points) If you are a one-person team, you can add functionality to make it work like the two-person team project.

(2 points) Improve the user interface in some significant way.  For example, change the code so that the user can type in an expression such as "5+A0" with no spaces between the tokens.  Another possibility is to print a useful error message any time the user enters a bad formula or bad cell reference.  (Currently, the code is inconsistent about how it handles errors.  However, if the user types in a legal cell reference or formula, then it will act correctly.)

(2 points) Add the "Save Spreadsheet to a file" command to the menu.

(5 points) Add exponentiation to the operations allowed in expressions.  Exponentiation uses the caret ("^") and has higher priority than * or /.  You will need to modify the expression parser, the code that generates the expression tree, and, of course, your expression tree evaluator.

(10 points) Implement other operators in a formula that can act on more than two operands.  For example, you might define "AVG (cell1, cell2, cell3)" to be the average of the values in cell1, cell2, and cell3.  You would have to change large parts of the code, including the expression parser.  This is hard.