University of
Washington
CSE 373, Data Structures and Algorithms
Project #3: Spreadsheet
Due: Friday, December 7, 2001, 12:30pm
Autumn 2001 November 21, 2001
Donald Chinn
IMPORTANT: Read this handout completely and read it several times before you type in any code! It is vital that you understand what the assignment is and plan your strategy for implementation before spending hours programming. If you do not plan, many of those hours on the computer will be a wasted effort.
Background
A spreadsheet
is just a two-dimensional grid of cells.
Each cell contains a formula
that specifies how to compute the value of the cell. Every cell has a name consisting of a letter followed by a
nonnegative integer (e.g., "A0" or "C14"). A formula can be as simple as an integer
constant (e.g., "5"), or it can refer to other cells (e.g., "B1
+ C3").
Or, it can be a combination of the two (e.g.,
"2 * D3"). Here's a sample
8x8 spreadsheet with the formulas corresponding to those cells.
|
A |
B |
C |
D |
E |
F |
G |
H |
0 |
|
|
23 |
|
B2+C2 |
|
|
|
1 |
4 |
|
|
B3+2 |
|
|
|
|
2 |
|
|
|
2*5 |
|
|
|
|
3 |
|
A1+B1 |
B3+A4 |
|
3+C0 |
|
|
|
4 |
A1-D4 |
|
|
-5 |
|
|
|
|
5 |
|
|
|
|
|
|
|
|
6 |
|
|
|
|
|
|
|
|
7 |
|
|
|
|
|
|
|
|
Note that every cell's formula consists of
integer constants, cell references, and operators. The operators we will use in our spreadsheet are just the integer
operations +, –, *, and /. Any cell
that does not have a formula is assumed to have "0" as its formula.
Here is the same spreadsheet with the values
computed for each cell based on their formulas above. (All empty cells have value 0.)
|
A |
B |
C |
D |
E |
F |
G |
H |
0 |
|
|
23 |
|
0 |
|
|
|
1 |
4 |
|
|
6 |
|
|
|
|
2 |
|
|
|
10 |
|
|
|
|
3 |
|
4 |
13 |
|
26 |
|
|
|
4 |
9 |
|
|
-5 |
|
|
|
|
5 |
|
|
|
|
|
|
|
|
6 |
|
|
|
|
|
|
|
|
7 |
|
|
|
|
|
|
|
|
There are two main questions to ask. How do you decide what order to evaluate the
cells? How do you evaluate a cell given
its formula?
To determine what order to evaluate the cells,
you first need to construct a dependency graph. Call the dependency graph G = (V,E). The vertices, V, in the graph are the cells
of the spreadsheet. (There are 64 cells
in the above spreadsheet, and therefore 64 vertices in the graph.) There is edge in E between vertices v and w if the formula of the cell corresponding to vertex w contains a reference to the cell
corresponding to vertex v. For example, since the formula for cell C3
contains a reference to B3, then there will be an edge in E that goes from the
vertex for B3 to the vertex for C3.
Here is part of the dependency graph:
All other nodes in the graph are vertices with
no edges going in or coming out of them.
Once we have the dependency graph, we can then
perform a topological sort on it to determine an order to evaluate the formulas
in the cells. Note that if we evaluate
the cells in order of their topological numbers, then we can be assured that
when we evaluate a formula that has a reference to another cell, that other
cell's formula will already have been evaluated. Also note that this only works if there are no cycles in the
dependency graph. For example, if A0's
formula were "B0+3" and B0's formula were "A0+1", then we
would have a cycle in the dependency graph, and it would be impossible to
evaluate the formulas. In our
spreadsheet, we will disallow the user from creating formulas that would result
in a cycle in the dependency graph. (What
does Microsoft's Excel spreadsheet do when you try to create a cyclic
dependency?)
Once we know what order to evaluate the formulas
in the cells, we then have to determine how to evaluate a cell's formula. This is fairly easy. We compute its formula using the values in
the cells it references, if there are any.
The Assignment
Your task in this
project is to write a spreadsheet ADT and a main() program that allows the user
to assign formulas to cells, view the formulas in the cells of the spreadsheet,
and view the values computed for all the cells in the spreadsheet. The user interface will be simple – it will
be menu-driven (see the User Interface for more details).
You may work by
yourself or in a team of two people. I
suggest you work in a team, as you will learn a lot more and you will see how
other people think about designing and implementing programs.
The one-person
assignment is essentially the same as the two-person assignment, except
that in the one-person assignment, formulas in the cells are either empty (no
formula, evaluates to 0) or have exactly one operator (+, -, *, /) and two operands (either an integer constant
or a cell reference).
The two-person
assignment has no restriction on the length of formulas. They can consist of any number of operators
and an appropriate number of operands.
In addition, the user may use parentheses to enforce some order of
evaluation on the formula (e.g., "3 + 5 * B3 * (A0 + 5)"). As we shall see, we will need a fancier
representation of these more general formulas than in the one-person
assignment.
Data Structures
You will need to decide how to represent the
spreadsheet. You will likely want to
implement it as a class, where the two-dimensional array of cells is a private
data member. The cells themselves will
be a class, containing some representation of a formula. The exact representation will depend on
whether you are a one-person team or a two-person team. We will need to use the concept of a token
to represent the various parts of a formula.
We
will need to define the following types/classes:
enum
OperandType = {Cell, Literal};
typedef
struct CellToken {
int column; //
column A = 0, B = 1, etc.
int row;
}
CellToken;
typedef
struct OperandToken {
OperandType opType;
int literalID; // valid if opType == Literal
CellToken cellID; // valid if opType == Cell
}
OperandToken;
enum
OperatorToken {Plus, Minus, Mult, Div, LeftParen, Error};
enum
TokenType {OperatorTokenType, CellTokenType, LiteralTokenType};
typedef
struct ExpressionTreeToken {
TokenType tokenType;
CellToken cellID; // valid if tokenType == CellTokenType
OperatorToken nodeOperator; //
if OperatorTokenType
int literalID; // if tokenType ==
LiteralTokenType
}
ExpressionTreeToken;
For the one-person
assignment, a cell might be represented like this:
class
Cell {
public:
void Evaluate (Spreadsheet &
spreadsheet);
void ReadFormula (iostream & ios );
private:
int value;
// the following represent the formula
OperatorToken cellOperator;
OperandToken leftOperand;
OperandToken rightOperand;
};
For the two-person
assignment, a cell might be represented like this:
class
Cell {
public:
void Evaluate (Spreadsheet &
spreadsheet);
void ReadFormula (iostream & ios );
private:
int value;
// the expression tree below represents the
formula
ExpressionTree expressionTree;
};
The
nodes of the ExpressionTree (which is a binary tree, NOT a binary search tree)
will look like this:
class
ExpressionTreeNode {
ExpressionTreeToken token;
ExpressionTreeNode *left;
ExpressionTreeNode *right;
friend class ExpressionTree;
};
And
an ExpressionTree would look like this:
class
ExpressionTree {
public:
void makeEmpty();
void printTree();
int Evaluate(Spreadsheet & spreadsheet);
private:
ExpressionTreeNode *root;
};
As
an example of what the expression tree would look like, suppose we had a
formula such as "5 + B3 * 8".
This would be represented by the following expression tree:
The
spreadsheet will be a class with a (private) two-dimensional array of
cells. (At first, you will want it to
be a small array so that it is easy to test.
Say, 4x4. Later, you will want
to make it bigger. Use a #define to make it easy to change the
number of rows and columns of the spreadsheet.) You will also have a (private) graph data structure to perform
the topological sort when you need to.
Algorithms
Both one-person and two-person teams will need
to implement topological sort over the cells to determine what order to
evaluate them. Instead of numbering the
cells (as presented in class and in the book), it is better to evaluate the
cell's value as you do the topological sort.
If you discover a cycle in the graph, then you should stop the
topological sort, print an error message, and set the formula of the cell that was changed back to what it was before
the change. (It turns out that the
cells that were computed first in the new topological sort will have the same
value that they had in the old topological sort (why?), so we don't have to go
back and recompute them if a cycle is discovered.)
Handling the expressions is a little more
complicated. Code is provided that will
take a string that represents an (infix) expression, such as “5 + B3”, and
produce a stack of tokens of a specific form.
It will turn the infix expression into a postfix expression, following
the algorithm described on pages 105-8 of Weiss. Thus, in our example, the stack, from bottom to top, will look
like this: 5 B3 +. The code that does
this conversion from infix to stack is in expressionTree.cpp.
The expression tree that results from the
postfix expression on page 108 of Weiss should look like this:
For one-person teams, it is easy to set
the cell’s formula to the right value.
Just pop the stack to get the operator.
Depending on what it is, set the cellOperator to the correct value. Then pop the next item off the stack to get
the right operand. Then pop again to
get the left operand.
For two-person teams, handling the
expressions is more involved, but still pretty easy. To get an expression tree from the postfix expression on the
stack, pop an item off the stack. If it
is a literal or cell, then we create an ExpressionTreeNode from the token and
stop. If it is an operand, then we must
generate a right ExpressionTree and a left ExpressionTree from what is left on
the stack. We continue doing this until
the stack is empty. Here is some code
that expresses that algorithm:
// Build an
expression tree from a stack of tokens
void
ExpressionTree::BuildExpressionTree
(Stack<ExpressionTreeNode> & s)
{
root = GetExpressionTree(s);
assert(s.isEmpty());
}
ExpressionTreeNode
* ExpressionTree::GetExpressionTree
(Stack
<ExpressionTreeNode> & s)
{
ExpressionTreeNode * returnTree;
ExpressionTreeToken token;
if (s.isEmpty())
return NULL;
token = s.topAndPop();
if
((token.tokenType == LiteralTokenType) ||
(token.tokenType == CellTokenType)
{
// Literals and Cells are leaves
// in the expression tree
returnTree = new ExpressionTreeNode
(token, NULL, NULL);
return returnTree;
}
else
if (token.tokenType == OperatorTokenType)
{
// Continue finding tokens that will
form the
// right subtree and left subtree.
ExpressionTreeNode *leftSubtree;
ExpressionTreeNode *rightSubtree;
rightSubtree = GetExpressionTree (s);
leftSubtree = GetExpressionTree (s);
returnTree = new ExpressionTreeNode
(token, leftSubtree, rightSubtree);
return returnTree;
}
}
If you are in a two-person team, be sure you
understand how the postfix expression gets transformed into this expression
tree using the algorithm described above.
To evaluate an expression tree, you will need to
to a traversal of the tree. For
example, to evaluate the following tree, you would evaluate subtree A, then
subtree B, then add the two results.
(What kind of traversal is this?)
User Interface
The user interface will be menu-driven. If you decide to offer more choices to the
user, you may add to the code. The
code, which is in the main.cpp file provided, is just a while loop that accepts
commands from the user. When the
command is parsed, it is dispatched to the appropriate function.
Read main.cpp to see how the menus work. You need to implement code that handles all
the commands on the menu.
Reading from and Saving to Files
Code is provided to help you read a spreadsheet
from a file. Each line in a file is of
the form <cell> <formula>.
The code provided will return a cellToken and a stack of tokens. You will need to convert the stack into an
expression tree (for two-person teams) as you did for the "change
cell" command.
For bonus, you can write a function that writes
a spreadsheet to a file.
Organizing Your Files
Most of the functionality of your spreadsheet
will be in a single file. A good name
for it would be spreadsheet.h and spreadsheet.cpp. You might need some functions that are not necessarily specific to
spreadsheets. If you need to write such
functions, put them in files called util.h and util.cpp.
What
Code is Provided
Weiss's string code has been augmented to
convert a string (representing a formula) to a stack of tokens. You will need to take these tokens and
construct an expression tree, using the algorithm described above. You may take the code and alter it, simplify
it, or add to it as you see fit.
However, please document any changes you make (both in the code and in
your report).
main.cpp has extensive code and pseudocode
(function calls to functions that you will have to write) to help get you
started on writing the menus and handling the file input/output.
You may use any of Weiss's code. The queue code, binary search tree code, and
possibly some of the graph code might be useful. On the other hand, it might actually be easier to implement these
data structures yourself. They are not
that hard to implement if you understand them.
Weiss's code contains a lot of functionality that we do not need for our
purposes.
What Text Files are Provided
There are a few text files that represent some
small sample spreadsheets. You will
want to use them and your own test cases to check that your code works.
What
to Turn In
Turn in all code (.cpp and .h files) you wrote
or used. Also, create an executable
(.exe) using whatever compiler you used.
Finally, create a README.txt textfile that tells who is in your group
and what files you submitted. Use the
electronic turnin to submit these files.
If you are in a two-person team, submit using either person's name (but
make sure the README.txt file has both team member's names).
You should turn in a report (hard copy) that
will act as a users manual (how to use the program) and a technical manual
(describing the data structures and algorithms you used). IMPORTANT:
Your users manual should also describe exactly how to compile your program by
describing the exact steps you used (for example, in VC++, you should describe
which files to add to the project, any special settings you used, etc.).
Software
Engineering Tips
Start small.
Put stubs in for functions the menu-driven user interface calls. Make sure you can read in a spreadsheet and
print it out (both as values and as formulas).
Then work on changing the formula for one cell. Then work on evaluating one cell that has
no references to any other cells. Then
work on the topological sort. Then you
can calculate the entire spreadsheet.
Make your graph small (8x8). If you have written your code correctly, it
should be easy to make it bigger (by just changing one line of code).
Test, test, test! Every time you add a significant feature to
your code, test it before moving on to another part of the code. Work out on paper interesting spreadsheets
to use to test and make a file for it so that you can test your code without having
to use the menu to create the test spreadsheet.
Document, document, document! Put comments in your code before, during,
and after you implement it. Does your
code actually do what your comments say they do?
How Your Project Will Be Graded
As usual,
completeness, correctness, and style (readability, documentation) will be
important criteria for your grade. The
clarity of your user manual and report will also factor heavily into your
grade, since they will help us understand your code.
Unless there is some
exceptional reason, both people in a two-person team will receive the same
grade.
Bonus
(up to 15 points)
Here are some bonus points ideas. If you have some other ideas, talk to the
instructor or the TAs to determine how much work is involved (which will
correlate with how many bonus points it will be worth). Do
not attempt bonus work until you have the "regular" part of the
project working, tested, and documented.
(10 points) If you are a one-person team, you
can add functionality to make it work like the two-person team project.
(2 points) Improve the user interface in some significant way. For example, change the code so that the
user can type in an expression such as "5+A0" with no spaces between
the tokens. Another possibility is to print
a useful error message any time the user enters a bad formula or bad cell
reference. (Currently, the code is
inconsistent about how it handles errors.
However, if the user types in a legal cell reference or formula, then it
will act correctly.)
(2 points) Add the "Save Spreadsheet to a
file" command to the menu.
(5 points) Add exponentiation to the operations
allowed in expressions. Exponentiation
uses the caret ("^") and has higher priority than * or /. You will need to modify the expression parser,
the code that generates the expression tree, and, of course, your expression
tree evaluator.
(10 points) Implement other operators in a
formula that can act on more than two operands. For example, you might define "AVG (cell1, cell2,
cell3)" to be the average of the values in cell1, cell2, and cell3. You would have to change large parts of the
code, including the expression parser. This is hard.