CSE 303, Autumn 2009
Homework 3: digits
Due Thursday, October 29, 2009, 11:30 PM
100 points total
This assignment is inspired by a similar
assignment used in CSE303 that was itself adapted from Professor Steve
Wolfman of UBC.
This assign has one objective: helping you get used to basic C
programming.
You will write a C program
digits.c to explore a phenomenon called Benford's Law, which describes
a surprising pattern in the frequency of
occurrence of the digits 1-9 as the first digits of natural data.
(For example, in the number 328905, the first digit is 3.) You might
expect each digit to occur with equal frequency in arbitrary data.
Indeed, in truly random data over appropriate ranges,
each digit does appear with equal frequency. However, a
substantial amount of data from diverse sources does not
exhibit a uniform distribution. If you are curious about the
distribution, Benford's Law, and the process by which it was
discovered, check out the Wikipedia page about it.
You will write a C program digits.c
that examines integer data and
counts the first digits of those integers, then outputs statistics.
Your program will read input from the console (standard
input) using the scanf function. You will continually read integers,
one per line, until you see the value -1 at which point
your program will output its statistics and exit. For full credit, you
should appropriately use an array to help you perform
the counting of first digits. Your program must read from standard
input. The following is an example file enrollment.txt that shows
number of students enrolled at various major universities. (UW is the
first data point with 28,570.) Notice that the
file ends with -1 to signify the end of the input. No number
starting with a 0, including the number 0 itself, can appear in the
file.
28570
12176
5476
543
3490
24892
28619
2595
603
2527
1465
1858
-1
If your program were compiled into an executable program named digits
and run with the data above (place into a file and redirected to
standard input), the output would be:
$ ./digits < enrollment.txt
Digit Count
Percent Histogram
1 3
25.00% ************
2 5
41.67% ********************
3 1
8.33% ****
4 0 0.00%
5 2
16.67% ********
6 1
8.33% ****
7 0 0.00%
8 0 0.00%
9 0 0.00%
TOTAL
12
You must match the above output
format exactly (no, the horizontal lines are not included in the
output). The columns above are
separated by exactly 3 spaces. The 'Digit' column is exactly 5 spaces
wide. The 'Count' column is exactly 5 spaces
wide, with each count value right-aligned within the column. The
'Percent' column is exactly 7 spaces wide, with each
percentage value right-aligned within the column and displaying exactly
2 digits after the decimal point. The
'Histogram' column should show exactly one * star character for every
full 2% for that row's integer. For example, since the digit
2 above has 41.67%, there are 20 stars. You may assume valid input,
that the input to your program will consist
entirely of positive integers until -1. You may not make any assumption
about the number of integers to read. It could
be very few, none at all, or a very large number -- yes, even larger
than fits in an int or a
long or a ....
After you complete this program, you will write a variation of this
program, digits2, which
takes one argument (and still reads from standard input for the data)
that indicates how many digits should be examined and analyzed for each
integer in the input file. That is, digits2 1 should be identical
to digits, as the 1 says to analyze the first
digit. digits2 3,
however, would analyze the first, second, and third digits of each
input entry. So, for example, the output for digits2 4 on the above input
file would be:
1 Digit Count
Percent Histogram
1 3
25.00% ************
2 5
41.67% ********************
3 1
8.33% ****
4 0 0.00%
5 2
16.67% ********
6 1
8.33% ****
7 0 0.00%
8 0 0.00%
9 0 0.00%
TOTAL
12
2 Digit
Count Percent Histogram
0
1 8.33% ****
1 0
0.00%
2 1
8.33% ****
3 0 0.00%
4 5 41.67%
********************
5 2 16.67%
********
6 0 0.00%
7 0 0.00%
8 3 25.00% ************
9 0 0.00%
TOTAL
12
3 Digit
Count Percent Histogram
0 0
0.00%
1 1 8.33% ****
2 1
8.33% ****
3 2 16.67%
********
4 0 0.00%
5 2 16.67%
********
6 2 16.67%
********
7 1 8.33% ****
8 1 8.33% ****
9 2 16.67%
********
TOTAL
12
4 Digit
Count Percent Histogram
0 1
10.00% *****
1 1 10.00% *****
2 0 0.00%
3 0 0.00%
4 0 0.00%
5 2 20.00%
***********
6 1 10.00% *****
7 3 30.00% ***************
8 1 10.00% *****
9 1 10.00% *****
TOTAL
10
Note a couple of things.
First, the digit 0 can
appear anywhere except in the first position: that is, the first table
only includes digits 1-9, but the remaining tables include 0-9.
Second, if you are examining the nth digit (where 1<=n<=argument)
and
come across a number with less than n digits, then it is not counted:
so, because the above input file has two
3-digit numbers in it, there are only 10 instances of a "4 Digit" in
the final table. You can assume the argument is an integer
between 1 and 9.
Your program should produce no errors or warnings from gcc -Wall
. You
should not use pointers on this assignment. In terms of grading, most
of the points will come from the correctness of your programs, although
some points will also come from the style and design and appropriate
simplicity of your code. You should not use a more complex command or
control structure when a more simple one would achieve the same result.
You should reduce redundancy when reasonable.
Turn in will be via the dropbox, with a single file hw3.tar.gz containing both digits.c and digits2.c.