CHAPTER 6: Some more on Strings, and Arrays of Strings


   Well, let's go back to strings for a bit.  In the following
all assignments are to be understood as being global, i.e. made
outside of any function, including main.

   We pointed out in an earlier chapter that we could write:

   char my_string[40] = "Ted";

which would allocate space for a 40 byte array and put the string
in the first 4 bytes (three for the characters in the quotes and
a 4th to handle the terminating '\0'.

    Actually, if all we wanted to do was store the name "Ted" we
could write:

      char my_name[] = "Ted";

and the compiler would count the characters, leave room for the
nul character and store the total of the four characters in memory
the location of which would be returned by the array name, in this
case my_string.

    In some code, instead of the above, you might see:

     char *my_name = "Ted";

which is an alternate approach.  Is there a difference between
these?  The answer is.. yes.  Using the array notation 4 bytes of
storage in the static memory block are taken up, one for each
character and one for the nul character.  But, in the pointer
notation the same 4 bytes required, _plus_ N bytes to store the
pointer variable my_name (where N depends on the system but is
usually a minimum of 2 bytes and can be 4 or more).

    In the array notation, my_name is a constant (not a
variable).  In the pointer notation my_name is a variable.  As to
which is the _better_ method, that depends on what you are going
to do within the rest of the program.

    Let's now go one step further and consider what happens if
each of these definitions are done within a function as opposed
to globally outside the bounds of any function.

void my_function_A(char *ptr)
{
  char a[] = "ABCDE";
  .
  .
}

void my_function_B(char *ptr)
{
  char *cp = "ABCDE";
  .
  .
}

    Here we are dealing with automatic variables in both cases.
In my_function_A the automatic variable is the character array
a[]. In my_function_B it is the pointer cp.  While C is designed
in such a way that a stack is not required on those processors
which don't use them, my particular processor (80286) has a
stack.  I wrote a simple program incorporating functions similar
to those above and found that in my_function_A the 5 characters
in the string were all stored on the stack.  On the other hand,
in my_function_B, the 5 characters were stored in the data space
and the pointer was stored on the stack.

    By making a[] static I could force the compiler to place the
5 characters in the data space as opposed to the stack.  I did
this exercise to point out just one more difference between
dealing with arrays and dealing with pointers.  By the way, array
initialization of automatic variables as I have done in
my_function_A was illegal in the older K&R C and only "came of
age" in the newer ANSI C.  A fact that may be important when one
is considering portabilty and backwards compatability.

    As long as we are discussing the relationship/differences
between pointers and arrays, let's move on to multi-dimensional
arrays.  Consider, for example the array:

    char multi[5][10];

    Just what does this mean?   Well, let's consider it in the
following light.

        char multi[5][10];
        ^^^^^^^^^^^^^

    If we take the first, underlined, part above and consider it
to be a variable in its own right, we have an array of 10
characters with the "name"  multi[5].  But this name, in itself,
implies an array of 5 somethings.  In fact, it means an array of
five 10 character arrays.  Hence we have an array of arrays.  In
memory we might think of this as looking like:

      multi[0] = "0123456789"
      multi[1] = "abcdefghij"
      multi[2] = "ABCDEFGHIJ"
      multi[3] = "9876543210"
      multi[4] = "JIHGFEDCBA"

with individual elements being, for example:

      multi[0][3] = '3'
      multi[1][7] = 'h'
      multi[4][0] = 'J'

    Since arrays are to be contiguous, our actual memory block
for the above should look like:

    "0123456789abcdefghijABCDEFGHIJ9876543210JIHGFEDCBA"

    Now, the compiler knows how many columns are present in the
array so it can interpret multi + 1 as the address of the 'a' in
the 2nd row above.  That is, it adds 10, the number of columns,
to get this location.  If we were dealing with integers and an
array with the same dimension the compiler would add
10*sizeof(int) which, on my machine, would be 20.  Thus, the
address of the "9" in the 4th row above would be &multi[3][0] or
*(multi + 3) in pointer notation.  To get to the content of the
2nd element in row 3 we add 1 to this address and dereference the
result as in

    *(*(multi + 3) + 1)

    With a little thought we can see that:

    *(*(multi + row) + col)    and
    multi[row][col]            yield the same results.

    The following program illustrates this using integer arrays
instead of character arrays.

------------------- program 6.1 ----------------------
#include 

#define ROWS 5
#define COLS 10

int multi[ROWS][COLS];

int main(void)
{
  int row, col;
  for (row = 0; row < ROWS; row++)
    for(col = 0; col < COLS; col++)
      multi[row][col] = row*col;
  for (row = 0; row < ROWS; row++)
    for(col = 0; col < COLS; col++)
    {
      printf("\n%d  ",multi[row][col]);
      printf("%d ",*(*(multi + row) + col));
    }
  return 0;
}
----------------- end of program 6.1 ---------------------

    Because of the double de-referencing required in the pointer
version, the name of a 2 dimensional array is said to be a
pointer to a pointer.  With a three dimensional array we would be
dealing with an array of arrays of arrays and a pointer to a
pointer to a pointer.  Note, however, that here we have initially
set aside the block of memory for the array by defining it using
array notation.  Hence, we are dealing with an constant, not a
variable.  That is we are talking about a fixed pointer not a
variable pointer.  The dereferencing function used above permits
us to access any element in the array of arrays without the need
of changing the value of that pointer (the address of multi[0][0]
as given by the symbol "multi").

NEXT