CSE 374, Lecture 9: Intro to C, continued

Arrays

We briefly introduced pointers and arrays last lecture. Arrays are a contiguous chunk of memory, each element having some type. You can declare and access an array using the square bracket syntax like we used in Java.

    // Declaring an array of 3 ints.
    int arr[3];
    // Set 2nd element of the array to 123.
    arr[1] = 123;

                ---------------------------------
    memory ... | | | | | | ? | 123 | ? | | | | | | ...
                ---------------------------------
                           ^
                index in address space = 3840

                ------
           arr | 3840 |
                ------

What's actually going on in memory, and what does "arr" actually refer to? The array is a contiguous chunk of memory, and the ADDRESS of the START that chunk of memory is stored in the variable "arr". This is important: "arr" doesn't actually store the array itself, it just stores the pointer for where to FIND the beginning of the array.

Gotcha: arrays that are declared on the stack (like in the example here) must have a constant size so that the compiler knows how much space to reserve for it. If you need a dynamically-sized array or a very large array, you should store the array on the heap (more in a future lecture).

Since an array variable actually stores an address, which is what a pointer does, you will see that pointers can also be used to refer to arrays. This is called "implicit conversion".

    int array[100];
    int *arrayPtr = array;
    arrayPtr[10] = 123;

With either syntax, you use the "[10]" bracket notation to indicate an offset from that starting address. People sloppily say that "arrays and pointers are the same thing in C" - this is not exactly true but you can use them interchangeably most of the time.

Gotcha: arrays are not initialized with 0's like they are in Java. The array will store whatever random bits were already present at that position in memory. If you want the array to be truly empty, you will have to store 0's in the array manually.

Functions can take array types as parameters.

    void foo(int arr[], int len) {
      for (int i = 0; i < len; i++) {
        printf("Element in array!");
      }
    }

Gotcha: arrays in C don't have a built-in length like those in Java. You have to know how many elements there are so you don't "run off the end". A common idiom is to pass the length of an array along with the array itself to a function. What would happen if we tried to access arr[1000] if the array had a size of 100? Who knows, but it probably wouldn't be good! Your program would probably crash, but it isn't certain.

Strings

There is no real "string" type in C. Strings are just arrays of characters. However, the final character in every string will be the null character (0 as an integer); this means that proper strings in C have a defined length and you don't need to pass around the length of the string in addition to the character array itself.

    [ "h", "e", "l", "l", "o", " ", "w", "o", "r", "l", "d", "!", \0 ]

To declare a new string, you have a couple of options:

    char str[] = "hello";  // array syntax
    char *str2 = "hello";  // pointer syntax

How would we declare an array of strings? The type of an array of strings can be either "char* arr[]" or "char** arr" - you can use either notation for an array.

    char *arrStr[] = {"ant", "bee"};  // array containing char*'s
    char **arrStrPtr = arrStr;        // pointer to the start of an array containing char*'s
    arrStr[0] = "cat";

This is in fact the argument that we saw to "main" in our hello world program! An array of strings. To test this, we can print out the arguments to our main program:

    int main(int argc, char **argv) {
      for (int i = 0; i < argc; i++) {
        printf(argv[i]);
      }
    }

Control constructs

As we just saw, we can use for loops in C just like we did in Java. In fact, we can also use while loops, if statements, breaks, continues, and switches.

Printing

In our hello world program, we also used the C library function "printf". This prints the given string to stdout. The "f" in "printf" actually stands for "format": we can do fancier things with printf:

    int x = 5;
    char *str = "The secret number is";
    printf("%s %d\n", str, x);
    // Prints:
    // The secret number is 5
    printf("%25s %d\n", str, x);
    // Prints:
    //      The secret number is 5

You can read more about printf on the cplusplus.com website: http://www.cplusplus.com/reference/cstdio/printf/

There are many "format specifiers" that you can provide within the string that you pass to printf - in this case we see that if we have "%s" in the format, then we can provide printf with an additional argument that holds a string and printf will insert that string into what it prints. Similarly, "%d" represents an integer that we can provide as an argument. Finally, format specifiers can have additional arguments, such as "%25s", which adds padding to the output such that the output takes up at least 25 characters. Look at the reference document to learn more about format specifiers for printf.

Gotcha: what happens if printf has more format specifiers than additional arguments? Who knows, but it would probably be bad. Always make sure that the number of arguments to printf matches the number of format specifiers.

Scanning

On the other side of printf is scanf, which is a way for C programs to get data from the standard input stream. Similarly to printf, you use format specifiers to determine how to parse the input.

    int num;
    char str[21];
    int numInputsReadProperly = scanf("%20s %d", str, &num);

In this example, scanf will halt the program and wait for the user to type and then press enter. It will then parse the values (separated by spaces) and put the first one into the string pointed to by the char* "str" pointer and put the second one into the integer pointed to by the "&num" pointer. It will return the number of inputs that it read successfully (for instance, if the user does not enter an integer, then it will return 1 instead of 2).

Why does the string "str" hold 21 characters while scanf only allows 20? We need to be able to hold the \0-terminator to the string in addition to 20 characters!

Gotcha: scanf has many problems, such as that you have to be very careful with string size you are reading in, and you can't read in strings that contain spaces. There are other functions that you can use to read in input, such as fgets.

Preprocessor + define

Last lecture, we saw that lines starting with "#" are directions to the "preprocessor" (a part of the compiler - more later). The "#include" directive, for instance, finds a file and "pastes" it into the program. It will find all files that it depends on RECURSIVELY - so if in this example, stdio.h contains stdother.h, then stdother.h will also be included.

    #include <stdio.h>
    #include "foo.h"

Notice that we have two formats of includes: and "file.h". What's the difference? We use triangular brackets for files in the C standard libraries. We use quotes for regular files, which will be included from the current directory.

Another thing that the preprocessor can do is define and substitute constants. We call this a "macro" and it can do more than just numeric constants, but for now we'll use it for constants. In this example, the preprocessor will replace ALL instances of "FOO" in the file with "17".

    #define FOO 17

    void f() {
      for (int i = 0; i < FOO; i++) {
        printf("Hello, world!\n");
      }
    }

Be careful with this, however: since it will replace ALL instances of the name, if you accidentally name a variable with the same name, things won't compile. We typically use all-caps for constants/macros for this reason:

    #define foo 17
    void f() {
      int foo = 5+foo+foo;  // won't compile: int 17 = 5+17+17;
    }

Booleans

Although C has many familiar types from Java, it does NOT have an actual "boolean" true/false type. We have to make pseudo-booleans. If the type of a variable is an number or character, then 0 will be "false" and anything else will be "true". If the type of a variable is a pointer, then NULL will be "false" and any other pointer will be "true". NULL is a special value for pointers that means "no pointer".

Declaration vs definition

In C, there is a difference between DEFINING a function or variable and DECLARING a function or variable. We're used to defining functions:

    void foo(int x) {
      printf("%d\n", x);
    }

A "declaration" on the other hand introduces a name and describes its properties (such as return type and arguments) but does not actually create it.

    void foo(int x);

A function can be DECLARED as many times as you like (although usually only once per file). However, when a program is compiled, the DEFINITION must be present once and only once.

Another requirement of C is that functions and variables must be declared before they are used. That declaration can be either a full definition or just a declaration of the type.

    // THIS IS LEGAL - foo is defined before it is used in main.
    void foo(int x) {
      printf("%d\n", x);
    }
    int main(int argc, char **argv) {
      foo(5);
    }

    // --------------------------------------------------------
    // THIS IS LEGAL - foo is declared before it is used in main
    //                 and the definition is made after main.
    //                 We call this a "forward declaration".
    void foo(int x);
    int main(int argc, char **argv) {
      foo(5);
    }
    void foo(int x) {
      printf("%d\n", x);
    }

    // --------------------------------------------------------
    // THIS IS ILLEGAL - foo is defined before it is used in main.
    int main(int argc, char **argv) {
      foo(5);
    }
    void foo(int x) {
      printf("%d\n", x);
    }

Summary of declaration vs definition:

A declaration introduces a name and describes its type; declarations of the same name can happen as many times as you want (although only once per file).
A definition is the actual creation and implementation, must happen only once in the entire program.
Declarations of shared things are usually put in header files (such as stdio.h).
Declaration of a name MUST happen before the thing is used. If you'd like to place the implementation AFTER the first use of the name, then you can use a "forward declaration" and declare the type of the name before the first use and define the name afterwards.