Variables, Data Types and Arrays

Document: Software Engineering 1: Course Notes

The Dreaded Array Indexing Bugs

Week 5-7: Chapter 2 pp. 11-19

The newline Character - and other vagaries!

Variables, Data Types and Arrays

A "simple" variable in C can store or record a single "value". What kind of value depends on the data type: variable of type int can hold positive or negative integers, up to about ; type long variables can hold integers up to about ; float variables can hold positive or negative rational numbers, with a precision of about 9 significant digits, and a range up to about . The type char is a little stranger: its values can be regarded either as integers (with a range from 0 to 255) or as ASCII characters, represented in single quotes, e.g., 'a', 'Z', '9', '+', '(' and so on.

But the basic point remains that one variable can store or record just one value.

Now, if we want to store a whole lot of values (and this is a very common requirement) we could simply declare as many separate, simple, variables as are needed. This is a perfectly satisfactory approach for many purposes.

BUT: in many cases, we don't just want a whole lot of separate values - we want our program to be able to automatically scan or iterate or repeat some operation over all these values. If the values are all stored in separate variables this will be very clumsy, if not impossible to code. This is so because each repetition or iteration would need to refer to a different variable - with a different name. That means that the C code for each iteration is necessarily different. If I want to do the same thing to three variables called x, y and z, then I will have to program (at least) three separate statements. Granted, the statements will be almost identical - but not quite exactly identical, because the relevant variable name will have to be changed from x to y to z. This means I can't possibly achieve the required repetition with, say, a for statement: because there is just one substatement controlled as the substatement of a for, and that substatement would have to name some particular one of the three variables x, y or z.

This may not be too much of a problem if I just want to repeat something over two or three different variables - it will not be too much of a problem simply to copy the relevant C source statements, and edit the separate copies to deal with the different variables as appropriate. But what if the repetition is to be over 50, or 100, or even 1000 different values? Then duplicating, and varying, the C code for each different variable is going to be very laborious, and very error prone.

Well, the C language provides a special mechanism for dealing with this kind of situation. If you want a set of data items, all of the same type, then instead of declaring a separate variable for each one, you can declare a single array variable:

    int foobar[100];

The square brackets on the declaration signal that foobar is some kind of array; the number inside the square brackets says how many elements are in the array; and the type - int in the case - gives the type of the elements. So this declaration creates a single "array" variable, called foobar; but foobar in turn, is actually made up of a whole lot (100 to be precise) of individual, simple, int variables.

Note that arrays already have an advantage over separate variables, in that they are much more concise to declare. In the example above, if we did not have the array mechanism, we would have had to declare 100 separate variables - something like this:

    int foobar1, foobar2, foobar3, foobar4, foobar5;
    int foobar6, foobar7, foobar8, foobar9, foobar10;

and so on!

Once an array is declared, the individual elements can be accessed by indexing the array name. That is, we can use statements like:

    foobar[10] = 42;
    foobar[24] = foobar[5] * foobar[4] * 163;
    printf("This is element 9 of foobar: %i\n", foobar[9]);

Array indices in C always start at zero. So if an array has 100 elements, the valid indices run from 0 to 99.

Note carefully at this stage that an array is a quite different kind of object from its elements. The kinds of things you can typically do to a complete array are quite different from the kinds of things you can do to its elements. Thus, in the case of foobar above, it is perfectly reasonable to write:

    foobar[0] = foobar[0] + 10;
    printf("%i", foobar[0]);

which has the effect of increasing the value of the zero'th element of foobar by 10. But it would be silly to write:

    foobar = foobar + 10;
    printf("%i", foobar);

foobar itself - the whole array, as opposed to one of its elements - is not a number (even though all its elements are). Adding 10 to an array is simply not a sensible or meaningful kind of thing to try to do to a complete array. Similarly, if we give printf() a format specification, "%i", which signals to it to expect a further argument of type int, then it would be totally confusing to give it a whole array of int values instead.

Are there any operations which it makes sense to carry out or use on a complete array, as opposed to individual elements of it? Well, there are, but they are relatively few. The only one we will have immediate use for is in passing a complete array to a function - somewhat as was attempted above with printf(). But whereas printf() has not been designed to be capable of accepting a whole array as a single argument, it is perfectly possible, and useful, to write your own functions which can accept such array arguments.

In the example given above, the elements of foobar are simply of type int. But the elements of an array can, themselves, be arrays - and so on. This allows the creation of data objects which can be thought of as multi-dimensional arrays:

    int two_dim_foobar[20][30];

This makes foobar an array with 20 elements, where each of these elements is, in turn, an array with 30 elements, and each of these is a simple variable of type int. The elements can then be accessed as you would expect, but now needing two indices to identify a particular, simple, int element:

  two_dim_foobar[10][5] = foobar[6] / foobar[7];

Multidimensional arrays turn out to be useful in many engineering applications. For example, if I am writing a program to plot points on a graph, I could conveniently set up a two dimensional array to record which points are marked, and which left blank. Similarly, Alcock's MATMUL program provides a business calculation example where two dimensional arrays are useful.

Anyway: so far we have just looked at declaring array variables, and accessing particular elements. While this simplifies the declarations somewhat, it is not yet clear that it addresses the original problem - our desire to be able to easily and conveniently repeat a single operation over a whole set of different variables, without having to code a separate statement for each one. Thus, we can now refer to array elements foobar[0] through foobar[99], instead of, say, to the completely separate variables named foobar0, foobar1 and so on down to foobar99. But how does this make it easier to repeat operations over these elements?

The key notion here is that the index of an array is not limited to being just a particular, literal, number, such as in foobar[7]. The index is technically allowed to be an arbitrary expression. Thus we could refer to foobar[2+20], or foobar[(10+3)*2]. This is still not terribly useful. But, in an expression, we can put variables. Thus we can have a program fragment like:

    int foobar[100];
    int index;
    .
    .
    .
    index = 6;
    .
    .
    .
    printf("%i", foobar[index]);

This will have the effect of printing element 6 of foobar. What advantage does this have over simply writing printf("%i", foobar[6])? Not much yet, but now consider this code fragment:

    int foobar[100];
    int index;
    .
    .
    .
    index = 6;
    printf("%i\n", foobar[index]);
    index = index + 1;
    printf("%i\n", foobar[index]);

This has the effect of printing elements 6 and 7 of foobar. Still, so what? The so what is that, if you examine this fragment carefully, you will see that the two printf() statements, even though they print out two different elements of foobar, are actually textually identical statements. Not just "almost" the same, but strictly identical. Their different effects, when they are executed, arise because the effect depends on the value of the variable index at the time of execution - and that actually changes between the two statements in this example.

What's so significant about the two printf() statements being absolutely identical? Well, because of this absolute identity, it is not necessary to actually repeat them in the C source file at all: the two separate invocations of this statement can be collapsed down as the substatement of a for statement as follows:

    for (index = 6; index <= 7; index++)
      printf("%i\n", foobar[index]);

Granted, it is not terrible exciting just to replace a single duplication of a statement. But it becomes terribly useful if we have a large number of duplications. For example, suppose we want to print out all the elements of foobar rather than just elements 6 and 7? Instead of writing something like:

    printf("%i\n", foobar[0]);
    printf("%i\n", foobar[1]);
    .
    .
    .
    printf("%i\n", foobar[99]);

with 100 separate statements, each differing only ever so slightly from the others, we can have a single for statement:

    for (index = 0; index <= 99; index++)
      printf("%i\n", foobar[index]);

Now that's a real benefit.

Make sure you understand that this kind of collapsing down, using an iteration statement like for, only works if there is absolutely no variation in the text of the statement being repeated - although, of course, there can be a large variation in the effect of that text each time it is executed. Go back over the example above, and satisfy yourself that you cannot achieve the same effect - collapsing down to a single for statement - if the program were using 100 separate variables with separate names instead of an array.

Note also that, although I have built up this example using the specific idea that the thing we want to iterate or repeat is the printing out of an element of the array, the concept and argument does not rely in any way on just what it is we want to repeat. Exactly similar considerations would have applied if we wanted to increment each element, or take its square root, or add it into a cumulative total, or whatever. Alcock's two example programs MATMUL and BUBBLE provide, of course, more comprehensive examples of the same ideas. It is a good idea to at least attempt to see how you might try implementing either of these programs if you did not have the possibility of using arrays open to you, but were forced, instead, to simply declare as many separate, simple, variables as are necessary. You should find that, in that case, you cannot use for statements to repeat the required operations over different variables, and you would therefore be forced to "manually" duplicate blocks of C code many times over, and then edit each copy to deal with slightly different variable(s). The whole thing becomes incredibly cumbersome!

Document: Software Engineering 1: Course Notes

The Dreaded Array Indexing Bugs

Week 5-7: Chapter 2 pp. 11-19

The newline Character - and other vagaries!

McMullin@ugmail.eeng.dcu.ie
Wed Mar 15 10:20:49 GMT 1995