Session 12: Week 24/25: <em>Exploring Pointers</em>

Document: Software Engineering 1: Lab Exercises

Software Engineering 1: Laboratory Exercises

Exercise 2: Enhanced Simulation Program (40%)

Session 12: Week 24/25: Exploring Pointers

Note that in this exercise you are not required to generate any new programs of your own - though you may wish to modify the programs you are given.

This exercise is concerned with exploring the use of pointers in C.

Specifically, you will examine programs where pointer variables are created using pointer data types. Thus, for example:

    int i;

declares a variable, i, of type int; whereas the declaration:

    int *i;

declares that i is a variable of type "pointer to int" or (int *). That is, i itself does not hold an int value, but the thing it points at does.

Note that you can happily declare variables which point at other pointer variables. Thus:

    int **i;

declares that i is a variable of type "pointer to pointer to int". The value of i is a pointer; the thing it points at is now another pointer variable; but the thing this points at is an int. And so on!

A pointer value can be produced using the "address-of" operator, denoted with the ampersand character &. Thus, the following code fragment might make sense:

    int i;
    int *ptr;

    ptr = &a;

The net effect is that ptr now points at a.

You can display a pointer value using the %p format specification with printf():

    printf("Address of variable a is: %p\n", &a);

With our particular computer platform, addresses can be thought of as binary numbers with 32 bits. By convention, they are displayed by printf() in hexadecimal, or base-16, notation (where the digits 0 to 9 as used as normal, but the letters A through F are used to represent the base-16 digits for the numbers 10 to 15). So a typical result from executing the printf() above might be something like:

    Address of variable a is: 0xbbbfc893

The 0x prefix is a standard C notation for showing that a number is in hex; this is then followed by 8 hex digits (since one hex digit represents a number between 0 and 15, it is equivalent to 4 binary digits; so a 32 bit number is equivalent to 8 hex digits). In any case, your only use for looking at pointer values will be to assess whether or not different pointer values are equal - and you can do that without knowing any of the details of interpreting hex notation.

A pointer value is dereferenced with the "dereference" or "indirection" operator, denoted with the asterisk character, *. A dereferenced pointer may be used anywhere a variable name can be used.

Thus, normally, a variable name evaluates as the value of a variable; so a dereferenced pointer normally evaluates as the value of the thing pointed at. But a variable name can also be used in an assignment, in which case its value is altered (overwritten); similarly, a dereferenced pointer can be used in an assignment, and the value of the thing pointed at will be changed.

Note very carefully that, whenever you are dealing with a pointer variable, you must be very clear about whether it is its own value that is being referred to, or the value of the thing pointed at; these are generally very different things (they are usually even of different types, never mind values).

A simple use of dereferencing is this:

    int i, j, k;
    int *ptr;

    ptr = &i;
    i = 25;
    j = 2 + *ptr;
    *ptr = j + 2;

In this simple fragment, there are two variables of type int, and one of type pointer to int. The variable ptr is given the value & i, so it points at i; i is given the value 25; so when j is given the value 2 + *ptr, this means "2 plus the value of the thing ptr is pointing at" and this evaluates as 27; finally, *ptr, which is to say i, is given the value j + 2 which is to say 29.

Take a copy of the file ptr0.c. This is a basic test program for investigating the idea of pointers. Note that this program (and all the other programs introduced in this session) is totally contrived. It does not do anything useful whatsoever. It is intended purely as a vehicle for you to experiment with C pointer variables and pointer operations.

First analyse the program by hand, and predict what you think the outcome will be. Record this analysis in your report. Then execute it and test your predictions. Add extra printf() statements as you deem necessary in order to check the values of variables at different stages during execution. If the results differ from your predictions then test more carefully until you can explain the discrepancies. Note your results in your report.

Repeat the procedure described above for the file ptr1.c.

Now consider the file ptr_net.c.

The is again a contrived program. It does not do anything useful. It merely serves to illustrate some further ideas on the use of pointers.

To understand this example you will first need some elementary understanding of C structures. Only the basic ideas will be introduced here. For more detailed information consult a textbook.

The idea of a "structure" in C is that you can group a number of distinct data items, of distinct types, together into one unit. That collection of data items can then (for some purposes at least) be manipulated as a unit. In particular, having once described the "format" or "shape" of the unit, one can declare, in a single step, "structured variables" which have separate slots for all the constituent parts or members. Look at this simple example:

    struct
    {
      int age;
      double IQ;
    } barry, john, mary, henry, susan;

struct is a keyword which introduces the declaration of structured variables. This is followed by braces which enclose a detailed description of the members of each structured variable. Finally come the names of the actual variables - five of them in this case. The effect of this declaration is to create five variables, each of which has two distinct members, one called age the other called IQ. To access these members we use the "dot" operator, thus:

    barry.age = 29;
    john.age = 40;
    mary.age = barry.age;
    john.IQ = 150.0;

and so on. But we can also, to a limited extent, manipulate a structured variable as a whole - for example, by assignment:

    henry = john;

This would assign the values of the members of john to each of the corresponding members of henry in turn. Thus, it would be exactly equivalent to:

    henry.age = john.age;
    henry.IQ = john.IQ;

So far, so good. It should be clear that structures might be useful whenever you have information that naturally divides into groups or units with a certain similarity.

This next point is that, if you are using structures at all, it is often handy to be able to declare structured variables (or function parameters for that matter), having a given shape (i.e. a given set of members) in several different places in your program. Now you can certainly do this in the "obvious" way by simply repeating the detailed declaration of all the members in each different place. But it would be nicer if the compiler could let you declare the shape once, and then refer back to it later. And, fortunately, it does - actually in a couple of different ways. But I will just show one:

    struct person
    {
      int age;
      double IQ;
    };

This time I have put an identifier (person) immediately after the struct keyword, and before the left brace. This does not create a structured variable called person. Instead, person acts as a tag or identifier for this particular shape of structured variable. Once I have given this declaration, then, at any subsequent point in the program, I can use simply struct person as a shorthand for the full declaration. So I could write, for example:

   struct person barry, john, mary;

This would then create variables barry, john and mary with the same shape as before. Then somewhere later on (or inside some other function) I might write:

    struct person henry, susan;

Be clear on the advantage of adding an identifier, such as person to a particular shape of structure: it just saves having to re-enter the detailed description of the shape in several different parts of a program.

One slight difficulty might be evident here: in reading a program one can easily become confused between identifiers such as henry, which refer to actual variables, and person which simply identifies an abstract shape for variables which might get declared at some stage. It would be ridiculous, for example, to try to refer to person.age because there is no such variable as person. Rather, person simply provides a name for a template from which variables might be created. Once you get used to using structures this should not cause any problem. Once you see the keyword struct you will expect to see it followed either by a left brace or a name for a particular shape of structure. But when you are just starting off, it may be a good idea to adopt some kind of systematic identifier convention - i.e. to choose your identifiers so as to clearly distinguish between those that refer to actual variables and those that simply refer to an abstract shape. One simple convention that I sometimes use is to append _s (for "structure") to the end of any identifier that refers to a structure shape. So I would actually use the identifier person_s. But note carefully that is only a convention: it has no significance for the compiler at all - it is purely a device to remind a reader of what's going on.

OK, now we can declare a structure shape, and give it an identifier; we can declare variables or parameters of that shape; and we can access the members of such variables. We now join these ideas up with the idea of a pointer.

Roughly speaking, we can embed, within a structured variable, a pointer member that points at something else. In particular, we can embed a member that can point to another structured variable of the same type or shape. And once we can do that, we can create indefinitely large networks of structured variables that are linked together by pointers. This turns out to be an immensely powerful programming idea - which, regretfully, we shall not be able to pursue properly in this introductory course. For now, you simply have to take it on faith that this idea is potentially useful. The question for us is simply what language features we need to use to do it.

At this stage, look again at the example program ptr_net.c. This first introduces the declaration of the structure shape called bucket_s:

    struct bucket_s
    {
      struct bucket_s *black, *white;
      int quantity;
    };

This says, simply, that any structure of this shape will have three members, called black, white and quantity. The quantity member is simply of type int; but black and white are pointers to (other?) structures of the same shape. Of course, they do not automatically point at any particular other structures: but they potentially can, if suitable assignments are made.

Four actual variables of this structured type or shape are then declared:

    struct bucket_s n0, n1, n2, n3;

Each of these will have the three separate members already described. They have been declared outside of any function so they are visible and accessible to all functions.

The rest of the program then involves setting up particular linkages between these structures (via the black and white members) and using these linkages to indirectly access various members in different ways. I repeat that none of this achieves any useful purpose: the only purpose of the exercise is for you to test whether you have a sufficient grasp of what's going on to be able to accurately predict what members will be accessed in each case. There is one final facility of C you need to understand in order to to this: the so-called arrow operator. Consider this fragmentary program:

    struct person_s
    {
      int age;
      double IQ;
    };

    void main(void)
    {
      struct person_s barry;
      struct person_s *p;

      p = &barry;

      (*p).age = 44;

    }

In this program there is a structured variable called barry, and a pointer variable called p. For the purposes of the example, p is made to point at barry, and then the age member of barry is indirectly accessed via p. This is done with the construction (*p).age. That is, p is first dereferenced using the * operator; this yields the thing p points at (namely barry), and the dot operator is then used to access the age member of this structure. The parentheses around *p are necessary here because the dot operator takes precedence over dereferencing by default. So if I simply write *p.age, the compiler thinks I want to first take the age member of p and then dereference that. Now, since p isn't even a structured variable, and therefore has no members at all (never mind a particular one called age) this cannot make sense, and the compiler would simply generate an error. But in more complicated cases the compiler might not even diagnose the error effectively. Anyway, the point is that if you are starting with a pointer, and you want to get at a member of the object being pointed at by that pointer, then you must use something like (*p).age. This is moderately convoluted; worse still, it turns out that this kind of operation or access is actually needed very commonly in C programs that operate on complicated data structures. The C language therefore provides a single operator, the arrow operator, which combines the effects of dereferencing and accessing a member of the structure thus accessed. It is written like this:

      p->age = 44;

But try to get clear that p->age is simply shorthand for (*p).age: if you ever find the arrow operator confusing, then try replacing it by the "longhand" version, with an explicit dereferencing and explicit dot operator.

Now, with all that in mind (!) let's return to the final part of the exercise, and the program ptr_net.c.

First try to draw out on paper the relationships between the structures n0, n1, n2 and n3 immediately after execution of the statement at line 77 (the invocation of the function red_connections()). Do this by analysing the program manually, not by executing it. The diagram should show graphically what each pointer member of each structure is pointing at.

Next attempt to predict the values of the quantity members of each structure immediately after execution of the statement at line 78 (the first invocation of the function massage()). Check your predictions by executing the program (you'll have to add some printf() statements to print out the values you are interested in).

Explain any discrepancies.

Note that I really do mean that - do not go any further unless and until you can explain all discrepancies between your predictions and the actual results.

Now draw a new diagram showing the relationships between the structures immediately after execution of the statement at line 79 (the invocation of the function blue_connections()). Repeat this diagram one last time for the situation immediately after execution of the statement at line 80 (the invocation of the function yellow_connections()).

Finally, predict the values of the quantity members of each structure immediately after execution of the statement at line 81 (the second invocation of the function massage()). As before, check your predictions by executing the program, and explain any discrepancies.

Document: Software Engineering 1: Lab Exercises

Software Engineering 1: Laboratory Exercises

Exercise 2: Enhanced Simulation Program (40%)

McMullin@eeng.dcu.ie
Tue Apr 30 14:15:37 GMT 1996