html article
C
at_work.gif (Under Construction...)
1 1
Software Engineering 1: Course Notes
Barry McMullin
This is the online hypermedia documentation for the course
Software Engineering 1 .
This is a required course for all students enrolled on Stage 1 of
the
B.Eng. (Electronic Engineering)
http://www.eeng.dcu.ie/eenghome/ee_info/ee_info.html or
B.Eng. (Telecommunications Engineering) programmes
offered by the
School of Electronic Engineering
http://www.eeng.dcu.ie/eenghome/eenghome.html of
Dublin City University
http://www.compapp.dcu.ie/DCU_home.html .
at_work.gif These online notes are under construction, and
will be incrementally extended and corrected as the year progresses.
Thus, early in the year you will find many sections to be empty, but
these will be filled out as we reach the relevant stages in the course.
Introduction
The course provides an introduction to Software
Engineering - which is to say the skill of developing
software or programs for digital computers.
The vast majority of modern technological products - from
dishwashers to communications satellites, CD players to
industrial robots - include at least one, and often many more,
embedded microprocessors . A microprocessor is the central
processing component of a general purpose computer, implemented as
a single integrated circuit (IC). The specific behaviours or
functionalities of the completed product are then realised by
providing these microprocessors with appropriate sequences of
instructions, or software . Software is thus a crucial
foundation for virtually all modern technological products - and
software development is an essential and pervasive skill for
every engineering discipline. This is reflected in the design of
the engineering programmes in DCU, where Software Engineering
takes a prominent role all stages - beginning with this
very important introductory course.
The objective of the course is that, on completion, you
should be able to design, code, test and debug small, but
complete, programs. The course introduces a modern high level
procedural language (ANSI C), supported on a general purpose
computing platform (ISA hardware, running MS-DOS and MS-Windows).
The course is preparatory in nature, providing necessary
foundations for a variety of courses in subsequent stages of the
relevant programs.
Syllabus
(Subject to revision.)
Background: Introduction to stored program digital
computers. Historical perspectives. The nature of software
engineering: issues of cost, reliability and efficiency. The
von Neumann architecture, and the idea of a sequential machine.
Evolution of computers and programming languages.
Characters and Text: Computerised text. Characters and
character sets. The 7-bit ASCII character set. Text files. Text
editing. Directories. Text files as computer programs: the
concepts of translation, execution, and debugging. The use of
an Integrated Development Environment (IDE).
Identifiers: Concept of identifiers . File and
directory names as identifiers. Identifier uniqueness. Syntax of
identifiers in programs. Reserved identifiers
in (keywords). Conventions and guidelines on identifier usage.
Data: Concepts of data type, constants, and variables.
Type int . int constants and range. Declaring
variables.
Operators:
Arithmetic operators:
+ (addition),
-} (subtraction),
* (multiplication),
/ (quotient),
(remainder).
Negative operands: direction of quotient truncation and sign of
remainder. Exceptions: overflow and divide-by-zero. int
values corresponding to "truth" and "falsehood".
Equality operators:
== (equal to),
!= (not equal to).
Relational operators:
> (greater than),
>= (greater than or equal to),
< (less than),
<= (less than or equal to).
Logical operators:
& & (AND),
|| (OR),
! (negation).
Expressions: Forming expressions. Concept of expression
evaluation. Sequential nature of expression evaluation. Order of
evaluation: operator precedence and associativity; forcing
associativity with parentheses.
Assignment: The concept of assignment. Operator side
effects. The assignment operator ( = ): evaluation and side
effects. Increment ( ++ ) and decrement ( }) operators.
Type unsigned int ; type conversion.
Programs: Skeleton structure of a program: the
main function. Declarations and statements. Comments. The
concept of flow of control. Types of statements: expression,
if , if-else and switch statements. Compound
statements. The idea of iteration and the while statement.
Symbolic constants and define . Input and output by direct
modification and examination of variables within an Integrated
Development Environment. Monitoring execution: tracing and
breakpoints.
Development Cycle: Specification. Algorithm design.
Coding. Compilation and linkage. Testing: selection of test
cases. Exception handling. Defensive design. Debugging.
Instructors
The lecturers for the course are
Dr. Barry McMullin
http://www.eeng.dcu.ie/ 7Emcmullin/home.html ,
and Ms. Fiona Lodge .
The laboratory demonstrator for the Tuesday sessions is
Fiona Murray , and for the Thursday sessions, Feargal Costello .
Textbook
The course is designed around the textbook:
Illustrating C (ANSI/ISO Version)
Donald Alcock
Cambridge University Press, 1992.
ISBN: 0-521-46821-3
The cost is approximately IR
This is a required course text. Every student enrolled
on the course must acquire a copy of this book.
Lectures
Chapter and page numbers relate to the assigned textbook.
General Comments on Lectures
I find the subject of software engineering endlessly
fascinating, and get a great kick out of successfully solving a
programming problem. The down side is that this subject can be
almost equally endlessly annoying and frustrating, particularly
when you are trying to come to grips with it for the very first
time. This course has been evolving over the past several years,
but one basic principle has remained constant: I see my role not
as a teacher or instructor, but as a facilitator .
The reason is that software engineering is a subject which is
dominated by doing . It cannot be primarily learned by
listening or reading, though both of these can play a subsidiary
role; it must be learned by taking control, and doing . The
frustration of the subject lies in the fact that, particularly at
first, this doing is necessarily uninformed and feels very much
like thrashing around in the dark - and will frequently end in
failure; but the joy lies in the fact that the failures are
always ultimately understandable and correctable - and
this understanding and correction delivers tremendous satisfaction.
Given this emphasis on doing, I consider the time you will spend
in the laboratory (both scheduled lab sessions, and private study
time) as being the most central and important component of the
course. Nonetheless, a small lecture component is still useful to
complement this practical work, and to provide overall structure
and pacing to the course.
There are two scheduled lectures and one tutorial each week. In
practise, I expect all three sessions to be tutorial in nature. This
reflects an increasing emphasis on active, personal, study time. The
lectures will closely follow the textbook. I work on the assumption that
you will have studied the assigned material in the textbook, and the
related notes provided here, before each lecture. In this way
the lecture sessions can be interactive in nature.
Note that attendance at lectures for this course is not monitored. It is
entirely under your own control and discretion.
Delivering the lectures on this basis has several significant
advantages:
I am relieved of the tedium of presenting extensive,
repetitive, and usually rather boring, notes in the lectures.
Perhaps more importantly, you are freed of the tedium (not
to mention writer's cramp) of transcribing them.
The lectures can be used to allow discussion,
clarification (and sometimes even correction!) of the textbook
material. Further, should you choose to play an active role in
the lectures, they can be used to explore those topics which you
find particularly interesting or difficult in more detail.
Since all the written materials for the course are
available to you independently of attendance at lectures, you can
decide for yourself whether or when it is worth your while to
attend the lectures. For example, if you happen to already be
proficient in certain aspects of software engineering, and are
satisfied that you completely understand the assigned materials,
then you may well decide to miss certain lectures and allocate
the time for some other, more pressing, studies. However, I recommend
that you adopt this position only with extreme caution!
Conversely, should you miss a lecture for some reason
outside your control, or otherwise fall behind in the course,
then because the written materials are all available to you, and
the schedule is explicitly laid out in advance, you will have the
flexibility to schedule in extra study time to make up.
In summary, my intention in designing this course is to
empower you, the student. That is, I hope that you will be
able to enjoy and benefit from this course precisely in
proportion to the energy and effort you are willing to invest in
it. No more, no less.
Term 1: Weeks 1 10
Weeks 1 2: Background
The course is introduced with background discussion of computers,
computer programming, and some demonstration of particular computer
tools which will be used during the course.
Week 3 4: Chapter 1 pp. 1 10
Overview
Chapter 1 is short, but very important. If you can successfully
master it, and complete its one associated exercise (p. 10)
then you will have laid down an excellent foundation for tackling
the substantive topics which will follow - and they will follow
very quickly indeed! Conversely, if you experience any difficulty
with anything in this chapter, this is your
opportunity to get your questions answered quickly.
In brief, chapter 1 introduces a problem (the calculation of
repayments on a loan) which can be solved "by hand" (i.e. by
manual calculation), but which can more conveniently be solved by
programming a computer, in a suitable language, "once and for
all". The bulk of the chapter is then concerned with presenting
and dissecting such a program, written in
First Step: Understand the problem!
I think that there is only one way to science - or
to philosophy, for that matter: to meet a problem, to see its
beauty and fall in love with it; to get married to it, and to
live with it happily, till death do ye part - unless you should
meet another and even more fascinating problem, or unless,
indeed, you should obtain a solution. But even if you do obtain
a solution, you may then discover, to your delight, the existence
of a whole family of enchanting though perhaps difficult problem
children for whose welfare you may work, with a purpose, to the
end of your days.
Realism and the Aim of Science
Preface 1956, p. 8
Karl Popper
Your first step in tackling any software engineering project is
to ensure that you fully understand the statement of the problem.
Indeed, many senior software engineers (classified as
Systems Analysts ) specialise in nothing but clearly
stating the problems to be solved by software systems!
Alcock presents a formula for calculating loan repayments. Ask
yourself do you believe this formula before going any
further. If you do believe it, try to say exactly
why . In any case, try to test the formula in any way
you can dream up. You may not be able to prove that it is
definitely right , but you should be able to think of at
least some partial tests which it should pass. For
example, does it give the correct answer for the limiting case of
the principle being 0? Can you think of any other
comparable tests?
Next, actually carry out the sequence of instructions which
Alcock lists on p. 2, for at least a few different cases. Carry
out these instructions as exactly and literally as you possibly
can (you will need to get a friend to act as the "client", and
then alternate roles). If you find anything in the instructions
which you think is obscure or ambiguous - which you cannot
immediately see how to carry out - then make a note of it.
What's a "high-level" computer language?
The next sections of the chapter present a program to make a
computer carry out, in effect, the same sequence of actions to
calculate loan repayments. Note that essentially the
same program will work on practically any computer. This is a
key advantage of using a so-called "high-level" language - it
can be automatically translated or "compiled" into a form which
can be executed on practically any computer. However, the
precise procedures for entering the program, and arranging for it
to be translated and executed will generally vary from computer
to computer. For this reason, Alcock cannot and does not give
detailed instructions for how to do this; however, you are
given the appropriate detailed instructions, for the computers
which you will be using, in the notes for
Laboratory Session 2 lab2 .
Alcock next dissects this simple program in great detail, to
explain, at least in a superficial sense, what its components
are, and how they are interpreted by the computer. Ask
yourself at this point if you can explain why a special
language such as is necessary at all - why can the computer not
simply understand the original instructions written in English on
p. 1?
Watch for the new technical terms!
Note that Alcock introduces a variety of technical terms in this
chapter - such as directive , block , statement
etc. He generally italicises such technical terms on the first
occasion on which they are used. This specialised technical
language is very important in this subject - as, indeed, it is in
most other engineering subjects; if you do not realise which
terms are being used with specialised meanings, and what those
meanings are, from the very start, then you will quickly find
that you are becoming very confused.
So, I suggest you acquire a notebook to use exclusively as a glossary
for the duration of this course. Organise it alphabetically. Each time
you come across a word which seems to be used in a special technical
sense, write it in, together with a definition in your own words, and an
example or two. You may well have to go back over certain definitions
at various times, and revise or correct them - but that is perfectly OK.
If there are words that seem to have a technical meaning, but you cannot
quite see what the meaning is, then e-mail feedback a question,
or ask in the lecture.
Start constructing your glossary now, by going through the
whole of Chapter 1 carefully.
Caution: On Program Style
In dissecting the initial example program Alcock comments that:
"You have complete freedom of layout. Statements may be typed
several to a line, one per line, one to several lines."
His point here is that as far as the computer - or more
precisely, the compiler - is concerned, many details of
layout or formatting of the program are irrelevant. This is true.
However, you should notice that programs have effectively two
parallel, but quite distinct, purposes:
To represent a suitable sequence of instructions which
(after compilation) the computer can execute to accomplish some
task. This is the role which Alcock explicitly recognises and
discusses.
To document that set of instructions so that some
person can understand how the computer is accomplishing
the task. This becomes vitally important whenever it is necessary to
modify or extend the behaviour of a program - or to correct
previously undetected malfunctions. Alcock does not explicitly
refer to this purpose of the program at all.
It should be clear that, in its role as documentation for
another human being, the details of layout and formatting of a
program are very important. A poorly formatted program
may be perfectly "correct" - in the sense of generating the
correct answers to the problem at hand - but may be virtually
impossible for a human being (even the original author) to
decipher if it becomes necessary to enhance it (or even to decide
whether it may malfunction in certain circumstances).
Another aspects of program design which is similarly constrained
is the choice of identifiers - names for variables and
other objects which the program uses. The computer is largely
insensitive to particular identifiers, as long as, in each
context, they unambiguously identify the "correct" object. In
particular, the computer does not care whether an identifier is
"meaningful" in human terms. The names P , Principal
and bZ34ywT are all equally valid to the
machine, as along as they are used consistently (unambiguously).
However, the net effect if entirely different from the point of
view of someone trying to understand how the program works. So
try to get into the habit, from the very start, of taking care
with the formatting and layout of your programs, and with your
choice of identifiers. These things may not directly affect the
functioning of your program, but they will seriously affect how
long it actually takes you to get the program working
satisfactorily!
Week 5 7: Chapter 2 pp. 11 19
What's all this about "Boolean" variables?
Alcock is at pains to point out that does not support
"Boolean" variables. What does this mean?
Well, certain other programming languages (notably Pascal and
its descendants) support a special kind of variable which can only
hold or be assigned one of two distinct values - usually
referred to as TRUE and FALSE . These are
called "Boolean" variables in reference to the notable 19th
century mathematician George Boole . Boole's greatest work
was entitled An Investigation of the Laws of Thought on
Which Are Founded the Mathematical Theories of Logic and
Probabilities , published in 1854. Boole reduced logic (the
analysis of the truth or falsity of complex propositions) to a
simple algebra, thereby incorporating logic into mathematics.
Boole's two-valued, or binary, algebra is the simplest form of
his more general boolean algebra .
Now Boolean variables would be quite different in kind from the
numeric variables we have seen so far in Admittedly, as we
shall see in more detail later, the total number of distinct values
that can be accommodated in any type of variable
(such as type int or type float ) is always finite ,
but it is still much bigger than two!
So why are we worried about all this anyway?
Well, it turns out that, in many programs, it is desirable for
the program to have a "choice" of things to do, depending on a
variety of factors that only become known when the program is
actually executing. In effect this means that we want the
program to be able to evaluate some proposition or condition and
classify it as being "true" or "false".
To allow this, a programming language normally provides operators
to test or compare various conditions - such as whether certain
numbers are equal, or one is greater than the other etc. If the
language supports Boolean variables as such, then the result of
applying or evaluating such an operator would be one of the
Boolean values normally denoted TRUE or FALSE . In
however, there is no special Boolean data type, so, instead, the
so-called logical operators simply generate
numeric results, with the number 0 being
interpreted as signalling "false" and the number 1
signalling "true". Technically, the result of applying
a logical operator is of the particular numeric data
type called int .
The logical operators would never generate any value
other than 0 or 1 . However, where, in effect, a
Boolean value is examined or used for some purpose in a
program, any int value may be introduced, with the
convention that all non-zero values (including 1 of course)
are classified or interpreted as meaning "true".
Making a statement!
Now that we have some logical operators available to us, how can
we use their results to control the flow of execution in a
program?
The answer is that there are special kinds of
statement where the exact effect of the statement can depend
on a value generated at execution time (usually by a logical
operator). But what is a "statement" anyway? Let's back
up and review the concept a little.
We have seen examples of "statements" already, in the
introductory chapter; they were things like:
R = Rpct / 100;
or:
printf(" Principal, Rate
These are all what we might term "simple" statements:
they are the components of a program that are carried
out, in sequence, when the program is executed. Technically,
these particular statements are all examples of what are called
expression statements: that is, as we shall see later, we
can always think of the execution of these statements as
involving, in effect, the evaluation of an "expression".
In any case, the "statement" notion is much broader than is
accommodated by just expression statements. We shall see other
more complex kinds of statement shortly. But first note that
statements in general should be distinguished from the other
major "kinds" of program component. As well as statements,
other program components we have seen are comments ,
preprocessor directives , and declarations .
Comments are simple pieces of text delimited by the
special tokens /* and */ which are ignored by the
compiler, but serve to provide some additional information to a
person reading the program:
/* WOTCOST; Computes the cost of a loan. */
Directives are the components of a program signalled by the
control the way the rest of the program is translated or
compiled; we say that they play a role only at "compile time".
The only example we have seen so far is the
include
Its effect is to tell the compiler to also compile an extra
separate file (in this case, one called stdio.h ) as an
integral part of the compilation of our own file.
Declarations , roughly speaking, "prepare the ground" for
statements to be subsequently executed. So far, the only
declarations we have seen have been concerned with creating
variables with given names and types, e.g. as follows:
float P, Rpct, R, M;
Recall that declarations and statements can act as
sub-\/ components: in particular, they may be combined to form a
block , by enclosing them within braces; and a block, in
turn, can form part of a function definition . This notion of
hierarchical structure , with components within components
within components, is a very important and pervasive one in the
design of all complex engineering systems.
For now, note that technically a block is, itself, regarded as a
special kind of statement . In fact, it is also
alternatively referred to as a compound statement . The
significance of this is that anywhere in a program where it is
"grammatically" (or "syntactically") legal to place a
"simple" statement then it will also be legal to put in a whole
block, comprising an indefinitely large group of declarations and
statements. It follows, logically, that a block may itself be a
component of a larger more encompassing block and so on! Granted,
there is no obvious reason why this might be useful yet, but
we'll get there in due course!
Branching and Looping
So far we have two kinds of statement: "expression" statements,
and "block" or "compound" statements. Expression statements
are just executed. The (simple) statements within a block are
executed in simple sequence. As yet we have no way of making the
execution of a program depend on, or vary in sympathy with, a
logical value determined at execution time.
There are two principle kinds of variation we typically might
want: branching (or, as Alcock says, selection ) and
looping. In branching we have alternative things we
might want our program to do; in looping we want to do a
particular thing repeatedly - over and over again.
provides several different ways of achieving both of these
effects; but for the time being, it is sufficient to have at
least one way of doing each in our armoury, and that is what
Alcock now introduces. He presents the if - else kind of
statement to achieve branching, and the for statement to
achieve looping or "iteration".
Examine the formats which Alcock specifies for the if-else
and for statements very carefully. Notice that, in each
case, the statement format contains one or more other statements
as components nested within it. In the case of an
if statement, the nested statement is to be executed if and
only if the expression value fed into the if is
"true" (non-zero); in the case of if-else exactly one or
the other of the nested statements will be executed depending on
whether the expression is classified as "true" or
"false"; and in the case of the for , the nested statement
is what gets executed repeatedly.
My point here is to draw your
attention to the fact that the nested statement is grammatically,
or "syntactically", allowed to be any kind of statement at
all.
So, as Alcock shows in his example, the statement nested within
an if can, itself, be another if statement (complete
with a further, nested, statement); equally, the nested statement
could be a for statement or, indeed, a compound
statement. The latter means that, with one if statement, we
can control execution of a whole block of statements - simply by
enclosing them in braces to make a single compound statement.
Furthermore, the nesting of statements within statements can, in
theory, be made as deep as we like! However, in practice,
programs get rather difficult for human beings to understand if
statements get nested too deeply within each other - so
don't get carried away with this!
A Shady Character?
A computer clearly has to be able to deal with "text" as
well as "numbers", if we are going to use it for general
purpose information processing. We need to be able to print out
headings, or textual prompts at the very least. More generally,
we want to be able to carry out "processing" on textual
information: sorting, searching, formatting etc.
Text is ultimately made up of basic components we call
characters .
But what exactly counts as a "character"?
In the English speaking world we tend to be happy with upper and
lower case alphabetics, some digits, and a selection of
miscellaneous brackets, punctuation, and other "special"
characters. Even that still leaves room for tremendous variety in
style and size .
But to handle text in most languages of the European
Union we need to add a large variety of accented
characters ( a , u ) etc.; and if we move further afield,
things get even more complicated: in the middle East we need
arabic and hebrew alphabets, in the CIS we need (at least) the
cyrillic character set, and in the far East we need even more
elaborate character sets to handle Japanese, the various kinds of
Chinese, and so on.
Further, if we want to deal with mathematical notation, we
need to be able to deal with things like sub- and super-scripts
( x^ y_1 ), and so forth.
So what should count as a "character", or even how many
"distinct" characters we might want to deal with, is a fairly
difficult question to deal with.
However, we have to start somewhere!
In the case of computers, the start was historically based on
good old fashioned typewriters, or, more strictly,
teletypewriters . Teletypewriters, or teletypes, were automatic
electric typewriters which could be remotely hooked together, so
that whatever was typed on one was automatically printed on the
other also. In the early days of computers, teletype machines
were already fairly freely available, and were a convenient
mechanism for both keying input information into a computer, and
for printing output information from the computer.
However, it was in the nature of teletype technology that they
could only support a rather limited range of characters:
typically just the upper and lower case roman alphabet, digits 0
to 9, and a variety of "special" characters. Furthermore, if
these devices were to work properly with each other it was
essential that they all adhere to some single
standard - both for the characters to be supported, and the
way they should be encoded (something akin to the "dots" and
"dashes" of the earlier Morse code).
A set of just 96 such characters, plus 32 non-printable
"control" characters, together with a standardised
way of encoding these into electrical impulses (as strings of
"highs" and "lows", or "ones" and "zeros" - i.e. binary
numbers) was therefore devised, and became a de facto
standard, known as the American Standard Code for
Information Interchange , or ASCII for short.
Because it only allows 96 characters, based on the roman
alphabet, ASCII is very limited. It doesn't include any
accented characters, or even the sign. It doesn't allow
for variations in size, or style. It can't deal properly with
mathematical notation. And it does not address non-roman
alphabets at all.
Nonetheless, ASCII has become a sort of "lowest common
denominator" or lingua franca of computers. They can
virtually all deal with ASCII encoded text.
In particular, the text that makes up computer programs is
normally limited to the ASCII character set (or some even
more restrictive character set), to ensure that the programs can
be potentially processed on the widest possible variety of
computers.
Now, the ASCII code is certainly not the only way
of representing textual material in computers, and the standard
for the language does not absolutely require either that the
characters making up a program, or the characters making up
textual material being processed by the program, must be encoded
in ASCII . Instead, the standard simply stipulates certain
restrictions on the encoding, which are summarised by
Alcock. Alcock rather misleadingly talks of certain
characters being stored in a certain "order" and/or
"contiguously"; what he means is that characters are
encoded in a certain order or contiguously.
ASCII text satisfies these restrictions, and also has some
additional properties which Alcock notes.
As it happens, in the implementation of the language which
you will be using, characters are, indeed, encoded in ASCII
and these properties do hold. However, the point to be made here
is that in general, in your programs, you should not rely
on these latter properties (specific to the ASCII coding),
since they are not guaranteed to hold on all computers
implementing the language - except, of course, where your
program is specifically intended only to work with
ASCII encoded text.
The newline Character - and other
vagaries!
One special problem in dealing with characters is how to represent the
idea of a "line break" or a "new line". In the ASCII character set
there are actually two separate numeric codes associated with the idea
of line breaking - one called "carriage return" the other called "line
feed". The terminology goes back to mechanical typewriters and
teletypes, where moving on to a new line required two separate actions:
the "carriage" carrying the roller and paper had to be moved back to the
extreme right (so that the left hand edge of the paper was relocated
under the printing position) and the paper had to be "fed"
through the roller by the correct amount for a new line. Of course,
these quaint historical origins have rather little to do with the
operation of modern computer displays; but the distinction between
moving the print position (the cursor) down a line, and moving it
back to the left hand margin still exists, and so the distinct
functions of the two ASCII control codes still apply.
Nonetheless, it is conventional in programming to have a single
control code which is denoted by the sequence n and which
has the effect of both moving down a line, and moving back
to the left hand margin. This special character is called the
newline character.
Why cannot a newline simply be represented in the file by,
literally, inserting a new line? Why do we need this special sequence
n ? The basic answer is that line breaks are used in a
program to try to lay it out in a legible manner; and that if we put in
line breaks in the file where what we really want is to have the
program emit a line break at run time, then this will disrupt and
interfere with the formatting of the program. In fact, within
string constants, which is where line breaks are usually desired, it is
illegal to insert a literal line break, and a compiler error would
be generated. So it is necessary to invent some kind of more or less
artificial way of indicating the idea of a line break, or the "newline"
character within a string, and n is the way it is
done. Note that when you write a statement like:
printf(" );
then the string which is actually stored internally in the computer at
run time (and passed as an argument to printf() ) contains just
one character - the ASCII newline character, and not
the two separate ASCII characters backslash and the letter
n . In turn, what the printf() function actually "sends" to
the screen is not the two separate characters and
n , but rather the single ASCII newline
character. Its actually even more complicated than this, but I
will spare you any further details!
The sequence n is called an "escape" sequence: the idea is
that the character n is "escaping" from its normal interpretation,
and is, instead, being given a special interpretation. Its a "sequence"
because the n must be preceded by the special "escape" character,
which, in this case, is the backslash character, . The use of an
escape sequence to represent a special ASCII character (such as
newline) introduces a difficulty of its own: what if we want to actually
put the backslash character itself into a string? How can we
prevent it being interpreted as an "escape" character? Well, the
answer is obvious enough: if we want to incorporate, literally, the
backslash character into a string, we simply escape this
escape character itself: i.e., we write . Thus to print the
backslash character using printf() we would write:
printf(" );
As well as the newline character, denoted n there are a
variety of other special ASCII characters which can be represented
in by escape sequences. Try out these sequences for yourself, and
try to establish their effect: r , b ,
t .
Variables, Data Types and Arrays
A "simple" variable in can store or record a single "value".
What kind of value depends on the data type : variable of
type int can hold positive or negative integers, up to about
32000 ; type long variables can hold integers up to about
2 10^ 9 ; float variables can hold positive or negative
rational numbers, with a precision of about 9 significant digits, and a
range up to about 10^ 38 . The type char is a little
stranger: its values can be regarded either as integers (with a
range from 0 to 255) or as ASCII characters, represented in
single quotes, e.g., 'a' , 'Z' , '9' , '+' ,
'(' and so on.
But the basic point remains that one variable can store or record just
one value.
Now, if we want to store a whole lot of values (and this is a very
common requirement) we could simply declare as many separate, simple,
variables as are needed. This is a perfectly satisfactory approach for
many purposes.
BUT: in many cases, we don't just want a whole lot of separate values -
we want our program to be able to automatically scan or iterate or
repeat some operation over all these values. If the values are all
stored in separate variables this will be very clumsy, if not impossible
to code. This is so because each repetition or iteration would need
to refer to a different variable - with a different name . That
means that the code for each iteration is necessarily different. If
I want to do the same thing to three variables called x , y
and z , then I will have to program (at least) three separate
statements. Granted, the statements will be almost identical - but not
quite exactly identical, because the relevant variable name will
have to be changed from x to y to z . This means I
can't possibly achieve the required repetition with, say, a for
statement: because there is just one substatement controlled as
the substatement of a for , and that substatement would have to
name some particular one of the three variables x , y or
z .
This may not be too much of a problem if I just want to repeat something
over two or three different variables - it will not be too much of a
problem simply to copy the relevant source statements, and edit the
separate copies to deal with the different variables
as appropriate. But what if the repetition is to be over 50, or 100, or
even 1000 different values? Then duplicating, and varying, the code
for each different variable is going to be very laborious, and very
error prone.
Well, the language provides a special mechanism for dealing with
this kind of situation. If you want a set of data items, all of the
same type , then instead of declaring a separate variable for each one,
you can declare a single array variable:
int foobar 100 ;
The square brackets on the declaration signal that foobar
is some kind of array; the number inside the square brackets says how
many elements are in the array; and the type - int in the case -
gives the type of the elements . So this declaration creates a
single "array" variable, called foobar ; but foobar in turn,
is actually made up of a whole lot (100 to be precise) of individual,
simple, int variables.
Note that arrays already have an advantage over separate variables, in
that they are much more concise to declare. In the example above, if we
did not have the array mechanism, we would have had to declare 100
separate variables - something like this:
int foobar1, foobar2, foobar3, foobar4, foobar5;
int foobar6, foobar7, foobar8, foobar9, foobar10;
and so on!
Once an array is declared, the individual elements can be accessed by
indexing the array name. That is, we can use statements like:
foobar 10 = 42;
foobar 24 = foobar 5 * foobar 4 * 163;
printf("This is element 9 of foobar:
Array indices in always start at zero. So if an array has 100
elements, the valid indices run from 0 to 99.
Note carefully at this stage that an array is a quite different
kind of object from its elements. The kinds of things you can
typically do to a complete array are quite different from the kinds of
things you can do to its elements. Thus, in the case of foobar
above, it is perfectly reasonable to write:
foobar 0 = foobar 0 + 10;
printf("
which has the effect of increasing the value of the zero'th element of
foobar by 10. But it would be silly to write:
foobar = foobar + 10;
printf("
foobar itself - the whole array, as opposed to one of its elements
- is not a number (even though all its elements are ). Adding 10
to an array is simply not a sensible or meaningful kind of thing to try
to do to a complete array. Similarly, if we give printf() a format
specification, " i" , which signals to it to expect a further
argument of type int , then it would be totally confusing to give
it a whole array of int values instead.
Are there any operations which it makes sense to carry out or use
on a complete array, as opposed to individual elements of it? Well, there
are, but they are relatively few. The only one we will have immediate
use for is in passing a complete array to a function - somewhat as was
attempted above with printf() . But whereas printf() has not
been designed to be capable of accepting a whole array as a single
argument, it is perfectly possible, and useful, to write your own
functions which can accept such array arguments. This
possibility is not pursued in detail by Alcock at this stage, but is
raised implicitly in Exercise 4 of the chapter.
In the example given above, the elements of foobar are simply of
type int . But the elements of an array can, themselves, be arrays
- and so on. This allows the creation of data objects which can be
thought of as multi-dimensional arrays:
int two_dim_foobar 20 30 ;
This makes foobar an array with 20 elements, where each of these
elements is, in turn, an array with 30 elements, and each of these is a
simple variable of type int . The elements can then be accessed as
you would expect, but now needing two indices to identify a particular,
simple, int element:
two_dim_foobar 10 5 = foobar 6 / foobar 7 ;
Multidimensional arrays turn out to be useful in many engineering
applications. For example, if I am writing a program to plot points on
a graph, I could conveniently set up a two dimensional array to record
which points are marked, and which left blank. Similarly, Alcock's
MATMUL matmul.c
program provides a business calculation example where two
dimensional arrays are useful.
Anyway: so far we have just looked at declaring array variables, and
accessing particular elements. While this simplifies the declarations
somewhat, it is not yet clear that it addresses the original problem -
our desire to be able to easily and conveniently repeat a single
operation over a whole set of different variables, without having to
code a separate statement for each one. Thus, we can now refer to
array elements foobar 0 through foobar 99 , instead of, say,
to the completely separate variables named foobar0 , foobar1
and so on down to foobar99 . But how does this make it easier to
repeat operations over these elements?
The key notion here is that the index of an array is not limited to
being just a particular, literal, number, such as in foobar 7 .
The index is technically allowed to be an arbitrary expression .
Thus we could refer to foobar 2+20 , or foobar (10+3)*2 .
This is still not terribly useful. But, in an expression, we can put
variables . Thus we can have a program fragment like:
int foobar 100 ;
int index;
.
.
.
index = 6;
.
.
.
printf("
This will have the effect of printing element 6 of foobar . What
advantage does this have over simply writing printf(" i",
foobar 6 ) ? Not much yet, but now consider this code fragment:
int foobar 100 ;
int index;
.
.
.
index = 6;
printf("
index = index + 1;
printf("
This has the effect of printing elements 6 and 7 of
foobar . Still, so what? The so what is that, if you examine this
fragment carefully, you will see that the two printf() statements,
even though they print out two different elements of foobar ,
are actually textually identical statements. Not just "almost"
the same, but strictly identical. Their different effects , when
they are executed, arise because the effect depends on the value of the
variable index at the time of execution - and that actually
changes between the two statements in this example.
What's so significant about the two printf() statements being
absolutely identical? Well, because of this absolute identity, it
is not necessary to actually repeat them in the source file at all:
the two separate invocations of this statement can be collapsed down as
the substatement of a for statement as follows:
for (index = 6; index <= 7; index++)
printf("
Granted, it is not terrible exciting just to replace a single
duplication of a statement. But it becomes terribly useful if we have a
large number of duplications. For example, suppose we want to print out
all the elements of foobar rather than just elements 6
and 7 ? Instead of writing something like:
printf("
printf("
.
.
.
printf("
with 100 separate statements, each differing only ever so slightly from
the others, we can have a single for statement:
for (index = 0; index <= 99; index++)
printf("
Now that's a real benefit.
Make sure you understand that this kind of collapsing down, using an
iteration statement like for , only works if there is absolutely no
variation in the text of the statement being repeated - although,
of course, there can be a large variation in the effect of that
text each time it is executed. Go back over the example above, and satisfy
yourself that you cannot achieve the same effect - collapsing down to a
single for statement - if the program were using 100 separate
variables with separate names instead of an array. Note that the
names of separate variables must be textually different, even if
only "slightly" so.
Note also that, although I have built up this example using the specific
idea that the thing we want to iterate or repeat is the printing out of
an element of the array, the concept and argument does not rely in any
way on just what it is we want to repeat. Exactly similar
considerations would have applied if we wanted to increment each
element, or take its square root, or add it into a cumulative total, or
whatever. Alcock's two example programs
MATMUL matmul.c and
BUBBLE bubble.c provide, of course, more
comprehensive examples of the same ideas. It is a good idea to at least
attempt to see how you might try implementing either of these programs
if you did not have the possibility of using arrays open to you, but
were forced, instead, to simply declare as many separate, simple,
variables as are necessary. You should find that, in that case, you
cannot use for statements to repeat the required operations over
different variables, and you would therefore be forced to "manually"
duplicate blocks of code many times over, and then edit each copy to
deal with slightly different variable(s). The whole thing becomes
incredibly cumbersome!
The Dreaded Array Indexing Bugs
It is vitally important in programming to ensure that you never use
an array index which is out of range - for example, if foobar is
an array declared as having 100 elements, then referring to
foobar 100 , or foobar -6 would be indexing errors. For
somewhat obscure and technical reasons, the compiler cannot detect
such errors, nor, in general, can they be automatically detected at run
time. But the effects of such errors tend to be highly variable, very
strange, and potentially very confusing. The nett result is that array
indexing errors are among the most difficult to find and correct in
programming. It is very important, therefore, to get into the habit
very early on of being very careful indeed about array indices.
This is straightforward enough with simple numeric indices, such as
foobar 10 and so on. Although, for beginning programmers it is
still very common to make errors "at the margin" - i.e. to forget that
the valid indices start at zero instead of 1, or make some related
mistake. So even with a simple indexing expression such as
foobar j it is important to carefully examine how the value of
j is being generated or manipulated to make sure that it cannot go
out of range.
But what about more complicated index expressions, such as in:
foobar (x + y) * z = w + 1;
If you put something like this in a program, you need to very carefully
examine what the values of x , y and z can possibly be,
and try to guarantee that there are absolutely no circumstances in
which the index expression might yield a value which is outside the
bounds of the particular array. In practice this kind of analysis is
difficult and error prone. It is better to adopt a style of
defensive programming where you deliberately try to make the program
itself warn you if something is going wrong. Thus, the statement above
might be replaced with the following:
index = (x + y) * z;
if( (index < 0) || (index >= 100)
printf(" indexing bug - Controlled Crash!!! );
exit();
foobar index = w + 1;
The operator || is called logical OR and will evaluate as
TRUE (non-zero) if either of its operands is TRUE . The
function exit() is a library function which causes the program to
immediately and unconditionally terminate. The prototype for
the exit() function is in the header file stdlib.h , so you
should include this header at the top of your file if you
intend to use the exit() function.
So the effect of this code is that the index is first calculated and
stored in the variable index . Then this value is compared with the
valid limits. If it turns out, for whatever reason, to be invalid, a
warning message is printed and the program is terminated. Of course, if
the index is valid, then execution simply continues around the if
statement, no message is printed, the program is not terminated, and the
index is used to actually index into the array.
At first sight this defensive programming may seem like a lot of
additional effort. But to repeat: array indexing bugs do happen;
and if you have not adopted this kind of defensive programming strategy,
they are extremely difficult to track down and isolate. Thus the
small additional effort of the defensive checks is usually far
outweighed by the reduction in time (not to mention sweat and tears)
required to debug the program. Of course, some judgement will always be
required as to whether to add extra checking code - but if in doubt, it
is always safer to add it. There is also a question over what to do if
an indexing error is detected: the example above shows just about
the crudest possible reaction. There are much more sophisticated
possible approaches to reacting to unexpected events in a program (so
called "exception handling"). By all means, feel free to experiment
with formulating your own approaches to this if you think you can
improve on the very crude mechanism shown above. In particular,
you might have a look at the standard library facilities for "process
control" described in pagese 171 176 of Illustrating C
- though this is not for the faint hearted!
Week 8-10: Chapter 2 pp. 20 26
What's a function anyway?
In a function is a unit of code that can be used, or called,
or invoked, from some other place in the program. Better names for this
idea, used in other languages, are "subprogram", "subroutine", or
"procedure". But, for reasons which, no doubt, seemed sensible at the
time, the designers of plumped for the word "function" and I am
afraid we are stuck with it!
Functions are handy because they allow you to break a program down into
manageable chunks, where each chunk is quite small (typically 10 20
lines) and does just one or two definite things. Thus each chunk is
pretty readable or understandable in its own right. If a single
function starts getting too complicated, then we chop some coherent
piece of it out and wrap it up in a function of its own, which now is
merely "called" by the original function.
Functions are also handy because they support "re-usability" in a
program. Thus, if there is a particular generic kind of manipulation
which has to be done in two or three, or 20 different situations in a
program, we can write a function to do this manipulation; then each time
we need that manipulation, we call the function, instead of repeating
all the detailed code.
In fact, we have been using functions all along already, without really
noticing it. The block of code in our programs with the following
outline:
int main(void)
.
.
.
is technically the definition of a function whose name is
main() . In general when I refer to the name of a function
I include a pair of parentheses after the name, as in
main() . I adopt this convention so that, whenever I use a name,
you can easily tell whether it is intended as the name of a function
or of a variable. In fact, the compiler works much the same way: when
it sees a name in an expression, it looks to see whether the name is
followed by a left parenthesis, and, if so, it knows that the name is
supposed to refer to some function.
All functions have names, just like variables. When you are
defining your own functions, you get to pick the names, again just as
with variables.
But the function name main() is special in this regard.
Every program is required to have a function called main() .
Normally functions get invoked or executed by some other function
calling them, or transferring control to them. But if this were true of
all functions then we would have an infinite regress, with no way
for any function to get started in the first place. What is
special about the function called main() is that it does not have
to be called or started up by any prior function - it is started up
automatically as the first function to be executed when the whole
program is started executing. In fact, it will normally cause very
bad things to happen if main() is actually called by any
other function...
OK, so, in a sort of a way we have already seen what a function
definition looks like, in the form of the definition of the
main() function. We have also seen what function calls or
invocations look like, because we have used some of the so-called
standard library functions , such as printf() ,
scanf() , sin() , cos() , and so on. So calling a
function basically involves just putting its name into a statement
(or, more precisely, into an expression ). When (or if) the
statement gets executed, then that will cause the function to be
invoked, and control will pass, temporarily, into the called function;
that function will do its thing, and then control will pass back to
where the function was called (the so-called "calling site").
Thus, consider this simple program:
include
int main(void)
printf("Help: I'm trapped in the computer!!! );
return(0);
When the program is started, execution will start at the main()
function. This is why a function called main() must be present:
otherwise the computer has no way of knowing where in our program
execution should start. This may not seem like much of a problem
at the moment: after all, our program only has one function, so that is
"obviously" where execution should start - regardless of what the
function is called. But very soon we will see programs where the
program consists of more than one function and some ambiguity would then
necessarily arise. Even still you might think the computer could simply
start execution at the first function that is defined (or the last for
that matter) rather that looking for a function of a particular name.
Indeed, with many other programming languages that is the approach
adopted. But, again for reasons best known to the original language
designers, they decided to take the approach of insisting that a
function with the specific name main() must always be present, and
that that is where execution will start...
Anyway: in the example above execution starts at the function
main() . The very first statement of this function is then a call,
or invocation, of the standard library function called printf() .
So execution is transferred to that function. In the meantime, the
main() function is essentially put into suspended animation. So
printf() is executed, which, in this case, means that the string
"Help: I'm trapped in the computer!!! n"
will be printed out on the
screen. Once printf() has done that, it "terminates", or, more
technically, executes a return statement, which returns control to
the calling site - in this case, the function main() . So now,
main() becomes re-animated, and continues execution with its next
statement. This is a return statement which causes main()
itself to terminate. Now, as just explained, when a function executes a
return this normally causes control to return to wherever the
function was called from ; but this is a bit tricky with
main() because, technically, main() was not called from
anywhere (or not, at least, from any other function ). In
practice, when the main() function does a return then the
whole program is terminated, and control is returned to the "external
environment" - in our case the Turbo-C++ IDE. This makes a sort of
sense since main() was the first, or "top-level" function to be
invoked, and in this contrived way, its "calling site" simply is
the "external environment"...
So: be sure you have the basic idea of a function as a unit or chunk of
code that can be "invoked". With this facility at our disposal, a
complete program need no longer consist of just one function, called
main() (together with whatever standard library functions
main() calls). Instead, once main() starts to get a bit
long or complicated, we break down the work that main() is doing,
and the statements that constitute it,
into some set of more or less coherent sub-blocks. Each of these is
then hived off into a "function" of its own. We will give each such
function some name; this should be as meaningfull as possible, but we
have essentially complete freedom in this (as we have in naming
variables). Whereas the name of the top-level function is constrained
to be precisely main() , the functions called by main() can
have any names we like. These additional functions which we name and
write are called user-defined functions to distinguish them from
standard library functions.
In the simplest case then, the outline of a program now looks
something like this:
void function1(void)
/* Statements making up function1() appear
here... */
.
.
.
return;
void function2(void)
/* Statements making up function2() appear
here... */
.
.
.
return;
void function3(void)
/* Statements making up function3() appear
here... */
.
.
.
return;
int main(void)
function1();
function2();
function3();
return 0;
This program consists of the definitions of four functions called
(rather unimaginatively!) function1() , function2() ,
function3() and main() . When the program is started up,
execution will begin at main() (even though main() is
actually the last function to be defined!); execution immediately
transfers to function1() , and main() is suspended; when
control is return 'ed from function1() to main() , then
main() goes on to call function2() and is again suspended
while function2 is executed; similarly, when function2()
executes its return statement control comes back to main() ,
and then function3() is invoked; finally, function3()
executes its return , control comes back to main() , and
main() itself executes a return , causing the complete
program to terminate, and control comes back to the IDE.
Be clear that you understand how this function call/return mechanism
allows main() to be broken down into smaller, hopefully simpler,
chunks. In this, admittedly contrived, example, the operation of
main() itself has actually become trivial - it just calls the
functions function1() , function2() and function3() in
sequence. All the complicated stuff that might have been in main()
has been split up, and moved into the "sub-" functions.
Of course, this process of breaking things down can be repeated
indefinitely. Thus, in the example above, if the function
function1() turned out to be still too complicated it might be
broken down into smaller pieces still:
void function1a(void)
/* Statements making up function1a() appear
here... */
.
.
.
return;
void function1b(void)
/* Statements making up function1b() appear
here... */
.
.
.
return;
void function1(void)
function1a();
function1b();
return;
And, of course, it might turn out that any of these functions can
conveniently be used (or called) multiple times, possibly from multiple
other functions. In the outline above, it might turn out that
function1b() , say, could usefully be called both by
function1() and by function3() , for example. Indeed, if we
do a good job in designing the program - in deciding just exactly how
"best" to break it down - we may be able to achieve quite a high level
of such re-usability!
Note that in laying out the outline example program above, I have chosen
to place the function definitions is essentially "reverse" order - if
function1() would call function1a() at run-time, then I have
placed the definition of function1a() before the definition
of function1() in the source file. This ordering is not
absolutely required, but it is an excellent rule of thumb - and you
should stick to it unless and until you have some very good reason for
diverging from it. But be clear on this point: the order in which the
function definitions appear in the source file has no effect whatsoever
on the order in which the functions will actually be invoked. The
function called main() will always be the first function
invoked, regardless of whether it is positioned at the top, the middle,
or the bottom, of the source file! And the next function to be
invoked will depend exclusively on what statement in main() causes
a function to be invoked; and whatever function that actually is, it
will be invoked next regardless of whether it is located in the source
file before or after the definition of the function main() . The
ordering of the functions in the source file affects how the
compiler translates the file - and it is to facilitate the compiler
that I give you the rule of thumb of ordering the definitions in reverse
order to the order in which they are invoked. Ordering them in this way
ensures that the compiler never has to attempt to translate a call or
invocation of a function without already having translated the
definition of the function - and it turns out that this makes the
compiler's job easier, and can avoid a variety of compile time problems
and errors. But regardless of whether this ordering is adopted or not, it
will not affect the order of function execution at run-time!
Moving Data Around
All right: so far the concern here has been with the concept of a
function, and an outline of how they can be defined and used. Now
we need to get down to some more nitty gritty specifics!
In general, if we want to decompose a program into a set of functions
then, for the functions to co-operate or interact properly they will
need to, in some sense "share" data between them. By "data" here I
essentially mean variables. How can the required data sharing be
achieved?
supports two quite different models for sharing or exchanging data
between functions. We will consider them in turn.
Thinking Global...
In this model the data (the variables) are made
"global": global variables are declared before, and outside of, the
first function definition in the source file. Global variables are
visible and accessible to all functions. Thus, if one function
wants to call another sub-function I use the term
"sub-function" loosely in this kind of context to distinguish between
one "calling" function and a second function which is "called".
However, "sub-function" is not a technical term of the language:
to all functions are "equal" - there is no kind of
precise distinction between "functions" and "sub-functions".
and have that sub-function carry out some
operation on some particular data values, the calling function can first
store the data values in some global variables, then call the
sub-function; the sub-function can then access the variables and do the
processing; if some "result" is generated, it can be made available to
the calling function by storing it in another global variable.
Have a look at the file TRI-GLBL.C tri-glbl.c .
This is a somewhat contrived example program, showing functional
decomposition and the use of global variables. Examine the program and
try to understand how it is supposed to work. Note that, given my rule
of thumb of defining functions in reverse order, the sensible way to
read a source file is to start at the bottom and work
backwards. That is, you should generally start by looking at the
main() function. Try to get some understanding of it in isolation
first. Then start looking at the functions which main() calls to
get some extra detail on what is going on. Then, if necessary (it does
not arise in this particular program) you can have a look at the
functions which are called by that function in turn, and so on,
effectively working your way back toward the top of the source file.
Play with TRI-GLBL.C tri-glbl.c .
In particular, experiment with single
stepping through it. Try to predict where execution is going to go
next at each stage, and then check whether you are right. Use the
Watch window to monitor the variables. Again, try to predict when
changes are going to happen, and see if you get it right. Notice how
execution transfers to a sub-function, then returns to the calling site.
Notice how the global variables are allowing information to be accessed
or shared between different functions. Try varying the program
somewhat: e.g. to calculate the area of a rectangle, or square; or to
repeat the calculation for a number of times; or to initially print out
some kind of "welcome" message before doing any calculations; or extend
the program to be able to calculate areas of several different shapes -
so that it first has to ask for the kind of shape to be identified (e.g.
saying "Enter 1 for triangle, 2 for square:" or something like that),
and then read the appropriate data for the particular shape, and do the
appropriate calculation. Note that the function for printing
out the result will not have to vary or be modified...
Note that in TRI none of the (sub-)functions actually have
a return statement as such. A feature of functions is that if
execution reaches the end of the function definition - i.e. the brace
which closes the definition - then the function will be terminated
anyway and control will automatically return to the calling site, just as
if a return statement had been executed. Thus, if you have a
simple return statement as the very last statement in a function
definition it can actually be omitted - and usually is. By a "simple"
return statement I mean one without an operand - unlike the
return statement in the main() function which has an operand
(namely 0 ). The significance of this operand will be explained
shortly: for now the point to note is that, if a return statement
does have an operand, then, even if it is the last statement of a
function definition, this return statement cannot be omitted.
Thinking Local...
OK, so much for using global variable for co-ordinating or sharing data
access between functions in a function. What is the alternative?
The alternative mechanism arises if we choose to make the data -
the variables - local to individual functions. Local
variables are declared immediately after the opening brace of a
function definition. Local variables are visible and accessible
only within the function in which they are defined. Indeed,
the storage for the variables is only allocated at all when the
function is invoked and is deallocated again when the function
terminates - so the variables only actually "exist" as long as
the function is active.
Now it follows directly from the "privacy" of local variables that they
cannot be used to share data between functions. Therefore, if our
functions are using local variables, and if we need to share this data,
we need some mechanism for "passing" the data from one function to
another. This generally arises just precisely when one function is
calling another (call the latter the sub-function). Actually there are
then two sub-issues: getting data from the function to the sub-function,
and getting data (results) back to the calling site from the
sub-function. provides separate mechanisms, with rather different
characteristics to deal with these two directions of data flow - from
calling site to called function, and from called function back to
calling site.
Note that in any particular case of one function calling
another we may need to transfer data in only one of the two directions,
so they really are quite separate. Indeed, as we have already seen with
the use of global variables above, we can have functional decomposition
of a program without using either of these data transfer
mechanisms. In designing a program - in breaking it up into functions -
each case has to be assessed individually.
A (Short) Digression
We will look at the separate data transfer mechanisms in turn,
below; but first: a digression. A common confusion among
students is between the notion of moving data around inside
a program - between the functions making up a program - and
moving data between the program and the outside world (screen,
keyboard, diskette files etc.). Granted these two things are
both involved with moving data around. And granted, it turns out
that to do the latter (move data into or out of the program) we
also have to do the former (move data around within the
program - specifically between our own user-defined functions and
those standard library functions, such as printf() and
scanf() , which actual do the direct exchange of data with
the outside world). But still, the two concepts are quite
different and should be kept distinct. We very frequently do the
former without the latter (i.e. move data around within
the program without exchanging data with the outside world). I
will use (and reserve) the terms "input" and "output", and also
"reading" and "writing", with the technical connotation of moving
data between the program and the outside world. When dealing
with data movement within the program I will use an
alternative set of terms, to be introduced below. I encourage
you to adopt this more precise vocabulary also.
Getting data into a function
All right: let's consider moving data from a function to a sub-function -
from a calling site to a called function. We have already seen examples
of this in using the standard library functions: in the function call we
simply list the data to be passed in, separated by commas if necessary,
and within brackets, thus:
printf("Hi there...");
Here the function printf() is being called, and the string
"Hi there..." is being passed in. Similarly, with:
x = sin(y);
In this case the function sin() is being called, and the value of
the variable y is being passed in to it. Again, with:
printf("The answer is
The function printf() is being called, but now two separate data
items are being passed in - the string "The answer is i" , and
the value of the variable result . These two are separated by a
comma.
These values passed in to a function are called arguments to the
function. They are, if you like, the values which the function is
supposed to operate on or process in some way. And, as already noted, a
function may not need any arguments - if, for that particular function,
what it is supposed to do does not need or rely on any data values to be
processed.
In general, function arguments are actually expressions . That is,
any argument can involve a more or less complicated expression. In that
case, the expression is evaluated , to yield a resulting value, and
that value is what is actually passed into the function as the argument
value. Here is an example:
printf("
In this case the second argument is actually the expression
((x * x) + (y * y))/(z + 3.141) . When execution reaches
this invocation of printf() , the expression is evaluated
first, and the resulting value is what actually gets passed in to
printf() . Note that printf() itself neither knows
nor cares how this value was generated - all it sees is the
resulting value. This is what is described as "passing by
value"; it is discussed by Alcock on page 21, and is specifically
described by him as "fundamental to the language". Be clear
about this particular point: since only allows values to be
passed as arguments, nothing that a called function does with an
argument given to it can affect or change the original source of
the argument. Consider an example. Suppose there is a function
called foobar() which expects to be given an argument of
type int . Then a fragment of code where foobar() is
called might look like this:
int main(void)
int i, j, k;
.
.
.
foobar((k + i) * 15);
foobar(j);
.
.
.
In both calls to foobar() the argument is technically an
expression and is evaluated before foobar() is actually
started up. This is clear enough with the argument (k + i)
* 15 , but it is true also even in the trivial case of the argument
being simply j . In the latter case what happens is
not that the variable j gets passed to
foobar() , but only that the current value of j
is passed in. The significance is that foobar() absolutely
cannot alter or modify the variable j ; in fact,
foobar() doesn't even know of the existence of such a
variable! All foobar() knows is the value it is given
- it does not know whether that value resulted simply from taking
the value of a single variable, or from evaluating some
arbitrarily more complicated expression.
Some languages do support a mechanism whereby whole
variables can , in effect, be passed as arguments. In that
case, the called function can be given access to a variable
actually belonging to the calling site (not just a copy of its
current value); and the called function can, if it wishes,
change the value of that variable, and that change will be
visible back at the calling site when control returns there.
Such a facility is a called "passing by reference" - because, in
effect, what is passed is a reference or handle whereby the
called function can get at the original variable, rather than
just a copy of its value. BUT: the language does not
support passing by reference - it only supports passing by
value. A technical exception here is in the case of
array arguments. has some special rules for what happens
when an array name is used in an expression (without any index)
- such as when an array is specified as an argument. The nett
effect is that arrays "sort of" get passed by reference: but
the details of this are beyond the scope of the current
discussion.
Fortunately, it turns out that this is not a significant
limitation because the effect of passing by reference can be
"simulated" using passing by value, in combination with certain
other features of the language. However (as should be clear)
this issue only becomes important in the context of trying to get
data back from a called function to the calling site - so
it will be considered again below.
As noted with printf() above, a function can accept a number of
arguments. In general, functions are designed so that they expect some
definite, unique, number of arguments, be it 0, 1, 10, or whatever. In
that case the function must be called with just that number of
arguments - no more, no less. Both printf() and
scanf() are exceptions to this rule - they can deal with
varying numbers of arguments. But this is highly exceptional:
most of the library functions we deal with will only accept
fixed numbers of arguments; and all of the user defined
functions covered in this course will only accept fixed numbers
of arguments.
Function calls are actually even more restrictive than this
in general. Not only must the call provide precisely the correct
number of arguments, but they must be of the correct data
types . Remember: the data type defines the "kind" of
information represented by a data value or a variable - such as
int , double etc. So if a function has been designed
to accept two arguments, the first of type int , the second
of type float , then every call or invocation of that
function must include precisely two arguments - no more, no less
- and they must be of precisely those types. Again,
printf() and scanf() are exceptions to this general
rule; but also again, it is beyond the scope of this course to
consider this very exceptional behaviour in any further
detail!
The compiler will usually try to enforce these rules - i.e. it
will try to check every invocation of a function to see that the
arguments match the design or specification of the function in
both number and type(s). But the compiler can obviously only do
this correctly if it knows how many arguments, of what
types, are expected. In the case of user-defined functions it
will know these things if it has processed the function
definition before it sees any invocation of it. This
condition will be satisfied if the program file is laid out as
suggested earlier - in "reverse" order - which, of course, was
precisely the reason for suggesting that ordering.
The situation with library functions is a little
more complicated. The whole idea of library functions is that
they have been pre-written and compiled ahead of time, so, by
definition, their definitions will not be available to the
compiler. But although the full definitions of the library
functions are not available, short "interface specifications"
are provided. These interface specifications tell the
compiler, for each function, exactly what arguments are expected,
and of what types. Such a specification, for one function, is
called a function prototype . Function prototypes for the
standard library functions are provided in the standard
header files - files like stdio.h and so on. So: if you
want to use or invoke any standard library function in your
program, you must ensure that an appropriate header file is first
scanned by the compiler, so that the compiler will then be able
to check that the invocation is legal. This is done with the
include directive. If you are ever in doubt as to which
header file to include to provide the prototype for a particular
library function, look up the function in the Turbo-C++ on-line
help system - the header file will always be listed
there. In some cases, the prototype for a particular
function may be included in several different header files; in
such a case you just have to insure that at least one of them
is include 'd in your file.
Having said all that, be warned: the ability of the compiler to
detect mismatches between the arguments expected by a function
and the arguments actually provided , is still less than
perfect. You should still take some care yourself in coding any
function call, and not rely completely on the compiler to check
these things for you!
So the format of a function call or function invocation is
always function followed by a left bracket - ( -
followed by zero, one, or more argument expressions, separated by
commas, followed by a right bracket - ) . The arguments must
match what is expected by the function in both number and types.
The compiler will try to check for this if - and only if - it has
earlier processed either a full definition of the function, or at
least a prototype, typically found in a include 'd header
file.
So much for the invocation of functions with arguments - but how
does this get represented in the function definition ?
In all the function definitions given so far (including
definitions of main() ), the functions did not accept any
arguments. This was denoted by the keyword void , placed
between the brackets after the function name:
void foo(void)
/* ^^^^ This denotes that foo()
will not accept *any*
arguments! */
/* Statements making up foo() go here... */
.
.
.
Note carefully that, in this example, the keyword void
appears before the function name also - but in that
position it has quite a different significance, nothing to do
with the arguments which foo() accepts. We'll
consider this again later.
OK: that was the special case of a function which does not accept
any arguments. Where a function does accept arguments,
this is denoted or coded by replacing the keyword void with
a list of names and types for the arguments: these are
technically called the parameters rather than the
arguments. The difference is this: the arguments are the
values which get generated at the calling site; the
parameters are effectively a special kind of variable, belonging
to the called function, into which the argument values are copied
when the function is invoked. If you like, all the calling site
can see are the arguments (or argument expressions) - it cannot
see the parameters as such; and all the called function can see
are the parameters (and their values) - it cannot see the
original expressions, or arguments, from which they were derived.
In this sense, arguments and parameters are like two views of the
same thing: but the calling site only has the argument view, and
the called function only has the parameter view.
The format for a parameter list, in a function definition is
something like this:
void foobar(int max, int min, double ecstasy)
This heading part of the function definition says that
foobar() will accept exactly three arguments.
foobar() will refer to these as max , min and
ecstasy . The first two must be of type int , the
third of type double . Whenever foobar() is invoked,
the arguments at the calling site will be evaluated; parameters
called max , min and ecstasy will be created;
and the argument values will be copied into them; then execution
will proceed with the statements in the body of foobar .
The parameters are then effectively a special kind of local
variable belonging to foobar . Within foobar() they
can be used just like any normal variable. In particular, their
values can be referenced, as one would normally reference the
value of a variable - by using its name in an expression.
Thus, a fragment of foobar() might be the following:
void foobar(int max, int min, double ecstasy)
.
.
.
if (ecstasy < 0.0) printf(" dear...");
else
if (ecstasy > 100.0) printf(" don't believe it!");
.
.
.
Since parameters are a form of local variable you can, of course,
change their values: but by doing so you overwrite the
original value - the argument - passed in from the calling site.
So this should only be done with care! Note also that, when the
function terminates, the parameters, just like all other local
variables, are disposed of, and their values lost. Changes to a
parameter within a called function, just like changes to a local
variable, have absolutely no effect on anything at the calling
site.
Getting data back out of a function
All right: that's passing data into a function, both from the
point of view of the calling site (argument expressions) and
from the point of view of defining the called function (parameter
variables). Now we turn to the complementary question of getting
information back from the called function to the calling
site.
The simplest, and arguably most convenient, mechanism provided
for this in is the use of so-called "return" values.
Essentially, a function can terminate using a form of the
return statement which specifies a value (it actually has a
complete expression in general: but this will be evaluated to
yield some particular, single, value, of some particular type).
If a function terminates in this way, the specified value is
retained and made available back at the calling site. In effect,
wherever the function name appeared at the calling site, its
name gets replaced by the returned value, and execution
then continues at the calling site.
Consider this simple example, using the library function
sin() :
x = sin(0.5);
In this case the function sin() is invoked (incidentally
being passed one argument, the double value 0.5 ).
sin() is executed, and when it terminates, it does so using
a return statement. That return statement specifies
a return value. Back at the calling site, the name of the
function is effectively replaced by this value, and execution
continues, so that this value is then assigned to the variable
called x . When I say that the name of the function
is "replaced" by the returned value, I do not, of course, mean
literally that the text of your program is altered. Rather I
mean that the way to understand what happens next at execution
time is to imagine the return value appearing in the place
occupied by the function name.
That was a simple example. Here is a somewhat more complicated
one:
printf("Answer:
When this is executed, what happens? Well, printf() cannot
be started before its arguments have been evaluated. To evaluate
the second argument, sqrt() (a function for extracting
square roots) must be invoked - but again, that cannot happen
until its argument has been evaluated. So the calls to the
sin() and cos() functions have to happen first. It
does not matter which is done first - lets say it is sin() .
Its argument is evaluated - simple, it is just the value of the
variable x . This is passed in and sin() gets
executed. In due course it terminates with a return
statement, specifying a return value. This is retained in
temporary storage at the calling site. The argument for
cos() is then evaluated - again simple, just the value of
y . When cos() terminates, again with a return
statement, its return value comes back to the calling site and
gets added to the value returned earlier from sin() . This
is now the argument value for sqrt() so sqrt() is
started up, with the value passed in to it. Again, in due
course, sqrt() terminates with a return statement,
and specifies a return value. This value is multiplied by
2 , and the resulting value is the second argument for
printf() , so printf() can now finally be invoked. It
will do its thing (printing out the text Answer: followed
by the value of its second argument, regarded - correctly - as a
double ), and will then terminate.
printf() actually also ends with a return statement
which specifies a return value, and this will indeed be passed
back to the calling site (this return value signals whether
printf() executed without detecting any difficulties). In
this particular case, the return value is not used at the calling
site - it is not, for example, stored in a variable, or used in
some further expression evaluation. So it is simply discarded,
and execution proceeds to the next statement.
Phew!
That covers how return values are used at the calling site, and
we have also given the gist of how the return value is
implemented in the function definition - namely by using a
return statement with an appropriate operand expression.
But let's just detail this with another example:
double silly_sum(double apples, double oranges)
return (apples + oranges);
Note that the void that we were previously placing in front
of the function name, in the heading, has now been replaced by
double . This is a general rule. The word that comes
before the function name in the heading specifies the type
of return value (if any) which the function will yield. If the
function does not yield any value, then the return type must be
specified here as void , as indeed, we have generally been
doing up to now. But if the function does return a value,
then the type of that value must be advertised or notified ahead of
time, by specifying it here in the heading. So, in our case, just
from the heading, we know that the function silly ()
must terminate using a return statement, with an operand
which is of type double . If there is a subsequent
inconsistency in the definition of silly () - if it
does not contain any return statement, or it has a
return statement with no operand, or it has a return
statement with an operand which is not of type double -
then the compiler will complain bitterly.
As an aside, note that we have consistently been defining the
function main() as if it returns a value of type int ,
and then actually ending it with a return statement
specifying a return value of 0 :
int main(void)
.
.
.
return 0;
This is, to say the least, a trifle odd. main() is not
being called by any other function - it is automatically started
as the very start of program execution. We already mentioned
that, for that very reason, it would not make sense for the
parameter list of main() to be anything other than
void . Yet, notwithstanding that, we are defining
main() as if it makes sense for it to be passing a return
value back to some calling site. Why is this?
Well, this is yet another of those funny historical oddities of
The essential idea is that the return value from
main() is passed back to the enclosing "environment" -
wherever the program was started up from. In our case, this is
the Turbo-C++ IDE. The general idea is that this return value
could be used to signal whether program execution was
"successful" or not - with 0 signalling success, and
anything else indicating some kind of failure. However, this
idea of signalling success or failure to the enclosing
environment doesn't actually serve any useful purpose in the case
of running programs under the Turbo-C++ IDE. Using it under
other environments (where it might be useful) is beyond the scope
of this course. In fact, for our purposes, it would be legal, and
preferable, to define main() as not returning any value at
all, and omit any return statement from it:
void main(void)
.
.
.
Unfortunately Alcock has insisted in showing main() as
yielding a return value of type int , so, in the interests
of consistency with his presentation, I have stuck with that...
OK: time for another substantial example. Let's revisit the
triangle area calculating program, but now avoiding the use of
global variables, and moving data around with arguments and
return statements instead. Have a look at
TRI-LCL.C tri-lcl.c . Compare it with
TRI-GLBL.C tri-glbl.c . Again experiment
with single
stepping through it. Try to predict where execution is going to go
next at each stage, and then check whether you are right. Use the
Watch window to monitor the variables (including the
various function parameters). Note that because
the variables are now all local , they only exist at all when the
relevant function is active. Furthermore, it is a characteristic
of the Watch window that only variables of the function
which currently contains the execution highlight bar are
visible. Other functions may be "active" -
main() is active all the time, for example - but
be temporarily suspended while a sub-function is executing. Their
variables still exist, and will become visible again in the
Watch window as execution returns to them...
Again, try to predict when changes are going to happen, and see
if you get it right. Notice how the parameters and return values
are allowing information to be moved around between the different
functions. Again, try varying the program somewhat: e.g. to
calculate the area of a rectangle, or square etc.
Now notice one particular difference between
TRI-GLBL.C tri-glbl.c and
TRI-LCL.C tri-lcl.c .
In TRI-GLBL.C tri-glbl.c I used a single function
( read () ) to prompt for and read in both the
dimensions of the triangle; whereas, in
TRI-LCL.C tri-lcl.c I have broken this down into
two separate functions, read () and read () .
Stop for a moment, and see if you can figure out why I made this
change...
Well, if I had retained the original structure of
TRI-GLBL.C tri-glbl.c ,
but without using global variables, then the
function read () would have to somehow return
two distinct pieces of information. But that cannot be
done with the return mechanism in OK: this is
a little white lie. It can be done, but only using the
mechanisms of struct data types. This is beyond the scope
of the current discussion, but have a browse in Chapter 8 of
Illustrating C if you want to explore this further...
The return statement can only have one operand, and thus
only a single value can be passed back to the calling site.
Indeed, if you think about the way the returned value is handled
at the calling site (being inserted wherever the function name
appeared) it should be clear why this must be so: if more than
one value was returned then there would be no unambiguous way of
deciding what to do at the calling site.
So: I elected to re-organise the program to remove the need for a
function which would return more than one value - by replacing it
with a pair of functions, each of which individually only return
one value.
This works fine. But it also gives me an excuse for mentioning
an alternative way for getting information back from a called
function to the calling site - a way which does not suffer the
limitation of the return mechanism, which can only handle a
single value.
First recall the earlier discussion of "passing by value" and
"passing by reference". Clearly, if we had a mechanism for
passing by reference we could use this to get as many distinct
items of data as we like back from a function: we just pass ("by
reference") that number of variables into the function; the
function then stashes the relevant results in these variables;
and when the function terminates, and control returns to the
calling site, those results can be accessed - as many of them as
you like.
But, of course, does not support passing by reference. So
what can we do?
The answer is that we can achieve the effect of passing by
reference, in a somewhat roundabout way: we can pass, as values,
and using the normal passing by value mechanism, the
addresses of some variables. Using these addresses, the
function can then, indirectly, get at the variables, and deposit
results in them - exactly as if the variables had been
passed by reference.
In fact, we have already been implicitly using this mechanism,
though without drawing too much attention to it. It is the way
we have been getting information back from the standard library
function scanf() :
scanf("
In this case the additional arguments to scanf() are
addresses or pointers to variables. We generate the
address of a variable by preceding its name with the "address-of"
operator, denoted with the ampersand character, & .
Of course, this mechanism is used by scanf() precisely
because scanf() is supposed to be able to get more
than one item of information back to the calling site. In fact,
a single call to scanf() can be used to pass an
indefinitely large number of separate items back, just according
to how the format specification string is laid out.
Can you use this mechanism in your own user-defined functions?
Yes - but it is quite messy. To define a function of your own
which uses this mechanism you need to know two new things: how to
specify that the type of a parameter is an address or pointer;
and given an address or pointer, how to "de-reference" it -
access the thing pointed at. The program
TRI-LCL1.C tri-lcl1.c gives an example, if
you want to have a look - but we will not consider this in detail
here.
Preferences?
We have seen two radically different styles of getting
information into and out of functions: the use of global
variables and the use of parameters and return values. Which is
best?
Unfortunately there is no simple answer. Both kinds of mechanism
are provided in precisely because, in different circumstances
they each have their own advantages.
For people just starting out to learn to program the global
variable mechanism is conceptually easier to deal with, and
actually syntactically easier to use in programs. Therefore I
recommend that you start out by using this mechanism, unless you
have some specific reason for doing otherwise.
However, quite quickly you will probably discover some of the
drawbacks of using global variables. At that point it is worth
getting familiar with both parameter and return value mechanisms
and trying them out.
Once you are technically happy that you can use both kinds of
style then, with experience, you will get a feel for the
particular circumstances in which each is appropriate. But:
until you are comfortable with judging this I recommend that you
actually try to avoid or minimise your use of global variables.
You will find that this requires you to think rather harder about
how to design your programs; and will also make the design and
coding somewhat more cumbersome at times. However, it is a fact
that the use of global variables makes programs prone to a
variety of subtle problems and bugs - due to non-obvious and
un-intended interactions between functions, via changes in global
variables. Such bugs are particularly difficult to isolate and
correct. Therefore you will find that avoiding the use of global
variables - except where they have some overwhelming advantage to
recommend them - will save you a lot of program development time
in the long run.
Recursion
Alcock chooses to introduce recursion toward the end of
Chapter 2. I'm not sure I agree with the need for this.
Recursion - functions which invoke themselves - is an interesting
programming concept, and it does permit very elegant solutions to
certain programming problems. On the other hand, there is never
a situation where the same effect as recursion cannot be achieved
using conventional iteration mechanisms, such as for
statements.
In any case, all I will add to Alcock's discussion are the
following brief comments:
When any function is invoked, storage is allocated for its
local variables (including parameters, if any).
This storage will be deallocated when the function
terminates.
Recursive functions are a special kind of function. This
may seem trite, but it follows that you need to understand the
general concept of a function reasonably well before you
try to understand the idea of a recursive function...
If a function is invoked again while it is still already
active - i.e. if a recursive invocation is made - then
additional storage is allocated to hold a new set of local
variables for this new invocation. Each invocation thus has an
entirely separate set of variables and parameters - even though,
within each invocation, these are referred to by the same names.
There is no confusion because the computer keeps careful track of
which invocation is being executed, and therefore which copies of
variables or parameters to refer to.
At first sight it seems that recursive function invocation
must be fatal: surely the computer will be caught in an infinite
regression, invoking ever more copies of the same function, until
something horrible happens (like it runs out of storage to hold
any more variables) and the whole thing crashes? This certainly
can happen; but recursion can also be made benign. It is
made benign by ensuring that, at each nested invocation,
something is changing (typically a parameter value); and at
some, finite, point, this will result in a copy of the function
which does not make a further, nested, recursive call to
itself, but instead terminates. That then allows its caller to
terminate, and its caller in turn, so that the whole nest of
recursive calls can finally unwind, and return control to
wherever the very first instance of the function was called from.
But recursion will only be benign and controlled in this
way if you, the software engineer, make it so. Recursion is
intrinsically dangerous stuff - use it with extreme caution!
Term 2: Weeks 11 20
Week 11 12: Chapter 3 pp. 27 38
Week 13 14: Chapter 3 pp. 39 49
Laboratory
General Comments on Lab Work
The CAE laboratory itself should be used both for the scheduled
laboratory exercises, and also for "private study": the primary
way of studying software engineering is to do it! The laboratory
exercises are crucial to this course: you should learn more from
the time you spend in the laboratory than from lectures and
tutorials combined.
Each student attends one three-hour scheduled lab session every
two weeks. You should aim to spend at least three
further hours of private study time in the lab over the same
period. The scheduled lab sessions are deliberately structured
so that each student has a separate PC; however, for private
study you may find it easier to get at a machine, and also more
effective, to work together with one or two others - provided
that you spend time discussing what is happening, and taking
turns at actually operating the PC.
For the first lab session you are required to bring along a new,
unused, copybook or notebook to be used as you lab logbook.
However, in contrast to most other laboratory subject, this
logbook will not be used to hold formal lab reports.
Instead, you will maintain the logbook just with your own
personal notes; but these personal notes will serve as the basis
for writing separate formal reports, which will actually
be submitted electronically for marking. This is a new initiative
in the academic year 1994/95, and will be explained in more detail
during the first lab session.
You should consult the
Laboratory Report Guidelines
http://www.eeng.dcu.ie/ 7Emcmullin/swe1/rptfmt/rptfmt.html
for advice on how to prepare lab reports, and the criteria which will be
used in marking them.
Term 1: Weeks 1 10
Session 1: Week 1/2
This first session is primarily concerned with introducing the
hardware and software tools which will be used in the remainder
of the course. Given that you are reading these notes, you have
successfully completed the first phase of the exercise!
Stop at this point and read back over what you have written in
your logbook. Make sure you have a reasonably complete record
of what you have done so far (so that, if necessary, you could
redo it without further guidance).
Your next task is the following:
Start up the File Manager program. Browse through its
on-line help to get a general idea of what the program is and how it can
be used.
Using File Manager , create a new directory at the topmost
level on disk drive D: with any name you wish (for
example, your own forename or surname etc.). If you manage this
successfully, you should be able to see the new directory listed by
File Manager - for example as D: MYDIR or
whatever.
Try to establish what, if any restrictions apply to naming
a directory - either by experiment or by reading the help.
Use the Notepad program to create a text file
containing a short description of your experiences during
Orientation week - what you enjoyed, disliked etc. Save this with
the filename ORIENT.TXT in the directory you have just
created. Ensure that it is no more than 2,000 bytes long. If
necessary, go back and make it shorter.
Save a copy of this file on both your diskettes, in a
top level directory called LAB . Delete the original file and its
directory from drive D: . You will have to
"format" the diskettes first. Disk formatting and file
copying and deleting operations are all carried out with the
File Manager : again, consult the online help for details!
Using the Terminal facility of the Wnqvtwsk program,
consult the DCU library On-line Public Access Catalog (OPAC) to find any
books written by Donald Alcock . Login as "opac".
Make a note of any you locate.
Still using the Terminal facility of Wnqvtwsk change your
password on the computer ugmail.eeng.dcu.ie .
Use the Mail facility of Wnqvtwsk to exchange at least 2
e-mail messages with other student(s) in the class. Arrange for copies
to be sent to your own mailbox also. Save these messages.
Compose, using Notepad , a brief formal report
on the lab session. Include copies of the file ORIENT.TXT ,
and the e-mail messages you exchanged. The complete report must
be no more that 5,000 bytes long. Mail it to your
demonstrator. This will be DemoA , DemoB ,
DemoC or DemoD , where the last letter is the letter of
your section in the class.
Arrange to save a copy of the mailed message (including the time
and date stamp) on both your diskettes in the directory
LAB . Note that you can specify multiple recipients for a
mail message by separating them by commas, and you can send mail to
yourself, though it will not be visible until you exit Mail and
re-enter it again.
Shut down any programs you still have running - and go for
a well deserved lunch!
Session 2: Week 3/4
lab2
Carry out Exercise 1 of Chapter 1 of Illustrating C .
Exercise 1
Implement the loans program. This is an exercise in using the tools of
your particular C environment. It can take a surprisingly long time to
master a new editor and get to grips with the commands of an unfamiliar
interface. If all else fails, try reading the manual.
Preparation
In advance of the lab, prepare a file, called WOTCOST.C ,
containing the program as listed on page 3 of the textbook. If
you did not do this in advance - shame on you. Do it now!
Now, using File Manager , create a directory called
SWE1 in the root directory of drive
D: . If this directory already exists you can leave
it there, but delete all files and/or subdirectories which may be in
it.
Copy the file WOTCOST.C into this directory.
The Turbo-C++ IDE
tc_ide
You will be using the Turbo C++ "Integrated Development
Environment", or IDE, to create, compile, test, and debug (i.e.
correct) your C programs. The IDE application is started up using the
Turbo C++ icon in Program Manager .
Note that the IDE was developed before Microsoft Windows became
standardised; while the IDE uses menus, windows, dialog boxes etc. in
much the same way as Microsoft Windows, the details vary significantly.
Experiment, and use the on-line help as necessary, to get a better feel
for how things behave within the IDE.
The IDE provides various facilities for developing programs.
It allows editing of program files. This is an alternative to
Notepad . You will probably still find it convenient to
use Notepad for creating and editing any text files
other than program files; but it will be handier to use the
editor built into the IDE for program files.
Next the IDE allows programs to be compiled . Compiling means
translating the relatively high level instructions of the program
into sequences of the very much simpler and basic instructions that the
microprocessor in the PC can actually execute or carry out. During
compilation, certain kinds of "warning" and "error" messages may be
issued. Error messages mean there is something so badly wrong with the
program that compilation cannot even be completed. Warning messages
indicate that the compilation could be completed, but the compiler is
suspicious that the program still does not make "sense".
The IDE also supports linking . Linking means combining the
instructions generated from your particular program with additional
"standard" sets of instructions, for doing "standard" things, like
arithmetic, or displaying information on the screen, or reading
information from disk files etc. These additional instructions are held
in "libraries". Your program cannot actually be executed until it has
been linked with any required libraries. Linking can also yield further
warning and/or error messages.
Note that, even if your program compiles and links without any warning
or error messages, this does not at all mean that all errors have been
eliminated. It simply means that you have a program which is complete
or valid enough to be actually executed: but it may not do
anything like what you intended. Indeed, it is virtually unknown for a
program of any significance to work properly first time.
So: the IDE also provides various facilities for testing and debugging
the program. The program can be simply executed; but it can also be
executed one step at a time; or the values of variables can be examined
or modified, and so on.
Hints
This section contains hints on using the Turbo-C++ IDE. It would be
preferable if you could refrain from reading it!!! Try
experimenting first. If the results of this seem totally
incomprehensible, try browsing through the on-line help. Come
back to look here only in the last resort (short of asking the
demonstrator).
IDE Edit Windows
Before you can do anything else (such as compile ) a
program with the IDE you must load the file into an IDE
edit window. These are very like Notepad : they
provide basic text editing facilities.
To open an edit window, and load WOTCOST.C into it,
drop down the File menu and select the Open menu item. This
will bring up a file selection dialog box, quite similar to (though also
slightly different from) a conventional Windows file selection dialog.
You should be able to use this to select WOTCOST.C . An edit window
will be opened. You should be able to see the text of WOTCOST.C .
If you cannot do this, then ask for help from the demonstrator.
Once you have an IDE edit window successfully opened, experiment
with editing the text, saving it, closing the edit window, and
reopening it.
It is possible to have several edit windows open
simultaneously. Each normally accesses a separate
file. I strongly discourage the use of multiple edit
windows simultaneously accessing the same file: this
usually ends in tears! In fact, at this stage in your programming
education, you would do well to restrict yourself to a maximum of 3
simultaneous edit windows (even accessing different files), at any
given time.
The name of the file being accessed or edited is shown in the
title bar of each edit window.
Compiling
Once the program text is loaded in an edit window you can
compile it: that is, translate the "high level" statements of
the language into sequences of the much simpler, "low
level" instructions which the microprocessor in the PC can
directly carry out. This so-called object code will
normally be stored in a file with the same name as the source
file, but the extension .OBJ . In our case then,
the object file will be called WOTCOST.OBJ .
Note that you will not need to copy object file(s) onto your
diskettes for backup. They can always be recreated simply
by recompiling again.
To compile you must first make the edit window (containing the
program text) active . The active window in the IDE is
identified by having a double line border. Initially this
probably will not be a problem: the edit window will be the only
window you have opened in the IDE. But later on you will have
several windows open in the IDE; then it is important to remember
that compiling will only work with an edit window selected.
To compile, drop down the Compile menu, and select
the Compile to OBJ menu item. The compilation will then
be attempted. It is quite likely that the compiler will issue
several warning and/or error messages. These will be displayed
in a separate Message window.
The Message Window
When the Message window is active, you can scroll
up and down through it highlighting one message at a time. As
each message is highlighted, a highlight bar will also be moved
about in the corresponding edit window. The highlight in the edit
window gives a rough indication of where the compiler
"thinks" the problem is located. Be wary of this: the compiler
(like most computer programs) is not at all smart. The most you
can rely on is that the problem, whatever it is, is either in the
line highlighted by the compiler, or somewhere before it .
Be wary also of the actual text of the message. The compiler will
do its best to diagnose the problem. It will try to give a
meaningful description of it. But, it will often fail miserably
at this! So treat the actual text of the message with
some scepticism. A message always indicates that
something is amiss with your program. But the actual problem
will often be in a different place, and of a different
kind, to that suggested by the compiler.
If you press the Enter key while the Message window is
active, the IDE will automatically make the (relevant) edit window
active, and position the cursor as near as possible to where it
"thinks" the problem is. The text of the message will also be
temporarily redisplayed in the bottom of the edit window. This text will
disappear as soon as you move the cursor around in the edit window. You
can make the Message window active again by clicking
anywhere over it.
"Error" messages are more serious than "warning" messages.
But at this stage in your education, you should consider all
messages issued by the compiler as indicating problems .
Do not go past the compile stage until you have managed to get a
completely "clean" compilation - with absolutely no error or
warning messages. If you find that you simply cannot make any
sense of some compiler message (after making a sincere effort!)
then call the demonstrator for help.
Linking
Once you have got a clean compilation, you can go on to the
linkage phase. This is where, if necessary, certain standard
"library" procedures or routines are added to the code produced
by compiling your program. These standard routines deal with
things like reading data in from from the keyboard, or writing it
out to a disk file; or calculating mathematical functions such as
sin() and log() etc. There are a very large number of
such standard routines, or "building blocks", available. This
saves individual software engineers from having to develop these
standard procedures themselves from scratch.
You will not be able to actually run or execute your program
unless it can be successfully linked. The output of the linkage
phase is called the executable file. This will normally be
stored in a file with the same name as the source file, but the
extension .EXE . In our case then, the executable
file will be called WOTCOST.EXE .
As with object file(s) you will not need to copy executable
file(s) onto your diskettes for backup. They can always be
recreated whenever necessary..
To carry out the linkage, first ensure that the (relevant) edit
window is active. Then drop down the Compile
menu. Select the Link EXE file menu item.
As with compilation, problems may be detected during the
linkage phase, and messages may be displayed in the edit window.
In this case, however, the IDE will not be able to even
guess where the problem is located in the original source file.
This is because, at linkage time, the IDE is really only dealing
with the .OBJ file: it is no longer dealing with
the source file. It cannot relate problems now arising with the
.OBJ file back to some particular location in the
source file: the relationship between these two files is too
complex to allow this. But: you can rest assured that, if the
linkage phase results in messages in the Message
window, there really is something wrong with your source
program.
Do not go past the linkage stage until you have managed to get a
completely "clean" link - with absolutely no error or warning
messages. If you find that you simply cannot make any sense of
some linkage message (after making a sincere effort!) then call
the demonstrator for help.
Running the Program
Once you have got a clean compilation, followed by a clean
linkage, you can finally try "running" or "executing" your
program.
Remember: just because you have got to this stage does not
guarantee that the program is "correct". It simply means that
the remaining problems are of such a kind that the program is
still "legal" - it is still executable - but it just doesn't do
what you intended. There almost certainly will be remaining
problems of this kind!
You can run the program by dropping down the Run
menu and selecting the Run menu item.
The IDE window should be automatically (though temporarily)
cleared so that it can be used and controlled by the program.
More technically, the window is being switched to display the
so-called User Screen .
If the program is working correctly, it should display the
message prompting for input of the relevant data. Enter suitable
test numbers (separated by spaces). Press the Enter key. The
program should then print out the results - and terminate! There is a
slight problem here. As soon as the program exits or terminates, the IDE
will automatically switch away from the User Screen , back to the
normal IDE display. This happens very quickly - so that you will
not have time to see whether the program has printed any result at all
(never mind the correct result).
However, you can redisplay the User Screen , even
when the program is no longer running. Simply drop down the
Window menu. Select the User Screen
menu item. The User Screen will then be
temporarily redisplayed. This allows you to check the program
output. Pressing any key (or clicking the mouse button) will then
cause the display to revert to the normal IDE display. A
slightly more sophisticated, and convenient, alternative, is to display
the Output window (select the Output menu item on the
Window menu). This is a normal Turbo-C++ IDE window which can be
moved around and resized etc., but which "mirrors" whatever output
appears on the User Screen . Thus, it can allow you to still see
the output even while the User Screen itself is not being
displayed.
Test and Debug
Test your program severely and thoroughly. If possible,
co-operate on this with another person in the class, if that
person is at the same stage: temporarily swap machines, and each
test the other's program. Most people have a natural
disinclination to be severe in testing their own work - but
relish the thought of demonstrating the flaws in the work of
somebody else. Sad but true - and, in this case, rather
useful.
If this is not possible, just try to be as severe in your own
testing as you can.
More likely than not, this testing will reveal continuing
problems with the program. Try to understand what is causing
these problems, and fix them. If all else fails, ask the
demonstrator for help. Do not go past this stage unless the
program has behaved correctly under a wide variety of tests.
STOP!
If you are satisfied that you have got the WOTCOST.C
program working correctly and satisfactorily, then you can go on to
read the rest of the instructions below; but, if the program is still
misbehaving, then stick at it - and ask for help from the demonstrator.
Some Questions
If you are completely satisfied that you have got the
WOTCOST.C program working correctly and
satisfactorily, then you can go on to consider the following
questions.
Are there any restrictions on the numbers that this program can
deal with? Try to find out using very big and very small numbers -
and anything else you can think of.
What happens if you feed in negative numbers to this
program? What should happen (in your opinion)?
What happens if you feed in too many numbers? Too few? What
should happen?
What happens if you feed in non-numbers? What should happen?
If you change things in the program it will generally cause
compiler and/or linker errors. Try to find what is the simplest, most
innocuous, change you can make which gives rise to the greatest number
of weird and incomprehensible error and warning messages
at compile time.
Variations
If you succeed in getting the basic WOTCOST.C program
working, and you have tested it thoroughly, then you can go on to try
the following:
WOTCOST1.C : Behaves exactly like the given program,
except that it first displays an additional message like "Welcome to
the WOTCOST program".
WOTCOST2.C : Behaves like the given program, except
that it expects the interest rate to be entered already as an absolute
value such as 0.15,
rather than as a percentage.
WOTCOST3.C : Behaves exactly like the given program
- but the variable names are changed to more meaningful ones (like
"principal", "years" etc.).
WOTCOST4.C : Behaves exactly like the given program
except that it prompts separately for each of the input values to
be entered.
WOTTIME.C : Similar in concept to the given program;
but it is given the principle, interest rate, and monthly
repayment ; it has to calculate how long the loan will take to pay off.
WOTCOST5.C : Behaves exactly like the given program,
except that it rejects any attempt to enter a negative value for
any of the inputs.
Conclusion
That completes the exercise. As previously explained, the formal lab
report should be prepared electronically, and submitted (to your
demonstrator) by electronic mail. The report should be concise, and not
repeat material in these on-line instructions. It should focus on the
particular problems you experienced, and how you solved them. It should
make clear what you have learned from the exercise.
You will be allowed approximately 20 minutes at the end of the lab
session for preparing and submitting the formal report. If you cannot
complete it in this time, complete it on your own time afterwards.
However, you should not spend more that one hour, in total,
preparing and submitting the formal lab report. The report must be
submitted within one week of the day on which your lab session is
scheduled.
Session 3: Week 5/6
Carry out Exercise 1 and Exercise 2 of Chapter 2 of
Illustrating C .
For your convenience these exercises are reproduced in the following
sections of the on-line notes. Exercise 1 has also been slightly
modified to conform with some limitations of the Turbo-C++ application.
Exercise 1: Matrix Multiplication
Program MATMUL multiplies matrices of fixed size (3 rows, 4
columns; 4 rows, 2 columns). Make the program deal with any specified
sizes up to an arbitrary 50 by 50. The maximum size stipulated
in the textbook is 100 by 100; but with the Turbo-C++ application,
as currently configured, this is too large for it to handle. Hence the
reduction to 50 by 50.
Read three sizes: the number of rows of A , the number of columns
of A
(implying also the number of rows of B ), the number of columns of
B .
Read A and B by rows, then print C by rows.
For this exercise you have
to change each simple reading loop to a nested pair of loops. Similarly
the printing loop.
Exercise 2: Bubble Sort
Alter the Hooke's Law anagram program to read and sort a list of numbers
(type double ) into numerical order, then display the sorted list. Make
the program request the length of list, then ask for the numbers one by
one.
Preparation
You must have completed all parts of the previous lab exercise in
advance of this lab session. If you are having difficulty with some
parts of it, look for help! But do this before you arrive for this
session.
Prepare files (called MATMUL.C matmul.c and
BUBBLE.C bubble.c respectively) containing
the basic programs introduced on pages 17 and 19 of the textbook. Again,
this must be done in advance of the lab session. You
should have found that, if you click on the hot-linked file names, you
can get a display of either of these files, within Mosaic. You can,
of course, retype each program from this display into the
Turbo-C++ IDE.
However, you may also save the files directly from Mosaic, using the
Save As menu item on the File menu. Which saves a bit of
typing.
Note the comment on the listing of MATMUL (page 17 of the text
book) that, when using Turbo-C, 1992, an extra "dummy" statement must
be added in to correct for a bug in the Turbo-C application itself. The
header file for the mathematical functions must also be included - i.e.
a include directive must be added at the top of the
file. Both of these corrections are already included in the online
version of MATMUL.C matmul.c .
Instructions
As in the previous exercise, ensure that a directory called
SWE1 exists on drive D: . Create it if necessary. If it
already exists, and if there are any existing files or subdirectories in
this directory, delete them. Then copy your files into this directory.
Do all this before starting up Turbo-C++ .
Each exercise requires you to modify one of the given programs.
But first, ensure that the original programs behave as expected. That
is, in each case, get the program to compile and link successfully (zero
error and warning messages), and then subject it to a range of tests.
Make the tests as difficult and severe as you can imagine.
You may find it useful to refer back to the information on using the
Turbo-C++ IDE tc_ide from the previous lab session.
Do not proceed with trying to develop either of the modified
programs until both original programs pass such tests. If in
doubt, consult the demonstrator for advice.
This should take no more than the first 50 minutes of the lab session,
i.e. you should be finished this by 9:50.
Now you can attempt to modify the programs. In each case, plan the
modifications first; then carry them out; then eliminate any compile
and/or link problems; and finally, carry out a range of tests, and
correct any run-time problems which are uncovered. Schedule at most 55
minutes for each of the two exercises: that is, even if you are not
finished with Exercise 1 after 55 minutes (10:45), put it aside and
start on Exercise 2; similarly, even if you are not finished with
Exercise 2 after a further 55 minutes (11:40), put it aside and proceed
to preparing your formal report.
You are allowed approximately 20 minutes at the end of the lab
session for preparing and submitting the formal report. If you cannot
complete it in this time, complete it on your own time afterwards.
However, you should not spend more that one hour, in total,
preparing and submitting the formal lab report. The report must be
submitted within one week of the day on which your lab session is
scheduled.
If you do not get the exercises completed during the lab session,
make sure that you come back and complete them on your own time, before
the next session. You should not include any description of this
further work in your formal report.
If you get stuck, look for help, either via the e-mail
feedback feedback mechanism, or by coming to the
tutorial tutorial !
Hints
The hints given below provide some further ideas on how to go about
completing the exercises. But try to do the exercises under you own
steam first. If and when you get stuck, you can come back and try here
for further inspiration.
General
As you modify either of the given programs, keep testing them (and
saving them!). Do this as you go along .
That is, each time you have made some kind of single, self-contained,
modification, save the file, and try it out if possible. For example, if
you have added a new statement, or a declaration of a new variable, then
this should, as a minimum, not have introduced any new compile or link
time problems. So check that this is the case before going any further.
In many cases, you will be able to do even better than this. If you have
just introduced a new printf() statement, for example, then you
should immediately be able to check that it actually works - that it
does print out what it should! And so on.
In this way the scale of the "current" problems with your program is
always kept small and manageable. Whereas, if you make all your
modifications before trying any of them, then you are likely to
be overwhelmed by the problems, and not know where to even start in
trying to solve them.
Whenever your program is supposed to read data in
it is a good idea to have the program immediately print it back out
again, before reading anything further. In this way you can separate
problems to do with reading the data from problems to do with
processing it.
Sometimes when you have a program running under the Turbo-C++
IDE you may
decide, for whatever reason, that you want to terminate it
immediately rather than going through whatever procedure is required
to get it to finish "naturally". For example, if the program needs to
read in 50 numbers, but it already clear that the first has been
mis-read, you don't want to bother typing in the remaining 49. Or (as
can easily happen) you program may get into an "infinite" loop, such
that it will never terminate "naturally". In such
circumstances you can usually abort the program by typing
Ctrl-Break . That is, press down and hold the Ctrl key
(located to the left of the space bar), and then press and release the
Break key (also labelled Pause : the Break legend is on
the front face of the key; it is located at the right end of the very top
row of keys). You may have to do this twice before it will take effect.
The program should then be terminated, and control returns to the normal
Turbo-C++ IDE environment. If this still doesn't kill the
program, then you will probably have to use the Reset button on
the computer system unit, in order to reset the whole computer. But
the reset procedure is quite slow, and, of course, you will lose any
modifications to files which have not been saved to disk again.
So this is a very last resort!
Exercise 1: MATMUL
You are asked to modify the program to handle arbitrarily sized
matrices (up to some maximum). You might think therefore that the
arrays should not be declared until the program has read in the
appropriate sizes. However, this is not possible in Arrays must have
their sizes fixed at compile time. You can get around this by
sizing the arrays big enough to accommodate the maximum you really
want to deal with. Then, at run time, when the "actual" sizes are read
in, you can arrange for the program to use only as much or as little of
the available array "space" as necessary - any left over can be simply
ignored. This strategy might run into difficulty if we were
concerned, for whatever reason, to only use as much memory as absolutely
necessary. Fortunately, this does not apply to the current exercise.
The original program has four main parts: first the contents of
array A are read in; then array B is read in; then the
calculations are done ( C is calculated); and, finally, C is
printed out. The new program can reasonably retain this overall
structure, so you should try to identify for yourself, exactly how the
original program is broken down into these parts.
In the reading of array A in the original program, a single
scanf() call is used to read one entire row (four values). This
is repeated three times to read the entire array. In the new program
the width of the array is not fixed; so you cannot use a single
scanf() call to read a whole row anymore. Instead you will have
to use a single scanf() call to read just one value. This
can then be embedded in a (new) for loop which iterates it for the
correct number of columns so that one complete row is read. This whole
loop can then be controlled by the "original" for loop,
iterating over the rows (though with some further, minor, modification).
Of course, similar considerations apply also to reading in the B
array, and printing the C array.
Note that multiple statements can be controlled, or
iterated, by a single for statement. The set of statements to be
iterated is simply enclosed in "curly brackets" or
braces . This is actually a very general point in
anywhere a single statement can be legally used, then a whole set of
statements, enclosed in braces, can be used instead.
The program MATMUL introduces a new operator, written
+= . This is a combination of addition and assignment. The
statement:
x += y;
is exactly equivalent to:
x = (x + y);
There are corresponding composite operators for all four basic arithmetic
operations (addition, subtraction, multiplication, division). If you
ever in doubt as to their effect, just write out the long form
instead.
Exercise 2: Bubble Sorting
You are asked to read in, and sort, a list of numbers which are
said to be of type double . double is yet another data
type - like int , float and char . double is most
similar to type float : it is also used to approximately represent
"real" numbers. It differs from type float is that it normally
has greater precision (more significant digits) and greater range (can
handle larger and smaller numbers - can have a bigger absolute value in
the "exponent" part in "scientific" notation). Type double is
actually the most common "preferred" type for general purpose
calculations. By that I mean that, if you need to represent numbers,
your first choice should normally be to use type double ; and you
should use something else only where there is some definite reason for
doing so.
Having said that, type double does introduce a few new problems of
its own. Therefore I suggest that you initially ignore the
reference to type double . That is, first try to get the program
to read and sort a list of numbers of type float (or even of type
int ). When you have got this working successfully then go back
and try to modify it for type double . Be warned: when
using scanf() to read values into a variable of type
double , the correct format specification is lf rather
than f ; but, paradoxical as it seems, you should still use
f as the format specification for printing values of
type double with printf() .
The length of the list of numbers is not fixed in advance. But
just as with MATMUL , you need to fix the size of array for holding
the list to some definite value, which will then be the maximum
length the program can deal with. The exercise does not explicitly state
a required maximum. I suggest that you get it to work for lists at least
up to 100 numbers long.
The original BUBBLE.C program does not read in the array to
be sorted. Instead the list is explicitly initialised in the program.
However, in the modified program, the list must be read in at
run time. You have already encountered code for doing that general sort
of job in program MATMUL.C , so look at that for examples or
inspiration.
Session 4: Week 7/8
Enter, compile, and test the program WOTRATE.C
wotrate.c , shown on pages 22 23 of Illustrating C . Then carry
out Exercise 3 of Chapter 2. For your convenience the exercise is
reproduced below.
Exercise 3: User Written Functions
For the math library functions sin(x) and cos(x) , the value
of x must be expressed in radians. Write functions Sine(a)
and Cosine(a) , for which the argument, a , must be expressed
in degrees of arc. All types are double .
Preparation
Prepare a file (called WOTRATE.C wotrate.c )
containing the program listed on pages 22 23 of the textbook. This
must be done in advance of the lab session. You should have
found that, if you click on the hot-linked file name, you can get a
display of the file, within Mosaic. You can, of course, retype the
program from this display into the Turbo-C++ IDE.
However, you may also save the
file directly from Mosaic, using the Save As menu item on the
File menu.
Hints
In testing the WOTRATE.C wotrate.c
program, it is essential that you prepare a number of test cases for
which you know "with certainty" what the correct answers are - so that
you can decide whether the program is giving these correct answers or not.
"Certainty" is hard to come by, of course. But in this particular
case, you may be able to use some of the work you
have done in previous lab(s) to generate "reliable" test cases.
In Exercise 3 you are asked to write two new functions.
Implicitly, of course, you are also expected to check whether your
functions Sine() and Cosine() work correctly or not. But
note that a function does not do anything unless and until it is called.
So to properly complete this exercise, there must be a main()
function which allows you to exercise the Sine() and
Cosine() functions.
For Exercise 3, your essential problem is to convert from degrees
to radians - since, if you could do that, you could the use the
existing library functions sin(x) and cos(x) to complete
the required calculations. So: as a first step, try to write a program
which just has a main() function which does this conversion from
degrees to radians. Then modify this program so as to encapsulate the
actual conversion proper in a function of its own. Then add the function
Sine() , and finally the function Cosine . At each stage make
sure that everything is working correctly: do not try to do too much at
any one step.
Session 5: Week 9/10: Introducing Recursion
lab5
This exercise is divided into 4 parts. For each part
your report should have separate sections labelled Plan ,
Coding and Testing . There should be a single overall
Conclusion section.
The first two parts represent revision of material already encountered
in previous exercises: arrays, user defined functions, the for
statement, the printf() function etc. The third and fourth parts
introduce the notion of recursion - a function which calls or
invokes itself !
Develop (i.e., design, enter, compile, test and debug) a function
called print (array, size) . The parameter array is an
array of integers (each of type int ) and size is the number
of entries in array . The function print () should have
the effect of displaying the contents of array . Each element of
array should be displayed on a separate line. Use a for
statement to achieve the required repetition.
Develop a function called reverse (array,
size) . The parameter array is an array of integers (each of type
int ) and size is the number of entries in array . The
function reverse () should have the effect of
displaying the contents of array , but in reverse order (i.e. the
last element of array should be displayed first etc.).
Each element of array should be displayed on a
separate line. Use a for statement to achieve the required
repetition.
Study the definition and discussion of the factorial
function, on page 25 of Illustrating C . Enter, compile, and test
the program FACTRIAL.C factrial.c . Note that this
program cannot work correctly for indefinitely large integers.
Attempt to identify the biggest integer for which the function gives the
correct answer.
Develop a function called summer(n) . The parameter n
is an integer (type int ). summer(n) should have the effect of
calculating the sum of all the integers up to n , inclusive.
It should achieve the required repetition using recursion . Attempt
to identify the biggest integer for which the function gives the correct
answer.
Term 2: Weeks 11 20
Session 6: Week 11/12: More Recursion
This exercise is divided into 2 parts. For each part
your report should have separate sections labelled Plan ,
Coding and Testing . There should be a single overall
Conclusion section.
Enter the file HCF.C hcf.c ,
shown on page 24 of
Illustrating C . Add a definition of a suitable main() function.
Then compile, test, and debug the completed program.
Modify the function reverse () , developed in
step 2 of Laboratory Session 5 lab5 ,
so that it uses recursion rather than a for
statement to achieve the required repetition. Compile, test, and debug
the completed program.
Hints
This exercise is concerned exclusively with the idea of
recursive functions. It is essential that, before attempting
it, you are sure you understand the concept of a function .
So, before coming in for this session ensure that you have
thoroughly studied Chapter 2 of Illustrating C , and the
accompanying sections of the on-line notes.
Before even bothering to test the HCF() function, make sure
that you understand what it is trying to do (calculate the
"highest common factor" of two numbers) and how it is trying to do
it. Please include in your report a short explanation (two or three
sentences), in your own words, of why Euclid's method of
calculating the highest common factor works.
Note carefully the introduction of a new operator in the HCF()
function - the remainder operator, denoted in with the symbol
. Thus, in dealing with integer arithmetic, we have
essentially two different kinds of division operator: the quotient
operator (denoted by / ) and the remainder operator denoted
by . This distinction does not arise in floating point
arithmetic, where only the quotient operator is required (why?).
To test the function HCF() you must first provide a
main() function in order to form a complete program (remember:
every complete C program file must contain the definition of
a function called main() ). In this particular case, the
main() function needs to do just whatever is required to allow you
to decide whether the HCF() function is working correctly or not.
Minimally this means that it must call the HCF() function
with some particular arguments, and should then print out the result
which is return 'ed by HCF() . But, of course, to properly
test HCF() you will need to test it with a range of different
arguments. Think carefully about the selection of test cases.
For example do you expect there to be some biggest numbers for which
the HCF() function will work? Smallest numbers? Should it work
if either of the arguments is zero? Should it work for negative
arguments? Does it matter which argument is given first - the bigger
or the smaller? And so on...
One option is to simply edit the main() function, and
rebuild (compile and link) the program before running each test. This
is simple and perfectly satisfactory. But if you have the time, and
wish to be a little more adventurous, you might try to write a
main() function which would prompt the user to type in test
arguments, and then feed them into HCF() : this would allow a whole
range of tests to be done simply by re-running the program (i.e.
without having to rebuild each time).
In the second part of the exercise you are required to develop a
variant on a function you were originally supposed to have developed in
the previous lab session. It is therefore vital that you actually
complete that previous exercise before coming in for this
lab session!
You are required to use recursion to implement the
repetition required in reverse () . The essential idea
of recursion is that a function actually calls or invokes (a copy of)
itself.
The way that can be used in this particular exercise is as
follows: suppose that a single call to reverse ()
just prints the last element of the array it is given (assuming the
array is not of zero length), and then calls itself (i.e. another copy
of reverse () ) to print the rest of the array (i.e.
passing on the same array it was given, except "minus" the last
element). And if reverse () is called with a zero
array length argument, it should simply return immediately - that
allows the recursion to "unwind" with all the copies of the function
then exiting in turn, until we finally get back to the main()
function. Of course, the idea of passing on the array "minus" the last
element can be achieved simply by passing on a array length argument
which is one less that the argument which has been passed in - the array
"itself" does not actually have to be modified.
Session 7: Week 13/14: Lab Test 1
The notes for this test are provided in a separate document entitled
Laboratory Test 1
../test1/test1.html .
Session 8: Week 15/16: Encryption
"Encryption" is the process of disguising a message or text so that its
meaning will not be apparent to anyone who may intercept it,
accidentally or otherwise.
In this exercise you will work with a very simple encryption procedure.
This is a called a substitution cipher, in which each letter of the
original message (the "plaintext") is replaced by a different letter to
generate the coded message (the "ciphertext").
To simplify matters further, we will restrict our plaintext to use only
the upper case letters A to Z ,
and the space and newline characters;
furthermore, space and newline characters will be left unchanged by the
encryption.
In each case, the replacement letter will be a fixed number of letters
later in the alphabet compared to the original letter (and "wrapping
around" back to A again after Z as necessary). The number of
letters to count forward is called the key .
Thus, with a key of 3, we would replace A by D , B by
E and so on, with, finally, W being replaced by Z ,
X by A , Y by B and Z by C .
This simple kind of cipher is sometimes called a "Caesar Cipher" after
Julius Caesar, who is said to have used it for secure battlefield
messages.
To implement a Caesar cipher with a program, note that, when
characters are read in with the getchar() function, what you
receive (in int form) is the ASCII numeric code. The upper case
letters A to Z have consecutive numeric codes from 65 to 90.
Thus, given a particular key value (a number from 1 to 25) simple
addition will encypher a letter as required - except that if the sum is
greater than 90 you must "wrap around" to the start of the alphabet
again by subtracting 26.
You are required to develop a program which will input a stream of
characters, using getchar() , and encode them with a Caesar cipher;
the ciphertext should be output with putchar() ; the program should
terminate when getchar() signals that the end of the input stream
has been reached. The key for the encoding should be represented in
your program with a symbolic constant. That is, using something
like define KEY 5 , or const int key = 5 . See page 34 of
Illustrating C .
You may find it handy to be able to run your program on an input
file (and producing an output file ) rather than working only
with the keyboard and screen. There are various ways of doing this, but one
relatively simple method is described in the hints hints8 section
below.
Test your program carefully. Record in your report log book what tests
you carry out, what results you expect, and what results actually occur.
Now think about the problem of cryptanalysis: that is, suppose you
are "the enemy" and have intercepted a transmission encyphered with a
Caesar cipher. How would you set about trying to decipher it? You may
wish to challenge a colleague in the class to see who can decipher the
other's ciphertext message most quickly. Record your conclusions in your
report.
vigenere
Finally, if you have time, try to develop a program to implement the
somewhat more sophisticated Vigenere cipher, invented in the 16th
century by Blaise de Vigenere. This has the same basic idea as the
Caesar cipher, but the key no longer stays the same throughout the
message: instead it follows some periodic cycle. A Vigenere key is thus
not a single number but a sequence of some fixed length - say 10, 3, 25,
which would mean the first letter should be shifted by 10, the second by
3, the third by 25, and then back to 10 again etc.
Hints
hints8
Perhaps the simplest way of arranging for your program to take its input
from a file, and/or route its output to a file, is to use
so-called command line redirection .
To do this
your program must already have been compiled successfully, resulting in
the creation of an executable version of it. If your source file were
called foobar.c , the executable version would be in a file called
foobar.exe . The executable version must be run from the MS-DOS
command line interpreter application
(and not from within the Turbo-C++ IDE ) in order to use
redirection. So you should start up an MS-DOS
Window by double clicking on the relevant icon in Program Manager.
Within this window you will first need to set the "current" drive and
directory
to where you have stored the executable program. This will normally
be the directory swe1 on drive D: , so issue the
following commands:
C: d:
D: cd
D:
Now execute your program, with a command something like this:
D: foobar out.txt
The < serves as the "input redirection" character in an MS-DOS command
line. The effect is that the input for foobar will no
longer be taken from the keyboard, but from this file instead.
Similarly, the > character in an MS-DOS command provides for output
redirection, to direct output to a file instead of the screen.
With any given command invocation you can use input redirection only, or
output redirection only, or both together.
Note that if you change your program you must rebuild it it again, from
within the IDE, before trying to rerun it in the MS-DOS application.
To close down the MS-DOS application, give the command exit thus:
D: exit
Session 9: Week 17/18: Projectiles
The purpose of this exercise is to calculate the theoretical trajectory
of a projectile in a constant gravitational field. In other words, we
want to calculate the path followed by, say, a ball thrown in the air.
We are going to make some simplifications. We will work in just two
dimensions (i.e. as if we are looking exactly side on at the path of the
ball). We will also neglect air resistance, and will treat the
projectile as a point mass. This allows a simple analysis of the motion
of the projectile, using newtonian theory.
In our two dimensional ( x and y ) coordinate system, x will denote
horizontal position, and y will denote vertical position.
The initial value of x will be taken as zero.
A zero y value will correspond to the projectile being on the ground.
The initial value of y will be some positive value (in effect, the
height of the person throwing the ball).
V_x will denote the horizontal component of velocity, and V_y the
vertical component. We will assume that both V_x and V_y are
initially positive. Similarly, A_x and A_y will be the components of
acceleration.
The "initial" information provided for a particular run of the program
(by assigning explicit values to suitable variables) will be the initial
height ( y ) and initial velocity components ( V_x and V_y ) of the
projectile. These values will be assigned to suitable variables. You can
either code values directly into the program (in which case you will
have to change the program, and rebuild, every time you want to try
different values) or you can have the program prompt for and read in
values when it starts running.
The output from the program will be a print out on the screen of values
of x and y at successive, "small", intervals of time. The program
should terminate when y becomes zero again (i.e. the projectile falls
back to earth). This print out may be captured into a text file
(see the previous discussion of redirection hints8 )
and a graph plotted of y versus x (one way of doing this
is described in detail in
the Hints hints9 section below).
Informally, the motion of the projectile is pretty simple: it moves
steadily to the right (positive x direction), simultaneously going up
for a while (positive y direction), and then coming down again to
land. Technically, under the various idealizing assumptions we
have made, it can be shown that the trajectory will be
parabolic .
Formally, at each time step, the new y co-ordinate can be calculated
as:
y(t+1) = y(t) + y(t)
where (approximately):
y V_y t
In words, if the velocity in the y direction is V_y , then, in a time
t , the projectile will move a distance of V_y multipled by
t ; and the new y position is the old y position plus this
movement. This calculation would be exact if V_y were
constant ; if V_y is changing (as it will be in our case) then
the calculation will still be approximately correct, as long as as
t is "small" - namely small enough that the percentage change
in V_y in that time is "negligable". Of course, you can only
estimate this once you have some idea of how quickly V_y will be
changing...
The new x co-ordinate may be calculated similarly.
Now, in principle, both V_x and V_y could also be
simultaneously changing, and have to be recalculated, at each time step.
But since we are neglecting air resistance, and since gravity works
straight down, there will be zero acceleration in the x direction
( A_x is zero), so that V_x will actually remain constant , equal to
its initial value. There will be acceleration in the y direction
however, of value -g , where g is the acceleration due to gravity at
the surface of the earth (about 9.81 meters per second per second).
This is negative because our y co-ordinate is positive in the upward
direction and gravity works downward (last time I checked anyway...).
This acceleration in the y direction will be constant (since g is
not changing). Thus, the new V_y at each time step can be calculated
as follows:
V_y(t+1) = V_y(t) + V_y(t)
where:
V_y = A_y t
That is, the change in the (vertical) component of speed is simply the
(vertical) acceleration multiplied by the time it acts over; and the new
speed is simply the old speed plus this change. This change
will, of course, be negative, because A_y is negative; to put that
another way, what goes up must come down!
You should test your program carefully. Explain, in your report, what
tests you carry out, how the program behaves, and whether it has passed
such tests.
In the first place, the repetition or iteration in your program should
be achieved using a while statement. However, if you succeed in
getting this working satisfactorily, you should then experiment with
using a for statement instead.
Hints
hints9
Sometimes you may find it useful to view the output of a program in the
form of a graph. There are a variety of ways of doing this. One simple
way is using a utility program, provided on the PC's in the CAE Lab,
called graph . This program should be run as a full-screen
MS-DOS application. To do this, first open an MS-DOS window by double
clicking on the relevant icon in Program Manager. Then make this window
take over the full screen by pressing Alt-Enter
(i.e. press and hold the Alt key while you press the Enter
key also).
If your program's output has been stored in a file
called out.txt , you can then see a graph of this simply by giving the
following command:
D: graph out.txt
Note that graph expects each line of its input file to contain
just two numbers. The first is taken as the "x" co-ordinate, and the
second as the "y" co-ordinate, of a single point to be plotted.
graph will automatically scale the axes in order to fit in all the
data points.
You can exit from the graph program by pressing the Esc key.
You can make the MS-DOS application revert from full-screen to a window by
pressing Alt-Enter again.
Session 10: Week 19/20: Letter Frequencies
In a world where there are code makers there will, inevitably,
also be code breakers . One of the simplest tools in the armoury
of a code breaker is a program to count the relative frequency of each
letter appearing in the cyphertext. For any given language there will be
a characteristic distribution of letter frequencies in the uncoded
message (the "plaintext"). The most commonly used letter in English is
e , by a wide margin; t is in second place, with a and
o nearly tied for third; i , n and r are also
very commonly used.
If we know that a coded message uses a simple substitution cypher (such
as the Caesar cypher we saw previously) then a simple count of the
relative frequencies will allow us to make a fairly good guess as to the
letters which have substituted for the most common english letters.
Often this would be enough to allow the remaining substitutions to be
easily guessed.
Of course, real cyphers are much more sophisticated and harder to
break than this (how would you tackle a Vigenere vigenere
cypher for example?).
But letter frequency counts still form an essential tool for code
breaking, albeit in conjunction with many other techniques.
Part 1: Counting
Your first task in this exercise is to write a program which will read a
file (using getchar() as usual), and count how many times each
alphabetic character occurs. It should ignore distinctions between
upper and lower case, and should ignore all non-alphabetic
characters. That is: both a and A should contribute
to the count for the letter "a"; and non-alphabetics should not
contribute to any count.
When the file has been completely read, the program should output a
table showing the count for each of the letters A to Z .
As usual, you should test your program carefully, and document any
problems etc. in your report.
Part 2: Statistics
Next you are required to enhance the basic program (rename it at this
stage, to avoid overwriting your first program!). This enhanced program
should calculate the total number of all alphabetic characters read,
and output, for each character, both the actual number of times it
occurred, and the percentage this represents of the total number of
alphabetic characters encountered.
Again, test and document the program carefully.
Part 3: Graphing
Finally: develop a further version of the program, which, instead of
outputing the relative frequencies as numbers , outputs a
histogram . This will be made up of lines of varying width (made up of
asterisk characters) one line for each letter, from A to Z .
The letter with the most occurrences should be scaled to give a line
with, say, 60 *'s. Then the lines for all the others should be shorter
in proportion. The display might look somewhat as follows:
A *************
B ******
C **
D ***
E ********************
F ******
G *
- and so on.
Note that histograms would normally be plotted with the independent
variable oriented vertically, whereas I have described an horizontal
orientation here. If you have time, try to alter your program again, to
yield a histogram with a vertical orientation.
Tutorials
tutorial
A voluntary, "walk-in", tutorial will be offered
each Tuesday, from 10.00 11.00 pm, in the CAE lab. At least one of the
instructors and/or demonstrators will be present at this time. This is
an excellent opportunity to look for help in completing any lab
exercises which you have outstanding, particularly if you are stuck with
some problem you simply can't seem to overcome.
You can, of course, submit queries by e-mail at any time (see
Feedback feedback ); but this may be impractical for various
reasons (e.g., if your problem is that you can't get the mail system to
work!). In such cases, the tutorial is your opportunity to get the
problem sorted.
Assessment
The course will be assessed by way
of two supervised Laboratory Tests . The first of these will take
place at the start of Term 2, the second at the end of Term 3. The
overall subject mark will be a weighted sum of the marks achieved in
these two tests: 25 from the first, 75 from the second.
The format of the tests will
be very similar to the format of the normal lab exercises. Each student
will work alone. The tests will be of three hours duration.
Instructions will be accessed via the World Wide Web .
The work to be done will be derived from work done during the normal lab
sessions - though it will, of course, differ in detail. A report will
be required, submitted electronically. This report will form the basis for
marking the test.
Detailed information on the format of the report, and the marking
scheme to be used, are available in the
Laboratory Report Guidelines
http://www.eeng.dcu.ie/ 7Emcmullin/swe1/rptfmt/rptfmt.html .
Students who are repeating this course in 1994/95 will be
offered the option of being assessed on the previous model (three hour
written examination) instead. All students who intend to avail of this
alternative should confirm that in writing to me as soon as
possible.
Feedback
feedback
You can give me feedback on any aspect of the course, or ask questions
etc., by sending e-mail to mcmullin@ugmail.eeng.dcu.ie . I normally
check this mailbox at least once every day, and will respond as quickly
as I can. This is far and away the most reliable and effective way of
getting in touch with me.
Other On-Line Resources
This complete document is available in
LaTex swe1root.tex LaTeX is a typesetting
mark-up language, particularly suitable for hierarchically organised
documents, and documents using mathematical notation extensively. It
is a de-facto standard in academic publishing. A LaTeX source file is
still a form of ASCII, but with typesetting commands interspersed with
the "raw" text. It can be read with any tool which can display ASCII
files.
and plain ASCII swe1root.txt forms. Thus, you can
download the document and browse it offline. However, note that it is
rather large (c. 200KBytes) - you would not be able to load it with, for
example, the MS-Windows NotePad application. But you could view it
with the built-in editor of the Turbo-C++ IDE. Alternatively,
you might want to install a copy of the
Programmer's File
Editor ftp://ftp.hea.ie:/pub/windows/programr/pfe0507.zip . This is a
powerful (and free!) professional text editor for Windows. It does the
same job as NotePad, but much better!
There is a separate detailed document dealing with the
C Standard Library
http://www.eeng.dcu.ie/ 7Emcmullin/swe1/stdlib/stdlib.html .
Again, it is also available for downloading, and offline browsing, in
LaTex stdlib.tex
and plain ASCII stdlib.txt forms.
There is also a very wide range of resources available elsewhere
on the Internet, on
the subjects of and C++ programming. Pointers to many of these
have been collected on the Web Page called
Learn C/C++ Today http://vinny.csd.mu.edu/learn.html .
If you are looking out for further elaboration or alternative approaches
to the C language I strongly recommend that you have a browse through
the materials identified by this site.
Copyright
This Hypermedia Document is copyrighted, 1994, 1995, by
Barry McMullin
http://www.eeng.dcu.ie/ 7Emcmullin/home.html .
Permission is hereby granted to access, copy, or store this work, in
whole or in part, for purposes of individual private study only. The
work may not be accessed or copied, in whole or in part, for
commercial purposes, except with the prior written permission of the
author.