html article C at_work.gif (Under Construction...) 1 1 Software Engineering 1: Course Notes Barry McMullin This is the online hypermedia documentation for the course Software Engineering 1 . This is a required course for all students enrolled on Stage 1 of the B.Eng. (Electronic Engineering) http://www.eeng.dcu.ie/eenghome/ee_info/ee_info.html or B.Eng. (Telecommunications Engineering) programmes offered by the School of Electronic Engineering http://www.eeng.dcu.ie/eenghome/eenghome.html of Dublin City University http://www.compapp.dcu.ie/DCU_home.html . at_work.gif These online notes are under construction, and will be incrementally extended and corrected as the year progresses. Thus, early in the year you will find many sections to be empty, but these will be filled out as we reach the relevant stages in the course. Introduction The course provides an introduction to Software Engineering - which is to say the skill of developing software or programs for digital computers. The vast majority of modern technological products - from dishwashers to communications satellites, CD players to industrial robots - include at least one, and often many more, embedded microprocessors . A microprocessor is the central processing component of a general purpose computer, implemented as a single integrated circuit (IC). The specific behaviours or functionalities of the completed product are then realised by providing these microprocessors with appropriate sequences of instructions, or software . Software is thus a crucial foundation for virtually all modern technological products - and software development is an essential and pervasive skill for every engineering discipline. This is reflected in the design of the engineering programmes in DCU, where Software Engineering takes a prominent role all stages - beginning with this very important introductory course. The objective of the course is that, on completion, you should be able to design, code, test and debug small, but complete, programs. The course introduces a modern high level procedural language (ANSI C), supported on a general purpose computing platform (ISA hardware, running MS-DOS and MS-Windows). The course is preparatory in nature, providing necessary foundations for a variety of courses in subsequent stages of the relevant programs. Syllabus (Subject to revision.) Background: Introduction to stored program digital computers. Historical perspectives. The nature of software engineering: issues of cost, reliability and efficiency. The von Neumann architecture, and the idea of a sequential machine. Evolution of computers and programming languages. Characters and Text: Computerised text. Characters and character sets. The 7-bit ASCII character set. Text files. Text editing. Directories. Text files as computer programs: the concepts of translation, execution, and debugging. The use of an Integrated Development Environment (IDE). Identifiers: Concept of identifiers . File and directory names as identifiers. Identifier uniqueness. Syntax of identifiers in programs. Reserved identifiers in (keywords). Conventions and guidelines on identifier usage. Data: Concepts of data type, constants, and variables. Type int . int constants and range. Declaring variables. Operators: Arithmetic operators: + (addition), -} (subtraction), * (multiplication), / (quotient), (remainder). Negative operands: direction of quotient truncation and sign of remainder. Exceptions: overflow and divide-by-zero. int values corresponding to "truth" and "falsehood". Equality operators: == (equal to), != (not equal to). Relational operators: > (greater than), >= (greater than or equal to), < (less than), <= (less than or equal to). Logical operators: & & (AND), || (OR), ! (negation). Expressions: Forming expressions. Concept of expression evaluation. Sequential nature of expression evaluation. Order of evaluation: operator precedence and associativity; forcing associativity with parentheses. Assignment: The concept of assignment. Operator side effects. The assignment operator ( = ): evaluation and side effects. Increment ( ++ ) and decrement ( }) operators. Type unsigned int ; type conversion. Programs: Skeleton structure of a program: the main function. Declarations and statements. Comments. The concept of flow of control. Types of statements: expression, if , if-else and switch statements. Compound statements. The idea of iteration and the while statement. Symbolic constants and define . Input and output by direct modification and examination of variables within an Integrated Development Environment. Monitoring execution: tracing and breakpoints. Development Cycle: Specification. Algorithm design. Coding. Compilation and linkage. Testing: selection of test cases. Exception handling. Defensive design. Debugging. Instructors The lecturers for the course are Dr. Barry McMullin http://www.eeng.dcu.ie/ 7Emcmullin/home.html , and Ms. Fiona Lodge . The laboratory demonstrator for the Tuesday sessions is Fiona Murray , and for the Thursday sessions, Feargal Costello . Textbook The course is designed around the textbook: Illustrating C (ANSI/ISO Version) Donald Alcock Cambridge University Press, 1992. ISBN: 0-521-46821-3 The cost is approximately IR This is a required course text. Every student enrolled on the course must acquire a copy of this book. Lectures Chapter and page numbers relate to the assigned textbook. General Comments on Lectures I find the subject of software engineering endlessly fascinating, and get a great kick out of successfully solving a programming problem. The down side is that this subject can be almost equally endlessly annoying and frustrating, particularly when you are trying to come to grips with it for the very first time. This course has been evolving over the past several years, but one basic principle has remained constant: I see my role not as a teacher or instructor, but as a facilitator . The reason is that software engineering is a subject which is dominated by doing . It cannot be primarily learned by listening or reading, though both of these can play a subsidiary role; it must be learned by taking control, and doing . The frustration of the subject lies in the fact that, particularly at first, this doing is necessarily uninformed and feels very much like thrashing around in the dark - and will frequently end in failure; but the joy lies in the fact that the failures are always ultimately understandable and correctable - and this understanding and correction delivers tremendous satisfaction. Given this emphasis on doing, I consider the time you will spend in the laboratory (both scheduled lab sessions, and private study time) as being the most central and important component of the course. Nonetheless, a small lecture component is still useful to complement this practical work, and to provide overall structure and pacing to the course. There are two scheduled lectures and one tutorial each week. In practise, I expect all three sessions to be tutorial in nature. This reflects an increasing emphasis on active, personal, study time. The lectures will closely follow the textbook. I work on the assumption that you will have studied the assigned material in the textbook, and the related notes provided here, before each lecture. In this way the lecture sessions can be interactive in nature. Note that attendance at lectures for this course is not monitored. It is entirely under your own control and discretion. Delivering the lectures on this basis has several significant advantages: I am relieved of the tedium of presenting extensive, repetitive, and usually rather boring, notes in the lectures. Perhaps more importantly, you are freed of the tedium (not to mention writer's cramp) of transcribing them. The lectures can be used to allow discussion, clarification (and sometimes even correction!) of the textbook material. Further, should you choose to play an active role in the lectures, they can be used to explore those topics which you find particularly interesting or difficult in more detail. Since all the written materials for the course are available to you independently of attendance at lectures, you can decide for yourself whether or when it is worth your while to attend the lectures. For example, if you happen to already be proficient in certain aspects of software engineering, and are satisfied that you completely understand the assigned materials, then you may well decide to miss certain lectures and allocate the time for some other, more pressing, studies. However, I recommend that you adopt this position only with extreme caution! Conversely, should you miss a lecture for some reason outside your control, or otherwise fall behind in the course, then because the written materials are all available to you, and the schedule is explicitly laid out in advance, you will have the flexibility to schedule in extra study time to make up. In summary, my intention in designing this course is to empower you, the student. That is, I hope that you will be able to enjoy and benefit from this course precisely in proportion to the energy and effort you are willing to invest in it. No more, no less. Term 1: Weeks 1 10 Weeks 1 2: Background The course is introduced with background discussion of computers, computer programming, and some demonstration of particular computer tools which will be used during the course. Week 3 4: Chapter 1 pp. 1 10 Overview Chapter 1 is short, but very important. If you can successfully master it, and complete its one associated exercise (p. 10) then you will have laid down an excellent foundation for tackling the substantive topics which will follow - and they will follow very quickly indeed! Conversely, if you experience any difficulty with anything in this chapter, this is your opportunity to get your questions answered quickly. In brief, chapter 1 introduces a problem (the calculation of repayments on a loan) which can be solved "by hand" (i.e. by manual calculation), but which can more conveniently be solved by programming a computer, in a suitable language, "once and for all". The bulk of the chapter is then concerned with presenting and dissecting such a program, written in First Step: Understand the problem! I think that there is only one way to science - or to philosophy, for that matter: to meet a problem, to see its beauty and fall in love with it; to get married to it, and to live with it happily, till death do ye part - unless you should meet another and even more fascinating problem, or unless, indeed, you should obtain a solution. But even if you do obtain a solution, you may then discover, to your delight, the existence of a whole family of enchanting though perhaps difficult problem children for whose welfare you may work, with a purpose, to the end of your days. Realism and the Aim of Science Preface 1956, p. 8 Karl Popper Your first step in tackling any software engineering project is to ensure that you fully understand the statement of the problem. Indeed, many senior software engineers (classified as Systems Analysts ) specialise in nothing but clearly stating the problems to be solved by software systems! Alcock presents a formula for calculating loan repayments. Ask yourself do you believe this formula before going any further. If you do believe it, try to say exactly why . In any case, try to test the formula in any way you can dream up. You may not be able to prove that it is definitely right , but you should be able to think of at least some partial tests which it should pass. For example, does it give the correct answer for the limiting case of the principle being 0? Can you think of any other comparable tests? Next, actually carry out the sequence of instructions which Alcock lists on p. 2, for at least a few different cases. Carry out these instructions as exactly and literally as you possibly can (you will need to get a friend to act as the "client", and then alternate roles). If you find anything in the instructions which you think is obscure or ambiguous - which you cannot immediately see how to carry out - then make a note of it. What's a "high-level" computer language? The next sections of the chapter present a program to make a computer carry out, in effect, the same sequence of actions to calculate loan repayments. Note that essentially the same program will work on practically any computer. This is a key advantage of using a so-called "high-level" language - it can be automatically translated or "compiled" into a form which can be executed on practically any computer. However, the precise procedures for entering the program, and arranging for it to be translated and executed will generally vary from computer to computer. For this reason, Alcock cannot and does not give detailed instructions for how to do this; however, you are given the appropriate detailed instructions, for the computers which you will be using, in the notes for Laboratory Session 2 lab2 . Alcock next dissects this simple program in great detail, to explain, at least in a superficial sense, what its components are, and how they are interpreted by the computer. Ask yourself at this point if you can explain why a special language such as is necessary at all - why can the computer not simply understand the original instructions written in English on p. 1? Watch for the new technical terms! Note that Alcock introduces a variety of technical terms in this chapter - such as directive , block , statement etc. He generally italicises such technical terms on the first occasion on which they are used. This specialised technical language is very important in this subject - as, indeed, it is in most other engineering subjects; if you do not realise which terms are being used with specialised meanings, and what those meanings are, from the very start, then you will quickly find that you are becoming very confused. So, I suggest you acquire a notebook to use exclusively as a glossary for the duration of this course. Organise it alphabetically. Each time you come across a word which seems to be used in a special technical sense, write it in, together with a definition in your own words, and an example or two. You may well have to go back over certain definitions at various times, and revise or correct them - but that is perfectly OK. If there are words that seem to have a technical meaning, but you cannot quite see what the meaning is, then e-mail feedback a question, or ask in the lecture. Start constructing your glossary now, by going through the whole of Chapter 1 carefully. Caution: On Program Style In dissecting the initial example program Alcock comments that: "You have complete freedom of layout. Statements may be typed several to a line, one per line, one to several lines." His point here is that as far as the computer - or more precisely, the compiler - is concerned, many details of layout or formatting of the program are irrelevant. This is true. However, you should notice that programs have effectively two parallel, but quite distinct, purposes: To represent a suitable sequence of instructions which (after compilation) the computer can execute to accomplish some task. This is the role which Alcock explicitly recognises and discusses. To document that set of instructions so that some person can understand how the computer is accomplishing the task. This becomes vitally important whenever it is necessary to modify or extend the behaviour of a program - or to correct previously undetected malfunctions. Alcock does not explicitly refer to this purpose of the program at all. It should be clear that, in its role as documentation for another human being, the details of layout and formatting of a program are very important. A poorly formatted program may be perfectly "correct" - in the sense of generating the correct answers to the problem at hand - but may be virtually impossible for a human being (even the original author) to decipher if it becomes necessary to enhance it (or even to decide whether it may malfunction in certain circumstances). Another aspects of program design which is similarly constrained is the choice of identifiers - names for variables and other objects which the program uses. The computer is largely insensitive to particular identifiers, as long as, in each context, they unambiguously identify the "correct" object. In particular, the computer does not care whether an identifier is "meaningful" in human terms. The names P , Principal and bZ34ywT are all equally valid to the machine, as along as they are used consistently (unambiguously). However, the net effect if entirely different from the point of view of someone trying to understand how the program works. So try to get into the habit, from the very start, of taking care with the formatting and layout of your programs, and with your choice of identifiers. These things may not directly affect the functioning of your program, but they will seriously affect how long it actually takes you to get the program working satisfactorily! Week 5 7: Chapter 2 pp. 11 19 What's all this about "Boolean" variables? Alcock is at pains to point out that does not support "Boolean" variables. What does this mean? Well, certain other programming languages (notably Pascal and its descendants) support a special kind of variable which can only hold or be assigned one of two distinct values - usually referred to as TRUE and FALSE . These are called "Boolean" variables in reference to the notable 19th century mathematician George Boole . Boole's greatest work was entitled An Investigation of the Laws of Thought on Which Are Founded the Mathematical Theories of Logic and Probabilities , published in 1854. Boole reduced logic (the analysis of the truth or falsity of complex propositions) to a simple algebra, thereby incorporating logic into mathematics. Boole's two-valued, or binary, algebra is the simplest form of his more general boolean algebra . Now Boolean variables would be quite different in kind from the numeric variables we have seen so far in Admittedly, as we shall see in more detail later, the total number of distinct values that can be accommodated in any type of variable (such as type int or type float ) is always finite , but it is still much bigger than two! So why are we worried about all this anyway? Well, it turns out that, in many programs, it is desirable for the program to have a "choice" of things to do, depending on a variety of factors that only become known when the program is actually executing. In effect this means that we want the program to be able to evaluate some proposition or condition and classify it as being "true" or "false". To allow this, a programming language normally provides operators to test or compare various conditions - such as whether certain numbers are equal, or one is greater than the other etc. If the language supports Boolean variables as such, then the result of applying or evaluating such an operator would be one of the Boolean values normally denoted TRUE or FALSE . In however, there is no special Boolean data type, so, instead, the so-called logical operators simply generate numeric results, with the number 0 being interpreted as signalling "false" and the number 1 signalling "true". Technically, the result of applying a logical operator is of the particular numeric data type called int . The logical operators would never generate any value other than 0 or 1 . However, where, in effect, a Boolean value is examined or used for some purpose in a program, any int value may be introduced, with the convention that all non-zero values (including 1 of course) are classified or interpreted as meaning "true". Making a statement! Now that we have some logical operators available to us, how can we use their results to control the flow of execution in a program? The answer is that there are special kinds of statement where the exact effect of the statement can depend on a value generated at execution time (usually by a logical operator). But what is a "statement" anyway? Let's back up and review the concept a little. We have seen examples of "statements" already, in the introductory chapter; they were things like: R = Rpct / 100; or: printf(" Principal, Rate These are all what we might term "simple" statements: they are the components of a program that are carried out, in sequence, when the program is executed. Technically, these particular statements are all examples of what are called expression statements: that is, as we shall see later, we can always think of the execution of these statements as involving, in effect, the evaluation of an "expression". In any case, the "statement" notion is much broader than is accommodated by just expression statements. We shall see other more complex kinds of statement shortly. But first note that statements in general should be distinguished from the other major "kinds" of program component. As well as statements, other program components we have seen are comments , preprocessor directives , and declarations . Comments are simple pieces of text delimited by the special tokens /* and */ which are ignored by the compiler, but serve to provide some additional information to a person reading the program: /* WOTCOST; Computes the cost of a loan. */ Directives are the components of a program signalled by the control the way the rest of the program is translated or compiled; we say that they play a role only at "compile time". The only example we have seen so far is the include Its effect is to tell the compiler to also compile an extra separate file (in this case, one called stdio.h ) as an integral part of the compilation of our own file. Declarations , roughly speaking, "prepare the ground" for statements to be subsequently executed. So far, the only declarations we have seen have been concerned with creating variables with given names and types, e.g. as follows: float P, Rpct, R, M; Recall that declarations and statements can act as sub-\/ components: in particular, they may be combined to form a block , by enclosing them within braces; and a block, in turn, can form part of a function definition . This notion of hierarchical structure , with components within components within components, is a very important and pervasive one in the design of all complex engineering systems. For now, note that technically a block is, itself, regarded as a special kind of statement . In fact, it is also alternatively referred to as a compound statement . The significance of this is that anywhere in a program where it is "grammatically" (or "syntactically") legal to place a "simple" statement then it will also be legal to put in a whole block, comprising an indefinitely large group of declarations and statements. It follows, logically, that a block may itself be a component of a larger more encompassing block and so on! Granted, there is no obvious reason why this might be useful yet, but we'll get there in due course! Branching and Looping So far we have two kinds of statement: "expression" statements, and "block" or "compound" statements. Expression statements are just executed. The (simple) statements within a block are executed in simple sequence. As yet we have no way of making the execution of a program depend on, or vary in sympathy with, a logical value determined at execution time. There are two principle kinds of variation we typically might want: branching (or, as Alcock says, selection ) and looping. In branching we have alternative things we might want our program to do; in looping we want to do a particular thing repeatedly - over and over again. provides several different ways of achieving both of these effects; but for the time being, it is sufficient to have at least one way of doing each in our armoury, and that is what Alcock now introduces. He presents the if - else kind of statement to achieve branching, and the for statement to achieve looping or "iteration". Examine the formats which Alcock specifies for the if-else and for statements very carefully. Notice that, in each case, the statement format contains one or more other statements as components nested within it. In the case of an if statement, the nested statement is to be executed if and only if the expression value fed into the if is "true" (non-zero); in the case of if-else exactly one or the other of the nested statements will be executed depending on whether the expression is classified as "true" or "false"; and in the case of the for , the nested statement is what gets executed repeatedly. My point here is to draw your attention to the fact that the nested statement is grammatically, or "syntactically", allowed to be any kind of statement at all. So, as Alcock shows in his example, the statement nested within an if can, itself, be another if statement (complete with a further, nested, statement); equally, the nested statement could be a for statement or, indeed, a compound statement. The latter means that, with one if statement, we can control execution of a whole block of statements - simply by enclosing them in braces to make a single compound statement. Furthermore, the nesting of statements within statements can, in theory, be made as deep as we like! However, in practice, programs get rather difficult for human beings to understand if statements get nested too deeply within each other - so don't get carried away with this! A Shady Character? A computer clearly has to be able to deal with "text" as well as "numbers", if we are going to use it for general purpose information processing. We need to be able to print out headings, or textual prompts at the very least. More generally, we want to be able to carry out "processing" on textual information: sorting, searching, formatting etc. Text is ultimately made up of basic components we call characters . But what exactly counts as a "character"? In the English speaking world we tend to be happy with upper and lower case alphabetics, some digits, and a selection of miscellaneous brackets, punctuation, and other "special" characters. Even that still leaves room for tremendous variety in style and size . But to handle text in most languages of the European Union we need to add a large variety of accented characters ( a , u ) etc.; and if we move further afield, things get even more complicated: in the middle East we need arabic and hebrew alphabets, in the CIS we need (at least) the cyrillic character set, and in the far East we need even more elaborate character sets to handle Japanese, the various kinds of Chinese, and so on. Further, if we want to deal with mathematical notation, we need to be able to deal with things like sub- and super-scripts ( x^ y_1 ), and so forth. So what should count as a "character", or even how many "distinct" characters we might want to deal with, is a fairly difficult question to deal with. However, we have to start somewhere! In the case of computers, the start was historically based on good old fashioned typewriters, or, more strictly, teletypewriters . Teletypewriters, or teletypes, were automatic electric typewriters which could be remotely hooked together, so that whatever was typed on one was automatically printed on the other also. In the early days of computers, teletype machines were already fairly freely available, and were a convenient mechanism for both keying input information into a computer, and for printing output information from the computer. However, it was in the nature of teletype technology that they could only support a rather limited range of characters: typically just the upper and lower case roman alphabet, digits 0 to 9, and a variety of "special" characters. Furthermore, if these devices were to work properly with each other it was essential that they all adhere to some single standard - both for the characters to be supported, and the way they should be encoded (something akin to the "dots" and "dashes" of the earlier Morse code). A set of just 96 such characters, plus 32 non-printable "control" characters, together with a standardised way of encoding these into electrical impulses (as strings of "highs" and "lows", or "ones" and "zeros" - i.e. binary numbers) was therefore devised, and became a de facto standard, known as the American Standard Code for Information Interchange , or ASCII for short. Because it only allows 96 characters, based on the roman alphabet, ASCII is very limited. It doesn't include any accented characters, or even the sign. It doesn't allow for variations in size, or style. It can't deal properly with mathematical notation. And it does not address non-roman alphabets at all. Nonetheless, ASCII has become a sort of "lowest common denominator" or lingua franca of computers. They can virtually all deal with ASCII encoded text. In particular, the text that makes up computer programs is normally limited to the ASCII character set (or some even more restrictive character set), to ensure that the programs can be potentially processed on the widest possible variety of computers. Now, the ASCII code is certainly not the only way of representing textual material in computers, and the standard for the language does not absolutely require either that the characters making up a program, or the characters making up textual material being processed by the program, must be encoded in ASCII . Instead, the standard simply stipulates certain restrictions on the encoding, which are summarised by Alcock. Alcock rather misleadingly talks of certain characters being stored in a certain "order" and/or "contiguously"; what he means is that characters are encoded in a certain order or contiguously. ASCII text satisfies these restrictions, and also has some additional properties which Alcock notes. As it happens, in the implementation of the language which you will be using, characters are, indeed, encoded in ASCII and these properties do hold. However, the point to be made here is that in general, in your programs, you should not rely on these latter properties (specific to the ASCII coding), since they are not guaranteed to hold on all computers implementing the language - except, of course, where your program is specifically intended only to work with ASCII encoded text. The newline Character - and other vagaries! One special problem in dealing with characters is how to represent the idea of a "line break" or a "new line". In the ASCII character set there are actually two separate numeric codes associated with the idea of line breaking - one called "carriage return" the other called "line feed". The terminology goes back to mechanical typewriters and teletypes, where moving on to a new line required two separate actions: the "carriage" carrying the roller and paper had to be moved back to the extreme right (so that the left hand edge of the paper was relocated under the printing position) and the paper had to be "fed" through the roller by the correct amount for a new line. Of course, these quaint historical origins have rather little to do with the operation of modern computer displays; but the distinction between moving the print position (the cursor) down a line, and moving it back to the left hand margin still exists, and so the distinct functions of the two ASCII control codes still apply. Nonetheless, it is conventional in programming to have a single control code which is denoted by the sequence n and which has the effect of both moving down a line, and moving back to the left hand margin. This special character is called the newline character. Why cannot a newline simply be represented in the file by, literally, inserting a new line? Why do we need this special sequence n ? The basic answer is that line breaks are used in a program to try to lay it out in a legible manner; and that if we put in line breaks in the file where what we really want is to have the program emit a line break at run time, then this will disrupt and interfere with the formatting of the program. In fact, within string constants, which is where line breaks are usually desired, it is illegal to insert a literal line break, and a compiler error would be generated. So it is necessary to invent some kind of more or less artificial way of indicating the idea of a line break, or the "newline" character within a string, and n is the way it is done. Note that when you write a statement like: printf(" ); then the string which is actually stored internally in the computer at run time (and passed as an argument to printf() ) contains just one character - the ASCII newline character, and not the two separate ASCII characters backslash and the letter n . In turn, what the printf() function actually "sends" to the screen is not the two separate characters and n , but rather the single ASCII newline character. Its actually even more complicated than this, but I will spare you any further details! The sequence n is called an "escape" sequence: the idea is that the character n is "escaping" from its normal interpretation, and is, instead, being given a special interpretation. Its a "sequence" because the n must be preceded by the special "escape" character, which, in this case, is the backslash character, . The use of an escape sequence to represent a special ASCII character (such as newline) introduces a difficulty of its own: what if we want to actually put the backslash character itself into a string? How can we prevent it being interpreted as an "escape" character? Well, the answer is obvious enough: if we want to incorporate, literally, the backslash character into a string, we simply escape this escape character itself: i.e., we write . Thus to print the backslash character using printf() we would write: printf(" ); As well as the newline character, denoted n there are a variety of other special ASCII characters which can be represented in by escape sequences. Try out these sequences for yourself, and try to establish their effect: r , b , t . Variables, Data Types and Arrays A "simple" variable in can store or record a single "value". What kind of value depends on the data type : variable of type int can hold positive or negative integers, up to about 32000 ; type long variables can hold integers up to about 2 10^ 9 ; float variables can hold positive or negative rational numbers, with a precision of about 9 significant digits, and a range up to about 10^ 38 . The type char is a little stranger: its values can be regarded either as integers (with a range from 0 to 255) or as ASCII characters, represented in single quotes, e.g., 'a' , 'Z' , '9' , '+' , '(' and so on. But the basic point remains that one variable can store or record just one value. Now, if we want to store a whole lot of values (and this is a very common requirement) we could simply declare as many separate, simple, variables as are needed. This is a perfectly satisfactory approach for many purposes. BUT: in many cases, we don't just want a whole lot of separate values - we want our program to be able to automatically scan or iterate or repeat some operation over all these values. If the values are all stored in separate variables this will be very clumsy, if not impossible to code. This is so because each repetition or iteration would need to refer to a different variable - with a different name . That means that the code for each iteration is necessarily different. If I want to do the same thing to three variables called x , y and z , then I will have to program (at least) three separate statements. Granted, the statements will be almost identical - but not quite exactly identical, because the relevant variable name will have to be changed from x to y to z . This means I can't possibly achieve the required repetition with, say, a for statement: because there is just one substatement controlled as the substatement of a for , and that substatement would have to name some particular one of the three variables x , y or z . This may not be too much of a problem if I just want to repeat something over two or three different variables - it will not be too much of a problem simply to copy the relevant source statements, and edit the separate copies to deal with the different variables as appropriate. But what if the repetition is to be over 50, or 100, or even 1000 different values? Then duplicating, and varying, the code for each different variable is going to be very laborious, and very error prone. Well, the language provides a special mechanism for dealing with this kind of situation. If you want a set of data items, all of the same type , then instead of declaring a separate variable for each one, you can declare a single array variable: int foobar 100 ; The square brackets on the declaration signal that foobar is some kind of array; the number inside the square brackets says how many elements are in the array; and the type - int in the case - gives the type of the elements . So this declaration creates a single "array" variable, called foobar ; but foobar in turn, is actually made up of a whole lot (100 to be precise) of individual, simple, int variables. Note that arrays already have an advantage over separate variables, in that they are much more concise to declare. In the example above, if we did not have the array mechanism, we would have had to declare 100 separate variables - something like this: int foobar1, foobar2, foobar3, foobar4, foobar5; int foobar6, foobar7, foobar8, foobar9, foobar10; and so on! Once an array is declared, the individual elements can be accessed by indexing the array name. That is, we can use statements like: foobar 10 = 42; foobar 24 = foobar 5 * foobar 4 * 163; printf("This is element 9 of foobar: Array indices in always start at zero. So if an array has 100 elements, the valid indices run from 0 to 99. Note carefully at this stage that an array is a quite different kind of object from its elements. The kinds of things you can typically do to a complete array are quite different from the kinds of things you can do to its elements. Thus, in the case of foobar above, it is perfectly reasonable to write: foobar 0 = foobar 0 + 10; printf(" which has the effect of increasing the value of the zero'th element of foobar by 10. But it would be silly to write: foobar = foobar + 10; printf(" foobar itself - the whole array, as opposed to one of its elements - is not a number (even though all its elements are ). Adding 10 to an array is simply not a sensible or meaningful kind of thing to try to do to a complete array. Similarly, if we give printf() a format specification, " i" , which signals to it to expect a further argument of type int , then it would be totally confusing to give it a whole array of int values instead. Are there any operations which it makes sense to carry out or use on a complete array, as opposed to individual elements of it? Well, there are, but they are relatively few. The only one we will have immediate use for is in passing a complete array to a function - somewhat as was attempted above with printf() . But whereas printf() has not been designed to be capable of accepting a whole array as a single argument, it is perfectly possible, and useful, to write your own functions which can accept such array arguments. This possibility is not pursued in detail by Alcock at this stage, but is raised implicitly in Exercise 4 of the chapter. In the example given above, the elements of foobar are simply of type int . But the elements of an array can, themselves, be arrays - and so on. This allows the creation of data objects which can be thought of as multi-dimensional arrays: int two_dim_foobar 20 30 ; This makes foobar an array with 20 elements, where each of these elements is, in turn, an array with 30 elements, and each of these is a simple variable of type int . The elements can then be accessed as you would expect, but now needing two indices to identify a particular, simple, int element: two_dim_foobar 10 5 = foobar 6 / foobar 7 ; Multidimensional arrays turn out to be useful in many engineering applications. For example, if I am writing a program to plot points on a graph, I could conveniently set up a two dimensional array to record which points are marked, and which left blank. Similarly, Alcock's MATMUL matmul.c program provides a business calculation example where two dimensional arrays are useful. Anyway: so far we have just looked at declaring array variables, and accessing particular elements. While this simplifies the declarations somewhat, it is not yet clear that it addresses the original problem - our desire to be able to easily and conveniently repeat a single operation over a whole set of different variables, without having to code a separate statement for each one. Thus, we can now refer to array elements foobar 0 through foobar 99 , instead of, say, to the completely separate variables named foobar0 , foobar1 and so on down to foobar99 . But how does this make it easier to repeat operations over these elements? The key notion here is that the index of an array is not limited to being just a particular, literal, number, such as in foobar 7 . The index is technically allowed to be an arbitrary expression . Thus we could refer to foobar 2+20 , or foobar (10+3)*2 . This is still not terribly useful. But, in an expression, we can put variables . Thus we can have a program fragment like: int foobar 100 ; int index; . . . index = 6; . . . printf(" This will have the effect of printing element 6 of foobar . What advantage does this have over simply writing printf(" i", foobar 6 ) ? Not much yet, but now consider this code fragment: int foobar 100 ; int index; . . . index = 6; printf(" index = index + 1; printf(" This has the effect of printing elements 6 and 7 of foobar . Still, so what? The so what is that, if you examine this fragment carefully, you will see that the two printf() statements, even though they print out two different elements of foobar , are actually textually identical statements. Not just "almost" the same, but strictly identical. Their different effects , when they are executed, arise because the effect depends on the value of the variable index at the time of execution - and that actually changes between the two statements in this example. What's so significant about the two printf() statements being absolutely identical? Well, because of this absolute identity, it is not necessary to actually repeat them in the source file at all: the two separate invocations of this statement can be collapsed down as the substatement of a for statement as follows: for (index = 6; index <= 7; index++) printf(" Granted, it is not terrible exciting just to replace a single duplication of a statement. But it becomes terribly useful if we have a large number of duplications. For example, suppose we want to print out all the elements of foobar rather than just elements 6 and 7 ? Instead of writing something like: printf(" printf(" . . . printf(" with 100 separate statements, each differing only ever so slightly from the others, we can have a single for statement: for (index = 0; index <= 99; index++) printf(" Now that's a real benefit. Make sure you understand that this kind of collapsing down, using an iteration statement like for , only works if there is absolutely no variation in the text of the statement being repeated - although, of course, there can be a large variation in the effect of that text each time it is executed. Go back over the example above, and satisfy yourself that you cannot achieve the same effect - collapsing down to a single for statement - if the program were using 100 separate variables with separate names instead of an array. Note that the names of separate variables must be textually different, even if only "slightly" so. Note also that, although I have built up this example using the specific idea that the thing we want to iterate or repeat is the printing out of an element of the array, the concept and argument does not rely in any way on just what it is we want to repeat. Exactly similar considerations would have applied if we wanted to increment each element, or take its square root, or add it into a cumulative total, or whatever. Alcock's two example programs MATMUL matmul.c and BUBBLE bubble.c provide, of course, more comprehensive examples of the same ideas. It is a good idea to at least attempt to see how you might try implementing either of these programs if you did not have the possibility of using arrays open to you, but were forced, instead, to simply declare as many separate, simple, variables as are necessary. You should find that, in that case, you cannot use for statements to repeat the required operations over different variables, and you would therefore be forced to "manually" duplicate blocks of code many times over, and then edit each copy to deal with slightly different variable(s). The whole thing becomes incredibly cumbersome! The Dreaded Array Indexing Bugs It is vitally important in programming to ensure that you never use an array index which is out of range - for example, if foobar is an array declared as having 100 elements, then referring to foobar 100 , or foobar -6 would be indexing errors. For somewhat obscure and technical reasons, the compiler cannot detect such errors, nor, in general, can they be automatically detected at run time. But the effects of such errors tend to be highly variable, very strange, and potentially very confusing. The nett result is that array indexing errors are among the most difficult to find and correct in programming. It is very important, therefore, to get into the habit very early on of being very careful indeed about array indices. This is straightforward enough with simple numeric indices, such as foobar 10 and so on. Although, for beginning programmers it is still very common to make errors "at the margin" - i.e. to forget that the valid indices start at zero instead of 1, or make some related mistake. So even with a simple indexing expression such as foobar j it is important to carefully examine how the value of j is being generated or manipulated to make sure that it cannot go out of range. But what about more complicated index expressions, such as in: foobar (x + y) * z = w + 1; If you put something like this in a program, you need to very carefully examine what the values of x , y and z can possibly be, and try to guarantee that there are absolutely no circumstances in which the index expression might yield a value which is outside the bounds of the particular array. In practice this kind of analysis is difficult and error prone. It is better to adopt a style of defensive programming where you deliberately try to make the program itself warn you if something is going wrong. Thus, the statement above might be replaced with the following: index = (x + y) * z; if( (index < 0) || (index >= 100) printf(" indexing bug - Controlled Crash!!! ); exit(); foobar index = w + 1; The operator || is called logical OR and will evaluate as TRUE (non-zero) if either of its operands is TRUE . The function exit() is a library function which causes the program to immediately and unconditionally terminate. The prototype for the exit() function is in the header file stdlib.h , so you should include this header at the top of your file if you intend to use the exit() function. So the effect of this code is that the index is first calculated and stored in the variable index . Then this value is compared with the valid limits. If it turns out, for whatever reason, to be invalid, a warning message is printed and the program is terminated. Of course, if the index is valid, then execution simply continues around the if statement, no message is printed, the program is not terminated, and the index is used to actually index into the array. At first sight this defensive programming may seem like a lot of additional effort. But to repeat: array indexing bugs do happen; and if you have not adopted this kind of defensive programming strategy, they are extremely difficult to track down and isolate. Thus the small additional effort of the defensive checks is usually far outweighed by the reduction in time (not to mention sweat and tears) required to debug the program. Of course, some judgement will always be required as to whether to add extra checking code - but if in doubt, it is always safer to add it. There is also a question over what to do if an indexing error is detected: the example above shows just about the crudest possible reaction. There are much more sophisticated possible approaches to reacting to unexpected events in a program (so called "exception handling"). By all means, feel free to experiment with formulating your own approaches to this if you think you can improve on the very crude mechanism shown above. In particular, you might have a look at the standard library facilities for "process control" described in pagese 171 176 of Illustrating C - though this is not for the faint hearted! Week 8-10: Chapter 2 pp. 20 26 What's a function anyway? In a function is a unit of code that can be used, or called, or invoked, from some other place in the program. Better names for this idea, used in other languages, are "subprogram", "subroutine", or "procedure". But, for reasons which, no doubt, seemed sensible at the time, the designers of plumped for the word "function" and I am afraid we are stuck with it! Functions are handy because they allow you to break a program down into manageable chunks, where each chunk is quite small (typically 10 20 lines) and does just one or two definite things. Thus each chunk is pretty readable or understandable in its own right. If a single function starts getting too complicated, then we chop some coherent piece of it out and wrap it up in a function of its own, which now is merely "called" by the original function. Functions are also handy because they support "re-usability" in a program. Thus, if there is a particular generic kind of manipulation which has to be done in two or three, or 20 different situations in a program, we can write a function to do this manipulation; then each time we need that manipulation, we call the function, instead of repeating all the detailed code. In fact, we have been using functions all along already, without really noticing it. The block of code in our programs with the following outline: int main(void) . . . is technically the definition of a function whose name is main() . In general when I refer to the name of a function I include a pair of parentheses after the name, as in main() . I adopt this convention so that, whenever I use a name, you can easily tell whether it is intended as the name of a function or of a variable. In fact, the compiler works much the same way: when it sees a name in an expression, it looks to see whether the name is followed by a left parenthesis, and, if so, it knows that the name is supposed to refer to some function. All functions have names, just like variables. When you are defining your own functions, you get to pick the names, again just as with variables. But the function name main() is special in this regard. Every program is required to have a function called main() . Normally functions get invoked or executed by some other function calling them, or transferring control to them. But if this were true of all functions then we would have an infinite regress, with no way for any function to get started in the first place. What is special about the function called main() is that it does not have to be called or started up by any prior function - it is started up automatically as the first function to be executed when the whole program is started executing. In fact, it will normally cause very bad things to happen if main() is actually called by any other function... OK, so, in a sort of a way we have already seen what a function definition looks like, in the form of the definition of the main() function. We have also seen what function calls or invocations look like, because we have used some of the so-called standard library functions , such as printf() , scanf() , sin() , cos() , and so on. So calling a function basically involves just putting its name into a statement (or, more precisely, into an expression ). When (or if) the statement gets executed, then that will cause the function to be invoked, and control will pass, temporarily, into the called function; that function will do its thing, and then control will pass back to where the function was called (the so-called "calling site"). Thus, consider this simple program: include int main(void) printf("Help: I'm trapped in the computer!!! ); return(0); When the program is started, execution will start at the main() function. This is why a function called main() must be present: otherwise the computer has no way of knowing where in our program execution should start. This may not seem like much of a problem at the moment: after all, our program only has one function, so that is "obviously" where execution should start - regardless of what the function is called. But very soon we will see programs where the program consists of more than one function and some ambiguity would then necessarily arise. Even still you might think the computer could simply start execution at the first function that is defined (or the last for that matter) rather that looking for a function of a particular name. Indeed, with many other programming languages that is the approach adopted. But, again for reasons best known to the original language designers, they decided to take the approach of insisting that a function with the specific name main() must always be present, and that that is where execution will start... Anyway: in the example above execution starts at the function main() . The very first statement of this function is then a call, or invocation, of the standard library function called printf() . So execution is transferred to that function. In the meantime, the main() function is essentially put into suspended animation. So printf() is executed, which, in this case, means that the string "Help: I'm trapped in the computer!!! n" will be printed out on the screen. Once printf() has done that, it "terminates", or, more technically, executes a return statement, which returns control to the calling site - in this case, the function main() . So now, main() becomes re-animated, and continues execution with its next statement. This is a return statement which causes main() itself to terminate. Now, as just explained, when a function executes a return this normally causes control to return to wherever the function was called from ; but this is a bit tricky with main() because, technically, main() was not called from anywhere (or not, at least, from any other function ). In practice, when the main() function does a return then the whole program is terminated, and control is returned to the "external environment" - in our case the Turbo-C++ IDE. This makes a sort of sense since main() was the first, or "top-level" function to be invoked, and in this contrived way, its "calling site" simply is the "external environment"... So: be sure you have the basic idea of a function as a unit or chunk of code that can be "invoked". With this facility at our disposal, a complete program need no longer consist of just one function, called main() (together with whatever standard library functions main() calls). Instead, once main() starts to get a bit long or complicated, we break down the work that main() is doing, and the statements that constitute it, into some set of more or less coherent sub-blocks. Each of these is then hived off into a "function" of its own. We will give each such function some name; this should be as meaningfull as possible, but we have essentially complete freedom in this (as we have in naming variables). Whereas the name of the top-level function is constrained to be precisely main() , the functions called by main() can have any names we like. These additional functions which we name and write are called user-defined functions to distinguish them from standard library functions. In the simplest case then, the outline of a program now looks something like this: void function1(void) /* Statements making up function1() appear here... */ . . . return; void function2(void) /* Statements making up function2() appear here... */ . . . return; void function3(void) /* Statements making up function3() appear here... */ . . . return; int main(void) function1(); function2(); function3(); return 0; This program consists of the definitions of four functions called (rather unimaginatively!) function1() , function2() , function3() and main() . When the program is started up, execution will begin at main() (even though main() is actually the last function to be defined!); execution immediately transfers to function1() , and main() is suspended; when control is return 'ed from function1() to main() , then main() goes on to call function2() and is again suspended while function2 is executed; similarly, when function2() executes its return statement control comes back to main() , and then function3() is invoked; finally, function3() executes its return , control comes back to main() , and main() itself executes a return , causing the complete program to terminate, and control comes back to the IDE. Be clear that you understand how this function call/return mechanism allows main() to be broken down into smaller, hopefully simpler, chunks. In this, admittedly contrived, example, the operation of main() itself has actually become trivial - it just calls the functions function1() , function2() and function3() in sequence. All the complicated stuff that might have been in main() has been split up, and moved into the "sub-" functions. Of course, this process of breaking things down can be repeated indefinitely. Thus, in the example above, if the function function1() turned out to be still too complicated it might be broken down into smaller pieces still: void function1a(void) /* Statements making up function1a() appear here... */ . . . return; void function1b(void) /* Statements making up function1b() appear here... */ . . . return; void function1(void) function1a(); function1b(); return; And, of course, it might turn out that any of these functions can conveniently be used (or called) multiple times, possibly from multiple other functions. In the outline above, it might turn out that function1b() , say, could usefully be called both by function1() and by function3() , for example. Indeed, if we do a good job in designing the program - in deciding just exactly how "best" to break it down - we may be able to achieve quite a high level of such re-usability! Note that in laying out the outline example program above, I have chosen to place the function definitions is essentially "reverse" order - if function1() would call function1a() at run-time, then I have placed the definition of function1a() before the definition of function1() in the source file. This ordering is not absolutely required, but it is an excellent rule of thumb - and you should stick to it unless and until you have some very good reason for diverging from it. But be clear on this point: the order in which the function definitions appear in the source file has no effect whatsoever on the order in which the functions will actually be invoked. The function called main() will always be the first function invoked, regardless of whether it is positioned at the top, the middle, or the bottom, of the source file! And the next function to be invoked will depend exclusively on what statement in main() causes a function to be invoked; and whatever function that actually is, it will be invoked next regardless of whether it is located in the source file before or after the definition of the function main() . The ordering of the functions in the source file affects how the compiler translates the file - and it is to facilitate the compiler that I give you the rule of thumb of ordering the definitions in reverse order to the order in which they are invoked. Ordering them in this way ensures that the compiler never has to attempt to translate a call or invocation of a function without already having translated the definition of the function - and it turns out that this makes the compiler's job easier, and can avoid a variety of compile time problems and errors. But regardless of whether this ordering is adopted or not, it will not affect the order of function execution at run-time! Moving Data Around All right: so far the concern here has been with the concept of a function, and an outline of how they can be defined and used. Now we need to get down to some more nitty gritty specifics! In general, if we want to decompose a program into a set of functions then, for the functions to co-operate or interact properly they will need to, in some sense "share" data between them. By "data" here I essentially mean variables. How can the required data sharing be achieved? supports two quite different models for sharing or exchanging data between functions. We will consider them in turn. Thinking Global... In this model the data (the variables) are made "global": global variables are declared before, and outside of, the first function definition in the source file. Global variables are visible and accessible to all functions. Thus, if one function wants to call another sub-function I use the term "sub-function" loosely in this kind of context to distinguish between one "calling" function and a second function which is "called". However, "sub-function" is not a technical term of the language: to all functions are "equal" - there is no kind of precise distinction between "functions" and "sub-functions". and have that sub-function carry out some operation on some particular data values, the calling function can first store the data values in some global variables, then call the sub-function; the sub-function can then access the variables and do the processing; if some "result" is generated, it can be made available to the calling function by storing it in another global variable. Have a look at the file TRI-GLBL.C tri-glbl.c . This is a somewhat contrived example program, showing functional decomposition and the use of global variables. Examine the program and try to understand how it is supposed to work. Note that, given my rule of thumb of defining functions in reverse order, the sensible way to read a source file is to start at the bottom and work backwards. That is, you should generally start by looking at the main() function. Try to get some understanding of it in isolation first. Then start looking at the functions which main() calls to get some extra detail on what is going on. Then, if necessary (it does not arise in this particular program) you can have a look at the functions which are called by that function in turn, and so on, effectively working your way back toward the top of the source file. Play with TRI-GLBL.C tri-glbl.c . In particular, experiment with single stepping through it. Try to predict where execution is going to go next at each stage, and then check whether you are right. Use the Watch window to monitor the variables. Again, try to predict when changes are going to happen, and see if you get it right. Notice how execution transfers to a sub-function, then returns to the calling site. Notice how the global variables are allowing information to be accessed or shared between different functions. Try varying the program somewhat: e.g. to calculate the area of a rectangle, or square; or to repeat the calculation for a number of times; or to initially print out some kind of "welcome" message before doing any calculations; or extend the program to be able to calculate areas of several different shapes - so that it first has to ask for the kind of shape to be identified (e.g. saying "Enter 1 for triangle, 2 for square:" or something like that), and then read the appropriate data for the particular shape, and do the appropriate calculation. Note that the function for printing out the result will not have to vary or be modified... Note that in TRI none of the (sub-)functions actually have a return statement as such. A feature of functions is that if execution reaches the end of the function definition - i.e. the brace which closes the definition - then the function will be terminated anyway and control will automatically return to the calling site, just as if a return statement had been executed. Thus, if you have a simple return statement as the very last statement in a function definition it can actually be omitted - and usually is. By a "simple" return statement I mean one without an operand - unlike the return statement in the main() function which has an operand (namely 0 ). The significance of this operand will be explained shortly: for now the point to note is that, if a return statement does have an operand, then, even if it is the last statement of a function definition, this return statement cannot be omitted. Thinking Local... OK, so much for using global variable for co-ordinating or sharing data access between functions in a function. What is the alternative? The alternative mechanism arises if we choose to make the data - the variables - local to individual functions. Local variables are declared immediately after the opening brace of a function definition. Local variables are visible and accessible only within the function in which they are defined. Indeed, the storage for the variables is only allocated at all when the function is invoked and is deallocated again when the function terminates - so the variables only actually "exist" as long as the function is active. Now it follows directly from the "privacy" of local variables that they cannot be used to share data between functions. Therefore, if our functions are using local variables, and if we need to share this data, we need some mechanism for "passing" the data from one function to another. This generally arises just precisely when one function is calling another (call the latter the sub-function). Actually there are then two sub-issues: getting data from the function to the sub-function, and getting data (results) back to the calling site from the sub-function. provides separate mechanisms, with rather different characteristics to deal with these two directions of data flow - from calling site to called function, and from called function back to calling site. Note that in any particular case of one function calling another we may need to transfer data in only one of the two directions, so they really are quite separate. Indeed, as we have already seen with the use of global variables above, we can have functional decomposition of a program without using either of these data transfer mechanisms. In designing a program - in breaking it up into functions - each case has to be assessed individually. A (Short) Digression We will look at the separate data transfer mechanisms in turn, below; but first: a digression. A common confusion among students is between the notion of moving data around inside a program - between the functions making up a program - and moving data between the program and the outside world (screen, keyboard, diskette files etc.). Granted these two things are both involved with moving data around. And granted, it turns out that to do the latter (move data into or out of the program) we also have to do the former (move data around within the program - specifically between our own user-defined functions and those standard library functions, such as printf() and scanf() , which actual do the direct exchange of data with the outside world). But still, the two concepts are quite different and should be kept distinct. We very frequently do the former without the latter (i.e. move data around within the program without exchanging data with the outside world). I will use (and reserve) the terms "input" and "output", and also "reading" and "writing", with the technical connotation of moving data between the program and the outside world. When dealing with data movement within the program I will use an alternative set of terms, to be introduced below. I encourage you to adopt this more precise vocabulary also. Getting data into a function All right: let's consider moving data from a function to a sub-function - from a calling site to a called function. We have already seen examples of this in using the standard library functions: in the function call we simply list the data to be passed in, separated by commas if necessary, and within brackets, thus: printf("Hi there..."); Here the function printf() is being called, and the string "Hi there..." is being passed in. Similarly, with: x = sin(y); In this case the function sin() is being called, and the value of the variable y is being passed in to it. Again, with: printf("The answer is The function printf() is being called, but now two separate data items are being passed in - the string "The answer is i" , and the value of the variable result . These two are separated by a comma. These values passed in to a function are called arguments to the function. They are, if you like, the values which the function is supposed to operate on or process in some way. And, as already noted, a function may not need any arguments - if, for that particular function, what it is supposed to do does not need or rely on any data values to be processed. In general, function arguments are actually expressions . That is, any argument can involve a more or less complicated expression. In that case, the expression is evaluated , to yield a resulting value, and that value is what is actually passed into the function as the argument value. Here is an example: printf(" In this case the second argument is actually the expression ((x * x) + (y * y))/(z + 3.141) . When execution reaches this invocation of printf() , the expression is evaluated first, and the resulting value is what actually gets passed in to printf() . Note that printf() itself neither knows nor cares how this value was generated - all it sees is the resulting value. This is what is described as "passing by value"; it is discussed by Alcock on page 21, and is specifically described by him as "fundamental to the language". Be clear about this particular point: since only allows values to be passed as arguments, nothing that a called function does with an argument given to it can affect or change the original source of the argument. Consider an example. Suppose there is a function called foobar() which expects to be given an argument of type int . Then a fragment of code where foobar() is called might look like this: int main(void) int i, j, k; . . . foobar((k + i) * 15); foobar(j); . . . In both calls to foobar() the argument is technically an expression and is evaluated before foobar() is actually started up. This is clear enough with the argument (k + i) * 15 , but it is true also even in the trivial case of the argument being simply j . In the latter case what happens is not that the variable j gets passed to foobar() , but only that the current value of j is passed in. The significance is that foobar() absolutely cannot alter or modify the variable j ; in fact, foobar() doesn't even know of the existence of such a variable! All foobar() knows is the value it is given - it does not know whether that value resulted simply from taking the value of a single variable, or from evaluating some arbitrarily more complicated expression. Some languages do support a mechanism whereby whole variables can , in effect, be passed as arguments. In that case, the called function can be given access to a variable actually belonging to the calling site (not just a copy of its current value); and the called function can, if it wishes, change the value of that variable, and that change will be visible back at the calling site when control returns there. Such a facility is a called "passing by reference" - because, in effect, what is passed is a reference or handle whereby the called function can get at the original variable, rather than just a copy of its value. BUT: the language does not support passing by reference - it only supports passing by value. A technical exception here is in the case of array arguments. has some special rules for what happens when an array name is used in an expression (without any index) - such as when an array is specified as an argument. The nett effect is that arrays "sort of" get passed by reference: but the details of this are beyond the scope of the current discussion. Fortunately, it turns out that this is not a significant limitation because the effect of passing by reference can be "simulated" using passing by value, in combination with certain other features of the language. However (as should be clear) this issue only becomes important in the context of trying to get data back from a called function to the calling site - so it will be considered again below. As noted with printf() above, a function can accept a number of arguments. In general, functions are designed so that they expect some definite, unique, number of arguments, be it 0, 1, 10, or whatever. In that case the function must be called with just that number of arguments - no more, no less. Both printf() and scanf() are exceptions to this rule - they can deal with varying numbers of arguments. But this is highly exceptional: most of the library functions we deal with will only accept fixed numbers of arguments; and all of the user defined functions covered in this course will only accept fixed numbers of arguments. Function calls are actually even more restrictive than this in general. Not only must the call provide precisely the correct number of arguments, but they must be of the correct data types . Remember: the data type defines the "kind" of information represented by a data value or a variable - such as int , double etc. So if a function has been designed to accept two arguments, the first of type int , the second of type float , then every call or invocation of that function must include precisely two arguments - no more, no less - and they must be of precisely those types. Again, printf() and scanf() are exceptions to this general rule; but also again, it is beyond the scope of this course to consider this very exceptional behaviour in any further detail! The compiler will usually try to enforce these rules - i.e. it will try to check every invocation of a function to see that the arguments match the design or specification of the function in both number and type(s). But the compiler can obviously only do this correctly if it knows how many arguments, of what types, are expected. In the case of user-defined functions it will know these things if it has processed the function definition before it sees any invocation of it. This condition will be satisfied if the program file is laid out as suggested earlier - in "reverse" order - which, of course, was precisely the reason for suggesting that ordering. The situation with library functions is a little more complicated. The whole idea of library functions is that they have been pre-written and compiled ahead of time, so, by definition, their definitions will not be available to the compiler. But although the full definitions of the library functions are not available, short "interface specifications" are provided. These interface specifications tell the compiler, for each function, exactly what arguments are expected, and of what types. Such a specification, for one function, is called a function prototype . Function prototypes for the standard library functions are provided in the standard header files - files like stdio.h and so on. So: if you want to use or invoke any standard library function in your program, you must ensure that an appropriate header file is first scanned by the compiler, so that the compiler will then be able to check that the invocation is legal. This is done with the include directive. If you are ever in doubt as to which header file to include to provide the prototype for a particular library function, look up the function in the Turbo-C++ on-line help system - the header file will always be listed there. In some cases, the prototype for a particular function may be included in several different header files; in such a case you just have to insure that at least one of them is include 'd in your file. Having said all that, be warned: the ability of the compiler to detect mismatches between the arguments expected by a function and the arguments actually provided , is still less than perfect. You should still take some care yourself in coding any function call, and not rely completely on the compiler to check these things for you! So the format of a function call or function invocation is always function followed by a left bracket - ( - followed by zero, one, or more argument expressions, separated by commas, followed by a right bracket - ) . The arguments must match what is expected by the function in both number and types. The compiler will try to check for this if - and only if - it has earlier processed either a full definition of the function, or at least a prototype, typically found in a include 'd header file. So much for the invocation of functions with arguments - but how does this get represented in the function definition ? In all the function definitions given so far (including definitions of main() ), the functions did not accept any arguments. This was denoted by the keyword void , placed between the brackets after the function name: void foo(void) /* ^^^^ This denotes that foo() will not accept *any* arguments! */ /* Statements making up foo() go here... */ . . . Note carefully that, in this example, the keyword void appears before the function name also - but in that position it has quite a different significance, nothing to do with the arguments which foo() accepts. We'll consider this again later. OK: that was the special case of a function which does not accept any arguments. Where a function does accept arguments, this is denoted or coded by replacing the keyword void with a list of names and types for the arguments: these are technically called the parameters rather than the arguments. The difference is this: the arguments are the values which get generated at the calling site; the parameters are effectively a special kind of variable, belonging to the called function, into which the argument values are copied when the function is invoked. If you like, all the calling site can see are the arguments (or argument expressions) - it cannot see the parameters as such; and all the called function can see are the parameters (and their values) - it cannot see the original expressions, or arguments, from which they were derived. In this sense, arguments and parameters are like two views of the same thing: but the calling site only has the argument view, and the called function only has the parameter view. The format for a parameter list, in a function definition is something like this: void foobar(int max, int min, double ecstasy) This heading part of the function definition says that foobar() will accept exactly three arguments. foobar() will refer to these as max , min and ecstasy . The first two must be of type int , the third of type double . Whenever foobar() is invoked, the arguments at the calling site will be evaluated; parameters called max , min and ecstasy will be created; and the argument values will be copied into them; then execution will proceed with the statements in the body of foobar . The parameters are then effectively a special kind of local variable belonging to foobar . Within foobar() they can be used just like any normal variable. In particular, their values can be referenced, as one would normally reference the value of a variable - by using its name in an expression. Thus, a fragment of foobar() might be the following: void foobar(int max, int min, double ecstasy) . . . if (ecstasy < 0.0) printf(" dear..."); else if (ecstasy > 100.0) printf(" don't believe it!"); . . . Since parameters are a form of local variable you can, of course, change their values: but by doing so you overwrite the original value - the argument - passed in from the calling site. So this should only be done with care! Note also that, when the function terminates, the parameters, just like all other local variables, are disposed of, and their values lost. Changes to a parameter within a called function, just like changes to a local variable, have absolutely no effect on anything at the calling site. Getting data back out of a function All right: that's passing data into a function, both from the point of view of the calling site (argument expressions) and from the point of view of defining the called function (parameter variables). Now we turn to the complementary question of getting information back from the called function to the calling site. The simplest, and arguably most convenient, mechanism provided for this in is the use of so-called "return" values. Essentially, a function can terminate using a form of the return statement which specifies a value (it actually has a complete expression in general: but this will be evaluated to yield some particular, single, value, of some particular type). If a function terminates in this way, the specified value is retained and made available back at the calling site. In effect, wherever the function name appeared at the calling site, its name gets replaced by the returned value, and execution then continues at the calling site. Consider this simple example, using the library function sin() : x = sin(0.5); In this case the function sin() is invoked (incidentally being passed one argument, the double value 0.5 ). sin() is executed, and when it terminates, it does so using a return statement. That return statement specifies a return value. Back at the calling site, the name of the function is effectively replaced by this value, and execution continues, so that this value is then assigned to the variable called x . When I say that the name of the function is "replaced" by the returned value, I do not, of course, mean literally that the text of your program is altered. Rather I mean that the way to understand what happens next at execution time is to imagine the return value appearing in the place occupied by the function name. That was a simple example. Here is a somewhat more complicated one: printf("Answer: When this is executed, what happens? Well, printf() cannot be started before its arguments have been evaluated. To evaluate the second argument, sqrt() (a function for extracting square roots) must be invoked - but again, that cannot happen until its argument has been evaluated. So the calls to the sin() and cos() functions have to happen first. It does not matter which is done first - lets say it is sin() . Its argument is evaluated - simple, it is just the value of the variable x . This is passed in and sin() gets executed. In due course it terminates with a return statement, specifying a return value. This is retained in temporary storage at the calling site. The argument for cos() is then evaluated - again simple, just the value of y . When cos() terminates, again with a return statement, its return value comes back to the calling site and gets added to the value returned earlier from sin() . This is now the argument value for sqrt() so sqrt() is started up, with the value passed in to it. Again, in due course, sqrt() terminates with a return statement, and specifies a return value. This value is multiplied by 2 , and the resulting value is the second argument for printf() , so printf() can now finally be invoked. It will do its thing (printing out the text Answer: followed by the value of its second argument, regarded - correctly - as a double ), and will then terminate. printf() actually also ends with a return statement which specifies a return value, and this will indeed be passed back to the calling site (this return value signals whether printf() executed without detecting any difficulties). In this particular case, the return value is not used at the calling site - it is not, for example, stored in a variable, or used in some further expression evaluation. So it is simply discarded, and execution proceeds to the next statement. Phew! That covers how return values are used at the calling site, and we have also given the gist of how the return value is implemented in the function definition - namely by using a return statement with an appropriate operand expression. But let's just detail this with another example: double silly_sum(double apples, double oranges) return (apples + oranges); Note that the void that we were previously placing in front of the function name, in the heading, has now been replaced by double . This is a general rule. The word that comes before the function name in the heading specifies the type of return value (if any) which the function will yield. If the function does not yield any value, then the return type must be specified here as void , as indeed, we have generally been doing up to now. But if the function does return a value, then the type of that value must be advertised or notified ahead of time, by specifying it here in the heading. So, in our case, just from the heading, we know that the function silly () must terminate using a return statement, with an operand which is of type double . If there is a subsequent inconsistency in the definition of silly () - if it does not contain any return statement, or it has a return statement with no operand, or it has a return statement with an operand which is not of type double - then the compiler will complain bitterly. As an aside, note that we have consistently been defining the function main() as if it returns a value of type int , and then actually ending it with a return statement specifying a return value of 0 : int main(void) . . . return 0; This is, to say the least, a trifle odd. main() is not being called by any other function - it is automatically started as the very start of program execution. We already mentioned that, for that very reason, it would not make sense for the parameter list of main() to be anything other than void . Yet, notwithstanding that, we are defining main() as if it makes sense for it to be passing a return value back to some calling site. Why is this? Well, this is yet another of those funny historical oddities of The essential idea is that the return value from main() is passed back to the enclosing "environment" - wherever the program was started up from. In our case, this is the Turbo-C++ IDE. The general idea is that this return value could be used to signal whether program execution was "successful" or not - with 0 signalling success, and anything else indicating some kind of failure. However, this idea of signalling success or failure to the enclosing environment doesn't actually serve any useful purpose in the case of running programs under the Turbo-C++ IDE. Using it under other environments (where it might be useful) is beyond the scope of this course. In fact, for our purposes, it would be legal, and preferable, to define main() as not returning any value at all, and omit any return statement from it: void main(void) . . . Unfortunately Alcock has insisted in showing main() as yielding a return value of type int , so, in the interests of consistency with his presentation, I have stuck with that... OK: time for another substantial example. Let's revisit the triangle area calculating program, but now avoiding the use of global variables, and moving data around with arguments and return statements instead. Have a look at TRI-LCL.C tri-lcl.c . Compare it with TRI-GLBL.C tri-glbl.c . Again experiment with single stepping through it. Try to predict where execution is going to go next at each stage, and then check whether you are right. Use the Watch window to monitor the variables (including the various function parameters). Note that because the variables are now all local , they only exist at all when the relevant function is active. Furthermore, it is a characteristic of the Watch window that only variables of the function which currently contains the execution highlight bar are visible. Other functions may be "active" - main() is active all the time, for example - but be temporarily suspended while a sub-function is executing. Their variables still exist, and will become visible again in the Watch window as execution returns to them... Again, try to predict when changes are going to happen, and see if you get it right. Notice how the parameters and return values are allowing information to be moved around between the different functions. Again, try varying the program somewhat: e.g. to calculate the area of a rectangle, or square etc. Now notice one particular difference between TRI-GLBL.C tri-glbl.c and TRI-LCL.C tri-lcl.c . In TRI-GLBL.C tri-glbl.c I used a single function ( read () ) to prompt for and read in both the dimensions of the triangle; whereas, in TRI-LCL.C tri-lcl.c I have broken this down into two separate functions, read () and read () . Stop for a moment, and see if you can figure out why I made this change... Well, if I had retained the original structure of TRI-GLBL.C tri-glbl.c , but without using global variables, then the function read () would have to somehow return two distinct pieces of information. But that cannot be done with the return mechanism in OK: this is a little white lie. It can be done, but only using the mechanisms of struct data types. This is beyond the scope of the current discussion, but have a browse in Chapter 8 of Illustrating C if you want to explore this further... The return statement can only have one operand, and thus only a single value can be passed back to the calling site. Indeed, if you think about the way the returned value is handled at the calling site (being inserted wherever the function name appeared) it should be clear why this must be so: if more than one value was returned then there would be no unambiguous way of deciding what to do at the calling site. So: I elected to re-organise the program to remove the need for a function which would return more than one value - by replacing it with a pair of functions, each of which individually only return one value. This works fine. But it also gives me an excuse for mentioning an alternative way for getting information back from a called function to the calling site - a way which does not suffer the limitation of the return mechanism, which can only handle a single value. First recall the earlier discussion of "passing by value" and "passing by reference". Clearly, if we had a mechanism for passing by reference we could use this to get as many distinct items of data as we like back from a function: we just pass ("by reference") that number of variables into the function; the function then stashes the relevant results in these variables; and when the function terminates, and control returns to the calling site, those results can be accessed - as many of them as you like. But, of course, does not support passing by reference. So what can we do? The answer is that we can achieve the effect of passing by reference, in a somewhat roundabout way: we can pass, as values, and using the normal passing by value mechanism, the addresses of some variables. Using these addresses, the function can then, indirectly, get at the variables, and deposit results in them - exactly as if the variables had been passed by reference. In fact, we have already been implicitly using this mechanism, though without drawing too much attention to it. It is the way we have been getting information back from the standard library function scanf() : scanf(" In this case the additional arguments to scanf() are addresses or pointers to variables. We generate the address of a variable by preceding its name with the "address-of" operator, denoted with the ampersand character, & . Of course, this mechanism is used by scanf() precisely because scanf() is supposed to be able to get more than one item of information back to the calling site. In fact, a single call to scanf() can be used to pass an indefinitely large number of separate items back, just according to how the format specification string is laid out. Can you use this mechanism in your own user-defined functions? Yes - but it is quite messy. To define a function of your own which uses this mechanism you need to know two new things: how to specify that the type of a parameter is an address or pointer; and given an address or pointer, how to "de-reference" it - access the thing pointed at. The program TRI-LCL1.C tri-lcl1.c gives an example, if you want to have a look - but we will not consider this in detail here. Preferences? We have seen two radically different styles of getting information into and out of functions: the use of global variables and the use of parameters and return values. Which is best? Unfortunately there is no simple answer. Both kinds of mechanism are provided in precisely because, in different circumstances they each have their own advantages. For people just starting out to learn to program the global variable mechanism is conceptually easier to deal with, and actually syntactically easier to use in programs. Therefore I recommend that you start out by using this mechanism, unless you have some specific reason for doing otherwise. However, quite quickly you will probably discover some of the drawbacks of using global variables. At that point it is worth getting familiar with both parameter and return value mechanisms and trying them out. Once you are technically happy that you can use both kinds of style then, with experience, you will get a feel for the particular circumstances in which each is appropriate. But: until you are comfortable with judging this I recommend that you actually try to avoid or minimise your use of global variables. You will find that this requires you to think rather harder about how to design your programs; and will also make the design and coding somewhat more cumbersome at times. However, it is a fact that the use of global variables makes programs prone to a variety of subtle problems and bugs - due to non-obvious and un-intended interactions between functions, via changes in global variables. Such bugs are particularly difficult to isolate and correct. Therefore you will find that avoiding the use of global variables - except where they have some overwhelming advantage to recommend them - will save you a lot of program development time in the long run. Recursion Alcock chooses to introduce recursion toward the end of Chapter 2. I'm not sure I agree with the need for this. Recursion - functions which invoke themselves - is an interesting programming concept, and it does permit very elegant solutions to certain programming problems. On the other hand, there is never a situation where the same effect as recursion cannot be achieved using conventional iteration mechanisms, such as for statements. In any case, all I will add to Alcock's discussion are the following brief comments: When any function is invoked, storage is allocated for its local variables (including parameters, if any). This storage will be deallocated when the function terminates. Recursive functions are a special kind of function. This may seem trite, but it follows that you need to understand the general concept of a function reasonably well before you try to understand the idea of a recursive function... If a function is invoked again while it is still already active - i.e. if a recursive invocation is made - then additional storage is allocated to hold a new set of local variables for this new invocation. Each invocation thus has an entirely separate set of variables and parameters - even though, within each invocation, these are referred to by the same names. There is no confusion because the computer keeps careful track of which invocation is being executed, and therefore which copies of variables or parameters to refer to. At first sight it seems that recursive function invocation must be fatal: surely the computer will be caught in an infinite regression, invoking ever more copies of the same function, until something horrible happens (like it runs out of storage to hold any more variables) and the whole thing crashes? This certainly can happen; but recursion can also be made benign. It is made benign by ensuring that, at each nested invocation, something is changing (typically a parameter value); and at some, finite, point, this will result in a copy of the function which does not make a further, nested, recursive call to itself, but instead terminates. That then allows its caller to terminate, and its caller in turn, so that the whole nest of recursive calls can finally unwind, and return control to wherever the very first instance of the function was called from. But recursion will only be benign and controlled in this way if you, the software engineer, make it so. Recursion is intrinsically dangerous stuff - use it with extreme caution! Term 2: Weeks 11 20 Week 11 12: Chapter 3 pp. 27 38 Week 13 14: Chapter 3 pp. 39 49 Laboratory General Comments on Lab Work The CAE laboratory itself should be used both for the scheduled laboratory exercises, and also for "private study": the primary way of studying software engineering is to do it! The laboratory exercises are crucial to this course: you should learn more from the time you spend in the laboratory than from lectures and tutorials combined. Each student attends one three-hour scheduled lab session every two weeks. You should aim to spend at least three further hours of private study time in the lab over the same period. The scheduled lab sessions are deliberately structured so that each student has a separate PC; however, for private study you may find it easier to get at a machine, and also more effective, to work together with one or two others - provided that you spend time discussing what is happening, and taking turns at actually operating the PC. For the first lab session you are required to bring along a new, unused, copybook or notebook to be used as you lab logbook. However, in contrast to most other laboratory subject, this logbook will not be used to hold formal lab reports. Instead, you will maintain the logbook just with your own personal notes; but these personal notes will serve as the basis for writing separate formal reports, which will actually be submitted electronically for marking. This is a new initiative in the academic year 1994/95, and will be explained in more detail during the first lab session. You should consult the Laboratory Report Guidelines http://www.eeng.dcu.ie/ 7Emcmullin/swe1/rptfmt/rptfmt.html for advice on how to prepare lab reports, and the criteria which will be used in marking them. Term 1: Weeks 1 10 Session 1: Week 1/2 This first session is primarily concerned with introducing the hardware and software tools which will be used in the remainder of the course. Given that you are reading these notes, you have successfully completed the first phase of the exercise! Stop at this point and read back over what you have written in your logbook. Make sure you have a reasonably complete record of what you have done so far (so that, if necessary, you could redo it without further guidance). Your next task is the following: Start up the File Manager program. Browse through its on-line help to get a general idea of what the program is and how it can be used. Using File Manager , create a new directory at the topmost level on disk drive D: with any name you wish (for example, your own forename or surname etc.). If you manage this successfully, you should be able to see the new directory listed by File Manager - for example as D: MYDIR or whatever. Try to establish what, if any restrictions apply to naming a directory - either by experiment or by reading the help. Use the Notepad program to create a text file containing a short description of your experiences during Orientation week - what you enjoyed, disliked etc. Save this with the filename ORIENT.TXT in the directory you have just created. Ensure that it is no more than 2,000 bytes long. If necessary, go back and make it shorter. Save a copy of this file on both your diskettes, in a top level directory called LAB . Delete the original file and its directory from drive D: . You will have to "format" the diskettes first. Disk formatting and file copying and deleting operations are all carried out with the File Manager : again, consult the online help for details! Using the Terminal facility of the Wnqvtwsk program, consult the DCU library On-line Public Access Catalog (OPAC) to find any books written by Donald Alcock . Login as "opac". Make a note of any you locate. Still using the Terminal facility of Wnqvtwsk change your password on the computer ugmail.eeng.dcu.ie . Use the Mail facility of Wnqvtwsk to exchange at least 2 e-mail messages with other student(s) in the class. Arrange for copies to be sent to your own mailbox also. Save these messages. Compose, using Notepad , a brief formal report on the lab session. Include copies of the file ORIENT.TXT , and the e-mail messages you exchanged. The complete report must be no more that 5,000 bytes long. Mail it to your demonstrator. This will be DemoA , DemoB , DemoC or DemoD , where the last letter is the letter of your section in the class. Arrange to save a copy of the mailed message (including the time and date stamp) on both your diskettes in the directory LAB . Note that you can specify multiple recipients for a mail message by separating them by commas, and you can send mail to yourself, though it will not be visible until you exit Mail and re-enter it again. Shut down any programs you still have running - and go for a well deserved lunch! Session 2: Week 3/4 lab2 Carry out Exercise 1 of Chapter 1 of Illustrating C . Exercise 1 Implement the loans program. This is an exercise in using the tools of your particular C environment. It can take a surprisingly long time to master a new editor and get to grips with the commands of an unfamiliar interface. If all else fails, try reading the manual. Preparation In advance of the lab, prepare a file, called WOTCOST.C , containing the program as listed on page 3 of the textbook. If you did not do this in advance - shame on you. Do it now! Now, using File Manager , create a directory called SWE1 in the root directory of drive D: . If this directory already exists you can leave it there, but delete all files and/or subdirectories which may be in it. Copy the file WOTCOST.C into this directory. The Turbo-C++ IDE tc_ide You will be using the Turbo C++ "Integrated Development Environment", or IDE, to create, compile, test, and debug (i.e. correct) your C programs. The IDE application is started up using the Turbo C++ icon in Program Manager . Note that the IDE was developed before Microsoft Windows became standardised; while the IDE uses menus, windows, dialog boxes etc. in much the same way as Microsoft Windows, the details vary significantly. Experiment, and use the on-line help as necessary, to get a better feel for how things behave within the IDE. The IDE provides various facilities for developing programs. It allows editing of program files. This is an alternative to Notepad . You will probably still find it convenient to use Notepad for creating and editing any text files other than program files; but it will be handier to use the editor built into the IDE for program files. Next the IDE allows programs to be compiled . Compiling means translating the relatively high level instructions of the program into sequences of the very much simpler and basic instructions that the microprocessor in the PC can actually execute or carry out. During compilation, certain kinds of "warning" and "error" messages may be issued. Error messages mean there is something so badly wrong with the program that compilation cannot even be completed. Warning messages indicate that the compilation could be completed, but the compiler is suspicious that the program still does not make "sense". The IDE also supports linking . Linking means combining the instructions generated from your particular program with additional "standard" sets of instructions, for doing "standard" things, like arithmetic, or displaying information on the screen, or reading information from disk files etc. These additional instructions are held in "libraries". Your program cannot actually be executed until it has been linked with any required libraries. Linking can also yield further warning and/or error messages. Note that, even if your program compiles and links without any warning or error messages, this does not at all mean that all errors have been eliminated. It simply means that you have a program which is complete or valid enough to be actually executed: but it may not do anything like what you intended. Indeed, it is virtually unknown for a program of any significance to work properly first time. So: the IDE also provides various facilities for testing and debugging the program. The program can be simply executed; but it can also be executed one step at a time; or the values of variables can be examined or modified, and so on. Hints This section contains hints on using the Turbo-C++ IDE. It would be preferable if you could refrain from reading it!!! Try experimenting first. If the results of this seem totally incomprehensible, try browsing through the on-line help. Come back to look here only in the last resort (short of asking the demonstrator). IDE Edit Windows Before you can do anything else (such as compile ) a program with the IDE you must load the file into an IDE edit window. These are very like Notepad : they provide basic text editing facilities. To open an edit window, and load WOTCOST.C into it, drop down the File menu and select the Open menu item. This will bring up a file selection dialog box, quite similar to (though also slightly different from) a conventional Windows file selection dialog. You should be able to use this to select WOTCOST.C . An edit window will be opened. You should be able to see the text of WOTCOST.C . If you cannot do this, then ask for help from the demonstrator. Once you have an IDE edit window successfully opened, experiment with editing the text, saving it, closing the edit window, and reopening it. It is possible to have several edit windows open simultaneously. Each normally accesses a separate file. I strongly discourage the use of multiple edit windows simultaneously accessing the same file: this usually ends in tears! In fact, at this stage in your programming education, you would do well to restrict yourself to a maximum of 3 simultaneous edit windows (even accessing different files), at any given time. The name of the file being accessed or edited is shown in the title bar of each edit window. Compiling Once the program text is loaded in an edit window you can compile it: that is, translate the "high level" statements of the language into sequences of the much simpler, "low level" instructions which the microprocessor in the PC can directly carry out. This so-called object code will normally be stored in a file with the same name as the source file, but the extension .OBJ . In our case then, the object file will be called WOTCOST.OBJ . Note that you will not need to copy object file(s) onto your diskettes for backup. They can always be recreated simply by recompiling again. To compile you must first make the edit window (containing the program text) active . The active window in the IDE is identified by having a double line border. Initially this probably will not be a problem: the edit window will be the only window you have opened in the IDE. But later on you will have several windows open in the IDE; then it is important to remember that compiling will only work with an edit window selected. To compile, drop down the Compile menu, and select the Compile to OBJ menu item. The compilation will then be attempted. It is quite likely that the compiler will issue several warning and/or error messages. These will be displayed in a separate Message window. The Message Window When the Message window is active, you can scroll up and down through it highlighting one message at a time. As each message is highlighted, a highlight bar will also be moved about in the corresponding edit window. The highlight in the edit window gives a rough indication of where the compiler "thinks" the problem is located. Be wary of this: the compiler (like most computer programs) is not at all smart. The most you can rely on is that the problem, whatever it is, is either in the line highlighted by the compiler, or somewhere before it . Be wary also of the actual text of the message. The compiler will do its best to diagnose the problem. It will try to give a meaningful description of it. But, it will often fail miserably at this! So treat the actual text of the message with some scepticism. A message always indicates that something is amiss with your program. But the actual problem will often be in a different place, and of a different kind, to that suggested by the compiler. If you press the Enter key while the Message window is active, the IDE will automatically make the (relevant) edit window active, and position the cursor as near as possible to where it "thinks" the problem is. The text of the message will also be temporarily redisplayed in the bottom of the edit window. This text will disappear as soon as you move the cursor around in the edit window. You can make the Message window active again by clicking anywhere over it. "Error" messages are more serious than "warning" messages. But at this stage in your education, you should consider all messages issued by the compiler as indicating problems . Do not go past the compile stage until you have managed to get a completely "clean" compilation - with absolutely no error or warning messages. If you find that you simply cannot make any sense of some compiler message (after making a sincere effort!) then call the demonstrator for help. Linking Once you have got a clean compilation, you can go on to the linkage phase. This is where, if necessary, certain standard "library" procedures or routines are added to the code produced by compiling your program. These standard routines deal with things like reading data in from from the keyboard, or writing it out to a disk file; or calculating mathematical functions such as sin() and log() etc. There are a very large number of such standard routines, or "building blocks", available. This saves individual software engineers from having to develop these standard procedures themselves from scratch. You will not be able to actually run or execute your program unless it can be successfully linked. The output of the linkage phase is called the executable file. This will normally be stored in a file with the same name as the source file, but the extension .EXE . In our case then, the executable file will be called WOTCOST.EXE . As with object file(s) you will not need to copy executable file(s) onto your diskettes for backup. They can always be recreated whenever necessary.. To carry out the linkage, first ensure that the (relevant) edit window is active. Then drop down the Compile menu. Select the Link EXE file menu item. As with compilation, problems may be detected during the linkage phase, and messages may be displayed in the edit window. In this case, however, the IDE will not be able to even guess where the problem is located in the original source file. This is because, at linkage time, the IDE is really only dealing with the .OBJ file: it is no longer dealing with the source file. It cannot relate problems now arising with the .OBJ file back to some particular location in the source file: the relationship between these two files is too complex to allow this. But: you can rest assured that, if the linkage phase results in messages in the Message window, there really is something wrong with your source program. Do not go past the linkage stage until you have managed to get a completely "clean" link - with absolutely no error or warning messages. If you find that you simply cannot make any sense of some linkage message (after making a sincere effort!) then call the demonstrator for help. Running the Program Once you have got a clean compilation, followed by a clean linkage, you can finally try "running" or "executing" your program. Remember: just because you have got to this stage does not guarantee that the program is "correct". It simply means that the remaining problems are of such a kind that the program is still "legal" - it is still executable - but it just doesn't do what you intended. There almost certainly will be remaining problems of this kind! You can run the program by dropping down the Run menu and selecting the Run menu item. The IDE window should be automatically (though temporarily) cleared so that it can be used and controlled by the program. More technically, the window is being switched to display the so-called User Screen . If the program is working correctly, it should display the message prompting for input of the relevant data. Enter suitable test numbers (separated by spaces). Press the Enter key. The program should then print out the results - and terminate! There is a slight problem here. As soon as the program exits or terminates, the IDE will automatically switch away from the User Screen , back to the normal IDE display. This happens very quickly - so that you will not have time to see whether the program has printed any result at all (never mind the correct result). However, you can redisplay the User Screen , even when the program is no longer running. Simply drop down the Window menu. Select the User Screen menu item. The User Screen will then be temporarily redisplayed. This allows you to check the program output. Pressing any key (or clicking the mouse button) will then cause the display to revert to the normal IDE display. A slightly more sophisticated, and convenient, alternative, is to display the Output window (select the Output menu item on the Window menu). This is a normal Turbo-C++ IDE window which can be moved around and resized etc., but which "mirrors" whatever output appears on the User Screen . Thus, it can allow you to still see the output even while the User Screen itself is not being displayed. Test and Debug Test your program severely and thoroughly. If possible, co-operate on this with another person in the class, if that person is at the same stage: temporarily swap machines, and each test the other's program. Most people have a natural disinclination to be severe in testing their own work - but relish the thought of demonstrating the flaws in the work of somebody else. Sad but true - and, in this case, rather useful. If this is not possible, just try to be as severe in your own testing as you can. More likely than not, this testing will reveal continuing problems with the program. Try to understand what is causing these problems, and fix them. If all else fails, ask the demonstrator for help. Do not go past this stage unless the program has behaved correctly under a wide variety of tests. STOP! If you are satisfied that you have got the WOTCOST.C program working correctly and satisfactorily, then you can go on to read the rest of the instructions below; but, if the program is still misbehaving, then stick at it - and ask for help from the demonstrator. Some Questions If you are completely satisfied that you have got the WOTCOST.C program working correctly and satisfactorily, then you can go on to consider the following questions. Are there any restrictions on the numbers that this program can deal with? Try to find out using very big and very small numbers - and anything else you can think of. What happens if you feed in negative numbers to this program? What should happen (in your opinion)? What happens if you feed in too many numbers? Too few? What should happen? What happens if you feed in non-numbers? What should happen? If you change things in the program it will generally cause compiler and/or linker errors. Try to find what is the simplest, most innocuous, change you can make which gives rise to the greatest number of weird and incomprehensible error and warning messages at compile time. Variations If you succeed in getting the basic WOTCOST.C program working, and you have tested it thoroughly, then you can go on to try the following: WOTCOST1.C : Behaves exactly like the given program, except that it first displays an additional message like "Welcome to the WOTCOST program". WOTCOST2.C : Behaves like the given program, except that it expects the interest rate to be entered already as an absolute value such as 0.15, rather than as a percentage. WOTCOST3.C : Behaves exactly like the given program - but the variable names are changed to more meaningful ones (like "principal", "years" etc.). WOTCOST4.C : Behaves exactly like the given program except that it prompts separately for each of the input values to be entered. WOTTIME.C : Similar in concept to the given program; but it is given the principle, interest rate, and monthly repayment ; it has to calculate how long the loan will take to pay off. WOTCOST5.C : Behaves exactly like the given program, except that it rejects any attempt to enter a negative value for any of the inputs. Conclusion That completes the exercise. As previously explained, the formal lab report should be prepared electronically, and submitted (to your demonstrator) by electronic mail. The report should be concise, and not repeat material in these on-line instructions. It should focus on the particular problems you experienced, and how you solved them. It should make clear what you have learned from the exercise. You will be allowed approximately 20 minutes at the end of the lab session for preparing and submitting the formal report. If you cannot complete it in this time, complete it on your own time afterwards. However, you should not spend more that one hour, in total, preparing and submitting the formal lab report. The report must be submitted within one week of the day on which your lab session is scheduled. Session 3: Week 5/6 Carry out Exercise 1 and Exercise 2 of Chapter 2 of Illustrating C . For your convenience these exercises are reproduced in the following sections of the on-line notes. Exercise 1 has also been slightly modified to conform with some limitations of the Turbo-C++ application. Exercise 1: Matrix Multiplication Program MATMUL multiplies matrices of fixed size (3 rows, 4 columns; 4 rows, 2 columns). Make the program deal with any specified sizes up to an arbitrary 50 by 50. The maximum size stipulated in the textbook is 100 by 100; but with the Turbo-C++ application, as currently configured, this is too large for it to handle. Hence the reduction to 50 by 50. Read three sizes: the number of rows of A , the number of columns of A (implying also the number of rows of B ), the number of columns of B . Read A and B by rows, then print C by rows. For this exercise you have to change each simple reading loop to a nested pair of loops. Similarly the printing loop. Exercise 2: Bubble Sort Alter the Hooke's Law anagram program to read and sort a list of numbers (type double ) into numerical order, then display the sorted list. Make the program request the length of list, then ask for the numbers one by one. Preparation You must have completed all parts of the previous lab exercise in advance of this lab session. If you are having difficulty with some parts of it, look for help! But do this before you arrive for this session. Prepare files (called MATMUL.C matmul.c and BUBBLE.C bubble.c respectively) containing the basic programs introduced on pages 17 and 19 of the textbook. Again, this must be done in advance of the lab session. You should have found that, if you click on the hot-linked file names, you can get a display of either of these files, within Mosaic. You can, of course, retype each program from this display into the Turbo-C++ IDE. However, you may also save the files directly from Mosaic, using the Save As menu item on the File menu. Which saves a bit of typing. Note the comment on the listing of MATMUL (page 17 of the text book) that, when using Turbo-C, 1992, an extra "dummy" statement must be added in to correct for a bug in the Turbo-C application itself. The header file for the mathematical functions must also be included - i.e. a include directive must be added at the top of the file. Both of these corrections are already included in the online version of MATMUL.C matmul.c . Instructions As in the previous exercise, ensure that a directory called SWE1 exists on drive D: . Create it if necessary. If it already exists, and if there are any existing files or subdirectories in this directory, delete them. Then copy your files into this directory. Do all this before starting up Turbo-C++ . Each exercise requires you to modify one of the given programs. But first, ensure that the original programs behave as expected. That is, in each case, get the program to compile and link successfully (zero error and warning messages), and then subject it to a range of tests. Make the tests as difficult and severe as you can imagine. You may find it useful to refer back to the information on using the Turbo-C++ IDE tc_ide from the previous lab session. Do not proceed with trying to develop either of the modified programs until both original programs pass such tests. If in doubt, consult the demonstrator for advice. This should take no more than the first 50 minutes of the lab session, i.e. you should be finished this by 9:50. Now you can attempt to modify the programs. In each case, plan the modifications first; then carry them out; then eliminate any compile and/or link problems; and finally, carry out a range of tests, and correct any run-time problems which are uncovered. Schedule at most 55 minutes for each of the two exercises: that is, even if you are not finished with Exercise 1 after 55 minutes (10:45), put it aside and start on Exercise 2; similarly, even if you are not finished with Exercise 2 after a further 55 minutes (11:40), put it aside and proceed to preparing your formal report. You are allowed approximately 20 minutes at the end of the lab session for preparing and submitting the formal report. If you cannot complete it in this time, complete it on your own time afterwards. However, you should not spend more that one hour, in total, preparing and submitting the formal lab report. The report must be submitted within one week of the day on which your lab session is scheduled. If you do not get the exercises completed during the lab session, make sure that you come back and complete them on your own time, before the next session. You should not include any description of this further work in your formal report. If you get stuck, look for help, either via the e-mail feedback feedback mechanism, or by coming to the tutorial tutorial ! Hints The hints given below provide some further ideas on how to go about completing the exercises. But try to do the exercises under you own steam first. If and when you get stuck, you can come back and try here for further inspiration. General As you modify either of the given programs, keep testing them (and saving them!). Do this as you go along . That is, each time you have made some kind of single, self-contained, modification, save the file, and try it out if possible. For example, if you have added a new statement, or a declaration of a new variable, then this should, as a minimum, not have introduced any new compile or link time problems. So check that this is the case before going any further. In many cases, you will be able to do even better than this. If you have just introduced a new printf() statement, for example, then you should immediately be able to check that it actually works - that it does print out what it should! And so on. In this way the scale of the "current" problems with your program is always kept small and manageable. Whereas, if you make all your modifications before trying any of them, then you are likely to be overwhelmed by the problems, and not know where to even start in trying to solve them. Whenever your program is supposed to read data in it is a good idea to have the program immediately print it back out again, before reading anything further. In this way you can separate problems to do with reading the data from problems to do with processing it. Sometimes when you have a program running under the Turbo-C++ IDE you may decide, for whatever reason, that you want to terminate it immediately rather than going through whatever procedure is required to get it to finish "naturally". For example, if the program needs to read in 50 numbers, but it already clear that the first has been mis-read, you don't want to bother typing in the remaining 49. Or (as can easily happen) you program may get into an "infinite" loop, such that it will never terminate "naturally". In such circumstances you can usually abort the program by typing Ctrl-Break . That is, press down and hold the Ctrl key (located to the left of the space bar), and then press and release the Break key (also labelled Pause : the Break legend is on the front face of the key; it is located at the right end of the very top row of keys). You may have to do this twice before it will take effect. The program should then be terminated, and control returns to the normal Turbo-C++ IDE environment. If this still doesn't kill the program, then you will probably have to use the Reset button on the computer system unit, in order to reset the whole computer. But the reset procedure is quite slow, and, of course, you will lose any modifications to files which have not been saved to disk again. So this is a very last resort! Exercise 1: MATMUL You are asked to modify the program to handle arbitrarily sized matrices (up to some maximum). You might think therefore that the arrays should not be declared until the program has read in the appropriate sizes. However, this is not possible in Arrays must have their sizes fixed at compile time. You can get around this by sizing the arrays big enough to accommodate the maximum you really want to deal with. Then, at run time, when the "actual" sizes are read in, you can arrange for the program to use only as much or as little of the available array "space" as necessary - any left over can be simply ignored. This strategy might run into difficulty if we were concerned, for whatever reason, to only use as much memory as absolutely necessary. Fortunately, this does not apply to the current exercise. The original program has four main parts: first the contents of array A are read in; then array B is read in; then the calculations are done ( C is calculated); and, finally, C is printed out. The new program can reasonably retain this overall structure, so you should try to identify for yourself, exactly how the original program is broken down into these parts. In the reading of array A in the original program, a single scanf() call is used to read one entire row (four values). This is repeated three times to read the entire array. In the new program the width of the array is not fixed; so you cannot use a single scanf() call to read a whole row anymore. Instead you will have to use a single scanf() call to read just one value. This can then be embedded in a (new) for loop which iterates it for the correct number of columns so that one complete row is read. This whole loop can then be controlled by the "original" for loop, iterating over the rows (though with some further, minor, modification). Of course, similar considerations apply also to reading in the B array, and printing the C array. Note that multiple statements can be controlled, or iterated, by a single for statement. The set of statements to be iterated is simply enclosed in "curly brackets" or braces . This is actually a very general point in anywhere a single statement can be legally used, then a whole set of statements, enclosed in braces, can be used instead. The program MATMUL introduces a new operator, written += . This is a combination of addition and assignment. The statement: x += y; is exactly equivalent to: x = (x + y); There are corresponding composite operators for all four basic arithmetic operations (addition, subtraction, multiplication, division). If you ever in doubt as to their effect, just write out the long form instead. Exercise 2: Bubble Sorting You are asked to read in, and sort, a list of numbers which are said to be of type double . double is yet another data type - like int , float and char . double is most similar to type float : it is also used to approximately represent "real" numbers. It differs from type float is that it normally has greater precision (more significant digits) and greater range (can handle larger and smaller numbers - can have a bigger absolute value in the "exponent" part in "scientific" notation). Type double is actually the most common "preferred" type for general purpose calculations. By that I mean that, if you need to represent numbers, your first choice should normally be to use type double ; and you should use something else only where there is some definite reason for doing so. Having said that, type double does introduce a few new problems of its own. Therefore I suggest that you initially ignore the reference to type double . That is, first try to get the program to read and sort a list of numbers of type float (or even of type int ). When you have got this working successfully then go back and try to modify it for type double . Be warned: when using scanf() to read values into a variable of type double , the correct format specification is lf rather than f ; but, paradoxical as it seems, you should still use f as the format specification for printing values of type double with printf() . The length of the list of numbers is not fixed in advance. But just as with MATMUL , you need to fix the size of array for holding the list to some definite value, which will then be the maximum length the program can deal with. The exercise does not explicitly state a required maximum. I suggest that you get it to work for lists at least up to 100 numbers long. The original BUBBLE.C program does not read in the array to be sorted. Instead the list is explicitly initialised in the program. However, in the modified program, the list must be read in at run time. You have already encountered code for doing that general sort of job in program MATMUL.C , so look at that for examples or inspiration. Session 4: Week 7/8 Enter, compile, and test the program WOTRATE.C wotrate.c , shown on pages 22 23 of Illustrating C . Then carry out Exercise 3 of Chapter 2. For your convenience the exercise is reproduced below. Exercise 3: User Written Functions For the math library functions sin(x) and cos(x) , the value of x must be expressed in radians. Write functions Sine(a) and Cosine(a) , for which the argument, a , must be expressed in degrees of arc. All types are double . Preparation Prepare a file (called WOTRATE.C wotrate.c ) containing the program listed on pages 22 23 of the textbook. This must be done in advance of the lab session. You should have found that, if you click on the hot-linked file name, you can get a display of the file, within Mosaic. You can, of course, retype the program from this display into the Turbo-C++ IDE. However, you may also save the file directly from Mosaic, using the Save As menu item on the File menu. Hints In testing the WOTRATE.C wotrate.c program, it is essential that you prepare a number of test cases for which you know "with certainty" what the correct answers are - so that you can decide whether the program is giving these correct answers or not. "Certainty" is hard to come by, of course. But in this particular case, you may be able to use some of the work you have done in previous lab(s) to generate "reliable" test cases. In Exercise 3 you are asked to write two new functions. Implicitly, of course, you are also expected to check whether your functions Sine() and Cosine() work correctly or not. But note that a function does not do anything unless and until it is called. So to properly complete this exercise, there must be a main() function which allows you to exercise the Sine() and Cosine() functions. For Exercise 3, your essential problem is to convert from degrees to radians - since, if you could do that, you could the use the existing library functions sin(x) and cos(x) to complete the required calculations. So: as a first step, try to write a program which just has a main() function which does this conversion from degrees to radians. Then modify this program so as to encapsulate the actual conversion proper in a function of its own. Then add the function Sine() , and finally the function Cosine . At each stage make sure that everything is working correctly: do not try to do too much at any one step. Session 5: Week 9/10: Introducing Recursion lab5 This exercise is divided into 4 parts. For each part your report should have separate sections labelled Plan , Coding and Testing . There should be a single overall Conclusion section. The first two parts represent revision of material already encountered in previous exercises: arrays, user defined functions, the for statement, the printf() function etc. The third and fourth parts introduce the notion of recursion - a function which calls or invokes itself ! Develop (i.e., design, enter, compile, test and debug) a function called print (array, size) . The parameter array is an array of integers (each of type int ) and size is the number of entries in array . The function print () should have the effect of displaying the contents of array . Each element of array should be displayed on a separate line. Use a for statement to achieve the required repetition. Develop a function called reverse (array, size) . The parameter array is an array of integers (each of type int ) and size is the number of entries in array . The function reverse () should have the effect of displaying the contents of array , but in reverse order (i.e. the last element of array should be displayed first etc.). Each element of array should be displayed on a separate line. Use a for statement to achieve the required repetition. Study the definition and discussion of the factorial function, on page 25 of Illustrating C . Enter, compile, and test the program FACTRIAL.C factrial.c . Note that this program cannot work correctly for indefinitely large integers. Attempt to identify the biggest integer for which the function gives the correct answer. Develop a function called summer(n) . The parameter n is an integer (type int ). summer(n) should have the effect of calculating the sum of all the integers up to n , inclusive. It should achieve the required repetition using recursion . Attempt to identify the biggest integer for which the function gives the correct answer. Term 2: Weeks 11 20 Session 6: Week 11/12: More Recursion This exercise is divided into 2 parts. For each part your report should have separate sections labelled Plan , Coding and Testing . There should be a single overall Conclusion section. Enter the file HCF.C hcf.c , shown on page 24 of Illustrating C . Add a definition of a suitable main() function. Then compile, test, and debug the completed program. Modify the function reverse () , developed in step 2 of Laboratory Session 5 lab5 , so that it uses recursion rather than a for statement to achieve the required repetition. Compile, test, and debug the completed program. Hints This exercise is concerned exclusively with the idea of recursive functions. It is essential that, before attempting it, you are sure you understand the concept of a function . So, before coming in for this session ensure that you have thoroughly studied Chapter 2 of Illustrating C , and the accompanying sections of the on-line notes. Before even bothering to test the HCF() function, make sure that you understand what it is trying to do (calculate the "highest common factor" of two numbers) and how it is trying to do it. Please include in your report a short explanation (two or three sentences), in your own words, of why Euclid's method of calculating the highest common factor works. Note carefully the introduction of a new operator in the HCF() function - the remainder operator, denoted in with the symbol . Thus, in dealing with integer arithmetic, we have essentially two different kinds of division operator: the quotient operator (denoted by / ) and the remainder operator denoted by . This distinction does not arise in floating point arithmetic, where only the quotient operator is required (why?). To test the function HCF() you must first provide a main() function in order to form a complete program (remember: every complete C program file must contain the definition of a function called main() ). In this particular case, the main() function needs to do just whatever is required to allow you to decide whether the HCF() function is working correctly or not. Minimally this means that it must call the HCF() function with some particular arguments, and should then print out the result which is return 'ed by HCF() . But, of course, to properly test HCF() you will need to test it with a range of different arguments. Think carefully about the selection of test cases. For example do you expect there to be some biggest numbers for which the HCF() function will work? Smallest numbers? Should it work if either of the arguments is zero? Should it work for negative arguments? Does it matter which argument is given first - the bigger or the smaller? And so on... One option is to simply edit the main() function, and rebuild (compile and link) the program before running each test. This is simple and perfectly satisfactory. But if you have the time, and wish to be a little more adventurous, you might try to write a main() function which would prompt the user to type in test arguments, and then feed them into HCF() : this would allow a whole range of tests to be done simply by re-running the program (i.e. without having to rebuild each time). In the second part of the exercise you are required to develop a variant on a function you were originally supposed to have developed in the previous lab session. It is therefore vital that you actually complete that previous exercise before coming in for this lab session! You are required to use recursion to implement the repetition required in reverse () . The essential idea of recursion is that a function actually calls or invokes (a copy of) itself. The way that can be used in this particular exercise is as follows: suppose that a single call to reverse () just prints the last element of the array it is given (assuming the array is not of zero length), and then calls itself (i.e. another copy of reverse () ) to print the rest of the array (i.e. passing on the same array it was given, except "minus" the last element). And if reverse () is called with a zero array length argument, it should simply return immediately - that allows the recursion to "unwind" with all the copies of the function then exiting in turn, until we finally get back to the main() function. Of course, the idea of passing on the array "minus" the last element can be achieved simply by passing on a array length argument which is one less that the argument which has been passed in - the array "itself" does not actually have to be modified. Session 7: Week 13/14: Lab Test 1 The notes for this test are provided in a separate document entitled Laboratory Test 1 ../test1/test1.html . Session 8: Week 15/16: Encryption "Encryption" is the process of disguising a message or text so that its meaning will not be apparent to anyone who may intercept it, accidentally or otherwise. In this exercise you will work with a very simple encryption procedure. This is a called a substitution cipher, in which each letter of the original message (the "plaintext") is replaced by a different letter to generate the coded message (the "ciphertext"). To simplify matters further, we will restrict our plaintext to use only the upper case letters A to Z , and the space and newline characters; furthermore, space and newline characters will be left unchanged by the encryption. In each case, the replacement letter will be a fixed number of letters later in the alphabet compared to the original letter (and "wrapping around" back to A again after Z as necessary). The number of letters to count forward is called the key . Thus, with a key of 3, we would replace A by D , B by E and so on, with, finally, W being replaced by Z , X by A , Y by B and Z by C . This simple kind of cipher is sometimes called a "Caesar Cipher" after Julius Caesar, who is said to have used it for secure battlefield messages. To implement a Caesar cipher with a program, note that, when characters are read in with the getchar() function, what you receive (in int form) is the ASCII numeric code. The upper case letters A to Z have consecutive numeric codes from 65 to 90. Thus, given a particular key value (a number from 1 to 25) simple addition will encypher a letter as required - except that if the sum is greater than 90 you must "wrap around" to the start of the alphabet again by subtracting 26. You are required to develop a program which will input a stream of characters, using getchar() , and encode them with a Caesar cipher; the ciphertext should be output with putchar() ; the program should terminate when getchar() signals that the end of the input stream has been reached. The key for the encoding should be represented in your program with a symbolic constant. That is, using something like define KEY 5 , or const int key = 5 . See page 34 of Illustrating C . You may find it handy to be able to run your program on an input file (and producing an output file ) rather than working only with the keyboard and screen. There are various ways of doing this, but one relatively simple method is described in the hints hints8 section below. Test your program carefully. Record in your report log book what tests you carry out, what results you expect, and what results actually occur. Now think about the problem of cryptanalysis: that is, suppose you are "the enemy" and have intercepted a transmission encyphered with a Caesar cipher. How would you set about trying to decipher it? You may wish to challenge a colleague in the class to see who can decipher the other's ciphertext message most quickly. Record your conclusions in your report. vigenere Finally, if you have time, try to develop a program to implement the somewhat more sophisticated Vigenere cipher, invented in the 16th century by Blaise de Vigenere. This has the same basic idea as the Caesar cipher, but the key no longer stays the same throughout the message: instead it follows some periodic cycle. A Vigenere key is thus not a single number but a sequence of some fixed length - say 10, 3, 25, which would mean the first letter should be shifted by 10, the second by 3, the third by 25, and then back to 10 again etc. Hints hints8 Perhaps the simplest way of arranging for your program to take its input from a file, and/or route its output to a file, is to use so-called command line redirection . To do this your program must already have been compiled successfully, resulting in the creation of an executable version of it. If your source file were called foobar.c , the executable version would be in a file called foobar.exe . The executable version must be run from the MS-DOS command line interpreter application (and not from within the Turbo-C++ IDE ) in order to use redirection. So you should start up an MS-DOS Window by double clicking on the relevant icon in Program Manager. Within this window you will first need to set the "current" drive and directory to where you have stored the executable program. This will normally be the directory swe1 on drive D: , so issue the following commands: C: d: D: cd D: Now execute your program, with a command something like this: D: foobar out.txt The < serves as the "input redirection" character in an MS-DOS command line. The effect is that the input for foobar will no longer be taken from the keyboard, but from this file instead. Similarly, the > character in an MS-DOS command provides for output redirection, to direct output to a file instead of the screen. With any given command invocation you can use input redirection only, or output redirection only, or both together. Note that if you change your program you must rebuild it it again, from within the IDE, before trying to rerun it in the MS-DOS application. To close down the MS-DOS application, give the command exit thus: D: exit Session 9: Week 17/18: Projectiles The purpose of this exercise is to calculate the theoretical trajectory of a projectile in a constant gravitational field. In other words, we want to calculate the path followed by, say, a ball thrown in the air. We are going to make some simplifications. We will work in just two dimensions (i.e. as if we are looking exactly side on at the path of the ball). We will also neglect air resistance, and will treat the projectile as a point mass. This allows a simple analysis of the motion of the projectile, using newtonian theory. In our two dimensional ( x and y ) coordinate system, x will denote horizontal position, and y will denote vertical position. The initial value of x will be taken as zero. A zero y value will correspond to the projectile being on the ground. The initial value of y will be some positive value (in effect, the height of the person throwing the ball). V_x will denote the horizontal component of velocity, and V_y the vertical component. We will assume that both V_x and V_y are initially positive. Similarly, A_x and A_y will be the components of acceleration. The "initial" information provided for a particular run of the program (by assigning explicit values to suitable variables) will be the initial height ( y ) and initial velocity components ( V_x and V_y ) of the projectile. These values will be assigned to suitable variables. You can either code values directly into the program (in which case you will have to change the program, and rebuild, every time you want to try different values) or you can have the program prompt for and read in values when it starts running. The output from the program will be a print out on the screen of values of x and y at successive, "small", intervals of time. The program should terminate when y becomes zero again (i.e. the projectile falls back to earth). This print out may be captured into a text file (see the previous discussion of redirection hints8 ) and a graph plotted of y versus x (one way of doing this is described in detail in the Hints hints9 section below). Informally, the motion of the projectile is pretty simple: it moves steadily to the right (positive x direction), simultaneously going up for a while (positive y direction), and then coming down again to land. Technically, under the various idealizing assumptions we have made, it can be shown that the trajectory will be parabolic . Formally, at each time step, the new y co-ordinate can be calculated as: y(t+1) = y(t) + y(t) where (approximately): y V_y t In words, if the velocity in the y direction is V_y , then, in a time t , the projectile will move a distance of V_y multipled by t ; and the new y position is the old y position plus this movement. This calculation would be exact if V_y were constant ; if V_y is changing (as it will be in our case) then the calculation will still be approximately correct, as long as as t is "small" - namely small enough that the percentage change in V_y in that time is "negligable". Of course, you can only estimate this once you have some idea of how quickly V_y will be changing... The new x co-ordinate may be calculated similarly. Now, in principle, both V_x and V_y could also be simultaneously changing, and have to be recalculated, at each time step. But since we are neglecting air resistance, and since gravity works straight down, there will be zero acceleration in the x direction ( A_x is zero), so that V_x will actually remain constant , equal to its initial value. There will be acceleration in the y direction however, of value -g , where g is the acceleration due to gravity at the surface of the earth (about 9.81 meters per second per second). This is negative because our y co-ordinate is positive in the upward direction and gravity works downward (last time I checked anyway...). This acceleration in the y direction will be constant (since g is not changing). Thus, the new V_y at each time step can be calculated as follows: V_y(t+1) = V_y(t) + V_y(t) where: V_y = A_y t That is, the change in the (vertical) component of speed is simply the (vertical) acceleration multiplied by the time it acts over; and the new speed is simply the old speed plus this change. This change will, of course, be negative, because A_y is negative; to put that another way, what goes up must come down! You should test your program carefully. Explain, in your report, what tests you carry out, how the program behaves, and whether it has passed such tests. In the first place, the repetition or iteration in your program should be achieved using a while statement. However, if you succeed in getting this working satisfactorily, you should then experiment with using a for statement instead. Hints hints9 Sometimes you may find it useful to view the output of a program in the form of a graph. There are a variety of ways of doing this. One simple way is using a utility program, provided on the PC's in the CAE Lab, called graph . This program should be run as a full-screen MS-DOS application. To do this, first open an MS-DOS window by double clicking on the relevant icon in Program Manager. Then make this window take over the full screen by pressing Alt-Enter (i.e. press and hold the Alt key while you press the Enter key also). If your program's output has been stored in a file called out.txt , you can then see a graph of this simply by giving the following command: D: graph out.txt Note that graph expects each line of its input file to contain just two numbers. The first is taken as the "x" co-ordinate, and the second as the "y" co-ordinate, of a single point to be plotted. graph will automatically scale the axes in order to fit in all the data points. You can exit from the graph program by pressing the Esc key. You can make the MS-DOS application revert from full-screen to a window by pressing Alt-Enter again. Session 10: Week 19/20: Letter Frequencies In a world where there are code makers there will, inevitably, also be code breakers . One of the simplest tools in the armoury of a code breaker is a program to count the relative frequency of each letter appearing in the cyphertext. For any given language there will be a characteristic distribution of letter frequencies in the uncoded message (the "plaintext"). The most commonly used letter in English is e , by a wide margin; t is in second place, with a and o nearly tied for third; i , n and r are also very commonly used. If we know that a coded message uses a simple substitution cypher (such as the Caesar cypher we saw previously) then a simple count of the relative frequencies will allow us to make a fairly good guess as to the letters which have substituted for the most common english letters. Often this would be enough to allow the remaining substitutions to be easily guessed. Of course, real cyphers are much more sophisticated and harder to break than this (how would you tackle a Vigenere vigenere cypher for example?). But letter frequency counts still form an essential tool for code breaking, albeit in conjunction with many other techniques. Part 1: Counting Your first task in this exercise is to write a program which will read a file (using getchar() as usual), and count how many times each alphabetic character occurs. It should ignore distinctions between upper and lower case, and should ignore all non-alphabetic characters. That is: both a and A should contribute to the count for the letter "a"; and non-alphabetics should not contribute to any count. When the file has been completely read, the program should output a table showing the count for each of the letters A to Z . As usual, you should test your program carefully, and document any problems etc. in your report. Part 2: Statistics Next you are required to enhance the basic program (rename it at this stage, to avoid overwriting your first program!). This enhanced program should calculate the total number of all alphabetic characters read, and output, for each character, both the actual number of times it occurred, and the percentage this represents of the total number of alphabetic characters encountered. Again, test and document the program carefully. Part 3: Graphing Finally: develop a further version of the program, which, instead of outputing the relative frequencies as numbers , outputs a histogram . This will be made up of lines of varying width (made up of asterisk characters) one line for each letter, from A to Z . The letter with the most occurrences should be scaled to give a line with, say, 60 *'s. Then the lines for all the others should be shorter in proportion. The display might look somewhat as follows: A ************* B ****** C ** D *** E ******************** F ****** G * - and so on. Note that histograms would normally be plotted with the independent variable oriented vertically, whereas I have described an horizontal orientation here. If you have time, try to alter your program again, to yield a histogram with a vertical orientation. Tutorials tutorial A voluntary, "walk-in", tutorial will be offered each Tuesday, from 10.00 11.00 pm, in the CAE lab. At least one of the instructors and/or demonstrators will be present at this time. This is an excellent opportunity to look for help in completing any lab exercises which you have outstanding, particularly if you are stuck with some problem you simply can't seem to overcome. You can, of course, submit queries by e-mail at any time (see Feedback feedback ); but this may be impractical for various reasons (e.g., if your problem is that you can't get the mail system to work!). In such cases, the tutorial is your opportunity to get the problem sorted. Assessment The course will be assessed by way of two supervised Laboratory Tests . The first of these will take place at the start of Term 2, the second at the end of Term 3. The overall subject mark will be a weighted sum of the marks achieved in these two tests: 25 from the first, 75 from the second. The format of the tests will be very similar to the format of the normal lab exercises. Each student will work alone. The tests will be of three hours duration. Instructions will be accessed via the World Wide Web . The work to be done will be derived from work done during the normal lab sessions - though it will, of course, differ in detail. A report will be required, submitted electronically. This report will form the basis for marking the test. Detailed information on the format of the report, and the marking scheme to be used, are available in the Laboratory Report Guidelines http://www.eeng.dcu.ie/ 7Emcmullin/swe1/rptfmt/rptfmt.html . Students who are repeating this course in 1994/95 will be offered the option of being assessed on the previous model (three hour written examination) instead. All students who intend to avail of this alternative should confirm that in writing to me as soon as possible. Feedback feedback You can give me feedback on any aspect of the course, or ask questions etc., by sending e-mail to mcmullin@ugmail.eeng.dcu.ie . I normally check this mailbox at least once every day, and will respond as quickly as I can. This is far and away the most reliable and effective way of getting in touch with me. Other On-Line Resources This complete document is available in LaTex swe1root.tex LaTeX is a typesetting mark-up language, particularly suitable for hierarchically organised documents, and documents using mathematical notation extensively. It is a de-facto standard in academic publishing. A LaTeX source file is still a form of ASCII, but with typesetting commands interspersed with the "raw" text. It can be read with any tool which can display ASCII files. and plain ASCII swe1root.txt forms. Thus, you can download the document and browse it offline. However, note that it is rather large (c. 200KBytes) - you would not be able to load it with, for example, the MS-Windows NotePad application. But you could view it with the built-in editor of the Turbo-C++ IDE. Alternatively, you might want to install a copy of the Programmer's File Editor ftp://ftp.hea.ie:/pub/windows/programr/pfe0507.zip . This is a powerful (and free!) professional text editor for Windows. It does the same job as NotePad, but much better! There is a separate detailed document dealing with the C Standard Library http://www.eeng.dcu.ie/ 7Emcmullin/swe1/stdlib/stdlib.html . Again, it is also available for downloading, and offline browsing, in LaTex stdlib.tex and plain ASCII stdlib.txt forms. There is also a very wide range of resources available elsewhere on the Internet, on the subjects of and C++ programming. Pointers to many of these have been collected on the Web Page called Learn C/C++ Today http://vinny.csd.mu.edu/learn.html . If you are looking out for further elaboration or alternative approaches to the C language I strongly recommend that you have a browse through the materials identified by this site. Copyright This Hypermedia Document is copyrighted, 1994, 1995, by Barry McMullin http://www.eeng.dcu.ie/ 7Emcmullin/home.html . Permission is hereby granted to access, copy, or store this work, in whole or in part, for purposes of individual private study only. The work may not be accessed or copied, in whole or in part, for commercial purposes, except with the prior written permission of the author.