Input and Output: stdio.h

stdio.h ("standard input/output") provides basic functions for accessing data "external" to a program. This naturally includes data in files on disk, but also covers data coming from the keyboard, or displayed on the screen, or data routed via any of the other input/output "ports" of the computer. All of these sources or destinations for data may be generically referred to as streams, or, more simply, files.

Files are classified into two kinds: text and binary. A text file is divided into "lines", where each line has zero or more characters, and is terminated by a newline character '\n'. A binary file is simply a sequence of unprocessed or uninterpreted bytes, with no line organisation superimposed upon it.

Files are accessed through data structures called file pointers. Technically, a file pointer is the address of an object of a special type, denoted FILE. This type is defined by the stdio.h header file, and will not be recognised by the compiler unless the header file has been #include'd.

A file pointer must be created and associated with each particular external disk file, or input/output port, before any data in that file can be accessed ("read" or "written"). This process of creating a file pointer and associating it with a disk file or input/output port is called opening a file, and is performed by the fopen() function described in more detail below. The return value from fopen() is the value of the file pointer, and must be stored in a suitable variable to allow the file to be accessed subsequently.

Files are opened for access in a particular mode: reading, writing, or (occasionally) both. The file is treated as a sequence of characters (or, more generally, bytes). When the file is first opened it is positioned at the very start; the contents can then be read or written in sequence until the end of the file. With certain kinds of file (namely those stored on disk) it may be possible to "reposition" the file in an arbitrary way - go back to the start, or directly to the end, etc. If a file if opened for writing then its previous contents, if any, will normally be lost.

A program may have many files open at any given time, each one associated with its own distinct file pointer.

File contents can be read, one character at a time, in sequence, with the fgetc() function (provided the file is in read mode); or written with the fputc() function (in write mode). There are also a variety of more sophisticated reading and writing functions such as fprintf() and fscanf(), which typically read or write many characters in one go, and automatically translate between external "text" representations, and internal "binary" representations, for the various native data types (int, long etc.).

When reading a file it is necessary to be able to recognise when the end of the file is encountered. This is signalled by fgetc() yielding a special, reserved, return value, with the symbolic name EOF (this value is also - somewhat misleadingly - the return value if fgetc() encounters any kind of error or exception condition).

Now this raises a problem: if a special value is reserved to denote the end of file condition, does this mean that this special value can never actually be stored within a file (since, once it is read, it would be mistakenly taken as signalling that the file is ended)? If so, this would be a serious restriction.

In practice, this problem is solved in a rather ingenious, but also subtle and confusing manner. A file is treated as a sequence of values of type char; but the return value from fgetc() is actually made of type int. Normally this is just an "encoded" or "numerical equivalent" of the char value read from the file. But since int supports more distinct values than char does, it is possible to encode EOF as a int value than has no "equivalent" in the char data type. In this way, a file can contain any arbitrary char values, without restriction. But, in turn, this means that it is very important that before the return value from fgetc() is transformed back into a char value (e.g. by assigning it to a variable of type char) it must be tested to see is it EOF.

For consistency with this behaviour of fgetc() many other functions in the Standard Library (such as fputc() for example) also use the int data type to effectively deal with char values, but allowing also for the special EOF value.

As I said, this mechanism is quite subtle; in my opinion it may be just a little bit too subtle! It involves the programmer in relying, willy nilly, on automatic conversions between types - a practice of which I am generally severely critical. Unfortunately, this is now a de facto standardised way of doing things with the Standard Library, and cannot be helped at this stage. Nonetheless, it certainly helps if you at least understand what is going on.

Once the program is finished processing a particular file, the file should be closed with the fclose() function. This essentially involves discarding the data structure of type FILE which was associated with the file: it is therefore very important that, once a file is closed, the file pointer which was associated with it is not used again (since it no longer points at anything meaningful!). Files are automatically closed when a program terminates in a "controlled" fashion (either by reaching the end of the main() function, or by an explicit invocation of the exit() function).

There are three files which are opened by default, and automatically, for every program. File pointers associated with them are defined in stdio.h, as follows:

: stdin : The standard input file. By default this is a read mode file, associated with the computer keyboard. However, when the program is invoked from the DOS command line, this file may be "redirected" to be associated with, say, a disk file, using the DOS input redirection operator < .
: stdout : The standard output file. By default this is a write mode file, associated with the computer display screen. Again, however, when the program is invoked from the DOS command line, this file may be "redirected" to be associated with, say, a disk file, using the DOS output redirection operator > .
: stderr : The standard error file. This is a write mode file, associated with the computer display screen. It differs from stdout in that it cannot be redirected in any simple way under MS-DOS.

A program can do input and output on these three files immediately, without having to call fopen() first. The three files have certain conventional usages, as the names imply. Thus many programs take one input file, or stream of data, and transform it in some way into one output stream of data. If this is the case, the input data would normally be read from stdin, and the output data would be written on stdout. stderr is reserved for signalling "errors" or exception conditions, separate from the "normal" output. Programs written in this way can then be conveniently connected together in "pipelines", with the output from one being automatically routed as the input to another; in DOS pipelines are set up with the pipe operator | .
Thus, for example, suppose we have two programs part1 and part2 which perform two separate transformations on a data stream. The simple (DOS) command:
D:\ part1
would cause part1 to be started up, reading its input from the keyboard, and writing its output to the screen.
By contrast, the command:
D:\ part1 >out.dat <in.dat
would cause part1 to read its input from the disk file in.dat and write its output to the disk file out.dat. Note that, despite the redirection of stdout, any error or exception messages, written to stderr, would still appear on the screen.
Finally, the command:
D:\ part1 <in.dat | part2 >out.dat
would cause the output of part1 to be automatically routed as the input to part2, whose output would then finally be routed to out.dat.
Thus, writing programs to take input from stdin and write output to stdout (and errors etc. to stderr) means that they can be subsequently mixed and matched in a very flexible way to achieve a variety of different effects, and is a very powerful idea. But even if you have no intention of linking programs in this way, it will still often be convenient to use stdin as a source of keyboard input, and stdout (and/or stderr) as a route for output to the screen. There are specialised versions of several stdio.h functions which access one or the other of these files by default (e.g. printf() is a variant of fprintf() which automatically writes to stdout).
A small subset of the functions declared in stdio.h will now be described in more detail.

Document: The C Standard Library

FILE *fopen(char *filenamechar *mode)

The C Standard Library

void exit(int status)

McMullin@eeng.dcu.ie
Fri Mar 29 14:35:38 GMT 1996