A Style Guidebook for

The "C" Programming Language

in The UNIX Software Environment

D. M. Sunday, J. C. Noble

The Johns Hopkins University
Applied Physics Laboratory

Copyright 1995, JHU/APL

C-LANGUAGE CODING STYLE

The C-language coding style guidelines cover the comment header block format for source code files, the layout for a C- language main() function, the usage of include files, the modularity of source code structure (function and global variable usage), the internal conventions used for C-language statements (comments, naming conventions, and the details associated with what appears where on a page), and the portability of C source code.

The guidelines presented represent characteristics of a "good" programming style; that is, a programming style that is at once easy and natural to use as well as promoting correctness and clarity. In some cases, however, the conventional stylistic practices of the C programming community have been adhered to instead of recommending a (possibly superior) style guideline. This needs no apology, as it provides for consistency with the published literature on the C language (both in textbooks and existing code). Additionally, the conventional stylistic practices of the C community have been successful in that they have produced a great deal of usable, understandable, and maintainable code.

A.1 C-Language Program Heading

All C-language source code files must contain a documentation comment block at the beginning of the file. This comment block should have the same format for all source files produced and written in support of the project. The block's contents must identify the project and programmer, and describe the functions coded in the source file. A minimal prototype documentation header for a C-language source code file is as follows (text in italics needs to be filled in or is optional):

static char rcsid[] = "$Header$";
/*

* Project: project name
*
* Program: program name - and UNIX-like summary line
* File Name: the name of this file
* Purpose: state what the functions in this file do
*
* Synopsis (Usage and Parameters):
*
* Functions that are defined in this file should be listed here
* along with the parameters (and their types) that they are invoked with.

*
* Description:
* Describe what the functions in this file do, and what they have
* in common. Describe any common data structures they operate on.
*
*
Files:
* List data files that are used by functions in this file.

*
* See Also:
* List other functions that the ones in this file are related to
* in some meaningful (complementary) manner.
*
*
Bugs:
* Admit them up front, and avoid surprising everyone.

*
* Programmer: Joe Hacker
* Organization: JHU Part-Time Programs in Engineering and Applied Science
* Host System: UNIX 4.3 BSD or Sun OS 4.x
* Language: C
* Date Created: mm/dd/yy
* Modifications:
* $Log$
*/

If the C code in the file is to be compiled as part of a function library, instead of an executable program, this should be shown by replacing the Program: line of the comment block with:

* Library: library name - and summary line

All non-optional sections of the header block must be filled in and kept up-to-date. This is especially important for the mandatory Synopsis: and Description: sections. Additional optional sections are viewed as amplifying the Description: section. Optional header block sections are shown for Files:, See Also:, and Bugs:. These sections, if present, are to be filled in according to the conventions established for UNIX manual pages. Other optional sections may be added if deemed necessary. For example, if a routine had a long list of error conditions, then a new section called Errors: could be added.

A blank copy of this documentation header block should be kept in a file in each programmer's home or project directory to be used as a template whenever a new source file is created. This template can be partially filled in with the project name, programmer name, and other constant information. For consistency, a recommended name for this template file is .stdheader. On some systems, there is a software tool, mkheader, that will create a standard comment header for you.

The layout of the documentation header purposely parallels that of a standard UNIX manual page. Using this format simplifies the generation of conventional manual pages when they are required for documenting the code developed. On some systems, there is a software tool, mkman, that will automatically generate a UNIX-style manual page from the documentation comment block. With this in mind, the comment header for the main function file of an executable program must read like a manual page describing the usage of that program. However, if the functions in the file are to be compiled and archived as part of a function library (instead of a program), the comment header must then read like a manual page for the usage of those functions by another programmer who might use the library.

If a source file contains several related functions, the documentation header should describe the common and related actions they perform, and any data structures they share that are contained only in this file. In particular, a detailed description of the shared data should be given; otherwise, since the data is not part of any one function, it will not be described adequately in any one of them. The detailed description may require reading comments associated with the actual declarations to be complete.

A.2 Main Function File Layout

The main() function of a C-language program should read like a top-level description of the program. The layout of a C language main function file should be as follows:

1. Whenever the source code is to be maintained under a control system, such as RCS or SCCS, the first line of the file must be a control system id line. For example, when using RCS, one should have the following declaration:

static char rcsid[] = "$Header$";

2. A filled-in header comment block must appear at the start of the file. All the blanks should be filled in. The information in a main function file comment header should read exactly as a UNIX manual page, since it may be used to generate the manual page describing the resulting executable program.

3. The following items should follow the comment header and precede the main() function definition. They should appear in the given order.

a. #include <stdio.h>

b. other #include statements

c. any #define statements used only in this file.

d. the declaration and initialization of program global variables. Note that an alternate accepted practice is to put all globals in a "globals.c" file (especially if large structures need to be explicitly initialized).

e. extern declarations for global variables used only in this file.

4. The main() function definition should then be given. It should be preceded by at least 2 blank lines to set it apart. A comment describing it is not needed since this should already be done in the comment header.

5. In the main() function, command line argument handling should be done with the getopt() function that is now available and standard on all new UNIX systems. If getopt() is not available, an equivalent should be specified by the lead programmer of the project.

6. The main() function should terminate by calling exit(val) explicitly, where val is the termination status of the program. All programs must return an exit status of zero on success and nonzero on error or failure. Further, ideally, all exit() calls should be made only from the main() function, so that all program exit points are apparent at the top level of the code. Exceptions to this rule should be justified, approved by the lead programmer, and clearly documented. However, functions that make up a library to be used by other programmers must never call exit(), but must return an error code instead.

The recommended layout for a C-language main() function is typified by the prototype example in Section 7. Note that the example shown is a template that you could start with, but that you should then edit to suit your own requirements. This template can be kept in a file in each programmer's home or project directory to be used whenever a new main function file is created. This template can be partially filled in with the project name, programmer name, and other constant information. For consistency, a recommended name for this template file is .stdmain. On some systems, there is a software tool, mkmain, that will create a prototype main function file for you.

In general, main() functions should be short. They are usually only a few pages long, rarely exceeding 4 pages in length, and often only one page of C code.

A.3 Include Files

Include files should contain the following items when these items are necessary for a given program and shared among two or more of its files:

#includes

#defines

typedefs

structure definitions

extern declarations

function prototypes

Declarations of structures and extern declarations of global variables used for communications between functions in a program and their callers should always be placed in an appropriately named ".h" file. An alternate accepted practice is to declare externs explicitly in the files where the corresponding globals are referenced. Which practice is adopted should depend on the extent of the scope of the variables concerned. In particular, if they are used often in many functions and diverse source files, then the extern declarations should be made in a ".h" file that is included in all other files.

If any #defines of the same symbol occur in more than one file, then they must be placed in a common include file. Additionally, any configuration items, such as pathnames for directories or data files must be specified in a #define statement and kept in a ".h" file; for example:

#define DBDIR "/usr/local/lib/datadir/"

should appear once in a ".h" file. If there are many configuration items, a separate header file "conf.h" should be used to contain only them.

Include files should hardly ever contain other nested #include directives. However, this should be done for the purpose of using the defined constants or structures from another include file. This avoids the problem of having the same thing defined multiple times in multiple places, and results in more manageable code. Also, if one is developing a program or library with many files, it is better to have all the system #includes localized in one header ".h" file that is then included in each source files. Again, the purpose is to avoid redundant difficult-to-manage code.

To prevent multiple inclusions of the same header file, it has become a standard practice to enclose the contents of any ".h" include file in the following construct:

#ifndef _INCLUDENAME
#define _INCLUDENAME Contents of include file #endif _INCLUDENAME

A.4 Program Modularity

A.4.1 Functions

A C function should do only one thing, and should be kept short. In general, a function processes one or more inputs and generates one or more outputs, where each of the inputs and outputs can be concisely described, and there should only be a few of them in number. A function (or its calling sequence) may be too complex or too long, and may need to be decomposed (or redesigned):

a. if its length is greater than two pages,

b. if it requires too many parameters (more than 4) when called,

c. if heavy use is made of internal variables (whose scope is less than the entire function),

d. if its internal logical nesting is too deep (say > 4 levels).

e. if a it cannot be described with a single English verbobject pair without an "and".

A function should be designed with a natural, easily-remembered calling sequence. Functions with more than four arguments are not recommended. Calls by reference, where an argument is the address of a referenced variable (except for strings), should be minimized. The intent of this constraint is to minimize the number of variables whose values can be changed invisibly inside a function. Whenever possible, variables values should be changed by assigning the return value of the function to them. Also, functions should not have opcode arguments, where one argument determines the interpretation of the others.

Collections of functions that form a package can be kept in the same file. Declarations of variables that are to be shared among several related functions in a file should be placed at the beginning of the file. Any functions that are only used within the scope of a specific file should be declared "static" in that file to hide them from functions in other files.

When appropriate, functions should return a meaningful value. If not, a function should explicitly be declared to return the void data type. Except for numeric functions, a negative return value (usually (-1)) can be used to show that an error has occurred. When a function is called, its return value should be checked whenever an error is possible.

A function should never contain an exit() call to handle an error condition, unless there is a clear justification for doing this (for example, in an interrupt handler intended to process a termination signal). When a function is to be included as part of a function library used by other programmers, the exit() call must never be used under any condition.

A.4.2 Global Variables

Global variables, if used at all, should be used carefully. Use of globals should be minimized or eliminated by judicious use of parameter passing. Using global variables results in couplings between parts of the program that may not be obvious to the people maintaining the code. A global variable should only be declared and used when it makes the code significantly easier to understand. All globals should have high visibility and be tightly controlled. When globals are used only to reduce the number of arguments in function calls, redesign and/or the use of structures to pass arguments with a single name should be considered first. Further, whenever possible, global information hiding should be practiced by declaring static globals within a file, and accessing them with interface functions.

All global variables (even statics) should have names that begin with one upper case character, followed by lower case characters, and are logically capitalized thereafter (e.g., SymbolTable, or Symbol_table).

The definition and optional initialization of each global variable should occur in only one file for each program. There are several acceptable methods of organizing global variable declarations:

1. Declare all global variables at the beginning of the main function file of an executable program.

2. If a global is clearly associated with a specific file, it may be defined in that file. In such cases, the global should generally be declared static with its scope restricted to that file. It can still be accessed or modified from outside the file with interface functions to "set" or "get" its value.

3. If there are many global variables (a dozen or more) in a program, they may be declared in a separate file called "globals.c". However, the usage of many global variables is often an symptom of poor program design. If this is the case, you should probably redesign your program. Nevertheless, a globals.c file is often useful when initializations of large structures need to be done.

4. Extern reference declarations of global variables can be made in several specific places:

a. within the function blocks in which they are used.

b. at the beginning of any files in which they are used.

c. in a header file (sometimes called "globals.h") that is included in any file in which the global is accessed.

The choice of method used should depend on the scope of the global variables involved.

A.5 Details of C-Language Coding

Write code that is clear and safe, rather than code that you think is clever. It is easy to write totally incomprehensible code in C. It isn't too much harder to write brilliantly clear code, and the code is worth a lot more later. Remember that overly clever and complex C constructs usually compile down to the same machine instructions as more straightforward code.

A.5.1 Comments Within a Program

Source code should contain comments to make it more comprehensible. Programmers must include comments whenever it would be difficult to understand the code without the comments. However, one should avoid excessive overcommenting as this only hides the real code, and makes it more difficult to debug, modify, and maintain. The following practices regarding commenting are recommended.

Full English sentences are recommended for internal comments.

It is often clearer and more readable to make small blocks of comments as "paragraph headers" to a block of code than to comment every line or so, especially if the code is straightforward and untricky (which it should be).

A function definition must be preceded with a comment explaining its usage, unless this has already been done in the header comment block of the file. This comment may either be a block-style comment preceding the function name; or, if clear enough, a one-line comment on the same line as, and following, the function name.

A comment should follow the declaration of a variable whenever it clarifies its usage. A global variable declaration should always have a comment explaining its usage.

Anything non-obvious should be well-commented or avoided.

Anything obvious should not be commented, as this only serves to obscure relevant comments, the code itself, and the natural flow of the logic. For example:

	x += 1;		/* add one to x */
is a useless, unnecessary comment.

On-line comments within a segment of code should be aligned (by tabs) to the same physical starting location on the line, as well as possible.

No comment should be followed by program code on the same line.

Always leave a space between the text of a comment and the delimiters /* and */.

A.5.2 Naming Conventions

Names should be chosen for their mnemonic nature. Names should be at least 4 readable (pronounceable if possible) characters unless they are obvious conventional counter variables (like i, j, k, etc.) It is usually desirable to make names distinct in their first seven (7) characters as an aid to comprehension and portability.

Avoid non-meaningful names like flag, counter, state, etc.

It is a useful C convention to use upper-case for #define, typedef, and enum names. One exception to the above is for parameterized #defines. Traditionally, these have been in lower-case, but some programmers now prefer upper case. Either practice is acceptable as long as one is consistent.

All global variables (including static globals within a file) should begin with a capital letter, continue with lower case characters (to avoid confusion with #defines), and be logically capitalized thereafter (e.g., SymbolTable). Underscores are often useful in making a name more readable (e.g., Symbol_table).

All local variables should begin with a lower-case letter. Fields within structures should also be treated in this manner.

Two distinct variables must have two distinct spellings, and not merely be distinguished by case. In particular, a global and a local variable may never share the same variable name with the global one capitalized and the local one in lower case.

All function names should begin with a lower-case letter, and are usually completely lower-case with no embedded capitals. Underscores are often used to make function names more readable.

It is a recommended practice to use nouns for variable names, and verbs for function names. This has the effect of making the code much easier to read.

In large programs, it can sometimes be useful to use a data naming scheme that establishes an initial prefix (often only one letter) for certain categories of data. For example, "G" or "G_" could prefix a global variable name, "user_" or "u_" could prefix a "user" data structure, and so on.

A.5.3 Indentation, Nesting and Other Visual Criteria

For internal formatting and indenting style, the following guidelines should be adhered to:

All C code must be indented, one (1) tab stop for each indentation level. Statements at the same logical nesting level should be at the same physical indentation level; and statements at deeper logical nesting levels should be indented more. For this guidebook, the keyword else is regarded as having the same nesting level as the if to which it corresponds.

For example, the following indentation practices should be used:

if (condition)
	statement;
else
	statement;

if (condition) for (init; test; step) statement;

Note that the following indentation practices are NOT permitted:

if (condition)
statement;

if (condition) statement; else statement;

if (condition) for (init; test; step) statement;

There should be only one statement per line.

Blank lines should be used between blocks of code that are functionally distinct (e.g. to separate large control blocks).

Several blank lines (2 or 3) should separate individual functions.

A blank line should separate the declarations and the code in a function.

Comments must be lined up systematically with the code. The are several accepted styles for lining up comments.

a. a short comment can be on the same line as the code, at the end of the line, and tabbed over so that it appears on the right side of the printout page.

b. standalone comments and comment blocks must precede the code that is being commented.

c. standalone comments and comment blocks can be indented at the same level as the code in which they are contained. This is the prevalent "conventional" C commenting style. This clearly associates comments with their logical nesting level.

d. for emphasis, standalone comments and comment blocks can sometimes be indented either a little more or a little less (1 or 2 spaces) than the code in which they are contained.

It is important to be consistent in one's commenting style.

Statement blocks should be less than one (1) page long.

Braces delineating blocks must be in one of the following styles:

	(A)	if (exp)
		{
			code
		}
or
(B) if (exp) { code }

Parentheses and braces should be used whenever they make an expression or statement easier to read.

Blank spaces should be used anywhere where spacing improves readability.

Commas and semicolons must always be followed by a blank space except at the end of a line.

In expressions, operators should be surrounded by blanks (but not operators that compose primaries, specifically ".", "->"; and not after unary operators).

A space must be used between a reserved keyword and its opening parentheses; for example, use "if (condition)" rather than "if(condition)".

No spaces should be used between a function name and its opening parentheses. In a sense, the parentheses belongs to the function, and so is bound to it. This is not true of keywords, where the parentheses belongs to the expression following the keyword.

No deep nesting (greater than 4 levels deep) should occur.

Nesting the ternary conditional operator ( ? : ) is not permitted.

The format of the switch statement should observe the following guidelines:

a. a switch statement and every case in it (including default) should be preceded by a blank line when this improves readability (small switches may ignore this rule). Multiple case labels on a single block of code should be on separate lines, but they should not be separated by blank lines.

b. the case statements in a switch should either be at the same indentation level as the switch, or indented 2 spaces. Using 4 space indentation should be avoided as this creates readability problems when tabs are expanded to only 4 spaces in some printouts.

c. The C code after a case must be indented exactly one (1) tab stop from the switch statement.

A.5.4 Special Statement Structures

Other suggestions for improving the readability of one's C code are:

Fields of #defines and #includes should line up. For example:

#define	THIS		1
#define	WHATEVER	2

and not:

#define	THIS 1
#define	WHATEVER 2

Side effects within expressions should be used sparingly. It is recommended that no more than one operator with a side effect (=, op=, ++, --) appear within an expression. Function calls with side effects count as operators for this purpose.

In an if-else chain, the form:

if (condition)
	statement;
else if (condition)
	statement;
else if (condition)
	statement;

should only be used when the conditions are all the same basic type (such as, testing the same variable against different values), and the conditions involved are mutually exclusive. If the conditions are qualitatively different, the additional if statements should start on new lines, indented, as in:

if (cond1)
	statement;
else {
	if (cond2)
		statement;
	else
		statement;
}

The switch statement should be used instead of if --- else if --- chains whenever tests for equality of scalar expressions are being made. This is both more readable, and when compiled, produces more efficient code.

In the condition portion of an if, for, while, etc., side effects that extend beyond the guarded statement block should be minimized. That is, in a statement like:

if ((c = getchar()) != EOF)
{
	guarded-stmts
}
other-stmts

it is natural to think of the variable "c" being bound to a value only within the guarded-stmts. It should not be used in the other-stmts. Use of a variable set or modified in a condition, outside the range of the statements guarded by the condition, is distracting.

The use of || and && with right hand operands having side effects is discouraged. Also, whenever || and && are mixed in the same expression, parentheses should be used for clarity.

A.6 Portability of Programs

Use lint to check the portability of your code.

Adherence to data type compatibility should be practiced where reasonable. This can be simplified by liberal use of C's typedef facility and explicit casting of types.

The following violation of strict adherence is permitted: a function that returns a pointer to a structure whose format need not be known outside that function may return a "generic" pointer, of type (char *). Note that (char *) is specifically chosen because the C language guarantees that any pointer may be converted to a (char *) and back again without harm.

Liberal use of #defines should eliminate magic numbers, whether machine dependent or implementation dependent or arbitrary/random.

One should use #defines for configuration parameters such as pathnames of directories or data files.

The sizeof operator should be used to determine machine dependencies whenever they might affect the resulting code.

If one's code depends on using integer (int) variables with at least 32 bits of precision, then they should explicitly be declared as long.

A.7 Example Main Function File

static char rcsid[] = "$Header$"; /*
*	Project:		project name 
* 
*	Program:	eg - example program 
*	File: 		eg.c - main function file 
*	Purpose:	example main function * 
*
*	Synopsis (Usage and Parameters): 
* 
*		eg [-dv] [-o outfile] infile ... 
* 
*	Description: 
*		Place text here to describe what this program does. 
* 
*	Programmer:	Joe Hacker 
*	Organization:	JHU/APL 
*	Host System:	UNIX 4.2 BSD 
*	Language:	C 
*	Date Created:	mm/dd/yy 
*	Modifications: 
* 		$Log$ *
/
# include <stdio.h> 
# include <other.h> 
# include <local/ourcode.h>
# include "mydefs.h"

/* Some Constants (could put in ``mydefs.h") */ 
# define ON	1 
# define OFF	0

/* Program Global Variables */ 
int Debug	= OFF; 		/* debug mode ON/OFF flag */ 
int Verbose	= OFF; 		/* verbose mode ON/OFF flag */ 
char *Progname = NULL; 	/* program name */

/* System Extern Global Variables */ 
extern int errno;

main(argc, argv) 	/* this is an example main function */ 
int argc; 
char **argv; 
{
	/* local parameter declarations */ 
	int	c; 
	char	*ofile = NULL;		/* output file name */ 
	extern int optind, opterr;	/* used by getopt() */
	extern char *optarg; 		/* used by getopt() */ 
	char	*rindex();

/* get program name from command line */ if ((Progname = rindex(*argv, '/')) == NULL) Progname = *argv; else Progname++; /* command line option argument processing */ opterr = 0; /* suppress getopt() generated messages */ while ((c = getopt(argc, argv, "do:v")) != EOF) switch (c) { case 'd': /* turn on debug mode */ Debug = ON; break; case 'o': /* next arg is output file */ ofile = optarg; break; case 'v': /* turn on verbose mode */ Verbose = ON; break; default: /* there is an invalid option */ usage();/* so: print usage message */ exit(1);/* and exit */ } /* open output file if there is one */ if (ofile && (freopen(ofile, "w", stdout) == NULL)) { perror( ofile ); exit(errno); } /* process input files */ for ( ; optind < argc; optind++) process_input( argv[optind] ); exit(0); } usage() /* print a "usage" error message */ { fprintf(stderr, "Usage: %s [-dv] [-o outfile] infile ...\n", Progname); }