The central problem of documenting software is keeping the documentation up-to-date. It is accepted by most software engineers that documentation is an unfortunate necessity. It is probable that this general attitude is engendered by enormous documents which are separate from the code they describe. With this attitude, some way of closely relating documentation to the software it describes is very desirable. A literate programming system provides a means of interleaving documentation with source code so that the reader is presented with small amounts of documentation for small amounts of code - the ancient Roman maxim of "divide and conquer" seems peculiarly relevant here.
The Web 68 literate programming system not only provides a means of keeping system documentation up-to-date, but also makes it easier to develop large programs. One of the bugbears of modern compilation systems is the tendency to write large programs as a collection of small files. This causes considerable complexities in the code, never mind in the documentation of such code, which, sadly, is rarely provided. Writers of large programs written in a high-level programming language tend to write large files. Keeping track of identifiers and similar syntactic constructs can be facilitated by a literate programming system. Web 68 provides extensive cross-referencing of identifiers, mode indicants and operator indicants which help the reader to keep her bearings. Coupled with a powerful macro facility, Web 68 makes it easier to write programs in sections where each section is accompanied by its own documentation.
Ideally, one would like to use HTML to produce the documentation and Algol 68 to write the program. Web 68 provides a means of combining both HTML and Algol 68 as a single document which can be processed by two programs, tang and weav, to produce Algol 68 and HTML code respectively: the former as source code ready for compilation, the latter ready for reading with a Web browser.
Web 68 is firmly based on the Web System of Structured Documentation created by Donald E. Knuth for writing his TeX and Metafont typsetting system. In accordance with Algol 68 practice, the original programs tangle and weave have been abbreviated to tang and weav respectively. However, you should note that both tang and weav have been rewritten from scratch in Algol 68 so, although they have similar functionality to the original programs, they provide that functionality using different algorithms.
This document contains the specification of the Web 68 literate programming system.
A Web 68 file is a long string of text which has been divided, more or less arbitrarily, into individual lines. The exact line boundaries are not critical, so that you can chop up the input file in any way you wish. The end of a line is regarded by weav as a blank space. However, Algol 68 program source code is very flexible, so weav will try to respect your layout.
Web 68 contains its own command language which is described in the following sections. Subsection commands gives the definitive list of commands (there are no undocumented commands). Thus, to write a program using Web 68, you must be familiar with three languages:-
In practice, error messages from the Web 68 programs and the Algol 68 compiler or interpreter can occur. Thus this system is not intended for the beginner and never will be. Consult the documentation for your Algol 68 compiler or interpreter for the meaning of the error messages. See section errors for the location of error messages from tang and weav.
As you write a program using Web 68, you should be aware both of the documentation and of the program you are writing: that is, you should be aware of the different actions that tang and weav will perform on your document. See sections tang and weav for an overview of what each program does.
A Web 68 document consists of a number of sections, each of which starts with a section command and can contain an HTML part, a definition part or an Algol 68 part. The definition part or the Algol 68 part are optional: a section command can be followed immediately by another section command.
A Web 68 file is expected to have a file extension of .w68. Included Web 68 files may have any extension or even no extension. However, I recommend you keep to some kind of naming system so that files intended for inclusion can be recognised as such without glancing at their contents.
The output files from weav will have a file extension of .html and the output file from tang will have a file extension of .a68. The latter is an intermediate file which should be deleted as soon as it has been processed successfully by the Algol 68 compiler. The Web 68 file contains the definitive source code of the program.
This section describes how a Web 68 document can be converted into both structured documentation and an Algol 68 program.
The next two sections give an overview of the actions of tang and weav together with details of their arguments.
The simplest call of the program is
tang filename
tang presumes that the file extension is .w68.
Options are:-
tang works roughly as follows:-
Yet to be written.
Algol 68 source code can occur in three different places:-
Wherever the Algol 68 code appears, you should remember to double all @ symbols. For example:-
s:=t[pos+1:@@pos+1]
This also applies to string denotations, viz:-
STRING @!title = "Epsilon @@ Lugaru";
Notice that the @ symbol just before title has not been doubled because it is part of a Web 68 command.
Because identifiers in Algol 68 can contain white-space, it is possible for you to key an identifier differently on two separate occasions in the Web 68 file. This is not a problem for tang, because it deletes all white-space from the identifier before storing it. However, this could lead to problems with weav. The policy has been adopted of outputting the identifier in the way it was first keyed. Thus the two identifiers
fl bgn form and flbgn form
will appear in the index as fl bgn form. However, the identifiers will be compared as flbgnform (all white-space deleted).
In the HTML part of each section, scraps of Algol 68 code should be enclosed in exclamation marks (bangs). In this manual, I refer to such scraps, excluding the bangs, as snippets. For example, you might refer to a mode as !REF INT!. The part between the bangs will be formatted as Algol 68 code. In this case, the upper case characters will be made lower case and font switching HTML tags issued to ensure that the mode is printed in bold-face. Any Algol 68 code, and any amount of code, can be placed between the bangs, but you should ensure that each such snippet can fit on a single line of text because the normal formatting involving newlines will not be used.
All the text following the = symbol at the end of a macro header should be Algol 68 code. You can insert comments wherever you want in the macro body. They will be ignored by tang and formatted by weav as HTML code.
Most of the Algol 68 code will be in module bodies. Because Algol 68 is such a flexible language, it is impossible to specify a formatting style which will suit all programmers. Accordingly, a number of Web 68 commands are provided to help weav format the Algol 68 code. You should not use these until you have seen what weav does with your source code without them. Generally speaking, weav will format your Algol 68 code as you specify it.
Comments can be put into Algol 68 code in module bodies, just as in macro bodies. They will not appear in the output from tang.
Any text preceding the first section of a Web 68 document is said to be in limbo. Allowable text consists of
Any HTML commands which would normally appear in the preamble of an HTML input file should be placed here.
As an example, here is the limbo for an Algol 68 script to be elaborated by the Algol 68 interpreter a68g:
@=#!/usr/bin/a68g@>@\
The tang program processes this line by outputting #!/usr/bin/a68g immediately followed by a newline.
The weav program outputs the line as HTML containing formatting commands. However, the code between the Web 68 commands @= and @> is not subjected to the cross-referencing system. Text in limbo should be restricted to data extraneous to the Web 68 system.
HTML provides a set of heading commands for dividing a document into a hierarchy: each section at the next lower level is numbered from 1. Thus, the number 2.4.1 refers to the first subsubsection of the fourth subsection of the second section.
The numbering of sections in Web 68 is handled automatically by both tang and weav. All you need to remember is that each section is either a level section or a plain section.
A level section command specifies the section header for HTML as well as the section level. The command should be followed by the section heading followed by a full-stop. The full-stop will be omitted from the heading when it is formatted by weav. The heading will appear both in the table of contents and in each page, so you should be careful about what to put into such a header.
The following table shows the Web 68 code in the first column and the code output by weav in the second column:-
Web 68 code | Output HTML code |
@1Heading 1. | <h1>Heading 1</h1> |
@2Heading 2. | <h2>Heading 2</h2> |
@3Heading 3. | <h3>Heading 3</h3> |
The text for the section should follow on the next line, not preceded by any spaces.
For example:-
@1Introduction. This program ...
Level sections introduce major parts of the program. Level 1 sections are disallowed in included files to ensure the integrity of the indexes output by weav.
When tang and weav meet a level sectioning command, they both output the current section to the console in the form
[section.subsection.subsubsection.paragraph]
unless the current file is an included file when the current section will appear as
[Ifile number.subsection.subsubsection.paragraph]
So the second subsection of the third section will appear on the console as [3.2.1.1].
A plain section is preceded by the "@ " command (an @-symbol followed by a space). The section is numbered at the paragraph level, but has no header to otherwise identify it so its HTML part can directly follow the section command.
For example:-
@ This procedure ...
Plain sections describe successively the details of the major parts. Neither tang nor weav output anything to the console when plain sections are being processed (unless there are errors or warnings, of course).
This part starts immediately after the section command and contains HTML code intermingled with references to parts of the Algol 68 code (see section snippets). The latter are delimited by the exclamation mark (also referred to as bang). The text will be passed unchanged to the output file by weav (apart from the snippets which will be formatted as Algol 68 source code) and completely ignored by tang. Here is an example from the uregex.w prelude:-
@ The library is initialised by !rx init! which takes as its only parameter a row of pairs of mode !STRUCT(INT ind,UCS val)!. A null row specified as !()! may also be given if the default meta-characters are acceptable. The routine yields !TRUE! if the parameters are satisfactory.
The HTML part can contain any HTML code including mathematics. This example has been taken from the charbag.w prelude:-
@ (2) bbi ces the beginning of bi lies within cs. There are two cases:-
This part contains macro definitions and is introduced by the Web 68 commands @d or @m.
Web 68 provides two kinds of macros:
Both macros may have any number of parameters (including none).
The definition of a macro consists of four or five parts:
The text
@m identifier = Algol 68 text
defines a simple macro, where the identifier will be replaced by the Algol 68 text when tang produces its output.
The text
@m identifier (p1,...,pn) = Algol 68 text
defines a parametric macro, where the identifier plus the actual parameters (in a parameter pack) will be replaced by the Algol 68 text and occurrences of the formal parameters pi in that Algol 68 text will be replaced by the corresponding actual parameters.
The formal parameter pack looks just like the parameter pack of an Algol 68 routine. The formal parameters themselves can take the form of Algol 68 identifiers or mode or operator indicants and should be separated by commas as in an Algol 68 routine call. The macro body can contain any Algol 68 code, including calls of previously defined macros, but parentheses should be balanced. Note that "previously" in this case refers to previously in the Web 68 file. Any occurrence of the formal parameters in the text, as a lexical unit, will be replaced by the corresponding actual parameter when the macro is expanded. These two examples show a simple macro and a parametrised macro, both taken from tang:-
@m help status = 1 @m err print(msg)=(print nl; print out(msg); error(errors))
and here is a declarative macro taken from the forms.w prelude:-
@d macro x raise window = PROC(DISPLAY,WINDOW)INT x raise window = ALIEN "XRAISEWINDOW" "#define XRAISEWINDOW(dpy,win) \" " XRaiseWindow((void *)dpy,win)";
Because declarative macros can only be called once, they usually do not have parameters.
It should be noted that macros have to be recognised during tang's first phase, so every macro must be defined before it is used. You should not call a macro recursively. Macros are expanded by tang in its second phase.
Macros are only operational for tang. weav simply regards them as Algol 68 code. However, the macro identifier will appear in the cross-reference index.
Macros are used by calling them, just like Algol 68 procedures. The only restriction on the actual parameters of a macro is that any parentheses should be balanced. Furthermore, if you want to include a comma in an actual parameter, it should be surrounded by parentheses because the actual parameters, if there are more than one, are themselves separated by commas. The actual parameters can include or consist of calls to other macros. Any macros in the body of the macro must have been declared before calling them. This is becaue weav actions the macros in its first pass. tang, on the other hand, reads all the macros in the input files into memory before actioning them.
If a declarative macro is called more than once, the second and subsequent calls are replaced by SKIP instead of the macro body.
Because each token is considered in turn, the following definitions and call will produce an error message:-
@m arg = (p) @m identity(p1) = p1 @a identity ## arg
The ## between identity and arg serve to separate the two words into two lexical entities because otherwise they would be regarded as part of a single identifier. The error is caused by identity being expanded first: identity requires a parameter pack, but the next token is not "(", but arg which has not yet been expanded. Only when the macro call has been expanded will arg be expanded. This will produce the error message
! Error: call of identity not followed by "(".
This brings us to a useful wrinkle associated with macro definitions and calls.
In the next macro, the body of the macro contains two identifiers which are split into two lexical tokens by ## enabling the actual parameters, provided that they are just identifiers, to be used to construct two identifiers.
@m mcheck col(col1,col2) =@/ IF NOT q##col1 OR NOT q##col2 THEN@/ []CHAR s="fl set object color"; put(outf, (" ",s,"(",obj ident,",", CA mov(gen##col1),",",CA mov(gen##col2),");", newline)); add mac(s) FI
The Algol 68 program is written as modules. The outermost module is the untagged module and is introduced by the Web 68 command @a (a for Algol 68).
Tagged modules are preceded by a module tag, consisting of HTML code which appears between @< and @>.
Module tags should be a good description of the contents of the module. If you are tempted to write a comment inside Algol 68 which will explain what the following lines of code do, consider writing the comment as a module tag and putting the lines of code in their own section.
Multiple spaces and tabs in module tags are reduced to a single space or tab and preceding and following spaces are removed. So the module tag
@<Print input buffer error location@>
can be matched by
@< Print input buffer error location @>
However, inputting such long tags is error prone, so Web 68 allows you to abbreviate them whereby you provide a unique prefix to identify the tag. Thus, the above tag could be abbreviated to
@<Print input error...@>
provided that no other module starts with those characters.
When you want to add actual Algol 68 source code to a module, you should append an "=" symbol to the command @>, viz:-
@<Compiler pre...@>= PROGRAM tang CONTEXT VOID USE @<Library preludes@> standard
You can add Algol 68 source code to any module as many times as you want.
You can place comments in the Algol 68 source code in a module. tang will ignore them and weav will format them, the text between the # or CO or COMMENT pairs being regarded as HTML code. Thus you can use text between bangs just as in the HTML part.
Modules are particularly useful when you have a long procedure: you can use a module tag instead of comments. For example, here is a procedure from tang which has the same piece of code occurring in two places. The actual code has already been specified, so both references are abbreviated. Notice how the module tags materially add to your understanding of the procedure:-
@<Input...@>= PROC next char = CHAR: IF input ended THEN @<Check if included...@>; blank ELIF loc OF web >= UPB b OF web THEN WHILE input ln(web) & UPB b OF web = 0 DO SKIP OD; IF input ended THEN @<Check if included...@> FI; blank ELSE (b OF web)[loc OF web+:=1] FI;
In a sense, modules are like Algol 68 procedures with mode VOID, except that they do not have to be units.
Every Web 68 command consists of the command introducer @ followed by a defining character. Some commands are followed by other text which is delimited by the concluding command @>. See section limbo for an example of the @= command. Such text is called "control text" and, like a string denotation, must end on the same line of the Web 68 file as it began. Furthermore, no Web 68 commands are allowed in a control text, not even @@. (Remember that Algol 68 allows AT instead of @)
The letters following each code indicate in which sections that code can be found:-
A tilde preceding one of these letters means that the control code ends the present part of the Web 68 file; for example, ~A means that this control code ends the Algol 68 part of a section.
The last two commands have no effect on the Algol 68 program output by tang; they merely help to improve the readability of the HTML-formatted Algol 68 that is output by weav. Although Web 68 allows you to override the automatic formatting provided by weav, your best strategy is not to worry about such things until you have seen what weav produces automatically, since you will probably need to make only a few corrections when you are fine-tuning your documentation.
Because of the rules by which every section is broken into three parts, the commands @a, @d and @m are not allowed to occur once the Algol 68 part of a section has begun.
This section provides some additional features and warnings.
All the error messages emitted by the two programs can be found in the indexes of tags and indicants of their respective programs.