Web 68

The central problem of documenting software is keeping the documentation up-to-date. It is accepted by most software engineers that documentation is an unfortunate necessity. It is probable that this general attitude is engendered by enormous documents which are separate from the code they describe. With this attitude, some way of closely relating documentation to the software it describes is very desirable. A literate programming system provides a means of interleaving documentation with source code so that the reader is presented with small amounts of documentation for small amounts of code - the ancient Roman maxim of "divide and conquer" seems peculiarly relevant here.

The Web 68 literate programming system not only provides a means of keeping system documentation up-to-date, but also makes it easier to develop large programs. One of the bugbears of modern compilation systems is the tendency to write large programs as a collection of small files. This causes considerable complexities in the code, never mind in the documentation of such code, which, sadly, is rarely provided. Writers of large programs written in a high-level programming language tend to write large files. Keeping track of identifiers and similar syntactic constructs can be facilitated by a literate programming system. Web 68 provides extensive cross-referencing of identifiers, mode indicants and operator indicants which help the reader to keep her bearings. Coupled with a powerful macro facility, Web 68 makes it easier to write programs in sections where each section is accompanied by its own documentation.

Ideally, one would like to use HTML to produce the documentation and Algol 68 to write the program. Web 68 provides a means of combining both HTML and Algol 68 as a single document which can be processed by two programs, tang and weav, to produce Algol 68 and HTML code respectively: the former as source code ready for compilation, the latter ready for reading with a Web browser.

Web 68 is firmly based on the Web System of Structured Documentation created by Donald E. Knuth for writing his TeX and Metafont typsetting system. In accordance with Algol 68 practice, the original programs tangle and weave have been abbreviated to tang and weav respectively. However, you should note that both tang and weav have been rewritten from scratch in Algol 68 so, although they have similar functionality to the original programs, they provide that functionality using different algorithms.

2. Structure of this document

This document contains the specification of the Web 68 literate programming system.

3. Web 68

A Web 68 file is a long string of text which has been divided, more or less arbitrarily, into individual lines. The exact line boundaries are not critical, so that you can chop up the input file in any way you wish. The end of a line is regarded by weav as a blank space. However, Algol 68 program source code is very flexible, so weav will try to respect your layout.

Web 68 contains its own command language which is described in the following sections. Subsection commands gives the definitive list of commands (there are no undocumented commands). Thus, to write a program using Web 68, you must be familiar with three languages:-

HTML for the documentation itself
Algol 68 for the source code
Web 68 for weaving the whole together

In practice, error messages from the Web 68 programs and the Algol 68 compiler or interpreter can occur. Thus this system is not intended for the beginner and never will be. Consult the documentation for your Algol 68 compiler or interpreter for the meaning of the error messages. See section errors for the location of error messages from tang and weav.

As you write a program using Web 68, you should be aware both of the documentation and of the program you are writing: that is, you should be aware of the different actions that tang and weav will perform on your document. See sections tang and weav for an overview of what each program does.

A Web 68 document consists of a number of sections, each of which starts with a section command and can contain an HTML part, a definition part or an Algol 68 part. The definition part or the Algol 68 part are optional: a section command can be followed immediately by another section command.

3.1 Filenames

A Web 68 file is expected to have a file extension of .w68. Included Web 68 files may have any extension or even no extension. However, I recommend you keep to some kind of naming system so that files intended for inclusion can be recognised as such without glancing at their contents.

The output files from weav will have a file extension of .html and the output file from tang will have a file extension of .a68. The latter is an intermediate file which should be deleted as soon as it has been processed successfully by the Algol 68 compiler. The Web 68 file contains the definitive source code of the program.

3.2 Program Usage

This section describes how a Web 68 document can be converted into both structured documentation and an Algol 68 program.

Use a plain text editor to prepare the Web 68 document.
Production of the formatted text.
1. Use weav to produce an HTML output file.
2. Use a web browser to view the document on your screen.
Production of the Algol 68 program.
1. Use tang to produce the Algol 68 program.
2. Use your Algol 68 compiler or interpreter to produce the Algol 68 binary program.

The next two sections give an overview of the actions of tang and weav together with details of their arguments.

3.2.1 Overview of tang

The simplest call of the program is

tang filename

tang presumes that the file extension is .w68.

Options are:-

-d: Output debugging information to tang.dbg.
-h: Display a usage message only.
-v: Display the version of the program only.
-w: Specify a directory to search for Web 68 include files. The argument following the "-w" should be a directory name. There should be no spaces between the "-w" and the directory name.

See section input for details of input and output filenames.

tang works roughly as follows:-

It reads the Web 68 source file, skipping the HTML part of each section, but tokenising the macro definitions and all the Algol 68 code in the modules. Wherever Algol 68 code is specified as being in an already existing module, whether tagged or untagged, it is simply tacked onto the end of that module, together with the section number in which the code appears.
The untagged module is then output, token by token.
Wherever a token is a module tag, the tag is replaced by the text in that module.
Wherever a token is a macro, the macro is replaced by the body of the macro. If the macro has parameters, the actual parameters are scanned and used to replace the formal parameters in the macro body.
The process of expanding macros and modules bodies continues until the output text consists of plain tokens.
The output lines are filled to 80 characters as much as possible, although string denotations are not split across line boundaries.
The output text contains comments which specify the section in which each piece of code occurs:-

Starting comment

#section.subsection.subsubsection.paragraph:#

Finishing comment

#:section.subsection.subsubsection.paragraph#

Finishing and starting comment

Instead of outputting a finishing comment followed by a starting comment, the two comments are concatenated to form a single comment of the form #:1.1.2.1 3.1.4.2:#.

3.2.2 Overview of weav

Yet to be written.

3.3 Algol 68 code

Algol 68 source code can occur in three different places:-

Between bangs in the HTML part
In macro bodies in the definition part
After a module tag in the Algol 68 part

Wherever the Algol 68 code appears, you should remember to double all @ symbols. For example:-

s:=t[pos+1:@@pos+1]

This also applies to string denotations, viz:-

STRING @!title = "Epsilon @@ Lugaru";

Notice that the @ symbol just before title has not been doubled because it is part of a Web 68 command.

3.3.1 Identifiers

Because identifiers in Algol 68 can contain white-space, it is possible for you to key an identifier differently on two separate occasions in the Web 68 file. This is not a problem for tang, because it deletes all white-space from the identifier before storing it. However, this could lead to problems with weav. The policy has been adopted of outputting the identifier in the way it was first keyed. Thus the two identifiers

fl bgn form and flbgn form

will appear in the index as fl bgn form. However, the identifiers will be compared as flbgnform (all white-space deleted).

3.3.2 In the HTML section

In the HTML part of each section, scraps of Algol 68 code should be enclosed in exclamation marks (bangs). In this manual, I refer to such scraps, excluding the bangs, as snippets. For example, you might refer to a mode as !REF INT!. The part between the bangs will be formatted as Algol 68 code. In this case, the upper case characters will be made lower case and font switching HTML tags issued to ensure that the mode is printed in bold-face. Any Algol 68 code, and any amount of code, can be placed between the bangs, but you should ensure that each such snippet can fit on a single line of text because the normal formatting involving newlines will not be used.

3.3.3 In macro bodies

All the text following the = symbol at the end of a macro header should be Algol 68 code. You can insert comments wherever you want in the macro body. They will be ignored by tang and formatted by weav as HTML code.

3.3.4 In module bodies

Most of the Algol 68 code will be in module bodies. Because Algol 68 is such a flexible language, it is impossible to specify a formatting style which will suit all programmers. Accordingly, a number of Web 68 commands are provided to help weav format the Algol 68 code. You should not use these until you have seen what weav does with your source code without them. Generally speaking, weav will format your Algol 68 code as you specify it.

Comments can be put into Algol 68 code in module bodies, just as in macro bodies. They will not appear in the output from tang.

3.4 Limbo

qqq

Any text preceding the first section of a Web 68 document is said to be in limbo. Allowable text consists of

HTML commands or comments.
Algol 68 source code, to be output directly, placed between @= and @>.
The Web 68 command "@\" which instructs tang to output a newline in the output code.

Any HTML commands which would normally appear in the preamble of an HTML input file should be placed here.

As an example, here is the limbo for an Algol 68 script to be elaborated by the Algol 68 interpreter a68g:

@=#!/usr/bin/a68g@>@\

The tang program processes this line by outputting #!/usr/bin/a68g immediately followed by a newline.

The weav program outputs the line as HTML containing formatting commands. However, the code between the Web 68 commands @= and @> is not subjected to the cross-referencing system. Text in limbo should be restricted to data extraneous to the Web 68 system.

3.5 Sections

HTML provides a set of heading commands for dividing a document into a hierarchy: each section at the next lower level is numbered from 1. Thus, the number 2.4.1 refers to the first subsubsection of the fourth subsection of the second section.

The numbering of sections in Web 68 is handled automatically by both tang and weav. All you need to remember is that each section is either a level section or a plain section.

3.5.1 Level Sections

A level section command specifies the section header for HTML as well as the section level. The command should be followed by the section heading followed by a full-stop. The full-stop will be omitted from the heading when it is formatted by weav. The heading will appear both in the table of contents and in each page, so you should be careful about what to put into such a header.

The following table shows the Web 68 code in the first column and the code output by weav in the second column:-

Web 68 code	Output HTML code
@1Heading 1.	<h1>Heading 1</h1>
@2Heading 2.	<h2>Heading 2</h2>
@3Heading 3.	<h3>Heading 3</h3>

The text for the section should follow on the next line, not preceded by any spaces.

For example:-

@1Introduction.
This program ...

Level sections introduce major parts of the program. Level 1 sections are disallowed in included files to ensure the integrity of the indexes output by weav.

When tang and weav meet a level sectioning command, they both output the current section to the console in the form

[section.subsection.subsubsection.paragraph]

unless the current file is an included file when the current section will appear as

[Ifile number.subsection.subsubsection.paragraph]

So the second subsection of the third section will appear on the console as [3.2.1.1].

3.5.2 Plain Sections

A plain section is preceded by the "@ " command (an @-symbol followed by a space). The section is numbered at the paragraph level, but has no header to otherwise identify it so its HTML part can directly follow the section command.

For example:-

@ This procedure ...

Plain sections describe successively the details of the major parts. Neither tang nor weav output anything to the console when plain sections are being processed (unless there are errors or warnings, of course).

The HTML part

This part starts immediately after the section command and contains HTML code intermingled with references to parts of the Algol 68 code (see section snippets). The latter are delimited by the exclamation mark (also referred to as bang). The text will be passed unchanged to the output file by weav (apart from the snippets which will be formatted as Algol 68 source code) and completely ignored by tang. Here is an example from the uregex.w prelude:-

@ The library is initialised by !rx init! which takes as its only
parameter a row of pairs of mode !STRUCT(INT ind,UCS val)!. A null row
specified as !()! may also be given if the default meta-characters are
acceptable. The routine yields !TRUE! if the parameters are
satisfactory.

The HTML part can contain any HTML code including mathematics. This example has been taken from the charbag.w prelude:-

@ (2) b^b_i c^e_s the beginning of b_i lies within c_s. There are two cases:-

(2.1): b^e_i c^e_f The beginning ...

3.7 The definition part

This part contains macro definitions and is introduced by the Web 68 commands @d or @m.

3.7.1 Defining Macros

Web 68 provides two kinds of macros:

@m: Multi-macros which may be called many times.
@d: One-off macros which may only be called once (a declarative macro).

Both macros may have any number of parameters (including none).

The definition of a macro consists of four or five parts:

the Web 68 macro definition command ("@d" or "@m") followed by at least one space
the macro identifier (which looks like an Algol 68 identifier)
an optional formal parameter pack
an equals symbol ("=")
the macro body

The text

@m identifier = Algol 68 text

defines a simple macro, where the identifier will be replaced by the Algol 68 text when tang produces its output.

The text

@m identifier (p₁,...,p_n) = Algol 68 text

defines a parametric macro, where the identifier plus the actual parameters (in a parameter pack) will be replaced by the Algol 68 text and occurrences of the formal parameters p_i in that Algol 68 text will be replaced by the corresponding actual parameters.

The formal parameter pack looks just like the parameter pack of an Algol 68 routine. The formal parameters themselves can take the form of Algol 68 identifiers or mode or operator indicants and should be separated by commas as in an Algol 68 routine call. The macro body can contain any Algol 68 code, including calls of previously defined macros, but parentheses should be balanced. Note that "previously" in this case refers to previously in the Web 68 file. Any occurrence of the formal parameters in the text, as a lexical unit, will be replaced by the corresponding actual parameter when the macro is expanded. These two examples show a simple macro and a parametrised macro, both taken from tang:-

@m help status = 1
@m err print(msg)=(print nl; print out(msg); error(errors))

and here is a declarative macro taken from the forms.w prelude:-

@d macro x raise window =
PROC(DISPLAY,WINDOW)INT x raise window =
   ALIEN "XRAISEWINDOW"
   "#define XRAISEWINDOW(dpy,win) \"
   " XRaiseWindow((void *)dpy,win)";

Because declarative macros can only be called once, they usually do not have parameters.

It should be noted that macros have to be recognised during tang's first phase, so every macro must be defined before it is used. You should not call a macro recursively. Macros are expanded by tang in its second phase.

Macros are only operational for tang. weav simply regards them as Algol 68 code. However, the macro identifier will appear in the cross-reference index.

3.7.2 Using Macros

Macros are used by calling them, just like Algol 68 procedures. The only restriction on the actual parameters of a macro is that any parentheses should be balanced. Furthermore, if you want to include a comma in an actual parameter, it should be surrounded by parentheses because the actual parameters, if there are more than one, are themselves separated by commas. The actual parameters can include or consist of calls to other macros. Any macros in the body of the macro must have been declared before calling them. This is becaue weav actions the macros in its first pass. tang, on the other hand, reads all the macros in the input files into memory before actioning them.

If a declarative macro is called more than once, the second and subsequent calls are replaced by SKIP instead of the macro body.

Because each token is considered in turn, the following definitions and call will produce an error message:-

@m arg = (p)
@m identity(p1) = p1
@a identity ## arg

The ## between identity and arg serve to separate the two words into two lexical entities because otherwise they would be regarded as part of a single identifier. The error is caused by identity being expanded first: identity requires a parameter pack, but the next token is not "(", but arg which has not yet been expanded. Only when the macro call has been expanded will arg be expanded. This will produce the error message

! Error: call of identity not followed by "(".

This brings us to a useful wrinkle associated with macro definitions and calls.

In the next macro, the body of the macro contains two identifiers which are split into two lexical tokens by ## enabling the actual parameters, provided that they are just identifiers, to be used to construct two identifiers.

@m mcheck col(col1,col2) =@/
  IF NOT q##col1 OR NOT q##col2
  THEN@/
    []CHAR s="fl set object color";
    put(outf,
        ("  ",s,"(",obj ident,",",
         CA mov(gen##col1),",",CA mov(gen##col2),");",
         newline));
    add mac(s)
  FI

3.8 The Algol 68 part

The Algol 68 program is written as modules. The outermost module is the untagged module and is introduced by the Web 68 command @a (a for Algol 68).

Tagged modules are preceded by a module tag, consisting of HTML code which appears between @< and @>.

Module tags should be a good description of the contents of the module. If you are tempted to write a comment inside Algol 68 which will explain what the following lines of code do, consider writing the comment as a module tag and putting the lines of code in their own section.

Multiple spaces and tabs in module tags are reduced to a single space or tab and preceding and following spaces are removed. So the module tag

@<Print input buffer error location@>

can be matched by

@< Print input buffer  error location @>

However, inputting such long tags is error prone, so Web 68 allows you to abbreviate them whereby you provide a unique prefix to identify the tag. Thus, the above tag could be abbreviated to

@<Print input error...@>

provided that no other module starts with those characters.

When you want to add actual Algol 68 source code to a module, you should append an "=" symbol to the command @>, viz:-

@<Compiler pre...@>=
PROGRAM tang CONTEXT VOID
USE @<Library preludes@> standard

You can add Algol 68 source code to any module as many times as you want.

You can place comments in the Algol 68 source code in a module. tang will ignore them and weav will format them, the text between the # or CO or COMMENT pairs being regarded as HTML code. Thus you can use text between bangs just as in the HTML part.

Modules are particularly useful when you have a long procedure: you can use a module tag instead of comments. For example, here is a procedure from tang which has the same piece of code occurring in two places. The actual code has already been specified, so both references are abbreviated. Notice how the module tags materially add to your understanding of the procedure:-

@<Input...@>=
PROC next char = CHAR:
IF input ended
THEN @<Check if included...@>; blank
ELIF loc OF web >= UPB b OF web
THEN
  WHILE input ln(web) & UPB b OF web = 0 DO SKIP OD;
  IF input ended
  THEN @<Check if included...@>
  FI;
  blank
ELSE (b OF web)[loc OF web+:=1]
FI;

In a sense, modules are like Algol 68 procedures with mode VOID, except that they do not have to be units.

3.9 Web 68 Commands

Every Web 68 command consists of the command introducer @ followed by a defining character. Some commands are followed by other text which is delimited by the concluding command @>. See section limbo for an example of the @= command. Such text is called "control text" and, like a string denotation, must end on the same line of the Web 68 file as it began. Furthermore, no Web 68 commands are allowed in a control text, not even @@. (Remember that Algol 68 allows AT instead of @)

The letters following each code indicate in which sections that code can be found:-

L: In limbo.
T: In the HTML part of a section.
M: In the definition part of a section.
A: In the Algol 68 part of a section.
C: In a comment.
S: In a string.

A tilde preceding one of these letters means that the control code ends the present part of the Web 68 file; for example, ~A means that this control code ends the Algol 68 part of a section.

@@ [A,C,L,M,S,T]: A double @ symbol denotes a single @ symbol. This is the only Web 68 command which is allowed in limbo, in comments and in strings.
@ [~L,~T,~A]: This denotes the start of a plain section. A tab character or the end of the line is equivalent to a space when it follows an @ symbol.
@1,@2,@3 [~L,~A,~T]: This denotes the start of a level section, that is, a section which begins a new major part of the Web 68 file. The title of the new part should appear after the @n n being from 1 to 3) followed by a full-stop and a newline. HTML control sequences should be avoided in titles unless they are quite simple: such as font switching commands. When tang and weav meet a level section, they print the number of that section on the console. The very first section should be an @1 section unless it is in an included file.
@a [~M,~T]: This denotes the start of a part of the untagged module. weav will format the Algol 68 code without a module tag at its start. tang will concatenate all the source code preceded by @a.
@d [~M,~T]: This denotes the start of a declarative macro and, therefore, the end of a previous macro.
@h [A]: The control text that follows, upto the next @> will be output by weav, but is ignored by tang.
@i [~L,~A,~M,~T]: The control text upto the next @> will be regarded by both weav and tang as the filename of a file to be included at that point. The file will be looked for in directories previously specified to the program (see sections tang and weav). Such a file is presumed to be in Web 68 format and should contain sections as described above. It will be read completely before both programs return to the current point in the text. Whether weav includes the include file in the current output is determined by an option when calling the program. The @i command may also occur in an included file. However, circular calls will not be honoured because both tang and weav keep a check on which files have been included.
@m [~M,~T]: This denotes the start of a multi-call macro and, therefore, the end of a previous macro.
@< [A,~T]: A module tag begins with this command followed by HTML text followed by the concluding command @>; the HTML text should not contain any Web 68 commands other than @@, unless these commands appear in snippets (see section snippets). The module tag may be abbreviated after its first appearance in the Web 68 file, by giving any unique prefix followed by ... where the three dots immediately precede the command @>. No module tag should be the prefix of another. Module tags may not appear in snippets, nor may they appear in the definition part of a module (because the appearance of a module tag signals the end of the definition part and the beginning of the Algol 68 part).
@^ [A,T]: The "control text" that follows, up to the next @>, will be inserted into the index together with the identifiers and indicants of the Algol 68 program. The text will appear in normal type. For example, to put "system dependencies" into the index, you can key @^system dependencies@> in each section that you want to index as system dependent.
@. [A,T]: The "control text" that follows, will be inserted into the index in typewriter type; see the rules for @^ which is analogous.
@! [A,T]: The section number in an index entry will be in bold if this command immediately precedes the identifier or control text being indexed. This convention is used to distinguish the sections where an identifier is defined, or where it is explained in some special way, from the sections where it is used. A reserved bold tag or an identifier of length one will not be indexed except for bold face entries. An implicit @! is inserted by weav after @d and @m; but you should insert your own @! before the declarations of modes, denotations, names, parameters and field selectors of STRUCTs and operators that are not covered by this implicit convention, if you want to improve the quality of the index that you get.
@= [A]: The "control text" that follows, upto the next @>, will be passed verbatim to the Algol 68 program.
@\ [A]: This command forces tang to break the line in the output Algol 68 source program here.
@/ [A]: This command forces weav to break the line at this point in the output HTML source file. Line breaks are chosen automatically by HTML depending on the size of a web browser's window, but sometimes you want to force a line break so that the program is formatted according to logical rather than visual criteria. You should only use this command after an Algol 68 unit. It is ignored by tang.
@, [A]: This command inserts a non-breakable space into weav's output; it is ignored by tang.

The last two commands have no effect on the Algol 68 program output by tang; they merely help to improve the readability of the HTML-formatted Algol 68 that is output by weav. Although Web 68 allows you to override the automatic formatting provided by weav, your best strategy is not to worry about such things until you have seen what weav produces automatically, since you will probably need to make only a few corrections when you are fine-tuning your documentation.

Because of the rules by which every section is broken into three parts, the commands @a, @d and @m are not allowed to occur once the Algol 68 part of a section has begun.

3.10 Notes

This section provides some additional features and warnings.

Lines containing HTML comments.
The HTML output by weav is broken into lines containing not more than 80 characters each. Any HTML comments are put on a new line, but its embedded newlines will be observed.
The weav output contains embedded codes that cause HTML to indent and break lines as necessary, depending on the fonts used and the page width of the reader's browser. For best results it is wise to adhere to the following restrictions:-
1. Comments in Algol 68 text should appear only after units; that is, after semicolons, after bold tags like THEN or DO, or before bold tags like END or OD.
2. Don't enclose long Algol 68 texts in snippets, since the indentation and line breaking codes are omitted when the snippet is translated from Algol 68 to HTML. Stick to simple expressions or units.
Comments and module tags are not permitted in snippets. After a ! signals the change from HTML text to Algol 68 text, the next ! that is not part of a string or control text ends the snippet.
Because an Algol 68 comment must begin and end with the same symbol, it is quite possible to nest a comment delimited by # with a comment delimited by co or vice versa. At present, tang and weav handle comments in different ways and it is necessary to satisfy both conventions: tang ignores ! characters entirely, while weav uses them to switch between HTML text and Algol 68 text.
Algol 68 bold tags must appear entirely in uppercase letters in the Web 68 file; otherwise their special nature will not be recognised by weav. You could, for example, have an end identifier or macro and it will not be confused with Algol 68's END.
Sometimes it is desirable to insert spacing into Algol 68 code that is more general than the non-breaking space provided by @,. The @h command can be used for this purpose; for example, @h @> will leave two non-breaking spaces.
weav and tang are both designed to work with a single input file, called a Web 68 file. Any changes required in the Web 68 file should be made with a version control system. There are no "change files" corresponding to the original Web system created by Donald Knuth.

3.11 Errors

All the error messages emitted by the two programs can be found in the indexes of tags and indicants of their respective programs.

Web 68

Contents