(Man Kam Kwong, kwong@mcs.anl.gov)

For an application see htxp.

1. Introduction

Word processors store formatting instructions of a document by using markups and store them along with the document text in a marked-up file. Most WYSIWYG word processors (a notable exception is Ami Pro) use extended-ascii characters as markups, which are not meant for the user to mess with. Non-WYSIWYG text formatting languages, such as TeX and HTML, use special ascii character sequences to form markups. A user prepares a file with an ordinary ascii editor. A program must then be used to process the file to produce the final manuscript. The rationale for using "non-WYSIWYGness" may be different in different cases. For instance, TeX uses this approach to achieve machine independence and gives the user precise and complete control over the formatting. On the other hand, HTML gives the user only control over the logical attributes of portions of the document and relies on the browser to interpret the attributes to determine the final appearance of the manuscript. We shall be dealing exclusively with text-based marked-up documents.

Typically, markup commands are embedded among the regular document text and are distinguished from the latter by the convention that they start with one of several designated characters. They are instructions telling the formatter what to do with the text. The text itself is free-flowing, with most implicit and conventional formatting ignored -- these include line breaks, blank lines, extra spaces, etc. Because of this, explicit instructions are needed to specify a lot of formatting detail, resulting in repetitive use of markups.

In this document we propose a generic all-purpose preprocessor "magxp" that allows a user to save time by using abbreviations and command shorthands to prepare a raw file, which is then expanded to a regular marked-up document of the specified type. All abbreviations and almost all "magxp" commands are user-modifiable. We will call them "magxp" macros collectively although they are not full-fledge programmable macros like those in TeX. A special subset of the command shorthands are designated as built-in; they should be specifically designed to suit the particular application in mind and general enough for all users.

The generality of "magxp" means that a user can use the same program to preprocess different kinds of documents including, in particular, plain ascii text files (such as emails, notes, and program codes). Different document types are processed by pointing to different initialization files. We refer to the processing of one particular document type by "magxp" as an application. One can develop a core collection of abbreviations to be used for all applications, and augment it with a more specialized set for each document type.

I have a prototype C version of "magxp" that implements most of the features discussed in this article, but what I am proposing here is more like a framework targeting for an ideal program.

In the following, I outline the various ingredients of the preprocessor, and the rationale for the specifications.

2. Objectives

Being cryptic and other issues

3. Initialization Files

"magxp" commands can be viewed as markups of yet another markup language. There is, however, one difference between "magxp" markups and conventional markups. Even though "magxp" macros always start with some special characters, it is not true that any string that starts with those character must be a macro. If a string of characters has not been defined as "magxp" macros, either by default as built-in commands or by the user, the string is treated as ordinary text.

Different document types are handled by pointing to different initialization files. We use the HTML document type as a concrete example of one such application.

4. Abbreviations

Some of the abbreviation features listed below are not attainable by the abbreviation features of "vi" or "emacs".

Abbreviation suppression

5. Built-in dot commands and ? abbreviations

Additional built-in 1-character abbreviations are used to provide shorthands for writing commonly used markup attributes, such as "?l" expands to "align="left"" and "?a" to "alt=""". These are also defined in "html.dcm".

6. User commands

7. Implementation

The C code of my implementation is actually contained in the same code used to produce the command "magicxp" that is used to construct "htxp". You also need the header file "magicxp.h". If the macro MAGXP is defined at the beginning of the code, then "magxp" is produced instead of "magicxp.c". The easiest way is to create the file "magxp.c" that contains only the following two lines: Then compile with You need some initialization files before the program can work. Retrieve the following sample initialization files: "html.dcm", "html.ab1", and "html.rc". Put these files in a directory called DIR, and make the alias Expanding a raw file "file" is done by using the command To expand only the abbreviations in "file" use the command

Suggestions for improvements and enhancements are always welcome.



Mathematics and Computer Science Division
Argonne National Laboratory
9700 S Cass Ave
Argonne, IL 60439