Scott S. Lawton...Automated Web Publishing...

No-Tags Markup

February 2005: This document was written in 1995 and modified only slightly since then. Meanwhile, a few people have adopted No-Tags Markup, and several similar systems have been developed. (I've added some links below.) Many are used to enter relatively small amounts of text into a blog editor, wiki, or content management system, which converts the text into HTML or XML (and perhaps from there to PDF, RTF, etc.). The opposite path is also useful: create an enhanced plain-text version from an original stored as XML or HTML.

Similar systems that appear to have an active base of users:

Plus a few that may not have caught on much beyond the original developer:


Creating formatted text without a markup language

Although the desktop publishing revolution began 10 years ago, we still have not left plain text behind. It has a few compelling advantages that are not likely to go away soon.

And yet, formatted text and integrated graphics generally make a document much easier to read. Lists can be set off, as above. Words can be emphasized, headlines made to stand out, and literal text marked appropriately. Plain text also cannot adequately capture the structure of a document, e.g. the section break that appears a few paragraphs below.

The rapid growth of the World Wide Web brings the problem into sharp focus. A Web browser shows nicely formatted text & graphics, but the underlying document is a text file that contains special HTML markup commands. (Graphics are stored as separate files, and placed by the browser.) The Web is a great vehicle for delivering new information rapidly, and providing access to broad & deep historical data.

How will this information, old and new, flow onto the web? What new tools have to purchased and learned? How can that information be flowed automatically from or to other places?

One answer is text. Writers can write, using any software on any computer. Text databases can be a source or result. E-mail can be captured, or text compilations can be e-mailed. But plain text is not a good answer, it's simply inadequate. Having writers learn HTML markup is also not a good answer. We need something new.




Note: Ian Feldman's setext addresses some of these issues. However, as discussed below, I think it is not an adequate solution.




No-Tags Markup

The No-Tags Markup (tm) system provides a simple way to "markup" text documents to indicate format and structure, without using formal "tags" from a traditional markup language such as HTML. Instead, a few unobtrusive characters are used to indicate format, e.g. _ (underscore) for italics and * (asterisk) for bold. Document structure is conveyed with a similar approach, e.g. a - (dash) for list items, or a line containing only "---" for a section break. These elements do not intrude on the text. In fact, the basic ones described so far are quite common in e-mail and other online communication.

The second aspect of No-Tags Markup is a simple way to handle line breaks. There are two competing goals: to let the reader's software wrap paragraphs to any desired width, and to enable the writer to indicate text that should not be wrapped (and also not formatted as a standard paragraph). As with traditional markup, No-Tags Markup uses two "new line" characters (carriage return, line feed or both depending on computer) to indicate a paragraph break. Unlike traditional markup, it treats one or more spaces or tabs at the beginning of a line as an indicator that the line should not be wrapped. (The "For more information" section at the end of this document is a real-world example.)

Finally, there are a few important types of format and structure that have no obvious or commonly-accepted indicators, including different levels of headers, blocked quotations (which should wrap, but appropriately) and blocks of text in a fixed-width typeface (e.g. for code or simple tables). _Because there are no existing ad hoc standards, this area of the draft specification is the most subject to change. Let me know what you think!_

The current No-Tags Markup specification is a draft, submitted to the online community for feedback. This entire, unmodified document may be freely distributed.

A parser for No-Tags Markup exists today. The "source" for many documents on this web site (including this document) are "No-Tags" text files.




Draft Specification

July 10, 1995 One brief aside: I dislike the extra newline that Netscape puts before a list, but I have not discovered a workaround.

Design goals

Character format

Lists

"No wrap" text

Special blocks

Structure

Hypertext Links

Images, Files

Notes on translation to HTML




Planned Features

Optional "setext" extensions under consideration

Features under consideration




Contrast to "setext"

No-Tags Markup shares with Ian Feldman's setex (structure-enhanced text) the important goal of marked up text that is easy to read, indeed where the markup either enhances the plain text itself or is almost invisible. However, I see two main problems with setex:
  1. the tags are not as straightforward as they could be
  2. many simple and useful elements (such as this numbered list!) are not included

Although I can see some reasons for setext's choice of tags, I disagree with the tradeoffs. Why invent ** for bold? A single * is simpler, more consistent, and more commonly used ad hoc in e-mail and other online communication. The current No-Tags Markup parser can differentiate the most common uses of an asterisk as a bullet vs. as bold; including bold at the beginning of the line. No absolute tradeoff is required. Limiting italic to single words is taking an editorial decision that should be left to authors. "===" for title and "---" for subhead may be fine for newsletters, but ignores their ad hoc use as section breaks before or independent of any heading. The requirement that the heading and "tag" have the same number of characters also increases the possibility of mistakes in creating a setext document (and incidentally complicates the parser). The extra potential for error may be acceptable for the newsletter editors that were (I believe) setext's original target; but is not satisfactory for more ad hoc authorship. The tag for hypertext link is simply strange: no instrinic meaning and difficult to distinguish from underlined text. It also fails to specify the link destination (or did I miss that?).

No-Tags Markup adds support for many common document elements that are missing from setext, including mono-spaced text (in context), plain and numbered lists, block quotes (that are not mono-spaced), and a block definition for mono-spaced lines. It also follows the typical ad hoc rule that indented text should not be wrapped.

The setext format is used by the popular TidBITS online newsletter, EFF's newsletter, MacWeek articles on ZiffNet, and probably elsewhere. For details on setext, check out:




Other text formats

I've found a few other interesting approaches to this problem:

[February 2005: newer solutions are listed at the top.]




Challenges

The underscore (_) and tilde (~) characters often appear in FTP "addresses" (URLs). When they appear in pairs within the same paragraph, either the parser must be sophisticated enough to understand the URL context, or the human author/editor must "escape" them (by doubling). I don't like either solution, but am also not ready to toss these characters.

For more information:

I use No-Tags Markup to maintain documents (such as this one) that can be e-mailed directly or automatically processed into Web pages using my StageThree system. Although this spec is a draft, I encourage you to use it where it fits your needs. If you have questions or comments, or scripts to share, please let me know.
Internet: ssl@prefab.com

The No-Tags Markup specification is Copyright 1995 by Scott S. Lawton. All Rights Reserved. No-Tags Markup is a trademark owned by Scott S. Lawton.

| Top |


Stage Three| Prev | Home | Next |No-Tags Markup Example.ntm

Updated on Dec 9, 2005 by Scott S. Lawton (ssl@prefab.com)

Copyright 1995-2005, Scott S. Lawton. All Rights Reserved.

This site (mostly) built and maintained using Stage Three, a set of custom Frontier scripts.