Proposal: [154] XML Forth Standard - migration from LaTeX to DocBook

Retired

This page is dedicated to discussing this specific proposal

ContributeContributions

GeraldWodniavatar of GeraldWodni [154] XML Forth Standard - migration from LaTeX to DocBookProposal2020-09-01 21:16:26

Author:

Peter Knaggs

Problem & Solution:

The author, who is also the editor of the Forth Standard is considering migrating from LaTeX to XML. The idea being that XML is easier to parse for machines while maintaining being editable by humans. Please read the proposed PDF. More material including DTD, TEX, HTML and the XML example below.

Tools:

I have been thinking of using either XML Notepad or XXE (XMLmind XML Editor) as the editor environment and move the whole standard into DocBook. That way I get PDF, XHTML and EPUB with very little work.

Feedback:

At this stage the author is asking for feedback:

  • How do you like the XML definition for words?
  • Would your system/documentation also output this XML definitions for its own words?
  • Any other related feedback?

Example Code:

\<wordlist\>
<worddef name="DOES&gt;" id="core:DOES" number="1250" wordlist="CORE" english="does">
    \<description\>
        \<interpret\>
            Interpretation semantics for this word are undefined.
        </interpret>

        \<compile\>
            <stack type="C">
                \<pre\>colon-sys_1</pre>
                \<post\>colon-sys_2</post>
            </stack>
            \<para\>
                Append the run-time semantics below to the current
                definition.
                Whether or not the current definition is rendered
                findable in the dictionary by the compilation of
                <word word="core:DOES" /> is implementation defined.
                Consume \<param\>colon-sys_1</param> and produce
                \<param\>colon-sys_2</param>. Append the initiation
                semantics given below to the current definition.
            </para>
        </compile>
        \<runtime\>
            \<stack\></stack>
            <stack type="R">\<pre\>next-sys_1</pre></stack>
            \<para\>
                Replace the execution semantics of the most recent
                definition, referred to as \<param\>name</param>, with
                the \<param\>name</param> execution semantics given
                below. Return control to the calling definition
                specified by \<param\>nest-sys_1</param>. An ambiguous
                condition exists if \<param\>name</param> was not
                defined with <word word="core:CREATE" /> or a
                user-defined word that calls <word word="core:CREATE"/>.
            </para>
        </runtime>
        \<init\>
            \<stack\>
                \<pre\>i*x</pre>
                \<post\>i*x a-addr</post>
            </stack>
            <stack type="R">
                \<post\>next-sys_1</post>
            </stack>
            \<para\>
                Save implementation-dependent information
                \<param\>nest-sys_2</param> about the calling definition.
                Place \<param\>name</param>'s data field address on the
                stack. The stack effects \<param\>i*x</param> represent
                arguments to \<param\>name</param>.
            </para>
        </init>
        <execute type="name">
            \<stack\>
                \<pre\>i*x</pre>
                \<post\>j*x</post>
            </stack>
            \<para\>
                Execute the portion of the definition that begins with
                the initiation semantics appended by the
                <word word="core:DOES" /> which modified
                \<param\>name</param>. The stack effects \<param\>i*x</param>
                and \<param\>j*x</param> represent arguments to and
                results from \<param\>name</param>, respectively.
            </para>
        </execute>
        \<see\>
            <wref word="core:CREATE" />
        </see>
    </description>
    \<rationale\>
        \<para\>
            Typical use:
            \<c\>: X ... DOES&gt; ... ;</c>
        </para>\<para\>
            Following <word word="core:DOES" />, a Standard Program
            may not make any assumptions regarding the ability to find
            either the name of the definition containing the
            <word word="core:DOES"/> or any previous definition whose
            name may be concealed by it. <word word="core:DOES" />
            effectively ends one definition and begins another as far
            as local variables and control-flow structures are
            concerned.
            The compilation behavior makes it clear that the user is
            not entitled to place <word word="core:DOES"/> inside any
            control-flow structures.
        </para>
    </rationale>
    \<testing\>
        \<test\>\<pre\>: DOES1 DOES&gt; @ 1 + ;</pre>\<post\></post></test>
        \<test\>\<pre\>: DOES2 DOES&gt; @ 2 + ;</pre>\<post\></post></test>
        \<test\>\<pre\>CREATE CR1</pre>\<post\> </post></test>
        \<test\>\<pre\>CR1  </pre>\<post\>HERE</post></test>
        \<test\>\<pre\>1 ,  </pre>\<post\> </post></test>
        \<test\>\<pre\>CR1 @</pre>\<post\>1</post></test>
        \<test\>\<pre\>DOES1</pre>\<post\> </post></test>
        \<test\>\<pre\>CR1  </pre>\<post\>2</post></test>
        \<test\>\<pre\>DOES2</pre>\<post\> </post></test>
        \<test\>\<pre\>CR1  </pre>\<post\>3</post></test>

        \<test\>\<pre\>: WEIRD: CREATE DOES&gt; 1 + DOES&gt; 2 + ;</pre>\<post\></post></test>
        \<test\>\<pre\>WEIRD: W1</pre>\<post\></post></test>
        \<test\>\<pre\>' W1 &gt;BODY</pre>\<post\>HERE   </post></test>
        \<test\>\<pre\>W1       </pre>\<post\>HERE 1 +</post></test>
        \<test\>\<pre\>W1       </pre>\<post\>HERE 2 +</post></test>
    </testing>
</worddef>
</wordlist>

AntonErtlavatar of AntonErtl

How do you like the XML definition for words?

I like the tags used, very fitting for the task. What I am wondering about: If something new comes up, like the "TO name semantics" in Forth-2012, how easy is it to add that?

What I don't like are general problems of XML: extreme verbosity, and the need to escape > and < (probably also &); this is especially noticable in the testing section of the example. But you are the editor, so it's your decision, and if the next editor wants something better, the XML format allows automatic conversion to the next format (but it should also happen if the reverse direction is also possible, i.e., no information is lost).

Would your system/documentation also output this XML definitions for its own words?

Probably possible, but for now I don't see a benefit.

Any other related feedback?

How does this format cope with showing changes?

ruvavatar of ruv

extreme verbosity,

At the first glance, verbosity can be slightly reduced. E.g., the fragment:

\<compile\>
            <stack type="C">
                \<pre\>colon-sys_1</pre>
                \<post\>colon-sys_2</post>
            </stack>
            \<para\>
                Append the run-time semantics below to the current
                definition.
                Whether or not the current definition is rendered
                findable in the dictionary by the compilation of
                <word word="core:DOES" /> is implementation defined.
                Consume \<param\>colon-sys_1</param> and produce
                \<param\>colon-sys_2</param>. Append the initiation
                semantics given below to the current definition.
            </para>

Can be expressed as:

<compiling cs="colon-sys_1 -- colon-sys_2">
  \<p\>
    Append the run-time semantics below to the current definition.
    Whether or not the current definition is rendered
    findable in the dictionary by the compilation of
    <w id="COREto"/> is implementation defined.
    Consume \<d\>colon-sys_1</d> and produce
    \<d\>colon-sys_2</d>. Append the initiation
    semantics given below to the current definition.
  </p>

The idea: use shorter names for frequent elements, and use attributes.

The attributes ds, rs, cs, fs — for the data stack, return stack, control-flow, and floating-point stack correspondingly. p — the same as in HTML for paragraph. The <w id="DOESto"/> element can be also written as \<w\>DOES&gt;</w>.

I don't sure concerning supporting such XML attributes in the DocBook format, but in any case it can be easily transformed into the required form.

the need to escape > and < (probably also &); this is especially noticable in the testing section of the example.

Testing can be expressed less verbosely too, e.g.:

\<testing\>
  \<test\>
    : DOES1 DOES&gt; @ 1 + ;
    CREATE CR1
  </test>
  \<test\>CR1 \<result\>HERE</result></test>
</testing>

But I don't see much sense to use XML markup for T{ ... -> ... }T construct, when we don't use XML markup for other Forth constructs. What is a rationale?

I could say, if we use classic Forth code for colon-definitions, let's use the classic code for testcases too. For plain text XML nodes we can use CDATA sections to avoid escaping of the special characters:

\<testing\><![CDATA[
    T{ : DOES1 DOES> @ 1 + ;  ->  }T
    T{ CREATE CR1  ->   }T
    T{ CR1  ->  HERE }T
]]></testing>

How does this format cope with showing changes?

If you mean showing diff between versions — there are several approaches

  • git diff that shows changes in the source code (as plain text);
  • something like xmldiff that takes into account XML format and structure (NB: git can use an external diff utility);
  • something like html-differ that compares HTML files (the results of rendering);
  • some own special tool (i.e. XSLT transformations) that takes into account some special things and renders result in XHTML.

MarcosCruzavatar of MarcosCruz

After trying other options, I use almost only Asciidoctor for any type of documentation, including books and manuals. It's an improved implementation and converter of the AsciiDoc markup, whose main goal is to represent all features of DocBook using a light and readable ASCII markup.

Currently an Asciidoctor document can be converted directly into DocBook, HTML, PDF, EPUB, manpage... Other conversions are possible from DocBook, for example using Pandoc.

I've been using Asciidoctor for years and I find it much easier and productive than working directly with XML, and much more powerful than any other markup I have tried.

ruvavatar of ruv

the need to escape > and < (probably also &)

Usually only ampersand and the left angle bracket should be escaped in XML, and the right angle bracket in the rare cases when it follows ]] (see also XML specification):

"&" -> "&amp;" 
"<" -> "&lt;"
"]]>" -> "]]&gt;"

So >R, R>, DOES>, etc, can be used without escaping.


I've been using Asciidoctor for years and I find it ... much more powerful than any other markup I have tried.

How "compiling" fragment above can be expressed in AsciiDoc?

MarcosCruzavatar of MarcosCruz

I think the code below would be an equivalent. The blocks, the paragraphs and the inline markup are marked with roles. The -- markup delimits a generic "open" block, but other specific blocks exist.

[.compile]
--

[.stack]
colon-sys_1 -- colon-sys_2

Append the run-time semantics below to the current definition. Whether or not
the current definition is rendered findable in the dictionary by the
compilation of [.word]`DOES>` is implementation defined. Consume
[.par]__colon-sys_1__ and produce [.par]__colon-sys_2__. Append the initiation
semantics given below to the current definition.

--

The resulting DocBook is the following:

<para role="compile">
<simpara role="stack">colon-sys_1&#8201;&#8212;&#8201;colon-sys_2</simpara>
\<simpara\>Append the run-time semantics below to the current definition. Whether or not
the current definition is rendered findable in the dictionary by the
compilation of <literal role="word">DOES&gt;</literal> is implementation defined. Consume
\<emphasis\><phrase role="par">colon-sys_1</phrase></emphasis> and produce \<emphasis\><phrase role="par">colon-sys_2</phrase></emphasis>. Append the initiation
semantics given below to the current definition.</simpara>
</para>

ruvavatar of ruv

Marcos, thank you for your example.

[.par]__colon-sys_1__

I can guess, in a case of several space separated lexemes like ( c-addr<sup>1 u<sup>1) they are written as [.par]__c-addr_1 u_1__, that in XML can be written as \<d\>c-addr_1 u_1</d>.

One problem is how to render colon-sys_1 (or better colon-sys.1) as colon-sys\<sup\>1</sup> (that is shown as colon-sys<sup>1). In the case of XML it is easily solved in the XSLT step that transforms the sources. In the case of AsciiDoc I can guess a post-processing XSLT step on the generated DocBook can be used.

Other questions are: nesting of different blocks, using indentation in the sources, support of folding in text editors. It seems, some of these features are supported with AsciiDoc, but with XML they seem to be supported better.

To me, XML for this purpose looks better than AsciiDoc. But my view perhaps is biased, since I use XML a lot, and never AsciiDoc (except sometimes Markdown that is similar to AsciiDoc).

MarcosCruzavatar of MarcosCruz

how to render (...) (that is shown as colon-sys<sup>1).

Asciidoctor source:

[.par]_colon-sys^1^_

By the way, the double (called unconstrained) underscores I used in my previous message were unnecessary. I used them because the parameter had an inner underscore, but the parsing works fine with ordinary single underscores to mark the emphasis.

Result in DocBook:

\<emphasis\><phrase role="par">colon-sys\<superscript\>1</superscript></phrase></emphasis>

In fact you can ommit the [.par] role, which is added only to identify the parameters in order to change their style.

nesting of different blocks,

You can have nested blocks by adjusting the length of their delimiters by a pair of extra characters. Example:

====
This is an example block.
====

****
This is a sidebar block.
****

====
This is an example block with a nested...

======
...example block.
======

====

Result in DocBook:

\<title\>Nested blocks</title>
\<informalexample\>
\<simpara\>This is an example block.</simpara>
</informalexample>
\<sidebar\>
\<simpara\>This is a sidebar block.</simpara>
</sidebar>
\<informalexample\>
\<simpara\>This is an example block with a nested&#8230;&#8203;</simpara>
\<informalexample\>
\<simpara\>&#8230;&#8203;example block.</simpara>
</informalexample>
</informalexample>

using indentation in the sources

Not sure what you mean, but indentation is parsed as a type of block:

----
This is a source code or keyboard input block.
----

....
This is an output text block.
....


  This is an output text block as well.

Result in DocBook:

\<title\>Indentation</title>
\<screen\>This is a source code or keyboard input block.</screen>
<literallayout class="monospaced">This is an output text block.</literallayout>
<literallayout class="monospaced">This is an output text block as well.</literallayout>

I don't use indented blocks. I prefer explicit markup for clarity.

support of folding in text editors

I use Neovim and Vim, and I fold the Asciidoctor headings using the default folding marks of the editor. Unfortunately Asciidoctor doesn't allow comments at the end of a line, so I have to add a line comment above each heading, repeating the title in order to see it when the section is folded:

// My second-level heading {{{1
== My second-level heading

// My third-level heading {{{2
=== My third-level heading

In theory I could configure the editor to fold by other criteria, including the indentation of the source itself, but I don't need it and didn't try.

An XML editor can fold at any tag, I suppose. The nature of the markup makes it easy.

To me, XML for this purpose looks better than AsciiDoc. But my view perhaps is biased, since I use XML a lot, and never AsciiDoc (except sometimes Markdown that is similar to AsciiDoc).

Of course. XML provides powerful transformation capabilities and other advanced features, using specialized programs. With AsciiDoc you can obtain an equivalent result, with some limitations, but with a much simpler toolchain, and writting the documents easily with any text editor. I mentioned this alternative just in case. If you already use XML a lot and know it well, it seems the way to go.

PeterKnaggsavatar of PeterKnaggs

Actually the _1 is supposed to be a subscript, I got it wrong when I wrote the LaTeX to HTML translator. _1 should come out as <sub>1 _10 would translate as <sub>10 _{10} would translate as <sub>10 In LaTeX ^ is for superscripts.

GeraldWodniavatar of GeraldWodni

This proposal will be retired as no immediate action is required.

It should however serve as a template for a future editor who wants to migrate to XML, so they do not need to start from scratch.

Retired
Reply New Version