Proposal: XML Forth Standard - migration from LaTeX to DocBook

Informal

This page is dedicated to discussing this specific proposal

ContributeContributions

GeraldWodniavatar of GeraldWodni XML Forth Standard - migration from LaTeX to DocBookProposal2020-09-01 21:16:26

Author:

Peter Knaggs

Problem & Solution:

The author, who is also the editor of the Forth Standard is considering migrating from LaTeX to XML. The idea being that XML is easier to parse for machines while maintaining being editable by humans. Please read the proposed PDF. More material including DTD, TEX, HTML and the XML example below.

Tools:

I have been thinking of using either XML Notepad or XXE (XMLmind XML Editor) as the editor environment and move the whole standard into DocBook. That way I get PDF, XHTML and EPUB with very little work.

Feedback:

At this stage the author is asking for feedback:

  • How do you like the XML definition for words?
  • Would your system/documentation also output this XML definitions for its own words?
  • Any other related feedback?

Example Code:

<wordlist>
<worddef name="DOES&gt;" id="core:DOES" number="1250" wordlist="CORE" english="does">
    <description>
        <interpret>
            Interpretation semantics for this word are undefined.
        </interpret>

        <compile>
            <stack type="C">
                <pre>colon-sys_1</pre>
                <post>colon-sys_2</post>
            </stack>
            <para>
                Append the run-time semantics below to the current
                definition.
                Whether or not the current definition is rendered
                findable in the dictionary by the compilation of
                <word word="core:DOES" /> is implementation defined.
                Consume <param>colon-sys_1</param> and produce
                <param>colon-sys_2</param>. Append the initiation
                semantics given below to the current definition.
            </para>
        </compile>
        <runtime>
            <stack></stack>
            <stack type="R"><pre>next-sys_1</pre></stack>
            <para>
                Replace the execution semantics of the most recent
                definition, referred to as <param>name</param>, with
                the <param>name</param> execution semantics given
                below. Return control to the calling definition
                specified by <param>nest-sys_1</param>. An ambiguous
                condition exists if <param>name</param> was not
                defined with <word word="core:CREATE" /> or a
                user-defined word that calls <word word="core:CREATE"/>.
            </para>
        </runtime>
        <init>
            <stack>
                <pre>i*x</pre>
                <post>i*x a-addr</post>
            </stack>
            <stack type="R">
                <post>next-sys_1</post>
            </stack>
            <para>
                Save implementation-dependent information
                <param>nest-sys_2</param> about the calling definition.
                Place <param>name</param>'s data field address on the
                stack. The stack effects <param>i*x</param> represent
                arguments to <param>name</param>.
            </para>
        </init>
        <execute type="name">
            <stack>
                <pre>i*x</pre>
                <post>j*x</post>
            </stack>
            <para>
                Execute the portion of the definition that begins with
                the initiation semantics appended by the
                <word word="core:DOES" /> which modified
                <param>name</param>. The stack effects <param>i*x</param>
                and <param>j*x</param> represent arguments to and
                results from <param>name</param>, respectively.
            </para>
        </execute>
        <see>
            <wref word="core:CREATE" />
        </see>
    </description>
    <rationale>
        <para>
            Typical use:
            <c>: X ... DOES&gt; ... ;</c>
        </para><para>
            Following <word word="core:DOES" />, a Standard Program
            may not make any assumptions regarding the ability to find
            either the name of the definition containing the
            <word word="core:DOES"/> or any previous definition whose
            name may be concealed by it. <word word="core:DOES" />
            effectively ends one definition and begins another as far
            as local variables and control-flow structures are
            concerned.
            The compilation behavior makes it clear that the user is
            not entitled to place <word word="core:DOES"/> inside any
            control-flow structures.
        </para>
    </rationale>
    <testing>
        <test><pre>: DOES1 DOES&gt; @ 1 + ;</pre><post></post></test>
        <test><pre>: DOES2 DOES&gt; @ 2 + ;</pre><post></post></test>
        <test><pre>CREATE CR1</pre><post> </post></test>
        <test><pre>CR1  </pre><post>HERE</post></test>
        <test><pre>1 ,  </pre><post> </post></test>
        <test><pre>CR1 @</pre><post>1</post></test>
        <test><pre>DOES1</pre><post> </post></test>
        <test><pre>CR1  </pre><post>2</post></test>
        <test><pre>DOES2</pre><post> </post></test>
        <test><pre>CR1  </pre><post>3</post></test>

        <test><pre>: WEIRD: CREATE DOES&gt; 1 + DOES&gt; 2 + ;</pre><post></post></test>
        <test><pre>WEIRD: W1</pre><post></post></test>
        <test><pre>' W1 &gt;BODY</pre><post>HERE   </post></test>
        <test><pre>W1       </pre><post>HERE 1 +</post></test>
        <test><pre>W1       </pre><post>HERE 2 +</post></test>
    </testing>
</worddef>
</wordlist>

AntonErtlavatar of AntonErtl

How do you like the XML definition for words?

I like the tags used, very fitting for the task. What I am wondering about: If something new comes up, like the "TO name semantics" in Forth-2012, how easy is it to add that?

What I don't like are general problems of XML: extreme verbosity, and the need to escape > and < (probably also &); this is especially noticable in the testing section of the example. But you are the editor, so it's your decision, and if the next editor wants something better, the XML format allows automatic conversion to the next format (but it should also happen if the reverse direction is also possible, i.e., no information is lost).

Would your system/documentation also output this XML definitions for its own words?

Probably possible, but for now I don't see a benefit.

Any other related feedback?

How does this format cope with showing changes?

ruvavatar of ruv

extreme verbosity,

At the first glance, verbosity can be slightly reduced. E.g., the fragment:

<compile>
            <stack type="C">
                <pre>colon-sys_1</pre>
                <post>colon-sys_2</post>
            </stack>
            <para>
                Append the run-time semantics below to the current
                definition.
                Whether or not the current definition is rendered
                findable in the dictionary by the compilation of
                <word word="core:DOES" /> is implementation defined.
                Consume <param>colon-sys_1</param> and produce
                <param>colon-sys_2</param>. Append the initiation
                semantics given below to the current definition.
            </para>

Can be expressed as:

<compiling cs="colon-sys_1 -- colon-sys_2">
  <p>
    Append the run-time semantics below to the current definition.
    Whether or not the current definition is rendered
    findable in the dictionary by the compilation of
    <w id="COREto"/> is implementation defined.
    Consume <d>colon-sys_1</d> and produce
    <d>colon-sys_2</d>. Append the initiation
    semantics given below to the current definition.
  </p>

The idea: use shorter names for frequent elements, and use attributes.

The attributes ds, rs, cs, fs — for the data stack, return stack, control-flow, and floating-point stack correspondingly. p — the same as in HTML for paragraph. The <w id="DOESto"/> element can be also written as <w>DOES&gt;</w>.

I don't sure concerning supporting such XML attributes in the DocBook format, but in any case it can be easily transformed into the required form.

the need to escape > and < (probably also &); this is especially noticable in the testing section of the example.

Testing can be expressed less verbosely too, e.g.:

<testing>
  <test>
    : DOES1 DOES&gt; @ 1 + ;
    CREATE CR1
  </test>
  <test>CR1 <result>HERE</result></test>
</testing>

But I don't see much sense to use XML markup for T{ ... -> ... }T construct, when we don't use XML markup for other Forth constructs. What is a rationale?

I could say, if we use classic Forth code for colon-definitions, let's use the classic code for testcases too. For plain text XML nodes we can use CDATA sections to avoid escaping of the special characters:

<testing><![CDATA[
    T{ : DOES1 DOES> @ 1 + ;  ->  }T
    T{ CREATE CR1  ->   }T
    T{ CR1  ->  HERE }T
]]></testing>

How does this format cope with showing changes?

If you mean showing diff between versions — there are several approaches

  • git diff that shows changes in the source code (as plain text);
  • something like xmldiff that takes into account XML format and structure (NB: git can use an external diff utility);
  • something like html-differ that compares HTML files (the results of rendering);
  • some own special tool (i.e. XSLT transformations) that takes into account some special things and renders result in XHTML.
Reply New Version