Proposal:  Multi-Tasking Proposal
This proposal has been moved into this section. Its former address was: /standard/facility/MS
This page is dedicated to discussing this specific proposal
AndrewHaley  Multi-Tasking ProposalProposal2018-09-06 17:19:38
Multiprogramming is a common facility in Forth systems. The wordset which supports this is known as the multi-tasking wordset, and the entity executing a program sequence is known as a task.
The earliest Forth systems developed by Charles Moore supported multiprogramming, in that the computer could execute muliple concurrent program sequences. Many of today's Forths support multiprogramming, but it has never been part of any standard. Forth has been multi-tasking for almost 50 years. It's time to standardize it.
The most complex part of this proposal is not the wordset itself but the way in which it interacts with the rest of the system. It is necessary to go through every line of the standard to determine which parts will be affected and then to decide what to do.
Round-robin versus pre-emptive task scheduling
Originally, Forth used a co-operative (sometimes called "round robin") task scheduler to share time between tasks on a single processor. Such a scheduler only switches tasks voluntarily, i.e. when a task relinquishes control. This is done either by executing PAUSE or an I/O word such as TYPE. More recent systems sometimes use a "pre-emptive" scheduler which may switch tasks at any time, and some systems have multiple processors sharing the same memory. This wordset doesn't address such differences except to note that you may have to execute PAUSE in a long-running computation to allow other tasks to run. From th epoint of view of this standard it does not matter: as long as it follows the rules of the Memory Model, a Standard Forth program will execute correctly on either kind of system.
The Standard Forth Memory Model
Multiprocessor systems vary in the degree to which processors operate asynchronously. In order to allow systems to operate as quickly as possible while minimizing power consumption, memory operations performed by one task may appear to other tasks in a different order, stores may appear to be delayed, and in some cases incomplete updates to cells in memory may be visible. On some systems values may be read that have never been written by any task.
A Data Race occurs when two or more tasks access the same memory location concurrently, and at least one of the accesses is some kind of non-atomic store, and the tasks are not using any MUTEXes to guarantee exclusive access to that memory location. This standard does not specify what values result from data races, either when those values are read from memory or when they are written.
ATOMIC accesses and MUTEX operations are sequentially consistent. Sequential consistency means that there appears to be a total order of all atomic operations, i.e. every task observes atomic operations in the same order. More formally, he result of any execution is the same as if
the operations of all the tasks had been executed in some sequential order, and the operations of each processor appear in this total ordering in the same order as was specified in its program.
Therefore, a well-defined program may store values into a buffer using ordinary (non-atomic) stores then publish the address of that buffer to other tasks using either a region protected by a MUTEX or by using an atomic store. Another task may use a MUTEX or an atomic fetch to read the address of the buffer, then use ordinary (non-atomic) fetches to load the values from it.
[ As long as a program has no data races, the entire program is sequentially consistent, even though it uses plain fetches and stores for most of its operations. This is an important property that greatly aids intuitive understanding of programs. ]
TASK ( "
Skip leading space delimiters. Parse name delimited by a space. Create a definition for name with the execution semantics defined below. Reserve /TASK bytes of data space at a suitably aligned address. name is referred to as a "task".
name Execution: ( –– a-addr )
a-addr is the address of the task. The task may not be used until it has been CONSTRUCTed.
/TASK ( -- n) "per task"
n is the size of a task. [This word allows arrays of tasks to be created without having to name each one.]
CONSTRUCT ( a-addr )
Instantiate the task at a-addr. a-addr must be a memory region of at least /TASK bytes in size. CONSTRUCT shall be executed at most once on a task. Once CONSTRUCT has been executed, the task's USER variables may be accessed.
START-TASK ( xt a-addr -- )
Start the task at addr asynchronously executing the word whose execution token is xt. If the word terminates by executing EXIT, reaching the end of the word, or throwing an execption, the task terminates. The task may be restarted after this by executing START-TASK again. The task shall not be restarted until it has terminated.
+USER ( n "
Skip leading space delimiters. Parse name delimited by a space. Create a definition for name with the execution semantics defined below. Reserve n bytes of data space, suitably aligned, in every task. name is referred to as a "user variable".
[ I can't find any common practice for defining user variables. Traditional forth defines n USER to have a fixed offset in the uder area, but that is of no use unless once can discover the highest already-dedefined user variable. Suggestions welcome. ]
name Execution: ( –– a-addr )
a-addr is the address of the variable in the currently-running task.
[ I think we need to say that a child of MARKER deallocates all user variables defined after it. ]
HIS ( addr1 addr2 -- addr3 )
Given a task at address addr1 and a user varible address in the current task, return the address of the corresponding user variable in that task. [ Usage: <task-name> <user-variable-name> HIS e.g. TYPIST BASE HIS @ ]
Starting, stopping, and pausing tasks
STOP ( -- )
Block the currently-running task unless or until AWAKEN has been issued. Calls to AWAKEN are not counted, so multiple AWAKENs before a STOP can unblock only a single STOP.
AWAKEN ( a-addr -- )
a-addr is the address of a task. If that task is blocked, unblock it so that it resumes execution. If that task is not blocked, the next time it executes STOP it will continue rather than blocking.
PAUSE ( -- )
May cause the execuiting task temporarily to relinquish the CPU.
[ In a Forth system which uses round-robin scheduling this can be used to allow other tasks to run However, this isn't usually needed because any I/O causes a task to block. All words which perform I/O (TYPE, ACCEPT, READFILE, etc.) should (unless they are extremely high priority?) execute PAUSE . ]
Atomic memory words
ATOMIC@ ( a-addr - x )
x is the value stored at a-addr. ATOMIC@ is sequentially consistent.
See: Sequential consistency
ATOMIC! ( x a-addr –– ) “atomic-store”
Store x at a-addr. ATOMIC! is sequentially consistent.
See: Sequential consistency
ATOMICXCHG ( x1 a-addr – x2) "atomic-exchange"
Atomically replace the value at a-addr with x1. x2 is the value previously at a-addr. This is an atomic read-modify-write operation. ATOMIC-XCHG has the memory effect of an ATOMIC@ followed by an ATOMIC! .
See: Sequential consistency
ATOMICCAS ( x-expected x-desired a-addr – x-prev)
Atomically compare the value at a-addr with expected, and if equal, replace the value at a-addr with desired. prev is the value at a-addr immediately before this operation. This is an atomic read-modify-write operation. ATOMIC-CAS has the memory effect of an ATOMIC@ followed by (if the store happens) an ATOMIC! .
See: Sequential consistency
/MUTEX ( –- n) "per-mutex"
n is the number of bytes in a mutex.
MUTEXINIT ( a-addr -- )
Initialize a mutex. Set its state to released.
GET ( a-addr --)
Obtain control of the mutex at a-addr. If the mutex is owned by another task, the task executing GET will block until the mutex is available. This operation is part of the total order, and happens after another task releases the mutex.
[ In a round-robin scheduler, this word executes PAUSE before attempting to acquire the mutex. ]
RELEASE ( a-addr –- )
Release the mutex at addr. This operation is part of the total order, and happens before another task can GET the mutex.
MitraArdron Query re status; and minor commentsComment2020-11-29 06:32:26
I'm wondering whether this proposal is going somewhere, or was just Andrew's idea ? I don't see serious problems with it, but I'd prefer to only implement one version of multitasking in webForth, but I'm also not seeing any movement for two years?
Minor issues with the proposal ....
I think "INIT-TASK" makes more sense than "CONSTRUCT" for consistency.
ATOMIC! - I'm not quite sure what the point of this is? I'm presuming you mean that the Store can't be interrupted between starting to write and ending (for example if its writing a cell to multiple bytes). But in that case, I would presume that every ! in the core forth words should be an ATOMIC! . If that is not the meaning, then maybe it needs some explanation. (ATOMIC@ has the ame issue; ATOMIC-XCHG and ATOMIC-CAS seem fine to me)
MUTEX - I'd suggest MUTEXGET MUTEXRELEASE because GET is too common a word not to cause clashes?
USER variables seem to be a problem for me - I'm starting with eForth so there are about 40 already, Of those roughly 6 are directly needed to switch tasks (0..3 and SP0, RP0). Almost all the rest are are only relevant if the task has access to I/O or is reading and interpreting/compiling text (?KEY 'EMIT 'EXPECT 'TAP 'ECHO 'PROMPT temp SPAN >IN #TIB 'EVAL 'NUMBER HLD CONTEXT+ 8 CURRENT CP NP LAST BASE) So there is a lot of memory overhead 27 * cellsize bytes per task. If we are going to standardise, wouldn't it be good to be able to do lightweight (no I/O) tasks as well?
kc5tja Round-robin vs PreemptiveExample2022-06-25 15:26:37
Round-robin is correctly identified as a scheduling algorithm. However, the choice between cooperative and preemptive multitasking has nothing to do with scheduling. For example, the Commodore-Amiga operating system Kickstart uses round-robin scheduling with tasks of equal priority, even though it is also preemptive. The Tripos operating system uses a pure priority-driven scheduling algorithm, with no round-robin behavior at all, despite being cooperative. Thus, it is best to use the term "cooperative" instead of round-robin, and avoid the whole "sometimes called" stuff. It is distracting and factually wrong. Commentary about how people frequently mis-use the term "round-robin," and/or a more general discussion of the different scheduling algorithms that exist, is best left for an appendix entry. I don't think it has a proper place in the body of a standard.
Here's is how I would clean up that section. Very little needs to change, but it is, in my estimation, a huge improvement.
Cooperative Versus Preemptive Task Scheduling
Originally, Forth used a cooperative task scheduler to share time between tasks on a single processor. Such a scheduler only switches tasks voluntarily, i.e. when a task relinquishes control. This is done either by executing
PAUSE or an I/O word such as
TYPE. More recent systems sometimes use a "preemptive" scheduler which may switch tasks at any time, and some systems have multiple processors sharing the same memory. This wordset doesn't address such differences except to note that you may have to execute
PAUSE in a long-running computation to allow other tasks to run. From the point of view of this standard, it does not matter: as long as it follows the rules of the Memory Model, a Standard Forth program will execute correctly on either kind of system.
kc5tja Stack Sizes?Example2022-06-25 15:38:51
The proposal looks good, at least for a system which utilizes a MMU to detect stack overflows to expand the stacks automatically. However, not all Forth systems will be equipped with such an MMU. On such systems, how does one ensure a data or return stack large enough to allow the task to run?
kc5tja Interactions with MARKER and KILL-TASKExample2022-06-25 15:54:21
While the standard allows MARKER to focus only on the state of the dictionary, and thus an implementation doesn't have to extend marker words to do anything related to tasks, I'm wondering what are the general thoughts about a marker word being extended to kill currently running threads defined after a marker word? Without such an extension, you would need to write code like this:
MARKER end-program TASK fooer fooer CONSTRUCT : end-program fooer KILL-TASK ( STOP takes no parameters, so we'll need to introduce this new word. ) end-program ;
Which reminds me about
KILL-TASK. Exact name doesn't matter to me, but I think we'd want this. I imagine creating an interactive program, launching one background task per asynchronously-managed I/O device. When the program is terminated by the user, it would be necessary to bring those tasks down cleanly (even if rudely) before reverting HERE to somewhere ahead of their task records.
ruv Better API for multitaskingComment2022-07-18 00:03:00
Task object problem
The proposed interface (partially) is:
TASK ( "name” -- ), name
( -- a-addr.task )
/TASK ( -- u )— This word allows arrays of tasks to be created without having to name each one.
CONSTRUCT ( a-addr.task -- )— Instantiate the task at a-addr.task.
START-TASK ( xt a-addr.task -- )
The task shall not be restarted [by means of
START-TASK] until it has terminated.
It's a significant condition that I oversight before.
How it can be known in the parent thread whether a child thread has terminated or not? How to correctly free the memory allocated for a "task"?
Actually, it's a problem — since the user should manage memory for "tasks" on a very low level, and there is no enough means for that. But even having such means, this resource management should be hidden under the hood.
Requirements for multitasking
I implemented a variant of cooperative round-robin multitasking in Forth for MS-DOS many years ago, and I use multi-threading in SP-Forth/4 on Windows (and Linux) a lot. In these both implementations a data object that is required to start a new thread was superfluous (and it could be eliminated).
In practice, I need a convenient way to:
- launch an xt execution in a new separate thread (i.e., this xt is executed asynchronously, in parallel, using separate stacks),
- pass at least one argument to the new thread for this xt:
xt ( x -- ),
- premature terminate execution of the own thread (it's like
exitfor a word, or
byefor a process),
- forcefully terminate execution of another thread,
- wait for a thread termination,
- get a throw code of a terminated thread.
I'm not sure concerning the following:
- suspend execution of own or another thread (if it was not terminated),
- resume execution of another thread (if it was suspended, otherwise do nothing),
I don't include synchronization primitives in the list since they are a slightly different topic.
Another API variant
According to requirements, I can suggest the following API to manage tasks (threads):
launch-task ( x xt -- tid ior )
close-task-id ( tid -- ior )
clone-task-id ( tid1 -- tid2 ior )— a method to obtain another independent identifier.
task-id-own ( -- tid.own )
terminate ( -- )
kill-task ( tid -- ior )
wait-task ( tid -- ior )— wait task termination
wait-task-for ( u.timeout tid -- flag ior )— wait termination for given timeout
task-status ( tid -- n.status-code ior )
launch-task-suspended ( x xt -- tid ior )
suspend-task ( tid -- ior )
resume-task ( tid -- ior )
The method to suspend a task is not quite useful, since it cannot be used for synchronization.
Any tid is valid until it's closed (freed). The own tid is closed automatically when the thread terminates, so it should not be passed to another tread. When all tids are closed, the resources are freed automatically. NB: these identifier are not unique, since many identifiers may identify the same task/thread. Maybe the term "handle" is suited better here than "identifier".
Shared resources considerations
The API should specify how to work with dynamic memory: may a memory block allocated in one thread be freed in another thread, or not. For example, in SP-Forth/4 memory is allocated/freed in the per thread heaps, and all the allocated memory blocks in the thread are freed when the thread is terminated; and the global heap is also available.
Also, the API should specify does
here return different values in different threads, or the same.
here can be local for threads, may these threads perform
includedindependently in parallel, or not. If yes, should the compilation word list be different in different threads, or not.
High level inter-thread communication
A very convenient way for communications are channels (that is a message queue under the hood). So, each running task has an associated channel for inbound messages. Usually a message is cell-sized number. Maybe be strings should be also considered.
A sketch of methods:
send-message ( c-addr u tid -- ior )— sends a copy of given string.
wait-message ( -- c-addr u )— the returned string is valid until the next call of
wait-message-for ( u.timeout -- c-addr u flag )
The meaning of the content of string ( c-addr u ) is defined by the program (it may be binary data as well as textual).