Defining Tasks
Tasks are declared with the TASK macro.
Place the macro in a header or at the top of a source
file; it generates the task descriptor and all associated functions. Then
provide the task body as a regular C function named NAME_CALL.
TASK(int, my_task)
int my_task_CALL(lace_worker* lw) { ... }
TASK(int, fibonacci, int, n)
int fibonacci_CALL(lace_worker* lw, int n) { ... }
TASK(void, my_void_task, int, n)
void my_void_task_CALL(lace_worker* lw, int n) { ... }
TASK(void, process, int*, data, int, size)
void process_CALL(lace_worker* lw, int* data, int size) { ... }
Generated functions
Each TASK(RTYPE, NAME, ...) macro generates the following:
Function |
Description |
|---|---|
|
Your task body — implement this. |
|
Run the task, blocking until done. Works from inside or outside a Lace worker. |
|
Fork: push a task onto the deque so it can be stolen. Returns a pointer to the task. |
|
Join: retrieve the result of the last spawned task (LIFO order). |
|
Drop: cancel the last spawned task if not yet stolen, or discard its result if already stolen. |
|
Interrupt all workers and run this task. |
|
Interrupt all workers and run a copy on each worker. |
The lace_worker* pointer passed to _CALL must not be modified. It is
required by SPAWN, SYNC, and other Lace operations.
SPAWN and SYNC
SPAWN and SYNC must be matched and used in LIFO order: if you
spawn A then B, you must sync B before A. Syncing out of order is undefined
behaviour.
Each SPAWN pushes a task onto the deque where it can be stolen by another
worker. SYNC retrieves the result of the last spawned task, waiting for
it if stolen, or executing it directly if not.
int fibonacci_CALL(lace_worker* lw, int n)
{
if (n < 2) return n;
fibonacci_SPAWN(lw, n-1); // push onto deque (may be stolen)
int a = fibonacci_CALL(lw, n-2); // execute directly
int b = fibonacci_SYNC(lw); // retrieve spawned result
return a + b;
}
Calling NAME() from any context
NAME(...) can be called from both inside and outside a Lace worker thread.
If called from inside a worker, it detects this automatically and calls
NAME_CALL directly, skipping task submission entirely. This means you can
write library code that calls NAME() without knowing whether currently running
inside a Lace worker.
Dropping a spawned task
Instead of SYNC, use DROP to abandon the last spawned task.
If the task has not yet been stolen, it is cancelled and never executed. If it
has already been stolen, the thief will still complete it but the result is
discarded. Like SYNC, DROP must follow LIFO order relative to other
SPAWN/SYNC/DROP calls.
my_task_SPAWN(lw, arg);
// ... decide we don't need the result
my_task_DROP(lw);
Interrupting workers
Two special run modes interrupt currently executing tasks at the next steal
point (i.e. at SYNC or when idle):
NAME_NEWFRAME(...): halts all workers and runs the given task on the
worker pool. The current task frame is suspended and resumed after the new
task completes. Typical use: stop-the-world garbage collection.
NAME_TOGETHER(...): halts all workers and runs a copy of the given task
on every worker simultaneously. All workers start together and all complete
together (barrier semantics). Typical use: per-worker initialisation of
thread-local state.
Long-running tasks should call lace_check_yield() periodically to
cooperate with interruptions.
Per-task scratch storage
Some tasks need a small amount of temporary memory whose size depends on the input — a typical example is a buffer holding the children of an MTBDD node during a recursive operation. Three options are available:
allocaor C99 variable-length arrays. Fast, but the size sits on the thread stack; a deep recursion with input-dependentallocacan blow past the guard page and crash. MSVC also does not support VLAs, so this is not portable.malloc/free. Portable and safe, but each call goes through the global allocator. For fine-grained tasks this allocator contention is often the dominant cost.The Lace scratch arena described below. Per-worker, lock-free, and about as fast as
allocawhile remaining portable.
Each worker owns a private bump arena backed by a large virtual-memory reservation. The default is 1 GiB per worker on 64-bit systems, where address space is essentially free, and 16 MiB per worker on 32-bit systems, where user-mode address space is limited. Physical memory is committed lazily on first use; idle workers release pages back to the OS during deep backoff. The arena is private to one worker — no synchronisation, no atomics.
Tasks that need scratch use a save/alloc/restore pattern:
BDD my_op_CALL(lace_worker* lw, BDD a, BDD b)
{
void* mark = lace_scratch_mark(lw);
BDD* children = (BDD*)lace_scratch_alloc(lw, n * sizeof(BDD));
// ... use children, possibly SPAWN/CALL/SYNC sub-tasks ...
BDD result = mtbdd_makenode(...);
lace_scratch_reset(lw, mark);
return result;
}
The discipline is the same as alloca: a scratch pointer is valid for
the duration of the allocating frame and any descendants it transitively
spawns or calls. It must not be stored anywhere that could outlive the
frame.
Multiple return paths each need a matching lace_scratch_reset
before they return. If a task forgets to reset, Lace detects this when
the owning worker next enters deep idle: it prints a one-line warning
and resets the arena automatically, so the program does not accumulate
leaks over a long run.
Tasks that do not allocate scratch pay nothing for this feature: the virtual address space reservation costs nothing until a page is touched, and no instrumentation runs in the task entry or exit paths.
The reservation size and the committed band (the hysteresis margin
above the current top) can be configured before lace_start:
lace_set_scratch_size(4ULL * 1024 * 1024 * 1024); // 4 GiB per worker
lace_set_scratch_band(4 * 1024 * 1024); // 4 MiB band
lace_start(0, 0, 0);
Pass 0 to lace_set_scratch_size to disable the arena entirely.