Please support this book: buy it (PDF, EPUB, MOBI) or donate
6. Shared memory and atomics
The ECMAScript 2017 feature “Shared memory and atomics” was designed by Lars T. Hansen. It introduces a new constructor SharedArrayBuffer
and a namespace object Atomics
with helper functions. This chapter explains the details.
6.1 Parallelism vs. concurrency
Before we begin, let’s clarify two terms that are similar, yet distinct: “parallelism” and “concurrency”. Many definitions for them exist; I’m using them as follows:
- Parallelism (parallel vs. serial): execute multiple tasks simultaneously
Concurrency (concurrent vs. sequential): execute several tasks during overlapping periods of time (and not one after another).
Both are closely related, but not the same:Parallelism without concurrency: single instruction, multiple data (SIMD). Multiple computations happen in parallel, but only a single task (instruction) is executed at any given moment.
- Concurrency without parallelism: multitasking via time-sharing on a single-core CPU.
However, it is difficult to use these terms precisely, which is why interchanging them is usually not a problem.
6.1.1 Models of parallelism
Two models of parallelism are:
- Data parallelism: The same piece of code is executed several times in parallel. The instances operate on different elements of the same dataset. For example: MapReduce is a data-parallel programming model.
- Task parallelism: Different pieces of code are executed in parallel. Examples: web workers and the Unix model of spawning processes.
6.2 A history of JS parallelism
- JavaScript started as being executed in a single thread. Some tasks could be performed asynchronously: browsers usually ran those tasks in separate threads and later fed their results back into the single thread, via callbacks.
- Web workers brought task parallelism to JavaScript: They are relatively heavweight processes. Each worker has its own global environment. By default, nothing is shared. Communication between workers (or between workers and the main thread) evolved:
- At first, you could only send and receive strings.
- Then, structured cloning was introduced: copies of data could be sent and received. Structured cloning works for most data (JSON data, Typed Arrays, regular expressions,
Blob
objects,ImageData
objects, etc.). It can even handle cyclic references between objects correctly. However, error objects, function objects and DOM nodes cannot be cloned. - Transferables move data between workers: the sending party loses access as the receiving party gains access to data.
- Computing on GPUs (which tend to do data parallelism well) via WebGL: It’s a bit of a hack and works as follows.
- Input: your data, converted into an image (pixel by pixel).
- Processing: OpenGL pixel shaders can perform arbitrary computations on GPUs. Your pixel shader transforms the input image.
- Output: again an image that you can convert back to your kind of data.
- SIMD (low-level data parallelism): is supported via the ECMAScript proposal SIMD.js. It allows you to perform operations (such as addition and square root) on several integers or floats at the same time.
- PJS (codenamed River Trail): the plan of this ultimately abandoned project was to bring high-level data parallelism (think map-reduce via pure functions) to JavaScript. However, there was not enough interest from developers and engine implementers. Without implementations, one could not experiment with this API, because it can’t be polyfilled. On 2015-01-05, Lars T. Hansen announced that an experimental implementation was going to be removed from Firefox.
6.2.1 The next step: SharedArrayBuffer
What’s next? For low-level parallelism, the direction is quite clear: support SIMD and GPUs as well as possible. However, for high-level parallelism, things are much less clear, especially after the failure of PJS.
What is needed is a way to try out many approaches, to find out how to best bring high-level parallelism to JavaScript. Following the principles of the extensible web manifesto, the proposal “shared memory and atomics” (a.k.a. “Shared Array Buffers”) does so by providing low-level primitives that can be used to implement higher-level constructs.
6.3 Shared Array Buffers
Shared Array Buffers are a primitive building block for higher-level concurrency abstractions. They allow you to share the bytes of a SharedArrayBuffer
object between multiple workers and the main thread (the buffer is shared, to access the bytes, wrap it in a Typed Array). This kind of sharing has two benefits:
- You can share data between workers more quickly.
- Coordination between workers becomes simpler and faster (compared to
postMessage()
).
6.3.1 Creating and sending a Shared Array Buffer
// main.js
const
worker
=
new
Worker
(
'worker.js'
);
// To be shared
const
sharedBuffer
=
new
SharedArrayBuffer
(
// (A)
10
*
Int32Array
.
BYTES_PER_ELEMENT
);
// 10 elements
// Share sharedBuffer with the worker
worker
.
postMessage
({
sharedBuffer
});
// clone
// Local only
const
sharedArray
=
new
Int32Array
(
sharedBuffer
);
// (B)
You create a Shared Array Buffer the same way you create a normal Array Buffer: by invoking the constructor and specifying the size of the buffer in bytes (line A). What you share with workers is the buffer. For your own, local, use, you normally wrap Shared Array Buffers in Typed Arrays (line B).
Warning: Cloning a Shared Array Buffer is the correct way of sharing it, but some engines still implement an older version of the API and require you to transfer it:
worker
.
postMessage
({
sharedBuffer
},
[
sharedBuffer
]);
// transfer (deprecated)
In the final version of the API, transferring a Shared Array Buffer means that you lose access to it.
6.3.2 Receiving a Shared Array Buffer
The implementation of the worker looks as follows.
// worker.js
self
.
addEventListener
(
'message'
,
function
(
event
)
{
const
{
sharedBuffer
}
=
event
.
data
;
const
sharedArray
=
new
Int32Array
(
sharedBuffer
);
// (A)
// ···
});
We first extract the Shared Array Buffer that was sent to us and then wrap it in a Typed Array (line A), so that we can use it locally.
6.4 Atomics: safely accessing shared data
6.4.1 Problem: Optimizations make code unpredictable across workers
In single threads, compilers can make optimizations that break multi-threaded code.
Take, for example the following code:
while
(
sharedArray
[
0
]
===
123
)
;
In a single thread, the value of sharedArray[0]
never changes while the loop runs (if sharedArray
is an Array or Typed Array that wasn’t patched in some manner). Therefore, the code can be optimized as follows:
const
tmp
=
sharedArray
[
0
];
while
(
tmp
===
123
)
;
However, in a multi-threaded setting, this optimization prevents us from using this pattern to wait for changes made in another thread.
Another example is the following code:
// main.js
sharedArray
[
1
]
=
11
;
sharedArray
[
2
]
=
22
;
In a single thread, you can rearrange these write operations, because nothing is read in-between. For multiple threads, you get into trouble whenever you expect the writes to be done in a specific order:
// worker.js
while
(
sharedArray
[
2
]
!==
22
)
;
console
.
log
(
sharedArray
[
1
]);
// 0 or 11
These kinds of optimizations make it virtually impossible to synchronize the activity of multiple workers operating on the same Shared Array Buffer.
6.4.2 Solution: atomics
The proposal provides the global variable Atomics
whose methods have three main use cases.
6.4.2.1 Use case: synchronization
Atomics
methods can be used to synchronize with other workers. For example, the following two operations let you read and write data and are never rearranged by compilers:
Atomics.load(ta : TypedArray<T>, index) : T
Atomics.store(ta : TypedArray<T>, index, value : T) : T
The idea is to use normal operations to read and write most data, whileAtomics
operations (load
,store
and others) ensure that the reading and writing is done safely. Often, you’ll use custom synchronization mechanisms, such as locks, whose implementations are based onAtomics
.
This is a very simple example that always works, thanks to Atomics
(I’ve omitted setting up sharedArray
):
// main.js
console
.
log
(
'notifying...'
);
Atomics
.
store
(
sharedArray
,
0
,
123
);
// worker.js
while
(
Atomics
.
load
(
sharedArray
,
0
)
!==
123
)
;
console
.
log
(
'notified'
);
6.4.2.2 Use case: waiting to be notified
Using a while
loop to wait for a notification is not very efficient, which is why Atomics
has operations that help:
Atomics.wait(ta: Int32Array, index, value, timeout)
waits for a notification atta[index]
, but only ifta[index]
isvalue
.Atomics.wake(ta : Int32Array, index, count)
wakes upcount
workers that are waiting atta[index]
.
6.4.2.3 Use case: atomic operations
Several Atomics
operations perform arithmetic and can’t be interrupted while doing so, which helps with synchronization. For example:
Atomics.add(ta : TypedArray<T>, index, value) : T
Roughly, this operation performs:
ta
[
index
]
+=
value
;
6.4.3 Problem: torn values
Another problematic effect with shared memory is torn values (garbage): when reading, you may see an intermediate value – neither the value before a new value was written to memory nor the new value.
Sect “Tear-Free Reads” in the spec states that there is no tear if and only if:
- Both reading and writing happens via Typed Arrays (not DataViews).
- Both Typed Arrays are aligned with their Shared Array Buffers:
sharedArray
.
byteOffset
%
sharedArray
.
BYTES_PER_ELEMENT
===
0
Both Typed Arrays have the same number of bytes per element.
In other words, torn values are an issue whenever the same Shared Array Buffer is accessed via:One or more DateViews
- One or more unaligned Typed Arrays
- Typed Arrays with different element sizes
To avoid torn values in these cases, useAtomics
or synchronize.
6.5 Shared Array Buffers in use
6.5.1 Shared Array Buffers and the run-to-completion semantics of JavaScript
JavaScript has so-called run-to-completion semantics: every function can rely on not being interrupted by another thread until it is finished. Functions become transactions and can perform complete algorithms without anyone seeing the data they operate on in an intermediate state.
Shared Array Buffers break run to completion (RTC): data a function is working on can be changed by another thread during the runtime of the function. However, code has complete control over whether or not this violation of RTC happens: if it doesn’t use Shared Array Buffers, it is safe.
This is loosely similar to how async functions violate RTC. There, you opt into a blocking operation via the keyword await
.
6.5.2 Shared Array Buffers and asm.js and WebAssembly
Shared Array Buffers enable emscripten to compile pthreads to asm.js. Quoting an emscripten documentation page:
[Shared Array Buffers allow] Emscripten applications to share the main memory heap between web workers. This along with primitives for low level atomics and futex support enables Emscripten to implement support for the Pthreads (POSIX threads) API.
That is, you can compile multithreaded C and C++ code to asm.js.
Discussion on how to best bring multi-threading to WebAssembly is ongoing. Given that web workers are relatively heavyweight, it is possible that WebAssembly will introduce lightweight threads. You can also see that threads are on the roadmap for WebAssembly’s future.
6.5.3 Sharing data other than integers
At the moment, only Arrays of integers (up to 32 bits long) can be shared. That means that the only way of sharing other kinds of data is by encoding them as integers. Tools that may help include:
TextEncoder
andTextDecoder
: The former converts strings to instances ofUint8Array
. The latter does the opposite.- stringview.js: a library that handles strings as arrays of characters. Uses Array Buffers.
- FlatJS: enhances JavaScript with ways of storing complex data structures (structs, classes and arrays) in flat memory (
ArrayBuffer
andSharedArrayBuffer
). JavaScript+FlatJS is compiled to plain JavaScript. JavaScript dialects (TypeScript etc.) are supported. - TurboScript: is a JavaScript dialect for fast parallel programming. It compiles to asm.js and WebAssembly.
Eventually, there will probably be additional – higher-level – mechanisms for sharing data. And experiments will continue to figure out what these mechanisms should look like.
6.5.4 How much faster is code that uses Shared Array Buffers?
Lars T. Hansen has written two implementations of the Mandelbrot algorithm (as documented in his article “A Taste of JavaScript’s New Parallel Primitives” where you can try them out online): A serial version and a parallel version that uses multiple web workers. For up to 4 web workers (and therefore processor cores), speed-up improves almost linearly, from 6.9 frames per seconds (1 web worker) to 25.4 frames per seconds (4 web workers). More web workers bring additional performance improvements, but more modest ones.
Hansen notes that the speed-ups are impressive, but going parallel comes at the cost of the code being more complex.
6.6 Example
Let’s look at a more comprehensive example. Its code is available on GitHub, in the repository shared-array-buffer-demo
. And you can run it online.
6.6.1 Using a shared lock
In the main thread, we set up shared memory so that it encodes a closed lock and send it to a worker (line A). Once the user clicks, we open the lock (line B).
// main.js
// Set up the shared memory
const
sharedBuffer
=
new
SharedArrayBuffer
(
1
*
Int32Array
.
BYTES_PER_ELEMENT
);
const
sharedArray
=
new
Int32Array
(
sharedBuffer
);
// Set up the lock
Lock
.
initialize
(
sharedArray
,
0
);
const
lock
=
new
Lock
(
sharedArray
,
0
);
lock
.
lock
();
// writes to sharedBuffer
worker
.
postMessage
({
sharedBuffer
});
// (A)
document
.
getElementById
(
'unlock'
).
addEventListener
(
'click'
,
event
=>
{
event
.
preventDefault
();
lock
.
unlock
();
// (B)
});
In the worker, we set up a local version of the lock (whose state is shared with the main thread via a Shared Array Buffer). In line B, we wait until the lock is unlocked. In lines A and C, we send text to the main thread, which displays it on the page for us (how it does that is not shown in the previous code fragment). That is, we are using self.postMessage()
much like console.log()
in these two lines.
// worker.js
self
.
addEventListener
(
'message'
,
function
(
event
)
{
const
{
sharedBuffer
}
=
event
.
data
;
const
lock
=
new
Lock
(
new
Int32Array
(
sharedBuffer
),
0
);
self
.
postMessage
(
'Waiting for lock...'
);
// (A)
lock
.
lock
();
// (B) blocks!
self
.
postMessage
(
'Unlocked'
);
// (C)
});
It is noteworthy that waiting for the lock in line B stops the complete worker. That is real blocking, which hasn’t existed in JavaScript until now (await
in async functions is an approximation).
6.6.2 Implementing a shared lock
Next, we’ll look at an ES6-ified version of a Lock
implementation by Lars T. Hansen that is based on SharedArrayBuffer
.
In this section, we’ll need (among others) the following Atomics
function:
Atomics.compareExchange(ta : TypedArray<T>, index, expectedValue, replacementValue) : T
If the current element ofta
atindex
isexpectedValue
, replace it withreplacementValue
. Return the previous (or unchanged) element atindex
.
The implementation starts with a few constants and the constructor:
const
UNLOCKED
=
0
;
const
LOCKED_NO_WAITERS
=
1
;
const
LOCKED_POSSIBLE_WAITERS
=
2
;
// Number of shared Int32 locations needed by the lock.
const
NUMINTS
=
1
;
class
Lock
{
/**
* @param iab an Int32Array wrapping a SharedArrayBuffer
* @param ibase an index inside iab, leaving enough room for NUMINTS
*/
constructor
(
iab
,
ibase
)
{
// OMITTED: check parameters
this
.
iab
=
iab
;
this
.
ibase
=
ibase
;
}
The constructor mainly stores its parameters in instance properties.
The method for locking looks as follows.
/**
* Acquire the lock, or block until we can. Locking is not recursive:
* you must not hold the lock when calling this.
*/
lock
()
{
const
iab
=
this
.
iab
;
const
stateIdx
=
this
.
ibase
;
var
c
;
if
((
c
=
Atomics
.
compareExchange
(
iab
,
stateIdx
,
// (A)
UNLOCKED
,
LOCKED_NO_WAITERS
))
!==
UNLOCKED
)
{
do
{
if
(
c
===
LOCKED_POSSIBLE_WAITERS
// (B)
||
Atomics
.
compareExchange
(
iab
,
stateIdx
,
LOCKED_NO_WAITERS
,
LOCKED_POSSIBLE_WAITERS
)
!==
UNLOCKED
)
{
Atomics
.
wait
(
iab
,
stateIdx
,
// (C)
LOCKED_POSSIBLE_WAITERS
,
Number
.
POSITIVE_INFINITY
);
}
}
while
((
c
=
Atomics
.
compareExchange
(
iab
,
stateIdx
,
UNLOCKED
,
LOCKED_POSSIBLE_WAITERS
))
!==
UNLOCKED
);
}
}
In line A, we change the lock to LOCKED_NO_WAITERS
if its current value is UNLOCKED
. We only enter the then-block if the lock is already locked (in which case compareExchange()
did not change anything).
In line B (inside a do-while
loop), we check if the lock is locked with waiters or not unlocked. Given that we are about to wait, the compareExchange()
also switches to LOCKED_POSSIBLE_WAITERS
if the current value is LOCKED_NO_WAITERS
.
In line C, we wait if the lock value is LOCKED_POSSIBLE_WAITERS
. The last parameter, Number.POSITIVE_INFINITY
, means that waiting never times out.
After waking up, we continue the loop if we are not unlocked. compareExchange()
also switches to LOCKED_POSSIBLE_WAITERS
if the lock is UNLOCKED
. We use LOCKED_POSSIBLE_WAITERS
and not LOCKED_NO_WAITERS
, because we need to restore this value after unlock()
temporarily set it to UNLOCKED
and woke us up.
The method for unlocking looks as follows.
/**
* Unlock a lock that is held. Anyone can unlock a lock that
* is held; nobody can unlock a lock that is not held.
*/
unlock
()
{
const
iab
=
this
.
iab
;
const
stateIdx
=
this
.
ibase
;
var
v0
=
Atomics
.
sub
(
iab
,
stateIdx
,
1
);
// A
// Wake up a waiter if there are any
if
(
v0
!==
LOCKED_NO_WAITERS
)
{
Atomics
.
store
(
iab
,
stateIdx
,
UNLOCKED
);
Atomics
.
wake
(
iab
,
stateIdx
,
1
);
}
}
// ···
}
In line A, v0
gets the value that iab[stateIdx]
had before 1 was subtracted from it. The subtraction means that we go (e.g.) from LOCKED_NO_WAITERS
to UNLOCKED
and from LOCKED_POSSIBLE_WAITERS
to LOCKED
.
If the value was previously LOCKED_NO_WAITERS
then it is now UNLOCKED
and everything is fine (there is no one to wake up).
Otherwise, the value was either LOCKED_POSSIBLE_WAITERS
or UNLOCKED
. In the former case, we are now unlocked and must wake up someone (who will usually lock again). In the latter case, we must fix the illegal value created by subtraction and the wake()
simply does nothing.
6.6.3 Conclusion for the example
This gives you a rough idea how locks based on SharedArrayBuffer
work. Keep in mind that multithreaded code is notoriously difficult to write, because things can change at any time. Case in point: lock.js
is based on a paper documenting a futex implementation for the Linux kernel. And the title of that paper is “Futexes are tricky” (PDF).
If you want to go deeper into parallel programming with Shared Array Buffers, take a look at synchronic.js
and the document it is based on (PDF).
6.7 The API for shared memory and atomics
6.7.1 SharedArrayBuffer
Constructor:
new SharedArrayBuffer(length)
Create a buffer forlength
bytes.
Static property:get SharedArrayBuffer[Symbol.species]
Returnsthis
by default. Override to control whatslice()
returns.
Instance properties:get SharedArrayBuffer.prototype.byteLength()
Returns the length of the buffer in bytes.SharedArrayBuffer.prototype.slice(start, end)
Create a new instance ofthis.constructor[Symbol.species]
and fill it with the bytes at the indices from (including)start
to (excluding)end
.
6.7.2 Atomics
The main operand of Atomics
functions must be an instance of Int8Array
, Uint8Array
, Int16Array
, Uint16Array
, Int32Array
or Uint32Array
. It must wrap a SharedArrayBuffer
.
All functions perform their operations atomically. The ordering of store operations is fixed and can’t be reordered by compilers or CPUs.
6.7.2.1 Loading and storing
Atomics.load(ta : TypedArray<T>, index) : T
Read and return the element ofta
atindex
.Atomics.store(ta : TypedArray<T>, index, value : T) : T
Writevalue
tota
atindex
and returnvalue
.Atomics.exchange(ta : TypedArray<T>, index, value : T) : T
Set the element ofta
atindex
tovalue
and return the previous value at that index.Atomics.compareExchange(ta : TypedArray<T>, index, expectedValue, replacementValue) : T
If the current element ofta
atindex
isexpectedValue
, replace it withreplacementValue
. Return the previous (or unchanged) element atindex
.
6.7.2.2 Simple modification of Typed Array elements
Each of the following functions changes a Typed Array element at a given index: It applies an operator to the element and a parameter and writes the result back to the element. It returns the original value of the element.
Atomics.add(ta : TypedArray<T>, index, value) : T
Performta[index] += value
and return the original value ofta[index]
.Atomics.sub(ta : TypedArray<T>, index, value) : T
Performta[index] -= value
and return the original value ofta[index]
.Atomics.and(ta : TypedArray<T>, index, value) : T
Performta[index] &= value
and return the original value ofta[index]
.Atomics.or(ta : TypedArray<T>, index, value) : T
Performta[index] |= value
and return the original value ofta[index]
.Atomics.xor(ta : TypedArray<T>, index, value) : T
Performta[index] ^= value
and return the original value ofta[index]
.
6.7.2.3 Waiting and waking
Waiting and waking requires the parameter ta
to be an instance of Int32Array
.
Atomics.wait(ta: Int32Array, index, value, timeout=Number.POSITIVE_INFINITY) : ('not-equal' | 'ok' | 'timed-out')
If the current value atta[index]
is notvalue
, return'not-equal'
. Otherwise go to sleep until we are woken up viaAtomics.wake()
or until sleeping times out. In the former case, return'ok'
. In the latter case, return'timed-out'
.timeout
is specified in milliseconds. Mnemonic for what this function does: “wait ifta[index]
isvalue
”.Atomics.wake(ta : Int32Array, index, count)
Wake upcount
workers that are waiting atta[index]
.
6.7.2.4 Miscellaneous
Atomics.isLockFree(size)
This function lets you ask the JavaScript engine if operands with the givensize
(in bytes) can be manipulated without locking. That can inform algorithms whether they want to rely on built-in primitives (compareExchange()
etc.) or use their own locking.Atomics.isLockFree(4)
always returnstrue
, because that’s what all currently relevant supports.
6.8 FAQ
6.8.1 What browsers support Shared Array Buffers?
At the moment, I’m aware of:
- Firefox (50.1.0+): go to
about:config
and setjavascript.options.shared_memory
totrue
- Safari Technology Preview (Release 21+): enabled by default.
- Chrome Canary (58.0+): There are two ways to switch it on.
- Via
chrome://flags/
(“Experimental enabled SharedArrayBuffer support in JavaScript”) —js-flags=—harmony-sharedarraybuffer —enable-blink-feature=SharedArrayBuffer
- Via
6.9 Further reading
More information on Shared Array Buffers and supporting technologies:
- “Shared memory – a brief tutorial” by Lars T. Hansen
- “A Taste of JavaScript’s New Parallel Primitives” by Lars T. Hansen [a good intro to Shared Array Buffers]
- “SharedArrayBuffer and Atomics Stage 2.95 to Stage 3” (PDF), slides by Shu-yu Guo and Lars T. Hansen (2016-11-30) [slides accompanying the ES proposal]
“The Basics of Web Workers” by Eric Bidelman [an introduction to web workers]
Other JavaScript technologies related to parallelism:“The Path to Parallel JavaScript” by Dave Herman [a general overview of where JavaScript is heading after the abandonment of PJS]
“Write massively-parallel GPU code for the browser with WebGL” by Steve Sanderson [fascinating talk that explains how to get WebGL to do computations for you on the GPU]
Background on parallelism:“Concurrency is not parallelism” by Rob Pike [Pike uses the terms “concurrency” and “parallelism” slightly differently than I do in this chapter, providing an interesting complementary view]
Acknowledgement: I’m very grateful to Lars T. Hansen for reviewing this chapter and for answering mySharedArrayBuffer
-related questions.