Asterius is a Haskell to WebAssembly compiler based on GHC. It compiles simple Haskell source files or Cabal executable targets to WebAssembly+JavaScript code which can be run in node.js or browsers. It features seamless JavaScript interop (lightweight Async FFI with Promise
support) and small output code (~600KB hello.wasm
for a Hello World). A lot of common Haskell packages like lens
are already supported. The project is actively maintained by Tweag I/O.
Asterius is maintained by Tweag I/O.
Have questions? Need help? Tweet at @tweagio.
OverviewAsterius compiles Haskell code to WebAssembly (Wasm). Its frontend is based on GHC.
The Asterius pipeline provides everything to create a Wasm instance which exports the foreign exported functions (e.g. main
) that can be called from JavaScript to execute the main Haskell program.
We host prebuilt container images on Docker Hub under the terrorjack/asterius
repository. The images work with podman
or docker
.
Whenever the master
branch gets a new commit, we trigger an image build on our infrastructure. After the build completes, we push to the terrorjack/asterius:latest
tag. When trying asterius
locally, it's recommended to use terrorjack/asterius:latest
since it follows master
closely.
The images are built with the gitrev
label to indicate the exact asterius
repository revision. Use docker inspect terrorjack/asterius | grep "gitrev"
to find out the revision info.
You may want to stick with a specific version of the prebuilt image for some time for more reproducibility in e.g. CI builds. In that case, browse for the tags page and use an image with a specific tag, e.g. terrorjack/asterius:200520
. We always push a versioned tag first before we update the latest
tag.
We recommend podman
for running containers from our prebuilt images. The following commands are compatible with docker
as well; simply change podman
to docker
.
The images can be used interactively. Navigate to the project directory and use the following command to start an interactive bash
session, mounting the current directory to /workspace
. In the bash
session we can use tools like ahc-cabal
, ahc-dist
or ahc-link
to compile the Haskell sources.
terrorjack@hostname:/project$ podman run -it --rm -v $(pwd):/workspace -w /workspace terrorjack/asterius
root@hostname:/workspace#
It's also possible to use the images in a non-interactive manner:
terrorjack@hostname:/project$ podman run --rm -v $(pwd):/workspace -w /workspace terrorjack/asterius ahc-link --input-hs example.hs
Check the reference of the docker run
command for details. podman run
accepts most arguments of docker run
and has its own extensions.
podman
-specific tips
When using the prebuilt image with podman
, things should work out of the box with the default configuration. Check the official installation guide on how to install podman
in your environment. It's likely that you'd like to use podman
with a non-root user, in which case make sure to check the official tutorial for non-root users before usage.
docker
-specific tips
When using the prebuilt image with docker
, there's a file permission problem with the default configuration: the default user in the container is root
, and the processes will be run with the host root
users as well. So programs like ahc-link
will create output files owned by root
in the host file system, which is a source of annoyance. Things still work fine as long as you don't mind manually calling chown
to fix the permissions.
The proper solution is remapping the root
user inside the container to the current non-root user. See the docker official userns-remap
guide and this blog post for further explanation.
asterius
locally
Asterius is organized as a stack
project at the moment. The reason is mainly historical: stack
has builtin support for managing different sandboxed GHC installations, and we used to require a custom GHC fork to build, so using stack
has been more convenient.
In principle, building with cabal
should also work, but this hasn't been tested on CI yet. Some additional work is needed (checking in generated .cabal
files, setting up a cabal
project, etc) and PRs are welcome.
In addition to regular GHC dependencies, these dependencies are needed in the local environment:
git
binaryen
(at least version_98
)automake
, autoconf
(required by ahc-boot
)cabal
(at least v3.0.0.0
)node
, npm
(at least v12
)python3
stack
wasi-sdk
(the WASI_SDK_PREFIX
environment variable must point to the installation)After checking out, one needs to run a script to generate the in-tree private GHC API packages required by Asterius.
$ mkdir lib
$ pushd lib
$ ../utils/make-packages.py
$ rm -rf ghc
$ popd
The make-packages.py
script will checkout our custom GHC fork, run hadrian
to generate some autogen files, and generate several Haskell packages in lib
. A run takes ~5min on CI. This script only needs to be run once. After that, Asterius can be built using vanilla GHC.
If it's inconvenient to run make-packages.py
, it's also possible to download the generated packages from the CI artifacts. Check the CI log of a recent commit, and one of the artifacts is named lib
. Download and unzip it in the project root directory.
asterius
After checking out and running make-packages.py
, simply run stack build asterius
to build it.
After the asterius
package is built, run stack exec ahc-boot
to perform booting. This will compile the standard libraries to WebAssembly and populate the asterius
global package database. Some packages are compiled using ahc-cabal
in the boot process, so internet is required at least for the first boot.
asterius
After the booting process completes, it's possible to use stack exec
to call executables of asterius
, e.g. ahc-link
or ahc-cabal
. Although it's possible to use stack install asterius
to install the executables to somewhere in PATH
and directly call them later, this is not recommended, since the asterius
executables rely on certain components in the PATH
set up by stack exec
.
If direnv
is enabled, then the shell session can automatically set up the correct PATH
when navigating into the asterius
project directory. Thus it's possible to directly call ahc-boot
for booting, ahc-link
for compiling, etc.
For trying small examples, it's convenient to put them in the test
directory under the project root directory, since it's a .gitignore
item, so they won't be tracked by git
.
asterius
with Docker Using the prebuilt Docker image
The recommended way of trying asterius
is using our prebuilt Docker image on Docker Hub. The image is updated regularly upon new master
branch commits, and also ships ~2k prebuilt packages from a recent stackage snapshot, so it's convenient to test simple examples which use common dependencies without needing to set up a cabal
project.
To use the image, mount the working directory containing the Haskell source code as a Docker shared volume, then use the ahc-link
program:
username@hostname:~/project$ docker run --rm -it -v $(pwd):/project -w /project terrorjack/asterius
asterius@hostname:/project$ ahc-link --input-hs main.hs
Check the official reference of docker run
to learn more about the command given in the example above. The example opens an interactive bash
session for exploration, but it's also possible to use docker run
to invoke the Asterius compiler on local Haskell source files. Note that podman
can be used instead of docker
here.
The prebuilt Docker image can be reproduced by building from the in-tree Dockerfile
s.
base.Dockerfile
can be used for building the base image. The base image contains an out-of-the-box installation of asterius
, but doesn't come with the additional stackage packages. There's very aggressive trimming logic in base.Dockerfile
to make the image slimmer, so in the resulting base image, there isn't a complete stack
project directory for asterius
, and it's not possible to modify the Haskell logic of asterius
and partially rebuild/reboot it given a base image.
stackage.Dockerfile
can be used for building the image containing additional stackage packages upon the base image. Modify lts.sh
for adding/removing packages to be built into the final image, and ghc-toolkit/boot-libs/cabal.config
for modifying the package version constraints. All the stackage packages are installed into the asterius
global package database, so they can be directly used by ahc-link
, but this shouldn't affect ahc-cabal
for installing other versions of those packages elsewhere.
dev.Dockerfile
is used to build terrorjack/asterius:dev
, which is the image for VSCode remote containers.
Asterius now has preliminary Cabal support. By substituting toolchain executables like ghc
/ghc-pkg
and supplying some other configure options, Cabal can build static libraries and "executables" using Asterius. The "executables" can be quickly converted to node/web artifacts using ahc-dist
.
We also provide ahc-cabal
which is a wrapper for cabal
. ahc-cabal
works with typical nix-style commands like new-update
/new-build
, etc. The legacy commands with v1
prefix may also work.
ahc-link
/ahc-dist
ahc-link
is the frontend program of Asterius. It takes a Haskell Main
module and optionally an ES6 "entry" module as input, then emits a .wasm
WebAssembly binary module and companion JavaScript files, which can then be run in environments like Node.js or browsers.
ahc-dist
works similarly, except it takes the pseudo-executable file generated from ahc-cabal
as input. All command-line arguments are the same as ahc-link
, except ahc-link
takes --input-hs
, while ahc-dist
takes --input-exe
.
Compiling a Haskell file, running the result with node
immediately: ahc-link --input-hs hello.hs --run
Compiling for browsers, bundling JavaScript modules to a single script: ahc-link --input-hs hello.hs --browser --bundle
Compiling a Cabal executable target: ahc-cabal new-install --installdir . hello && ahc-dist --input-exe hello --run
--input-hs ARG
The Haskell Main
module's file path. This option is mandatory; all others are optional. Works only for ahc-link
.
The Main
module may reference other local modules, as well as packages in the asterius
global package database.
--input-exe ARG
The pseudo-executable file path. A pseudo-executable is produced by using ahc-cabal
to compile a Cabal executable target. This works only for ahc-dist
, and is also mandatory.
--input-mjs ARG
The ES6 "entry" module's file path. If not specified, a default entry module will be generated, e.g. xxx.hs
's entry script will be xxx.mjs
. The entry module can either be run by node
, or included in a <script>
tag, depending on the target supplied at link time.
It's possible to override the default behavior by specifying your own entry module. The easiest way to write a custom entry module is to modify the default one:
import * as rts from "./rts.mjs";
import module from "./xxx.wasm.mjs";
import req from "./xxx.req.mjs";
module
.then(m => rts.newAsteriusInstance(Object.assign(req, { module: m })))
.then(i => {
i.exports.main();
});
xxx.wasm.mjs
and xxx.req.mjs
are generated at link-time. xxx.wasm.mjs
exports a default value, which is a Promise
resolving to a WebAssembly.Module
value. xxx.req.mjs
exports the "request object" containing app-specific data required to initialize an instance. After adding the module
field to the request object, the result can be used as the input to newAsteriusInstance
exported by rts.mjs
.
newAsteriusInstance
will eventually resolve to an Asterius instance object. Using the instance object, one can call the exported Haskell functions.
--output-directory ARG
Specifies the output directory. Defaults to the same directory of --input-hs
.
--output-prefix
ARG
Specifies the prefix of the output files. Defaults to the base filename of --input-hs
, so for xxx.hs
, we generate xxx.wasm
, xxx.req.mjs
, etc.
--verbose-err
This flag will enable more verbose runtime error messages. By default, the data segments related to runtime messages and the function name section are stripped in the output WebAssembly module for smaller binary size.
When reporting a runtime error in the asterius
issue tracker, it is recommended to compile and run the example with --verbose-err
so there's more info available.
--no-main
This is useful for compiling and linking a non-Main
module. This will pass -no-hs-main
to GHC when linking, and the usual i.exports.main()
main function won't be available.
Note that the default entry script won't work for such modules, since there isn't an exported main
functions, but it's still possible to export other Haskell functions and call them from JavaScript; do not forget to use --export-function=..
to specify those functions.
--browser
Indicates the output code is targeting the browser environment. By default, the target is Node.js.
Since the runtime contains platform-specific modules, the compiled WebAssembly/JavaScript code only works on a single specific platform. The pseudo-executable generated by ahc
or ahc-cabal
is platform-independent though; it's possible to compile Haskell to a pseudo-executable, and later use ahc-dist
to generate code for different platforms.
--bundle
Instead of generating a bunch of ES6 modules in the target directory, generate a self-contained xxx.js
script, and running xxx.js
has the same effect as running the entry module. Only works for the browser target for now.
--bundle
is backed by webpack
under the hood and performs minification on the bundled JavaScript file. It's likely beneficial since it reduces the total size of scripts and doesn't require multiple requests for fetching them.
--tail-calls
Enable the WebAssembly tail call opcodes. This requires Node.js/Chromium to be called with the --experimental-wasm-return-call
flag.
See the "Using experimental WebAssembly features" section for more details.
--optimize-level=N
Set the optimize level of binaryen
. Valid values are 0
to 4
. The default value is 4
.
Check the relevant source code in binaryen
for the passes enabled for different optimize/shrink levels here.
--shrink-level=N
Set the shrink level of binaryen
. Valid values are 0
to 2
. The default value is 2
.
--ghc-option ARG
Specify additional ghc options. The {-# OPTIONS_GHC #-}
pragma also works.
--run
Runs the output code using node
. Ignored for browser targets.
--debug
Switch on the debug mode. The memory trap will be enabled, which replaces all load/store instructions in WebAssembly with load/store functions in JavaScript, performing aggressive validity checks on the addresses.
--yolo
Switch on the yolo mode. Garbage collection will never occur, instead the storage manager will simply allocate more memory upon heap overflows. This is mainly used for debugging potential gc-related runtime errors.
--gc-threshold=N
Set the gc threshold value to N
MBs. The default value is 64
. The storage manager won't perform actual garbage collection if the size of active heap region is below the threshold.
--no-gc-sections
Do not run dead code elimination.
--export-function ARG
For each foreign export javascript
function f
that will be called, a --export-function=f
link-time flag is mandatory.
--extra-root-symbol ARG
Specify a symbol to be added to the "root symbol set". Root symbols and their transitive dependencies will survive dead code elimination.
--output-ir
Output Wasm IRs of compiled Haskell modules and the resulting module. The IRs aren't intended to be consumed by external tools like binaryen
/wabt
.
--console-history
The stdout
/stderr
of the runtime will preserve the already written content. The UTF-8 decoded history content can be fetched via i.stdio.stdout()
/i.stdio.stderr()
. These functions will also clear the history when called.
This flag can be useful when writing headless Node.js or browser tests and the stdout
/stderr
contents need to be compared against a file.
Asterius implements JSFFI, which enables importing sync/async JavaScript code, and exporting static/dynamic Haskell functions. The JSFFI syntax and semantics is inspired by JSFFI in GHCJS, but there differ in certain ways.
Marshaling data between Haskell and JavaScript Directly marshalable value typesThere are mainly 3 kinds of marshalable value types which can be directly used as function arguments and return values in either JSFFI imports or exports:
Int
, Ptr
, StablePtr
, etc. When the MagicHash
and UnliftedFFITypes
extensions are enabled, some unboxed types like Int#
are also supported.JSVal
type and its newtype
s.Any
type.The JSVal
type is exported by Asterius.Types
. It represents an opaque JavaScript value in the Haskell world; one can use JSFFI imports to obtain JSVal
values, pass them across Haskell/JavaScript, store them in Haskell data structures like ordinary Haskell values. JSVal
s are garbage collected, but it's also possible to call freeJSVal
to explicitly free them in the runtime.
The Any
type in GHC.Exts
represents a boxed Haskell value, which is a managed pointer into the heap. This is only intended to be used by power users.
Just like regular ccall
imports/exports, the result type of javascript
imports/exports can be wrapped in IO
or not.
JSVal
family of types
Other than JSVal
, Asterius.Types
additionally exports these types:
JSArray
JSFunction
JSObject
JSString
JSUint8Array
They are newtype
s of JSVal
and can be directly used as argument or result types as well. The runtime doesn't perform type-checking at the JavaScript side, e.g. it won't check if typeof $1 === "string"
when $1
is declared as a JSString
. It's up to the users to guarantee the runtime invariants about such JSVal
wrapper types.
User-defined newtype
s of JSVal
can also be used as marshalable value types, as long as the newtype
constructor is available in scope.
Given the ability of passing simple value types, one can implement their own utilities for passing a piece of structured data either from JavaScript to Haskell, or vice versa.
To build a Haskell data structure from a JavaScript value, usually we write a builder function which recursively traverses the substructure of the JavaScript value (sequence, tree, etc) and build up the Haskell structure, passing one cell at a time. Similarly, to pass a Haskell data structure to JavaScript, we traverse the Haskell data structure and build up the JavaScript value.
The Asterius standard library provides functions for common marshaling purposes:
import Asterius.Aeson
import Asterius.ByteString
import Asterius.Text
import Asterius.Types
fromJSArray :: JSArray -> [JSVal]
toJSArray :: [JSVal] -> JSArray
fromJSString :: JSString -> String
toJSString :: String -> JSString
byteStringFromJSUint8Array :: JSUint8Array -> ByteString
byteStringToJSUint8Array :: ByteString -> JSUint8Array
textFromJSString :: JSString -> Text
textToJSString :: Text -> JSString
jsonToJSVal :: ToJSON a => a -> JSVal
jsonFromJSVal :: FromJSON a => JSVal -> Either String a
jsonFromJSVal' :: FromJSON a => JSVal -> a
The 64-bit integer precision problem
Keep in mind that when passing 64-bit integers via Int
, Word
, etc, precision can be lost, since they're represented by number
s on the JavaScript side. In the future, we may consider using bigint
s instead of number
s as the JavaScript representations of 64-bit integers to solve this issue.
import Asterius.Types
foreign import javascript unsafe "new Date()" current_time :: IO JSVal
foreign import javascript interruptible "fetch($1)" fetch :: JSString -> IO JSVal
The source text of foreign import javascript
should be a single valid JavaScript expression, using $n
to refer to the n
-th argument (starting from 1
). It's possible to use IIFE(Immediately Invoked Function Expression) in the source text, so more advanced JavaScript constructs can be used.
The safety level in a foreign import javascript
declaration indicates whether the JavaScript logic is asynchronous. When omitted, the default is unsafe
, which means the JavaScript code will return the result synchronously. When calling an unsafe
import, the whole runtime blocks until the result is returned from JavaScript.
The safe
and interruptible
levels mean the JavaScript code should return a Promise
which later resolves with the result. The current thread will be suspended when such an import function is called, and resumed when the Promise
resolves or rejects. Other threads may continue execution when a thread is blocked by a call to an async import.
When calling a JSFFI import function, The JavaScript code may synchronously throw exceptions or reject the Promise
with errors. They are wrapped as JSException
s and thrown in the calling thread, and the JSException
s can be handled like regular synchronous exceptions in Haskell. JSException
is also exported by Asterius.Types
; it contains both a JSVal
reference to the original JavaScript exception/rejection value, and a String
representation of the error, possibly including a JavaScript stack trace.
In the source text of a foreign import javascript
declaration, one can access everything in the global scope and the function arguments. Additionally, there is an __asterius_jsffi
binding which represents the Asterius instance object. __asterius_jsffi
exposes certain interfaces for power users, e.g. __asterius_jsffi.exposeMemory()
which exposes a memory region as a JavaScript typed array. The interfaces are largely undocumented and not likely to be useful to regular users.
There is one usage of __asterius_jsffi
which may be useful to regular users though. Say that we'd like the JSFFI import code to call some 3rd-party library code, but we don't want to pollute the global scope; we can assign the library functions as additional fields of the Asterius instance object after it's returned by newAsteriusInstance()
, then access them using __asterius_jsffi
in the JSFFI import code.
foreign export javascript "mult_hs" (*) :: Int -> Int -> Int
The foreign export javascript
syntax can be used for exporting a static top-level Haskell function to JavaScript. The source text is the export function name, which must be globally unique. The supported export function types are the same with JSFFI imports.
For the exported functions we need to call in JavaScript, at link-time, each exported function needs an additional --export-function
flag to be passed to ahc-link
/ahc-dist
, e.g. --export-function=mult_hs
.
In JavaScript, after newAsteriusInstance()
returns the Asterius instance object, one can access the exported functions in the exports
field:
const r = await i.exports.mult_hs(6, 7);
Note that all exported Haskell functions are async JavaScript functions. The returned Promise
resolves with the result when the thread successfully returns; otherwise it may reject with a JavaScript string, which is the serialized form of the Haskell exception if present.
It's safe to call a JSFFI export function multiple times, or call another JSFFI export function before a previous call resolves/rejects. The export functions can be passed around as first-class JavaScript values, called as ordinary JavaScript functions or indirectly as JavaScript callbacks. They can even be imported back to Haskell as JSVal
s and called in Haskell.
import Asterius.Types
foreign import javascript "wrapper" makeCallback :: (JSVal -> IO ()) -> IO JSFunction
foreign import javascript "wrapper oneshot" makeOneshotCallback :: (JSVal -> IO ()) -> IO JSFunction
freeHaskellCallback :: JSFunction -> IO ()
The foreign import javascript "wrapper"
syntax can be used for exporting a Haskell function closure to a JavaScript function dynamically. The type signature must be of the form Fun -> IO JSVal
, where Fun
represents a marshalable JSFFI function type in either JSFFI imports or static exports, and the result can be JSVal
or its newtype
.
After declaring the "wrapper" function, one can pass a Haskell function closure to it and obtain the JSVal
reference of the exported JavaScript function. The exported function can be used in the same way as the JSFFI static exports.
When a JSFFI dynamic export is no longer useful, call freeHaskellCallback
to free it. The JSVal
reference of the JavaScript callback as well as the StablePtr
of the Haskell closure will be freed.
Sometimes, we expect a JSFFI dynamic export to be one-shot, being called for only once. For such one-shot exports, use foreign import javascript "wrapper oneshot"
. The runtime will automatically free the resources once the exported JavaScript is invoked, and there'll be no need to manually call freeHaskellCallback
for one-shot exports.
We added hooks to these iserv
-related functions:
startIServ
stopIServ
iservCall
readIServ
writeIServ
The hook of hscCompileCoreExpr
is also used. The implementation of the hooks are in Asterius.GHCi.Internals
Normally, startIServ
and stopIServ
starts/stops the current iserv
process. We don't use the normal iserv
library for iserv
though; we use inline-js-core
to start a node process. inline-js-core
has its own mechanism of message passing between host/node, which is used for sending JavaScript code to node for execution and getting results. In the case of TH, the linked JavaScript and WebAssembly code is sent. Additionally, we create POSIX pipes and pass the file descriptors as environment variables to the sent code; so most TH messages are still passed via the pipes, like normal iserv
processes.
The iservCall
function is used for sending a Message
to iserv
and synchronously getting the result. The sent messages are related to linking, like loading archives and objects. Normally, linking is handled by the iserv
process, since it's linked with GHC's own runtime linker. In our case, porting GHC's runtime linker to WebAssembly is going to be a huge project, so we still perform TH linking in the host ahc
process. The linking messages aren't sent to node at all; using the hooked iservCall
, we maintain our own in-memory linker state which records information like the loaded archives and objects.
When splices are executed, GHC first emits a RunTH
message, then repeatedly queries the response message from iserv
; if it's a RunTHDone
, then the dust settles and GHC reads the execution result. The response message may also be a query to GHC, then GHC sends back the query result and repeat the loop. In our case, we don't send the RunTH
message itself to node; RunTH
indicates execution has begun, so we perform linking, and use inline-js-core
to load the linked JavaScript and WebAssembly code, then create and initialize the Asterius instance object. The splice's closure address is known at link time, so we can apply the TH runner's function closure to the splice closure, and kick off evaluation from there. The TH runner function creates a fresh IORef QState
, a Pipe
from the passed in pipe file descriptors, and uses ghci
library's own runTH
function to run the splice. During execution, the Quasi
class methods may be called, and on the node side, they are turned to THMessage
s sent back to the host via the Pipe
, and the responses are then fetched.
Our function signatures of readIServ
and writeIServ
are modified. Normal GHC simply uses Get
and Put
in the binary
library for reading/writing via the Pipe
, but we simply read/write a polymorphic type variable a
, with Binary
and Typeable
constraints. Having Binary
constraint allows fetching the needed get
and put
functions, and Typeable
allows us to inspect the message pre-serialization. This is important, since we need to catch RunTH
or RunModFinalizer
messages. As mentioned before, these messages aren't sent to node, and we have special logic to handle them.
As for hscCompileCoreExpr
: it's used for compiling the CoreExpr
of a splice and getting the resulting RemoteHValue
. We don't support GHC bytecode, so we overload it and go through the regular pipeline, compile it down to Cmm, then WebAssembly, finally performing linking, using the closures of the TH runner function and the splice as "root symbols". The resulting RemoteHValue
is not "remote" though; it's simply the static address of the splice's closure, and the TH runner function will need to encapsulate it as a RemoteRef
before feeding to runTH
.
TH WIP branch: asterius-TH
GitHub Project with relevant issues
Invoking RTS API in JavaScriptFor the brave souls who prefer to play with raw pointers instead of syntactic sugar, it's possible to invoke RTS API directly in JavaScript. This grants us the ability to:
Here is a simple example. Suppose we have a Main.fact
function:
fact :: Int -> Int
fact 0 = 1
fact n = n * fact (n - 1)
The first step is ensuring fact
is actually contained in the final WebAssembly binary produced by ahc-link
. ahc-link
performs aggressive dead-code elimination (or more precisely, live-code discovery) by starting from a set of "root symbols" (usually Main_main_closure
which corresponds to Main.main
), repeatedly traversing ASTs and including any discovered symbols. So if Main.main
does not have a transitive dependency on fact
, fact
won't be included into the binary. In order to include fact
, either use it in some way in main
, or supply --extra-root-symbol=Main_fact_closure
flag to ahc-link
when compiling.
The next step is locating the pointer of fact
. The "Asterius instance" type we mentioned before contains two "symbol map" fields: staticsSymbolMap
maps static data symbols to linear memory absolute addresses, and functionSymbolMap
maps function symbols to WebAssembly function table indices. In this case, we can use i.staticsSymbolMap.Main_fact_closure
as the pointer value of Main_fact_closure
. For a Haskell top-level function, there're also pointers to the info table/entry function, but we don't need those two in this example.
Since we'd like to call fact
, we need to apply it to an argument, build a thunk representing the result, then evaluate the thunk to WHNF and retrieve the result. Assuming we're passing --asterius-instance-callback=i=>{ ... }
to ahc-link
, in the callback body, we can use RTS API like this:
const argument = i.exports.rts_mkInt(5);
const thunk = i.exports.rts_apply(i.staticsSymbolMap.Main_fact_closure, argument);
const tid = i.exports.rts_eval(thunk);
console.log(i.exports.rts_getInt(i.exports.getTSOret(tid)));
A line-by-line explanation follows:
Assuming we'd like to calculate fact 5
, we need to build an Int
object which value is 5
. We can't directly pass the JavaScript 5
, instead we should call rts_mkInt
, which properly allocates a heap object and sets up the info pointer of an Int
value. When we need to pass a value of basic type (e.g. Int
, StablePtr
, etc), we should always call rts_mk*
and use the returned pointers to the allocated heap object.
Then we can apply fact
to 5
by using rts_apply
. It builds a thunk without triggering evaluation. If we are dealing with a curried multiple-arguments function, we should chain rts_apply
repeatedly until we get a thunk representing the final result.
Finally, we call rts_eval
, which enters the runtime and perform all the evaluation for us. There are different types of evaluation functions:
rts_eval
evaluates a thunk of type a
to WHNF.rts_evalIO
evaluates the result of IO a
to WHNF.rts_evalLazyIO
evaluates IO a
, without forcing the result to WHNF. It is also the default evaluator used by the runtime to run Main.main
.All rts_eval*
functions initiate a new Haskell thread for evaluation, and they return a thread ID. The thread ID is useful for inspecting whether or not evaluation succeeded and what the result is.
If we need to retrieve the result back to JavaScript, we must pick an evaluator function which forces the result to WHNF. The rts_get*
functions assume the objects are evaluated and won't trigger evaluation.
Assuming we stored the thread ID to tid
, we can use getTSOret(tid)
to retrieve the result. The result is always a pointer to the Haskell heap, so additionally we need to use rts_getInt
to retrieve the unboxed Int
content to JavaScript.
Most users probably don't need to use RTS API manually, since the foreign import
/export
syntactic sugar and the makeHaskellCallback
interface should be sufficient for typical use cases of Haskell/JavaScript interaction. Though it won't hurt to know what is hidden beneath the syntactic sugar, foreign import
/export
is implemented by automatically generating stub WebAssembly functions which calls RTS API for you.
This section explains various IR types in Asterius, and hopefully presents a clear picture of how information flows from Haskell to WebAssembly. (There's a similar section in jsffi.md
which explains implementation details of JSFFI)
Everything starts from Cmm, or more specifically, "raw" Cmm which satisfies:
All calls are tail calls, parameters are passed by global registers like R1 or on the stack.
All info tables are converted to binary data segments.
Check Cmm
module in ghc
package to get started on Cmm.
Asterius obtains in-memory raw Cmm via:
cmmToRawCmmHook
in our custom GHC fork. This allow us to lay our fingers on Cmm generated by either compiling Haskell modules, or .cmm
files (which are in rts
)
There is some abstraction in ghc-toolkit
, the compiler logic is actually in the Compiler
datatype as some callbacks, and ghc-toolkit
converts them to hooks, frontend plugins and ghc
executable wrappers.
There is one minor annoyance with the Cmm types in GHC (or any other GHC IR type): it's very hard to serialize/deserialize them without setting up complicated contexts related to package databases, etc. To experiment with new backends, it's reasonable to marshal to a custom serializable IR first.
Pre-linking expression IRWe then marshal raw Cmm to an expression IR defined in Asterius.Types
. Each compilation unit (Haskell module or .cmm
file) maps to one AsteriusModule
, and each AsteriusModule
is serialized to a .asterius_o
object file which will be deserialized at link time. Since we serialize/deserialize a structured expression IR faithfully, it's possible to perform aggressive LTO by traversing/rewriting IR at link time, and that's what we're doing right now.
The expression IR is mostly a Haskell modeling of a subset of binaryen
's expression IR, with some additions:
Unresolved
related variants, which allow us to use a symbol as an expression. At link time, the symbols are re-written to absolute addresses.
Unresolved locals/globals. At link time, unresolved locals are laid out to Wasm locals, and unresolved globals (which are really just Cmm global regs) become fields in the global Capability's StgRegTable
.
EmitErrorMessage
, as a placeholder of emitting a string error message then trapping. At link time, such error messages are collected into an "error message pool", and the Wasm code is just "calling some error message reporting function with an array index".
Null
. We're civilized, educated functional programmers and should really be using Maybe Expression
in some fields instead of adding a Null
constructor, but this is just handy. Blame me.
It's possible to encounter things we can't handle in Cmm (unsupported primops, etc). So AsteriusModule
also contains compile-time error messages when something isn't supported, but the errors are not reported, instead they are deferred to runtime error messages. (Ideally link-time, but it turns out to be hard)
The symbols are simply converted to Z-encoded strings that also contain module prefixes, and they are assumed to be unique across different compilation units.
The storeThere's an AsteriusStore
type in Asterius.Types
. It's an immutable data structure that maps symbols to underlying entities in the expression IR for every single module, and is a critical component of the linker.
Modeling the store as a self-contained data structure makes it pleasant to write linker logic, at the cost of exploding RAM usage. So we implemented a poor man's KV store in Asterius.Store
which performs lazy-loading of modules: when initializing the store, we only load the symbols, but not the actual modules; only when a module is "requested" for the first time, we perform deserialization for that module.
AsteriusStore
supports merging. It's a handy operation, since we can first initialize a "global" store that represents the standard libraries, then make another store based on compiling user input, simply merge the two and we can start linking from the output store.
At link time, we take AsteriusStore
which contains everything (standard libraries and user input code), then performs live-code discovery: starting from a "root symbol set" (something like Main_main_closure
), iteratively fetch the entity from the store, traverse the AST and collect new symbols. When we reach a fixpoint, that fixpoint is the outcome of dependency analysis, representing a self-contained Wasm module.
We then do some rewriting work on the self contained module: making symbol tables, rewriting symbols to absolute addresses, using our own relooper to convert from control-flow graphs to structured control flow, etc. Most of the logic is in Asterius.Resolve
.
The output of linker is Module
. It differs from AsteriusModule
, and although it shares quite some datatypes with AsteriusModule
(for example, Expression
), it guarantees that some variants will not appear (for example, Unresolved*
). A Module
is ready to be fed to a backend which emits real Wasm binary code.
There are some useful linker byproducts. For example, there's LinkReport
which contains mappings from symbols to addresses which will be lost in Wasm binary code, but is still useful for debugging.
Once we have a Module
(which is essentially just Haskell modeling of binaryen C API), we can invoke binaryen to validate it and generate Wasm binary code. The low-level bindings are maintained in the binaryen
package, and Asterius.Marshal
contains the logic to call the imported functions to do actual work.
We can also convert Module
to IR types of wasm-toolkit
, which is our native Haskell Wasm engine. It's now the default backend of ahc-link
, but the binaryen backend can still be chosen by ahc-link --binaryen
.
To make it actually run in Node.js/Chrome, we need two pieces of JavaScript code:
Common runtime which can be reused across different Asterius compiled modules. It's in asterius/rts/rts.js
.
Stub code which contains specific information like error messages, etc.
The linker generates stub script along with Wasm binary code, and concats the runtime and the stub script to a self-contained JavaScript file which can be run or embedded. It's possible to specify JavaScript "target" to either Node.js or Chrome via ahc-link
flags.
There is a runtime debugging mode which can be enabled by the --debug
flag for ahc-link
. When enabled, the compiler inserts "tracing" instructions in the following places:
SetLocal
when the local type is I64
I64
The tracing messages are quite helpful in observing control flow transfers and memory operations. Remember to also use the --output-link-report
flag to dump the linking report, which contains mapping from data/function symbols to addresses.
The runtime debugging mode also enables a "memory trap" which intercepts every memory load/store instruction and checks if the address is null pointer or other uninitialized regions of the linear memory. The program immediately aborts if an invalid address is encountered. (When debugging mode is switched off, program continues execution and the rest of control flow is all undefined behavior!)
Virtual address spacesRemember that we're compiling to wasm32
which has a 32-bit address space, but the host GHC is actually 64-bits, so all pointers in Asterius are 64-bits, and upon load
/store
/call_indirect
, we truncate the 64-bit pointer, using only the lower 32-bits for indexing.
The higher 32-bits of pointers are idle tag bits at our disposal, so, we implemented simple virtual address spaces. The linker/runtime is aware of the distinction between:
The physical address, which is either an i32
index of the linear memory for data, or an i32
index of the table for functions.
The logical address, which is the i64
pointer value we're passing around.
All access to the memory/table is achieved by using the logical address. The access operations are accompanied by a mapping operation which translates a logical address to a physical one. Currently it's just a truncate, but in the future we may get a more feature-complete mmap
/munmap
implementation, and some additional computation may occur when address translation is done.
We chose two magic numbers (in Asterius.Internals.MagicNumber
) as the tag bits for data/function pointers. The numbers are chosen so that when applied, the logical address does not exceed JavaScript's safe integer limit.
When we emit debug log entries, we may encounter various i64
values. We examine the higher 32-bits, and if it matches the pointer tag bits, we do a lookup in the data/function symbol table, and if there's a hit, we output the symbol along the value. This spares us the pain to keep a lot of symbol/address mappings in our working memory when examining the debug logs. Some false positives (e.g. some random intermediate i64
value in a Haskell computation accidentally collides with a logical address) may exist in theory, but the probability should be very low.
Note that for consistency between vanilla/debug mode, the virtual address spaces are in effect even in vanilla mode. This won't add extra overhead, since the truncate instruction for 64-bit addresses has been present since the beginning.
Complete list of emitted debugging log entriesAssertions: some hand-written WebAssembly functions in Asterius.Builtins
contain assertions which are only active in debugging mode. Failure of an assertion causes a string error message to be printed, and the whole execution flow aborted.
Memory traps: In Asterius.MemoryTrap
, we implement a rewriting pass which rewrites all load/store instructions into invocations of load/store wrapper functions. The wrapper functions are defined in Asterius.Builtins
, which checks the address and traps if it's an invalid one (null pointer, uninitialized region, etc).
Control-flow: In Asterius.Tracing
, we implement a rewriting pass on functions (which are later invoked at link-time in Asterius.Resolve
), which emits messages when:
There are multiple ways to dump IRs:
Via GHC flags: GHC flags like -ddump-to-file -ddump-cmm-raw
dump pretty-printed GHC IRs to files.
Via environment variable: Set the ASTERIUS_DEBUG
environment variable, then during booting, a number of IRs (mainly raw Cmm in its AST form, instead of pretty-printed form) will be dumped.
Via ahc-link
flag: Use ahc-link --output-ir
to dump IRs when compiling user code.
The asterius
project is hosted at GitHub. The monorepo contains several packages:
asterius
. This is the central package of the asterius
compiler.
binaryen
. It contains the latest source code of the C++ library binaryen
in tree, and provides complete raw bindings to its C API.
ghc-toolkit
. It provides a framework for implementing Haskell-to-X compilers by retrieving ghc
's various types of in-memory intermediate representations. It also contains the latest source code of ghc-prim
/integer-gmp
/integer-simple
/base
in tree.
wasm-toolkit
. It implements the WebAssembly AST and binary encoder/decoder in Haskell, and is now the default backend for generating WebAssembly binary code.
The asterius
package provides an ahc
executable which is a drop-in replacement of ghc
to be used with Setup configure
. ahc
redirects all arguments to the real ghc
most of the time, but when it's invoked with the --make
major mode, it invokes ghc
with its frontend plugin. This is inspired by Edward Yang's How to integrate GHC API programs with Cabal.
Based on ghc-toolkit
, asterius
implements a ghc
frontend plugin which translates Cmm to binaryen
IR. The serialized binaryen
IR can then be loaded and linked to a WebAssembly binary (not implemented yet). The normal compilation pipeline which generates native machine code is not affected.
In order for asterius
to support non-trivial Haskell programs (that is, at least most things in Prelude
), it needs to run the compilation process for base
and its dependent packages. This process is known as "booting".
The asterius
package provides an ahc-boot
test suite which tests booting by compiling the wired-in packages provided by ghc-toolkit
and using ahc
to replace ghc
when configuring. This is inspired by Joachim Breitner's veggies
.
In Asterius.Builtins
, there are WebAssembly shims which serve as our runtime. We choose to write WebAssembly code in Haskell, using Haskell as our familiar meta-language.
As of now, there are two ways of writing WebAssembly code in Haskell. The first way is directly manipulating AST types as specified in Asterius.Types
. Those types are pretty bare-metal and maps closely to binaryen IR. Simply write some code to generate an AsteriusFunction
, and ensure the function and its symbol is present in the store when linking starts. It will eventually be bundled into output WebAssembly binary file.
Directly using Asterius.Types
is not a pleasant experience, it's basically a DDoS on one's working memory, since the developer needs to keep a lot of things in mind: parameter/local ids, block/loop labels, etc. Also, the resulting Haskell code is pretty verbose, littered with syntactic noise (e.g. tons of list concats when constructing a block)
We now provide an EDSL in Asterius.EDSL
to construct an AsteriusFunction
. Its core type is EDSL a
, and can be composed with a Monad
or Monoid
interface. Most builtin functions in Asterius.Builtins
are already refactored to use this EDSL. Typical usages:
"Allocate" a parameter/local. Use param
or local
to obtain an immutable Expression
which corresponds to the value of a new parameter/local. There are also mutable variants.
An opaque LVal
type is provided to uniformly deal with local reads/assignments and memory loads/stores. Once an LVal
is instantiated, it can be used to read an Expression
in the pure world, or set an Expression
in the EDSL
monad.
Several side-effecting instructions can simply be composed with the monadic/monoidal interface, without the need to explicitly construct an anonymous block.
When we need named blocks/loops with branching instructions inside, use the block
/loop
combinators which has the type (Label -> EDSL ()) -> EDSL ()
. Inside the passed in continuation, we can use break'
to perform branching. The Label
type is also opaque and cannot be inspected, the only thing we know is that it's scope-checked just like any ordinary Haskell value, so it's impossible to accidentally branch to an "inner" label.
The EDSL only checks for scope safety, so we don't mess up different locals or jump to non-existent labels. Type-safety is not guaranteed (binaryen validator checks for it anyway). Underneath it's just a shallow embedded DSL implemented with a plain old state monad. Some people call it the "remote monad design pattern".
WebAssembly as a Haskell compilation targetThere are a few issues to address when compiling Cmm to WebAssembly.
Implementing Haskell Stack/HeapThe Haskell runtime maintains a TSO(Thread State Object) for each Haskell thread, and each TSO contains a separate stack for the STG machine. The WebAssembly platform has its own "stack" concept though; the execution of WebAssembly is based on a stack machine model, where instructions consume operands on the stack and push new values onto it.
We use the linear memory to simulate Haskell stack/heap. Popping/pushing the Haskell stack only involves loading/storing on the linear memory. Heap allocation only involves bumping the heap pointer. Running out of space will trigger a WebAssembly trap, instead of doing GC.
All discussions in the documentation use the term "stack" for the Haskell stack, unless explicitly stated otherwise.
Implementing STG machine registersThe Haskell runtime makes use of "virtual registers" like Sp, Hp or R1 to implement the STG machine. The NCG(Native Code Generator) tries to map some of the virtual registers to real registers when generating assembly code. However, WebAssembly doesn't have language constructs that map to real registers, so we simply implement Cmm local registers as WebAssembly locals, and global registers as fields of StgRegTable
.
WebAssembly currently enforces structured control flow, which prohibits arbitrary branching. Also, explicit tail calls are missing.
The Cmm control flow mainly involves two forms of branching: in-function or cross-function. Each function consists of a map from hoopl
labels to basic blocks and an entry label. Branching happens at the end of each basic block.
In-function branching is relatively easier to handle. binaryen
provides a "relooper" which can recover WebAssembly instructions with structured control flow from a control-flow graph. Note that we're using our own relooper though, see issue #22 for relevant discussion.
Cross-function branching (CmmCall
) is tricky. WebAssembly lacks explicit tail calls, and the relooper can't be easily used in this case since there's a computed goto, and potential targets include all Cmm blocks involved in linking. There are multiple possible ways to handle this situation:
Collect all Cmm blocks into one function, additionally add a "dispatcher" block. All CmmCall
s save the callee to a register and branch to the "dispatcher" block, and the "dispatcher" uses br_table
or a binary decision tree to branch to the entry block of callee.
One WebAssembly function for one CmmProc
, and upon CmmCall
the function returns the function id of callee. A mini-interpreter function at the top level repeatedly invoke the functions using call_indirect
. This approach is actually used by the unregisterised mode of ghc
.
We're using the latter approach: every CmmProc
marshals to one WebAssembly function. This choice is tightly coupled with some other functionalities (e.g. debug mode) and it'll take quite some effort to switch away.
When producing a WebAssembly binary, we need to map CLabel
s to the precise linear memory locations for CmmStatics
or the precise table ids for CmmProc
s. They are unknown when compiling individual modules, so binaryen
is invoked only when linking, and during compiling we only convert CLabel
s to some serializable representation.
Currently WebAssembly community has a proposal for linkable object format, and it's prototyped by lld
. We'll probably turn to that format and use lld
some day, but right now we'll simply stick to our own format for simplicity.
Although wasm64
is scheduled, currently only wasm32
is implemented. However, we are running 64-bit ghc
, and there are several places which need extra care:
wasm32
use uint32
when indexing into the linear memory.CmmSwitch
labels are 64-bit. CmmCondBranch
also checks a 64-bit condition. br_if
/br_table
operates on uint32
.i32
/i64
is supported by wasm32
value types, but in Cmm we also need arithmetic on 8-bit/16-bit integers.We insert instructions for converting between 32/64-bits in the codegen. The binaryen
validator also helps checking bit lengths.
As for booleans: there's no native boolean type in either WebAssembly or Cmm. As a convention we use uint32
.
The WebAssembly linear memory has a hard-coded page size of 64KB. There are several places which operate in units of pages rather than raw bytes:
CurrentMemory
/GrowMemory
Memory
component of a Module
When performing final linking, we layout static data segments to the linear memory. We ensure the memory size is always divisible by MBLOCK_SIZE
, so it's easy to allocate new mega blocks and calculate required page count.
The first 8 bytes of linear memory (from 0x0 to 0x7) are uninitialized. 0x0 is treated as null pointer, and loading/storing on null pointer or other uninitialized regions is prohibited. In debug mode the program immediately aborts.
Using experimental WebAssembly featuresBy default, Asterius only emits code that uses WebAssembly MVP features. There are flags to make use of WebAssembly experimental features:
--tail-calls
: Emits tail call opcodes for Cmm function calls; overrides the default trampoline approach. Only supported by the wasm-toolkit
backend at the moment.--debug
: Uses i64 BigInt integration for passing i64 values between js/wasm.The above features require specific flags to switch on in V8. They are known to work in latest Node.js 12.x versions, and we test them on CI.
The V8 team maintains a Node.js 13.x build which integrates V8 trunk, described here. It's possible to use that build to evaluate experimental WebAssembly features; we provide a script which unzips the latest test-passing build to the current directory, so it's possible to use the node
binary for testing bleeding-edge Wasm features in V8.
We are keeping an eye on the development of experimental WebAssembly features. Here is a list of V8 tracking issues of the features we are interested in. Some are already available in recent Node.js or Chromium releases.
We recommend using VSCode Remote Containers to reproduce the very same dev environment used by our core team members. The steps to set up the dev environment are:
1.45
) and its remote extensionpodman
, and make sure the podman
command works with the current userdocker
symlink which points to podman
, according to VSCode announcement of podman
supportdocker pull terrorjack/asterius:dev
Opening the repo with remote containers for the first time will take some time, since it runs the build script to build asterius
and perform booting. Later re-opening will be near instant, since it reuses the previous container.
The dev image shall work with docker
too if the userns-remap
related settings are correctly set up. Check the documentation section for relevant explanation; when using docker
with default settings, there is a file permission issue when mounting your local filesystem into the prebuilt container images.
direnv
If direnv
is enabled, the PATH
of the current shell session will be extended to include the locations of Asterius executables. This means it's possible to run ahc-link ..
instead of stack exec ahc-link -- ..
.
ghcid
A known-to-work workflow of hacking Asterius is using ghcid
. We also include an example .ghcid
file, so running ghcid
at the project root directory shall work out of the box.
Some notes regarding the usage of ghcid
:
utils/ghcid.sh
script first. Before committing changes in the Haskell codebase, it would be nice to run stack build --test --no-run-tests
to make sure all executables are not broken by lib changes.As described in the building guide, stack build
only builds the Asterius compiler itself; additionally we need to run stack exec ahc-boot
to run the compiler on the boot libs. This process is typically only needed once, but there are cases when it needs to be re-run:
ghc-toolkit/boot-libs
are modified.Asterius.Types
module is modified, so the IR types have changed.Asterius.CodeGen
module is modified and you're sure different code will be generated when compiling the same Haskell/Cmm files.Most other modifications in the Asterius lib/exes won't need a reboot. Specifically:
Asterius.Builtins
modifications don't impact the boot cache. The builtin module is generated on the fly with every linker invocation.When rebooting, run utils/reboot.sh
in the project root directory, so that we can ensure the booting is used with the up-to-date version of asterius
and the boot lib sources.
The ahc-boot
process is configurable via these environment variables:
ASTERIUS_CONFIGURE_OPTIONS
ASTERIUS_BUILD_OPTIONS
ASTERIUS_INSTALL_OPTIONS
Use stack-profile.yaml
to overwrite stack.yaml
, and then run utils/reboot.sh
to kick off the rebooting process. This will be quite slow due to the nature of profiled builds; all libraries will be rebuilt with the profiled flavor. Better to perform a profiled build in a standalone git tree.
Once the profiled build is complete, it's possible to use RTS flags to obtain profile data when compiling Haskell sources. At runtime there are two ways to pass RTS flags to a Haskell executable:
GHCRTS
environment variable+RTS ... -RTS
command line argumentsAlways use GHCRTS
when running programs like ahc-link
, since those programs can spawn other processes (e.g. ahc-ld
), and we're often interested in the profile data of all Asterius executables. The GHCRTS
environment variable can propagate to all processes.
See the relevant section in the GHC user guide for more information on profiling Haskell apps. There are also some third party applications useful for analyzing the profiling data, e.g. eventlog2html
, ghc-prof-flamegraph
.
Fow now, a major problem with the profiled build is: it seems to emit dysfunctional code which doesn't work. Consequently, this affects the TH runner, so any dependencies relying on TH isn't supported by the profiled build.
Measuring time/allocation differencesWhen working on a performance-related PR, we often want to measure the time/allocation differences it introduced. The workflow is roughly:
master
branch, one from the PR's branch.ahc-link
in the built images on the example program below, setting the necessary GHCRTS
to generate the profile reports. The code should be put in two standalone directories, otherwise the .hi
/.o
files may conflict or be accidentally reused.The profiled Docker images contain pre-compiled Cabal
. And the example program we use to stress-test the linker is:
import Distribution.Simple
main = defaultMain
We choose this program since it's classic, and although being short, it pulls in a lot of data segments and functions, so it exposes the linker's performance bottleneck pretty well.
Adding a test caseTo add a test case, it is best to replicate what has been done for an existing testcase.
git grep bytearraymini
should show all the places where the test case bytearraymini
has been used. Replicating the same files for a new test case should "just work".In Asterius we use ormolu
for formatting Haskell and prettier
for formatting JavaScript. Though not all parts of the codebase are currently formatted this way, it is recommended that when you submit a PR you run the respective formatters on the changed parts of the code, so that gradually the whole codebase is formatted uniformly.
build01
This section is for Tweagers only.
First, set up your build01
account according to the handbook. Don't forget to add the groups = ["docker"]
line in your PR.
Once the PR is merged, you can SSH into a NixOS non-privileged user. You can check out the asterius
repo, set up your favorite text editor, make edits and push to the remote.
To build/boot and run tests, a dev container needs to be built first. The dev.rootless.Dockerfile
can be used to build an image which has the same UID with your user and doesn't mess up local file permissions:
$ docker build --build-arg UID=$(id -u) --file dev.rootless.Dockerfile --tag my_dev_image .
Building the image can take around 10min. After my_dev_image
is built, a dev container can be started:
$ docker run -it -v $(pwd):/asterius -w /asterius --name my_dev_container my_dev_image
The command above will start my_dev_container
from my_dev_image
, mount the current project directory to /asterius
and drop into the bash prompt, from where you can run build commands.
After exit
ing the current bash prompt of my_dev_container
, it can be restarted later:
$ docker start -ai my_dev_container
If you're using VSCode remote SSH, the first attempt to set up will fail. A known to work workaround is available at https://github.com/microsoft/vscode-remote-release/issues/648#issuecomment-503148523.
Reading listHere is a brief list of relevant readings about GHC internals and WebAssembly suited for newcomers.
GHC documentation regarding the GHC API: a nice reading for anyone looking forward to using the GHC API.
GHC commentary: a wiki containing lots of additional knowledge regarding GHC's implementation. Keep in mind some content is out-dated though. Some useful entries regarding this project:
runPhaseHook
) to replace the default pipeline with our own, to enable manipulation of in-memory IRs.Understanding the Stack: A blog post explaining how generated code works at the assembly level. Also, its sequel Understanding the RealWorld
The WebAssembly spec: a useful reference regarding what's already present in WebAssembly.
The binaryen
C API: binaryen
handles WebAssembly code generation. There are a few differences regarding binaryen
AST and WebAssembly AST, the most notable ones:
binaryen
uses a recursive BinaryenExpression
which is side-effectful. The original WebAssembly standard instead uses a stack-based model and manipulates the operand stack with instructions.
binaryen
contains a "Relooper" which can recover high-level structured control flow from a CFG. However the relooper doesn't handle jumping to unknown labels (aka computed goto), so we don't use it to handle tail calls.
The following entries are papers which consume much more time to read, but still quite useful for newcomers:
Making a fast curry: push/enter vs. eval/apply for higher-order languages: A thorough explanation of what is STG and how it is implemented (via two different groups of rewrite rules, also with real benchmarks)
The STG runtime system (revised): Includes some details on the runtime system and worth a read. It's a myth why it's not merged with the commentary though. Install a TeX distribution like TeX Live or use a service like Overleaf to compile the .tex
file to .pdf
before reading.
The GHC storage manager: Similar to above.
Bringing the Web up to Speed with WebAssembly: The PLDI'17 paper about WebAssembly. Contains overview of WebAssembly design rationales and rules of small-step operational semantics.
Finally, the GHC codebase itself is also a must-read, but since it's huge we only need to check relevant parts when unsure about its behavior. Tips on reading GHC code:
There are a lot of insightful and up-to-date comments which all begin with "Notes on xxx". It's a pity the notes are neither collected into the sphinx-generated documentation or into the haddock docs of GHC API.
When writing build.mk
for compiling GHC, add HADDOCK_DOCS = YES
to ensure building haddock docs of GHC API, and EXTRA_HADDOCK_OPTS += --quickjump --hyperlinked-source
to enable symbol hyperlinks in the source pages. This will save you tons of time from grep
ing the ghc codebase.
grep
ing is still unavoidable in some cases, since there's a lot of CPP involved and they aren't well handled by haddock.
The Asterius project has come a long way and some examples with complex dependencies already work. It's still less mature than GHCJS though; see the next section for details.
In general, it's hard to give ETA for "production readiness", since improvements are continuous, and we haven't collected enough use cases from seed users yet. For more insight into what comes next for this project, we list our quarterly roadmap here.
Besides the goals in each quarter, we also do regular maintenance like dependency upgrades and bugfixes. We also work on related projects (mainly haskell-binaryen
and inline-js
) to ensure they are kept in sync and also useful to regular Haskell developers.
foreign import javascript
syntax. First-class garbage collected JSVal
type in Haskell land.ahc-cabal
to compile libraries and executables. Support for custom Setup.hs
is limited.aeson
.foreign export javascript
syntax. Haskell closures can be passed between the Haskell/JavaScript boundary via StablePtr
.binaryen
raw bindings, plus a monadic EDSL to construct WebAssembly code directly in Haskell.wasm-toolkit
: a Haskell library to handle WebAssembly code, which already powers binary code generation.BigInt
, there are no special requirements on the underlying JavaScript engine at the moment.BigInt
support at the moment.Promise
-based async JSFFI and GHCJS uses callbacks.getQ
/putQ
) don't work yet.Setup.hs
support is limited. If it has setup-deps
outside GHC boot libs, it won't work.For the past months before this update, I took a break from the Asterius project and worked on a client project instead. There's a saying "less is more", and I believe my absense in this project for a few months is beneficial in multiple ways:
Before I took the break, Asterius was stuck with a very complex & ad-hoc build system, and it was based on ghc-8.8. The most production-ready major version of ghc is ghc-8.10 today. Therefore, Q3 goals and roadmap has been adjusted accordingly:
What has been achieved so far:
ghc-8.10
branch, the previous asterius-specific patches have all been ported, and I implemented nix-based logic to generate cabal-buildable ghc api packages to be used by Asterius, replacing the previous ad-hoc python script.wasm32-unknown-wasi
triple now, so that's a good start for future work of proper transition of Asterius to a wasi32 backend of ghc.Remaining work of Q3 will be wrapping up #860 and merging it to master
.
Beyond Q3, the overall plan is also guided by the "less is more" principle: to reduce code rather than to add, leveraging upstream logic whenever possible, while still maintaing and even improving end-user experience. Many hacks were needed in the past due to various reasons, and after all the lessons learned along the way, there are many things that should be shaved off:
wasi-sdk
instead of emscripten
in the first place).In 2020 Q4 we mainly delivered:
ahc
a proper GHC frontend exe, support ahc -c
on non-Haskell sourcesahc-ar
JSVal#
closuresIn 2021 Q1, the primary goals are:
integer-gmp
and cbits
in common packages.The plan to achieving above goals:
wasi-sdk
as the C toolchain to configure the stage-1 GHC and finish the transition.A longer term goal beyond Q1 is upstreaming Asterius as a proper wasm backend of GHC. We need to play well with wasi-sdk
for this to happen, so another thing we're working on in Q1 is: refactor the linker infrastructure to make it LLVM-compliant, which means managing non-standard entities (e.g. static pointers, JSFFI imports/exports) in a standard-compliant way.
In 2020 Q3 we mainly delivered:
wasi-sdk
to compile C/C++ sources. Right now this doesn't work Cabal yet, so the C/C++ sources need to be manually added to asterius/libc
to be compiled and linked. We already replaced quite some legacy runtime shims with actual C code (e.g. cbits
in bytestring
/text
), and more will come in the future.Proper C/C++ support requires Asterius to be a proper wasm32
-targetting cross GHC which is configured to use wasi-sdk
as the underlying toolchain. The immediate benefits are:
wasi-sdk
. Some packages (e.g. integer-gmp
) are incompatible with these hacks.cbits
in user packages.i64
/i32
pointer casting everywhere.BigInt
usage in the JavaScript runtime, and support running generated code in Safari.Thus the goal of 2020 Q4 is finishing the 32-bit cross GHC transition. The steps to achieve this is roughly:
ghc
of the host GHC and instead use its own stage-1 GHC API packages.wasm32-wasi
and using wasi-sdk
as the toolchain.Work in 2020 Q3 is focused on:
The goals for Asterius are described on the page WebAssembly goals on the GHC Wiki. This document describes some milestones on the path to those goals.
Getting to JavaScript-free functionalityAlthough JavaScript interoperation is the big use case, much of the support needed for WebAssembly is independent of JavaScript.
Codegen: New back endA new back end will have to be defined in a way that fits into GHC's existing structure.
GHC support required:
Backend
or (likely) changes to the NcgImpl
record Make the Backend
type abstract, add a new value constructor for it.WebAssembly lacks goto
and provides only structured control flow: loops, blocks, if
statements, and multilevel continue
/break
. A Cmm control-flow graph must be converted to this structured control.
Status: A prototype has been implemented and tested, but the prototype works only on reducible control-flow graphs. A transformation from irreducible to reducible CFGs has yet to be implemented.
GHC support required:
CmmGraph
The Asterius prototype emits object files that are represented in a custom format. This format contains ad hoc information that can be handled only by a custom linker. The information currently stored in custom object files must either be expressed using standard object files that conform to C/C++ toolchain convention, or it must be eliminated.
Status: All information currently emitted by the Asterius prototype can be expressed using standard object files, with one exception: JSFFI records. We plan to turn these records into standard data segments whose symbols will be reachable from related Haskell functions. Such segments can be handled by a standard C/C++ linker. The data segments will be consumed by the JavaScript adjunct to GHC's run-time system, which will use them to reconstruct imported and exported functions.
GHC support required:
Rather than attempt to prettyprint WebAssembly directly from Cmm, the WebAssembly back end will first translate Cmm to an internal representation of a WebAssembly module, tentatively to be called WasmModule
. A WasmModule
can be serialized to the standard WebAssembly binary format.
A preliminary design might look like this:
WasmModule
contains sectionsWasmStmt ...
)Status: Except for that WasmStmt
fragment, which contains the WebAssembly control-flow constructs, the internal representation has yet to be defined. And we have yet to reach consensus on whether we wish to be able to emit both textual and binary WebAssembly, or whether we prefer to emit only binary WebAssembly and to rely on an external disassembler to produce a more readable representation. (External assemblers are apparently not good enough to be able to rely on emitting only a textual representation.)
GHC support required:
We need a translator from CmmGroup
to WasmModule
. Our prototype relooper translates CmmGraph
to WasmStmt ...
, and the other parts of the translation should mostly be a 1-to-1 mapping. Some Cmm features can be translated in more than one way:
Global registers. We can use the in-memory register table as in unregisterised mode, or one WebAssembly global for each global register, or use WebAssembly multi-value feature to carry the registers around. Start with WebAssembly globals first, easy to implement, should be reasonably faster than memory load/store.
Cmm tail calls. We can use WebAssembly experimental tail calls feature, or do trampolining by making each Cmm function return its jump target. Since WebAssembly tail calls is not widely implemented in engines yet, start with trampolining.
Status: Not started, but given the rich experience with the Asterius prototype, no difficulties are anticipated.
GHC support required:
The build system has to be altered to select the proper C code for the WebAssembly target. We're hoping for the following:
The build system can build and package the run-time system standalone.
The build system can easily cross-compile from a POSIX host to the Wasm target.
A developer can instruct the build system to choose Wasm-compatible features selectively to build and test on a POSIX platform (so-called "feature vector").
Meeting these goals will require both conditional build rules and CPP macros for code specific to wasm32-wasi
.
Status: Not yet begun.
GHC support required:
mmap
The run-time storage manager uses mmap
and munmap
to allocate and free MBlock
s. But mmap
and munmap
aren't available on the WASI platform, so we need to use standard libc allocation routines instead.
Status: we implemented the patch, tested with WebAssembly, i386 and x64-without-large-address-space.
GHC support required:
New directory rts/wasi
to go alongside rts/posix
and rts/win32
.
Altered logic in rts/rts.cabal.in
and elsewhere to use conditional compilation to select OSMem.c
from the rts/wasi
directory.
The run-time system currently uses a timer to know when to deliver a Haskell Execution Context (virtual CPU) to another Haskell thread. But the timer is implemented using pthreads and POSIX signals, which are not available on WebAssembly---so it has to go. We'll need some other method for deciding when to switch contexts.
This change will remove dependencies on pthreads and on a POSIX signal (VTALRM).
Status: We have patched the run-time system to disable that timer, and we have tested the patch on POSIX. In this patch, the scheduler does a context switch at every heap-block allocation (as in the -C0
RTS flag). Yet to be done: determine a viable long-term strategy for deciding when to context switch.
GHC support required:
The run-time system depends on the signals API in various ways: it can handle certain OS signals, and it can even support setting Haskell functions as signal handlers. Such functionality, which inherently depends on signals, must be made conditional on the target platform.
There is already a RTS_USER_SIGNALS
CPP macro that guards some signal logic, but not all. To make signals truly optional, more work is needed.
Status: In progress.
GHC support required:
libffi
is required for dynamic exports to C. It's technically possible to port libffi
to either pure WebAssembly or WebAssembly+JavaScript.
Status: Not yet implemented.
GHC support required:
(The audience for this section is primarily the Asterius implementation team, but there are a few things that ought to be communicated to other GHC implementors.)
RTS for JSFFI: representing and garbage-collecting foreign referencesWhen Haskell interoperates with JavaScript, Haskell objects need to be able to keep JavaScript objects alive and vice versa, even though they live on different heaps. Similarly, JavaScript needs to be able to reclaim JavaScript objects once there are no more references to them.
We propose to extend GHC with a new primitive type JSVal#
, whose closure payload is a single word. The JavaScript adjunct uses this word to index into an internal table. After each major garbage collection, the collector notifies the JavaScript adjunct of all live JSVal#
closures. The adjunct uses this report to drop its references to JavaScript objects that cannot be reached from the Haskell heap.
Status: Not yet implemented.
GHC support required:
Build-system support for the JavaScript adjunct to the RTS
New primitive type JSVal#
Patch to the garbage collector to report live JSVal#
closures.
Write down and document whatever API is needed for calls across the Haskell/JavaScript boundary and for sharing the single CPU among both Haskell threads and JavaScript's event loop. Ideal documentation would include a small-step operational semantics.
Status: Work in progress
GHC support required:
GHC's scheduler will need to be altered to support an event-driven model of concurrency. The details are work in progress.
Draft semantics of concurrency and foreign calls.Note: This document assumes that every function takes exactly one argument. Just imagine that it's the last argument in a fully saturated call.
Foreign export asynchronousSuppose that a Haskell function f
is exported to JavaScript asynchronously (which might be the default). When JavaScript calls the exported function with argument v
, it has the effect of performing the IO action ⟦f⟧ v
, where the translation ⟦f⟧
is defined as follows:
⟦f⟧ v = do
p <- allocate new promise
let run_f = case try (return $ f $ jsToHaskell v) of
Left exn -> p.fails (exnToJS exn)
Right a -> p.succeeds (haskellToJS a)
forkIO run_f
return p -- returned to JavaScript
Not specified here is whether the scheduler is allowed to steal a few cycles to run previously forked threads.
N.B. This is just a semantics. We certainly have the option of implementing the entire action completely in the runtime system.
Not yet specified: What is the API by which JavaScript would call an asynchronously exported Haskell function? Would it, for example, use API functions to construct a Haskell closure, then evaluate it?
Foreign import asynchronousSuppose that a JavaScript function g
is imported asynchronously (which might be the default). Let types a
and b
stand for two unknown but fixed types. The JavaScript function expects an argument of type a
and returns a Promise
that (if successful) eventually delivers a value of type b
. When a Haskell thunk of the form g e
is forced (evaluated), the machine performs the following monadic action, the result of which is (eventually) written into the thunk.
do let v = haskellToJS e -- evaluates e, converts result to JavaScript
p <- g v -- call returns a `Promise`, "immediately"
m <- newEmptyMVar
... juju to associate m with p ... -- RTS primitive?
result <- takeMVar m
case result of Left fails -> ... raise asynchronous exception ...
Right b -> return $ jsToHaskell v
CPU sharing
Suppose GHC wishes to say politely to the JavaScript engine, "every so often I would like to use the CPU for a bounded time." It looks like Haskell would need to add a message to the JavaScript message queue, such that the function associated with that messages is "run Haskell for N ticks." Is the right API to call setTimeout
with a delay of 0 seconds?
Let's suppose the state of a Haskell machine has these components:
F
("fuel") is the number of ticks a Haskell thread can execute before returning control to JavaScript. This component is present only when Haskell code is running.
R
("running") is either the currently running Haskell thread, or if no thread is currently running, it is • ("nothing")
Q
("run queue") is a collection of runnable threads.
H
("heap") is the Haskell heap, which may contain MVar
s and threads that are blocked on them.
Components R
and H
are used linearly, so they can be stored in global mutable state.
The machine will enjoy a set of labeled transitions such as are described in Simon PJ's paper on the "Awkward Squad." Call these the "standard transitions." (The awkward-squad machine state is a single term, formed by the parallel composition of R
with all the threads of Q
and all the MVars of H
. The awkward squad doesn't care about order, but we do.) To specify the standard transitions, we could add an additional clock that tells the machine when to switch the running thread R
out for a new thread from the queue. Or we could leave the context switch nondeterministic, as it is in the awkward-squad paper. Whatever seems useful.
Every state transition has the potential use to fuel. Fuel might actually be implemented using an allocation clock, but for semantics purposes, we can simply decrement fuel at each state transition, then gate the standard transitions on the condition F > 0
.
At a high level, every invocation of Haskell looks the same: JavaScript starts the Haskell machine in a state ⟨F, •, Q, H⟩
, and the Haskell machine makes repeated state transitions until it reaches one of two stopping states:
⟨F',•, [], H'⟩
: no Haskell threads are left to run
⟨0, ̧R', Q', H'⟩
: fuel is exhausted, in which case the machine moves the currently running thread onto the run queue, reaching state ⟨0, ̧•, R':Q', H'⟩
Once one of these states is reached, GHC's runtime system takes two actions:
It allocates a polite request for the CPU and puts that request on the JavaScript message queue, probably using setTimeout
with a delay of 0 seconds.
It returns control to JavaScript.
All discussion in this document refers to the non-threaded RTS.
Potential semanticsGHC relies on the scheduler to manage both concurrency and foreign calls. Foreign calls are in play because most foreign calls are asynchronous, so implementing a foreign call requires support from the scheduler. A preliminary sketch of possible semantics can be found in file semantics.md
.
I have foo.hs
. I can compile to foo.wasm
and foo.js
. foo.wasm
is a binary artifact that needs to be shipped with foo.js
, nothing else you need to know about this file. foo.js
conforms to some JavaScript module standard and exports a JavaScript object. Say this object is foo
.
For each exported top-level Haskell function, foo
contains a corresponding async method. Consider the most common case main :: IO ()
, then you can call foo.main()
. For something like fib :: Int -> Int
, you can do let r = await foo.fib(10)
and get the number result in r
. The arguments and result can be any JavaScript value, if the Haskell type is JSVal
.
Now, suppose we await foo.main()
, and main
finished successfully. The RTS must remain alive, because:
main
might have forked other Haskell threads, those threads are expected to run in the background.main
might have dynamically exported a Haskell function closure as a JSFunction
. This JSFunction
is passed into the outside JavaScript world, and it is expected to be called back some time in the future.Notes regarding error handling: any unhandled Haskell exception is converted to a JavaScript error. Likewise, any JavaScript error is converted to a Haskell exception.
Notes regarding RTS startup: foo
encapsulates some RTS context. That context is automatically initialized no later than the first time you call any method in foo
.
Notes regarding RTS shutdown: not our concern yet. As long as the browser tab is alive, the RTS context should be alive.
Primerghc-devs thread: Thoughts on async RTS API?
ghc commentary: scheduler
Consider a native case...Suppose we'd like to run some Haskell computation from C (e.g. the main
function). After the RTS state is initialized, we need to:
rts_mk*
functions in RtsAPI.h
to convert C argument values to Haskell closures. Call rts_apply
repeatedly to apply the Haskell function closure to argument closures, until we end up with a closure of Haskell type IO a
or a
, ready to be evaluated.RtsAPI.h
. The eval function creates a TSO(Thread State Object), representing the Haskell thread where the computation happens.rts_get*
functions in RtsAPI.h
to convert the result Haskell closure to C value.The key logic is in the schedule
function which implements the scheduler loop. The implementation is quite complex, for now we only need to keep in mind:
select()
call, to ensure I/O can proceed for at least one file descriptor.Suppose we'd like to call an async JavaScript function and get the result in Haskell:
foreign import javascript safe "fetch($1)" js_fetch :: JSRequest -> IO JSResponse
In Haskell, when js_fetch
returns, the actual fetch()
call should have already resolved; if it rejected, then an exception should be raised in Haskell.
Now, the main thread calls js_fetch
at some point, no other threads involved. According to previous section, the current call stack is something like:
main -> rts_evalLazyIO -> scheduleWaitThread -> schedule -> fetch
The Haskell code does a fetch()
call (or it arranges the RTS to perform one). fetch()
will immediately return a Promise
handle. Now what? What do we do with this Promise
thing? More importantly, the scheduler loop can't make any progress! The Haskell thread is blocked, suspended, the run queue is empty, the RTS scheduler only knows about posix blocking read/write, so it doesn't know how to handle this situation.
After fetch()
returns, the call stack is:
main -> rts_evalLazyIO -> scheduleWaitThread -> schedule
Remember the "run-to-completion" principle of the JavaScript concurrency model! We're currently inside some JavaScript/WebAssembly function, which counts as a single tick in the entire event loop. The functions we're running right now must run to completion and return, only after that, the fetch()
result can become available.
And also remember how the WebAssembly/JavaScript interop works: you can only import synchronous JavaScript functions, and export WebAssembly functions as synchronous JavaScript functions. Every C function in RTS that we cross-compile to WebAssembly is also synchronous, no magic blocking or preemptive context switch will ever take place!
What we needAll the scheduler-related synchronous C functions in RTS, be it rts_eval*
or schedule
, they only return when the initial Haskell thread completes. We must teach these functions to also return when the thread blocks, at least when blocking reason is beyond conventional posix read/write.
Here's how things should look like after the scheduler is refactored:
Promise
related to blocking, and also the blocked thread ids.Promise
. These handlers will resume the entire RTS and carry-on Haskell computation.Promise
is returned immediately, but it's resolved/rejected in the future, when the corresponding Haskell thread runs to completion.Draft: The RTS scheduler is synchronous. If you call rts_eval*
to enter the scheduler and do some evaluation, it'll only return when the relevant Haskell thread is completed or killed. This model doesn't work if we want to be able to call async foreign functions without blocking the entire RTS. The root of this problem: the scheduler loop has no knowledge about foreign event loops.
Status: we have looked into this, and based on our experience in Asterius, the implementation plan is as follows:
Add CPS-style async versions of rts_eval*
RTS API functions. Original sync versions continue to work, but panics with a reasonable error message when unsupported foreign blocking event occurs.
The scheduler loop is broken down into "ticks". Each tick runs to the point when some Haskell computation finishes or blocks, much like a single iteration in the original scheduler loop. The scheduler ticks can be plugged into a foreign event loop, so Haskell evaluation fully interleaves with other foreign computation.
GHC support required:
select
/poll
optional
In the current non-threaded RTS, when there are no immediately runnable Haskell threads, a select()
call will be performed on all the file descriptors related to blocking. The call returns when I/O is possible for at least one file descriptor, therefore some Haskell thread blocked on I/O can be resumed.
This may work for us when we target pure wasm32-wasi
instead of the browser. The WASI standard defines a poll_oneoff
syscall, and wasi-libc
implements select()
/poll()
using this syscall.
However, this doesn't work well with JavaScript runtime (or any foreign event loop in general). poll()
calls are blocking calls, so they can block the entire event loop, hang the browser tab and prevent "real work" (e.g. network requests) from proceeding.
Status: we have looked into this, and there are roughly two possible approaches:
poll_oneoff
without actually blocking the entire event loop. Easy to implement, but it's a very ugly hack that also comes with penalty in code size and performance.poll()
call at all. The higher-level caller of scheduler ticks will be in charge of collecting blocking I/O events and handling them.GHC support required:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4