A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/ocaml/ocaml/issues/13677 below:

atomic_load_acquire(&di->backup_thread_msg) == BT_INIT · Issue #13677 · ocaml/ocaml · GitHub

On Sunday in multicoretests we saw a new assertion error triggered on trunk that we haven't seen before:
https://github.com/ocaml-multicore/multicoretests/actions/runs/12337399971/job/34431080832

random seed: 164588596
generated error fail pass / total     time test name

[ ]    0    0    0    0 / 1000     0.0s Lin Weak HashSet stress test with Domain
[02] file runtime/domain.c; line 1126 ### Assertion failed: atomic_load_acquire(&di->backup_thread_msg) == BT_INIT
File "src/weak/dune", line 28, characters 7-24:
28 |  (name lin_tests_hashset)
            ^^^^^^^^^^^^^^^^^
(cd _build/default/src/weak && ./lin_tests_hashset.exe --verbose)
Command got signal ILL.

The relevant part of install_backup_thread reads a follows:

if (atomic_load_acquire(&di->backup_thread_running) != 0) { /* If the backup thread is running, but has been instructed to terminate, we need to wait for it to stop until we can spawn another. */ uintnat msg; msg = atomic_load_acquire(&di->backup_thread_msg); while (msg != BT_INIT) { /* Give a chance for backup thread on this domain to terminate */ caml_plat_unlock (&di->domain_lock); cpu_relax (); caml_plat_lock_blocking(&di->domain_lock); msg = atomic_load_acquire(&di->backup_thread_msg); } } CAMLassert(atomic_load_acquire(&di->backup_thread_msg) == BT_INIT);

Looking a bit into it, I spotted a couple of atomic updates above in backup_thread_func
thinking that it might be caused by the small window between these two:

atomic_store_release(&di->backup_thread_running, 0); atomic_store_release(&di->backup_thread_msg, BT_INIT);

I thus inserted a sleep to open the window a bit more:

diff --git a/runtime/domain.c b/runtime/domain.c
index 6086ed8155..bac5754173 100644
--- a/runtime/domain.c
+++ b/runtime/domain.c
@@ -1097,6 +1097,7 @@ static void* backup_thread_func(void* v)
 
   /* doing terminate */
   atomic_store_release(&di->backup_thread_running, 0);
+  thrd_sleep(&(struct timespec){.tv_sec=0, .tv_nsec=050000000}, NULL); /* sleep for 50ms */
   atomic_store_release(&di->backup_thread_msg, BT_INIT);
 
   return 0;

With this patch, any Domain-using test I've tried now fails immediately with an assertion error:

multicoretests$ _build/default/src/hashtbl/stm_tests.exe 
### OCaml runtime: debug mode ###
### set OCAMLRUNPARAM=v=0 to silence this message
random seed: 347896408
[02] file runtime/domain.c; line 1127 ### Assertion failed: atomic_load_acquire(&di->backup_thread_msg) == BT_INIT
Illegal instruction (core dumped)

What's a good fix for this? Switching the two atomic updates? (edit: apparently not) 🤔


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4