On Sunday in multicoretests we saw a new assertion error triggered on trunk
that we haven't seen before:
https://github.com/ocaml-multicore/multicoretests/actions/runs/12337399971/job/34431080832
random seed: 164588596
generated error fail pass / total time test name
[ ] 0 0 0 0 / 1000 0.0s Lin Weak HashSet stress test with Domain
[02] file runtime/domain.c; line 1126 ### Assertion failed: atomic_load_acquire(&di->backup_thread_msg) == BT_INIT
File "src/weak/dune", line 28, characters 7-24:
28 | (name lin_tests_hashset)
^^^^^^^^^^^^^^^^^
(cd _build/default/src/weak && ./lin_tests_hashset.exe --verbose)
Command got signal ILL.
The relevant part of install_backup_thread
reads a follows:
Looking a bit into it, I spotted a couple of atomic updates above in backup_thread_func
thinking that it might be caused by the small window between these two:
I thus inserted a sleep to open the window a bit more:
diff --git a/runtime/domain.c b/runtime/domain.c index 6086ed8155..bac5754173 100644 --- a/runtime/domain.c +++ b/runtime/domain.c @@ -1097,6 +1097,7 @@ static void* backup_thread_func(void* v) /* doing terminate */ atomic_store_release(&di->backup_thread_running, 0); + thrd_sleep(&(struct timespec){.tv_sec=0, .tv_nsec=050000000}, NULL); /* sleep for 50ms */ atomic_store_release(&di->backup_thread_msg, BT_INIT); return 0;
With this patch, any Domain-using test I've tried now fails immediately with an assertion error:
multicoretests$ _build/default/src/hashtbl/stm_tests.exe
### OCaml runtime: debug mode ###
### set OCAMLRUNPARAM=v=0 to silence this message
random seed: 347896408
[02] file runtime/domain.c; line 1127 ### Assertion failed: atomic_load_acquire(&di->backup_thread_msg) == BT_INIT
Illegal instruction (core dumped)
What's a good fix for this? Switching the two atomic updates? (edit: apparently not) 🤔
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4