On multicoretests
we have observed crashes and deadlocks on macOS ARM64 when running tests performing random allocations and random operations from the Gc
-module. We've observed these on 5.2.0, 5.3.0, and on current trunk from today. Interestingly, the misbehaviour doesn't trigger when omitting Gc.compact
.
The shape of the tests is the usual: a sequential prefix of random calls, followed by two Domain.spawn
s each performing random calls too.
In between each iteration, after Domain.join
'ning the two domains, we reset the allocation roots and invoke Gc.full_major
in an attempt to reset the heap to a somewhat sane state. lldb
backtraces reveal that the crashes may happen during this GC - with only the main domain running (see example below).
To recreate:
$ opam switch create . --empty
$ opam install . --inplace-build # build a local opam switch from, e.g., trunk
$ opam install qcheck-core
$ git clone -b gc-test-enable-compact https://github.com/ocaml-multicore/multicoretests.git
$ cd multicoretests
$ dune build
$ dune exec src/gc/stm_tests_par_stress.exe -- -v
Here's an example backtrace from lldb with the main thread #1
encountering an invalid address during gc_full_major
while the backup thread #2
is just waiting:
(lldb) bt all
warning: could not execute support code to read Objective-C class data in the process. This may reduce the quality of type information available.
* thread #1, stop reason = ESR_EC_DABORT_EL0 (fault address: 0x100d62c68)
* frame #0: 0x0000000100add134 stm_tests_par_stress.exe`do_some_marking [inlined] Hd_val(val=4309003376) at mlvalues.h:190:10 [opt]
frame #1: 0x0000000100add134 stm_tests_par_stress.exe`do_some_marking(stk=0x00006000038702a0, budget=11590) at major_gc.c:1222:21 [opt]
frame #2: 0x0000000100adba90 stm_tests_par_stress.exe`mark(budget=16384) at major_gc.c:1335:14 [opt]
frame #3: 0x0000000100adb0e4 stm_tests_par_stress.exe`major_collection_slice(howmuch=<unavailable>, participant_count=1, barrier_participants=0x000000012f008e00, mode=Slice_uninterruptible, force_compaction=0) at major_gc.c:1817:21 [opt]
frame #4: 0x0000000100adb970 stm_tests_par_stress.exe`stw_finish_major_cycle(domain=0x000000012ee047c0, arg=0x000000016f412f40, participating_count=1, participating=0x000000012f008e00) at major_gc.c:2050:5 [opt]
frame #5: 0x0000000100ac8c9c stm_tests_par_stress.exe`caml_try_run_on_all_domains_with_spin_work(sync=<unavailable>, handler=(stm_tests_par_stress.exe`stw_finish_major_cycle at major_gc.c:2030), data=0x000000016f412f40, leader_setup=<unavailable>, enter_spin_callback=<unavailable>, enter_spin_data=<unavailable>) at domain.c:1716:3 [opt]
frame #6: 0x0000000100ac7aec stm_tests_par_stress.exe`caml_try_run_on_all_domains(handler=<unavailable>, data=<unavailable>, leader_setup=<unavailable>) at domain.c:1738:7 [opt] [artificial]
frame #7: 0x0000000100ada944 stm_tests_par_stress.exe`caml_finish_major_cycle(force_compaction=0) at major_gc.c:2065:5 [opt]
frame #8: 0x0000000100ad13d4 stm_tests_par_stress.exe`gc_full_major_res at gc_ctrl.c:269:5 [opt]
frame #9: 0x0000000100ad1390 stm_tests_par_stress.exe`caml_gc_full_major(v=<unavailable>) at gc_ctrl.c:282:34 [opt]
frame #10: 0x0000000100af069c stm_tests_par_stress.exe`caml_c_call + 52
frame #11: 0x00000001009f8298 stm_tests_par_stress.exe`camlDune__exe__Stm_tests_spec$cleanup_1670 + 176
frame #12: 0x00000001009fbe94 stm_tests_par_stress.exe`camlSTM_domain$run_par_660 + 372
frame #13: 0x00000001009fc300 stm_tests_par_stress.exe`camlSTM_domain$stress_prop_par_793 + 32
frame #14: 0x0000000100a01ac8 stm_tests_par_stress.exe`camlUtil$fun_2474 + 120
frame #15: 0x0000000100a191d8 stm_tests_par_stress.exe`camlQCheck2$loop_4000 + 64
frame #16: 0x0000000100a190f8 stm_tests_par_stress.exe`camlQCheck2$run_law_3995 + 104
frame #17: 0x0000000100a19eec stm_tests_par_stress.exe`camlQCheck2$check_state_input_4063 + 204
frame #18: 0x0000000100a1a344 stm_tests_par_stress.exe`camlQCheck2$check_cell_inner_9740 + 316
frame #19: 0x0000000100a09ebc stm_tests_par_stress.exe`camlQCheck_base_runner$aux_map_1444 + 772
frame #20: 0x0000000100a57ba8 stm_tests_par_stress.exe`camlStdlib__List$map_334 + 72
frame #21: 0x0000000100a097cc stm_tests_par_stress.exe`camlQCheck_base_runner$run_tests_inner_2444 + 596
frame #22: 0x0000000100a0a550 stm_tests_par_stress.exe`camlQCheck_base_runner$run_tests_main_inner_3052 + 176
frame #23: 0x00000001009f6138 stm_tests_par_stress.exe`camlDune__exe__Stm_tests_par_stress$entry + 304
frame #24: 0x00000001009f0354 stm_tests_par_stress.exe`caml_program + 1596
frame #25: 0x0000000100af07dc stm_tests_par_stress.exe`caml_start_program + 132
frame #26: 0x0000000100af0058 stm_tests_par_stress.exe`caml_startup_common(argv=0x000000016f4132f0, pooling=<unavailable>) at startup_nat.c:127:9 [opt]
frame #27: 0x0000000100af00bc stm_tests_par_stress.exe`caml_main [inlined] caml_startup_exn(argv=<unavailable>) at startup_nat.c:134:10 [opt]
frame #28: 0x0000000100af00b4 stm_tests_par_stress.exe`caml_main [inlined] caml_startup(argv=<unavailable>) at startup_nat.c:139:15 [opt]
frame #29: 0x0000000100af00b4 stm_tests_par_stress.exe`caml_main(argv=<unavailable>) at startup_nat.c:146:3 [opt]
frame #30: 0x0000000100ada3f4 stm_tests_par_stress.exe`main(argc=<unavailable>, argv=<unavailable>) at main.c:37:3 [opt]
frame #31: 0x00000001875aa0e0 dyld`start + 2360
thread #2
frame #0: 0x00000001878f59ec libsystem_kernel.dylib`__psynch_cvwait + 8
frame #1: 0x000000018793355c libsystem_pthread.dylib`_pthread_cond_wait + 1228
frame #2: 0x0000000100ae4abc stm_tests_par_stress.exe`caml_plat_wait(cond=<unavailable>, mut=<unavailable>) at platform.c:146:21 [opt]
frame #3: 0x0000000100ac9ca8 stm_tests_par_stress.exe`backup_thread_func(v=0x0000000130048000) at domain.c:1080:11 [opt]
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4