Describe the issue
See ocaml-multicore/domainslib#47 for the original problem.
See below for repro testcase.
When run enough times (5-10), sometimes it prints 'no work', but then recovers and finishes succesfully, but sometimes it just prints 'no work' in a neverending loop.
Although it is possible that I made a mistake in my use of (Mini)Task, AFAICT I only await
a promise once, and I await each async
promise that I get returned (although these awaits are nested inside other asyncs, so there might be some ping-pong of the Atomic.t between different domains), I don't think it should ever get stuck on 'no work' (well fair enough the worker threads could, but eventually all promises should be resolved and the main program quit).
To reproduce
I've simplified this so it doesn't depend on Domainslib and uses Atomic just for storing the result, and otherwise use a Mutex and a Queue:
module MiniTask = struct
let m = Mutex.create ()
let tasks = Queue.create ()
let within f =
Mutex.lock m ;
Fun.protect ~finally:(fun () -> Mutex.unlock m) f
let do_task (result, f) =
try
let r = f () in
Atomic.set result @@ Some (Ok r)
with e -> Atomic.set result @@ Some (Error e)
let rec worker () =
match within @@ fun () -> Queue.take_opt tasks with
| None ->
Domain.Sync.cpu_relax () ; worker ()
| Some task ->
do_task task
let domains = Array.init 24 (fun _ -> Domain.spawn worker)
let async f =
let result = Atomic.make None in
let () = within @@ fun () -> Queue.add (result, f) tasks in
result
let rec await promise =
match Atomic.get promise with
| Some (Ok r) ->
r
| Some (Error e) ->
raise e
| None -> (
match within @@ fun () -> Queue.take_opt tasks with
| None ->
prerr_endline "no work"; flush stderr;
Domain.Sync.cpu_relax () ; await promise
| Some task ->
let () = do_task task in
await promise )
end
let parallel_foreach_reduce :
fold:(('b -> 'a -> 'b) -> 'b -> 'b)
-> body:('a -> 'c)
-> ('c -> 'c -> 'c)
-> 'c
-> 'c =
fun ~fold ~body reduce init ->
let process tasks item =
let rec take_mergeable (level, acc) = function
| (level', task, id) :: rest when level' = level ->
take_mergeable (level + 1, (task, id) :: acc) rest
| rest ->
(level, acc, rest)
in
let level, await_now, tasks = take_mergeable (0, []) tasks in
let task =
MiniTask.async
@@ fun () ->
let r =
List.fold_left
(fun acc (e, _) -> reduce acc @@ MiniTask.await e)
(body item) await_now
in
r
in
(level, task, ()) :: tasks
in
List.fold_left (fun acc (_, e, _) -> reduce acc @@ MiniTask.await e) init
@@ fold process []
let () =
let body _ = 4 in
let a = Array.init 1000000 Fun.id in
let fold f init = Array.fold_left f init a in
let (_ : int) = parallel_foreach_reduce ~fold ~body ( + ) 0 in
()
Please provide here the build instructions for your issue, and a code sample if available.
ocamlopt yx.ml -o yx.exe
while ./yx.exe; do echo; done
Note that simply increasing the array size or adding a for loop wrapper inside the executable doesn't always makes this reproducible, I have to actually relaunch the executable multiple times until it fails once. This is in contrast to the original domainslib testcase which almost always fails, but that has a lot more atomic ops and a lot more parallelism since it doesn't use a mutex.
Multicore OCaml build version
Please provide the output of ocamlc -version
or strings your_binary.exe | grep OCAML_RUNTIME_BUILD_GIT_HASH_IS
.
ocamlc -version
4.12.0+domains
strings yx.exe | grep OCAML_RUNTIME_BUILD_GIT_HASH_IS
OCAML_RUNTIME_BUILD_GIT_HASH_IS_c79158cfbfae
Did you try running it with the debug runtime and heap verification ON?
See https://github.com/ocaml-multicore/ocaml-multicore/wiki/Report-an-issue-to-the-bugtracker
Yes, fails the same way. Also ocamlc (with or without debug runtime) fails with infinite 'no work' too.
** Environment
lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
Address sizes: 43 bits physical, 48 bits virtual
CPU(s): 24
On-line CPU(s) list: 0-23
Thread(s) per core: 2
Core(s) per socket: 12
Socket(s): 1
NUMA node(s): 1
Vendor ID: AuthenticAMD
CPU family: 23
Model: 113
Model name: AMD Ryzen 9 3900X 12-Core Processor
Stepping: 0
Frequency boost: enabled
CPU MHz: 4466.589
CPU max MHz: 4672.0698
CPU min MHz: 2200.0000
BogoMIPS: 7600.59
Virtualization: AMD-V
L1d cache: 384 KiB
L1i cache: 384 KiB
L2 cache: 6 MiB
L3 cache: 64 MiB
NUMA node0 CPU(s): 0-23
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Full AMD retpoline, IBPB conditional, STIBP conditional, RSB filling
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdts
cp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popc
nt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext p
erfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt
_a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsa
veerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave
_vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es
Linux storm-broadband 5.14.9-200.fc34.x86_64 #1 SMP Thu Sep 30 11:55:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4