A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/ocaml-multicore/ocaml-multicore/issues/670 below:

Domainslib nesting issue · Issue #670 · ocaml-multicore/ocaml-multicore · GitHub

Describe the issue

See ocaml-multicore/domainslib#47 for the original problem.
See below for repro testcase.
When run enough times (5-10), sometimes it prints 'no work', but then recovers and finishes succesfully, but sometimes it just prints 'no work' in a neverending loop.
Although it is possible that I made a mistake in my use of (Mini)Task, AFAICT I only await a promise once, and I await each async promise that I get returned (although these awaits are nested inside other asyncs, so there might be some ping-pong of the Atomic.t between different domains), I don't think it should ever get stuck on 'no work' (well fair enough the worker threads could, but eventually all promises should be resolved and the main program quit).

To reproduce

I've simplified this so it doesn't depend on Domainslib and uses Atomic just for storing the result, and otherwise use a Mutex and a Queue:

module MiniTask = struct
  let m = Mutex.create ()

  let tasks = Queue.create ()

  let within f =
    Mutex.lock m ;
    Fun.protect ~finally:(fun () -> Mutex.unlock m) f

  let do_task (result, f) =
    try
      let r = f () in
      Atomic.set result @@ Some (Ok r)
    with e -> Atomic.set result @@ Some (Error e)

  let rec worker () =
    match within @@ fun () -> Queue.take_opt tasks with
    | None ->
        Domain.Sync.cpu_relax () ; worker ()
    | Some task ->
        do_task task

  let domains = Array.init 24 (fun _ -> Domain.spawn worker)

  let async f =
    let result = Atomic.make None in
    let () = within @@ fun () -> Queue.add (result, f) tasks in
    result

  let rec await promise =
    match Atomic.get promise with
    | Some (Ok r) ->
        r
    | Some (Error e) ->
        raise e
    | None -> (
      match within @@ fun () -> Queue.take_opt tasks with
      | None ->
          prerr_endline "no work"; flush stderr;
          Domain.Sync.cpu_relax () ; await promise
      | Some task ->
          let () = do_task task in
          await promise )
end

let parallel_foreach_reduce :
       fold:(('b -> 'a -> 'b) -> 'b -> 'b)
    -> body:('a -> 'c)
    -> ('c -> 'c -> 'c)
    -> 'c
    -> 'c =
 fun ~fold ~body reduce init ->
  let process tasks item =
    let rec take_mergeable (level, acc) = function
      | (level', task, id) :: rest when level' = level ->
          take_mergeable (level + 1, (task, id) :: acc) rest
      | rest ->
          (level, acc, rest)
    in
    let level, await_now, tasks = take_mergeable (0, []) tasks in
    let task =
      MiniTask.async
      @@ fun () ->
      let r =
        List.fold_left
          (fun acc (e, _) -> reduce acc @@ MiniTask.await e)
          (body item) await_now
      in
      r
    in
    (level, task, ()) :: tasks
  in
  List.fold_left (fun acc (_, e, _) -> reduce acc @@ MiniTask.await e) init
  @@ fold process []

let () =
  let body _ = 4 in
  let a = Array.init 1000000 Fun.id in
  let fold f init = Array.fold_left f init a in
  let (_ : int) = parallel_foreach_reduce ~fold ~body ( + ) 0 in
  ()

Please provide here the build instructions for your issue, and a code sample if available.

ocamlopt yx.ml -o yx.exe
while ./yx.exe; do echo; done

Note that simply increasing the array size or adding a for loop wrapper inside the executable doesn't always makes this reproducible, I have to actually relaunch the executable multiple times until it fails once. This is in contrast to the original domainslib testcase which almost always fails, but that has a lot more atomic ops and a lot more parallelism since it doesn't use a mutex.

Multicore OCaml build version

Please provide the output of ocamlc -version or strings your_binary.exe | grep OCAML_RUNTIME_BUILD_GIT_HASH_IS.

ocamlc -version
4.12.0+domains
strings yx.exe | grep OCAML_RUNTIME_BUILD_GIT_HASH_IS
OCAML_RUNTIME_BUILD_GIT_HASH_IS_c79158cfbfae

Did you try running it with the debug runtime and heap verification ON?

See https://github.com/ocaml-multicore/ocaml-multicore/wiki/Report-an-issue-to-the-bugtracker

Yes, fails the same way. Also ocamlc (with or without debug runtime) fails with infinite 'no work' too.

** Environment

lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   43 bits physical, 48 bits virtual
CPU(s):                          24
On-line CPU(s) list:             0-23
Thread(s) per core:              2
Core(s) per socket:              12
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       AuthenticAMD
CPU family:                      23
Model:                           113
Model name:                      AMD Ryzen 9 3900X 12-Core Processor
Stepping:                        0
Frequency boost:                 enabled
CPU MHz:                         4466.589
CPU max MHz:                     4672.0698
CPU min MHz:                     2200.0000
BogoMIPS:                        7600.59
Virtualization:                  AMD-V
L1d cache:                       384 KiB
L1i cache:                       384 KiB
L2 cache:                        6 MiB
L3 cache:                        64 MiB
NUMA node0 CPU(s):               0-23
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2:        Mitigation; Full AMD retpoline, IBPB conditional, STIBP conditional, RSB filling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdts
                                 cp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popc
                                 nt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext p
                                 erfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba ibpb stibp vmmcall fsgsbase bmi1 avx2 smep bmi2 cqm rdt
                                 _a rdseed adx smap clflushopt clwb sha_ni xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local clzero irperf xsa
                                 veerptr rdpru wbnoinvd arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold avic v_vmsave
                                 _vmload vgif v_spec_ctrl umip rdpid overflow_recov succor smca sme sev sev_es

Linux storm-broadband 5.14.9-200.fc34.x86_64 #1 SMP Thu Sep 30 11:55:35 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4