We've encountered that Stdlib/Out_channel.flush
may raise a Sys_error
exception when used in parallel with a close
.
Consider this reproducer program:
let path = Filename.temp_file "stm-" "" let test () = let channel = Atomic.make (Out_channel.open_text path) in let wait = Atomic.make true in let d1 = Domain.spawn (fun () -> while Atomic.get wait do Domain.cpu_relax() done; Out_channel.close (Atomic.get channel)) in let d2 = Domain.spawn (fun () -> while Atomic.get wait do Domain.cpu_relax() done; (try Out_channel.flush (Atomic.get channel) with (Sys_error msg) -> Printf.printf "Out_channel.flush raised Sys_error %S\n%!" msg; assert false)) in (* Output a sufficiently long string *) Out_channel.output_string (Atomic.get channel) (String.make (65531 (*65536*)) 'a'); (* Let the races begin! *) Atomic.set wait false; let () = Domain.join d1 in let () = Domain.join d2 in (* Please leave the torture chamber nice and clean as you found it *) (try Out_channel.close (Atomic.get channel) with Sys_error _ -> ()); Sys.remove path let _ = for i = 1 to 50_000 do if i mod 250 = 0 then Printf.printf "#%!"; test () done
with this behaviour:
$ ocamlopt -g flushexc.ml
$ ./a.out
##############################################################################################################Out_channel.flush raised Sys_error "Bad file descriptor"
Fatal error: exception File "flushexc.ml", line 21, characters 94-100: Assertion failed
The documentation does not warn of unsynchronized access to Out_channel
operations, as they are generally protected by internal channel locks.
The documentation for close
mentions that flush
should not raise Sys_error
when called on a closed channel, but does
strictly speaking not specify the behaviour of flush
in parallel with close
:
val close : t -> unit (** Close the given channel, flushing all buffered write operations. Output functions raise a [Sys_error] exception when they are applied to a closed output channel, except {!close} and {!flush}, which do nothing when applied to an already closed channel. Note that {!close} may raise [Sys_error] if the operating system signals an error when flushing or closing. *)
The problem is that flush
on a still-open channel ends up in caml_flush_partial
which calls check_pending
that may
temporarily unlock the channel to process pending actions. This creates a small window for caml_ml_close_channel
to lock and close the
underlying file descriptor, and offset channel->curr
by 1 from channel->buff
into dummy_buff
. As a result, when caml_flush_partial
resumes, it will attempt to output 1 character, fail, and raise an exception.
Before jumping into discussions of fixes, we may want to discuss how to proceed.
Here's a few suggestions:
caml_flush_partial
flush
on theNote: the above example program can easily be modified to race on two close
s also resulting
in a Sys_error
. The last sentence of the documentation could account for that behaviour
though.
Thanks to @ncik-roberts for figuring out the above explanation.
CC to @damiendoligez who last had his fingers in these parts in #12678
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4