This library allows you to query the processor topology as well as set the processor affinity for the current process. This library does not depend on ocaml-5 (Multicore), but it can be used within ocaml-5 Domains as expected.
The topology can identify individual threads (smt), cores, sockets as well as P-cores (Performance) and E-cores (Energy energy efficient cores) on both AMD64 and Apple's ARM64 (M1, M2 & friends).
The library is split into 3 main modules:
Retrieves a count of threads, cores, sockets.
utop # Processor.Query.cpu_count;;
- : int = 8
utop # Processor.Query.core_count;;
- : int = 4
utop # Processor.Query.socket_count;;
- : int = 1
Build's an actual topology of each CPU, each Cpu.t
expresses a logical cpu with a logical id id
, a thread id smt
, a core id core
, a socket id socket
and a kind
which can be P-core
or E-core
, which is only relevant for Intel Alder Lake and Apple's ARM64 machines.
The topology is built uppon Module load an it's static through the runtime.
utop # Processor.Topology.t;;
- : Processor.Cpu.t list =
[{Processor.Cpu.id = 0; kind = Processor.Cpu.P_core; smt = 0; core = 0; socket = 0};
{Processor.Cpu.id = 1; kind = Processor.Cpu.P_core; smt = 0; core = 1; socket = 0};
{Processor.Cpu.id = 2; kind = Processor.Cpu.P_core; smt = 0; core = 2; socket = 0};
{Processor.Cpu.id = 3; kind = Processor.Cpu.P_core; smt = 0; core = 3; socket = 0};
{Processor.Cpu.id = 4; kind = Processor.Cpu.P_core; smt = 1; core = 0; socket = 0};
{Processor.Cpu.id = 5; kind = Processor.Cpu.P_core; smt = 1; core = 1; socket = 0};
{Processor.Cpu.id = 6; kind = Processor.Cpu.P_core; smt = 1; core = 2; socket = 0};
{Processor.Cpu.id = 7; kind = Processor.Cpu.P_core; smt = 1; core = 3; socket = 0}]
Sometimes it may be useful to see "what happens" when you restrict your application to a set of CPUs, maybe you don't want them to cross a socket, or maybe you want to see how it behaves without two threads fighting for its core resources.
The affinity must be set on its own running context, so if you are using Domains, it must be called individually within each domain.
Say you you want to restrict to running only on the threads of core 0:
utop # Processor.Affinity.set_cpus (Processor.Cpu.from_core 0 Processor.Topology.t);;
- : unit = ()
utop # Processor.Affinity.get_cpus ();;
- : Processor.Cpu.t list =
[{Processor.Cpu.id = 0; kind = Processor.Cpu.P_core; smt = 0; core = 0; socket = 0};
{Processor.Cpu.id = 4; kind = Processor.Cpu.P_core; smt = 1; core = 0; socket = 0}]
A simple binary called ocaml-processor-dump
is provided:
$ ocaml-processor-dump
cpu_count: 8
core_count: 4
socket_count: 1
cpus-per-core: 2
cpus-per-socket: 8
cores-per-socket: 4
cpu0: smt=0 core=0 socket=0 kind=P_core
cpu1: smt=0 core=1 socket=0 kind=P_core
cpu2: smt=0 core=2 socket=0 kind=P_core
cpu3: smt=0 core=3 socket=0 kind=P_core
cpu4: smt=1 core=0 socket=0 kind=P_core
cpu5: smt=1 core=1 socket=0 kind=P_core
cpu6: smt=1 core=2 socket=0 kind=P_core
cpu7: smt=1 core=3 socket=0 kind=P_core
Turns out all of this is harder than it should, there are basically no portable APIs and even the consensus of what a CPU thread is, is sketchy between different architectures.
On AMD64 we visit each CPU, by pinning our current context, and then do the whole CPUID dance manually, the only thing we need from the system is a working pthread_setaffinity_np
. Query
and Topology
will be accurate as long as the process doesn't start in an already restricted affinity.
On anything other than AMD64 we will build a fake topology by using Query
, each CPU will be its own core and everyone will be on the same socket.
Initially I've added support for parsing /proc/cpuinfo
on Linux for other architectures, but the format is not standarized, so it isn't worth it.
On these systems Query
is accurate for cpu_count
, but thread_count
and socket_count
will be faked, topology will be faked and affinity is a nop. NetBSD and DragonflyBSD could have affinity support but I don't want to maintain it. OpenBSD has no support for it.
Apple doesn't support affinity/pinning, so in order to retrieve the actual apicid
in AMD64 we have to go through the horrible ioreg
stuff from Apple, which we do. On Apple ARM64 we also go through ioreg
to retrieve the relationship between E-cores
and P-cores
. On Apple, Query
and Topology
will always be accurate.
If you want to work on cache topology, I'll send you beers.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4