Hi,
I'm a happy lisp programmer, currently using Clozure CL on Mac OS.
A friend mentioned in casual conversation that "lisp is about as fast as compiled C".
I was fascinated by this statement, and immediately resolved to test this claim by converting a chess engine that I wrote in C to a nearly equivalent implementation in lisp.
To my surprise, my lisp implementation is 1,600 times slower than a nearly identical C implementation. I'm not sure what the cause of this is, since I actually expected to initially have a lisp implementation that was perhaps 10x slower and then improve the implementation until finally perhaps it was only slightly slower than an equivalent C implementation (in other words I pretty much believe the claim even though clearly there are some significant problems in my own efforts to demonstrate the truth of it).
I wonder, can anyone comment on the performance of Clozure CL on Mac OS? Is there another Common Lisp implementation available for Mac that has significantly improved performance? And/or, does anyone know what the highest speed lisp implementation is for unix/linux/freebsd type machines? I checked my functions in Clozure CL and it does seem to be compiling them down to assembly. I can't say for sure that the assembly is high performance.
I have another problem: there is no performance profiler on Clozure CL. I did some searching and found something called "Sam: A Tiny Concurrent Sampling Profiler for Clozure Common Lisp".
It is here: http://mr.gy/blog/sam.html
I have two problems when trying to use Sam. The first is I try to profile code for (roughly) 5 minutes. In this case, the whole of Clozure CL freezes up and I have to kill CCL and restart the app (total loss of all data). The second problem is, I have tried to profile the code for a much shorter period of time, perhaps 30 seconds. This "works" in the sense that Sam outputs a report of the top n functions that "use the most time" for the lisp function that was profiled. The only problem is that after profiling the same function with exactly the same arguments and data multiple times, it is now quite clear that the output of Sam is essentially a nice collection of random numbers. Functions that appear to take 9% of the time in one run jump to 20% of total run time in another (exactly the same inputs and data) run, and in a third run, the function is no longer listed (perhaps meaning less than 1% of total run time?).
I can't optimize performance using this tool. There are no consistent patterns that emerge given data from multiple runs. It's just noise.
Might anyone know of a high performance lisp implementation that also has a working code profiler?
One other totally unrelated question: Does anyone know of a lisp implementation that provides a really great real time 2D/3D graphics capability? The best option I have found (so far) appears to be to use lisp bindings into the SDL2 library (Simple Directmedia Layer version 2). I'm pretty sure I could get this to work but I have doubts as to how portable it would be and how easy it would be for a beginning lisp programmer to set up (so they can play with my code).
Thanks and sorry for such a long posting,
Rob
Hi,
I'm a happy lisp programmer, currently using Clozure CL on Mac OS.
A friend mentioned in casual conversation that "lisp is about as fast as compiled C".
I was fascinated by this statement, and immediately resolved to test
this claim by converting a chess engine that I wrote in C to a nearly
equivalent implementation in lisp.
First, read the references at http://cliki.net/Performance
It is possible to write CL programs and compile them with SBCL to be
even faster than some versions of gcc.
And indeed, with native CL compilers you should be able to get speeds on
same order of magnitude (2 or 3 times at most).
To my surprise, my lisp implementation is 1,600 times slower than a
nearly identical C implementation.
But also, it is very easy for newbies to be oblivious of the performance
impact of lisp forms.
I'm not sure what the cause of
this is,
We'd have to see the sources.
Profilers are useless when you have a 1600x slower program. In that
case, the problem is your algorithms and your data structures.
You may use the profiler to go from 5x to 2x.
One other totally unrelated question: Does anyone know of a lisp
implementation that provides a really great real time 2D/3D graphics
capability?
Lisp is a general programming language. Perhaps you want CUDA?
--
__Pascal J. Bourguignon
http://www.informatimago.com
this may not be too helpful but i found you can profile your code
You have to add everywhere that these integers are used, and make
sure that the compiler knows that the functions that return these
values are still of these types, so it won't box them. This is not
impossible but it is tricky, and may not be worth doing casually or
for leisure
But how?
By putting (THE (UNSIGNED-BYTE 64) X) around every X
but this may not be necessary after all... see below...
e.g. (let ((x (add_bit_to_bitfield (bitboard-white_pawns board) offset)))
(declare (type (unsigned-byte 64)) x)
(setf (the (unsigned-byte 64) (bitboard-white_pawns board)) x))
Do I need to switch to a different lisp implementation? Because there
is no way to view argument types in Clozure CL (that I have found).
I haven't seen argument types in any implementation, so I can't answer
that.
[mentioned where]
This link was provided by Robert Munyer at the very end of his most
recent post.
[I missed that thx]
I'm afraid The only trick in my book was that mentioned in Gary's
posts. This was also what Robert Munyer suggested: To declare
everything as (unsigned-byte 64) and to make sure the compiler doesnt
do any unboxing in the operations that use them.
But I now see you have already tried disassembling
add_bit_to_bitfield. Run the profiler to make sure that the problem
is bignum consing and not something else, maybe try to identify the
functions where it is happening. Maybe Declare add_bit_to_bitfield
inline and dissassemble the function calling it to make sure nothing
is getting boxed.
Remember, Clozure CL has no profiler. Maybe you have a suggestion of
a lisp implementation I could switch to that does have a profiler?
Hopefully it also prints out the argument types of functions so I can
check that too...
The code seems to be portable common lisp, it should run on any
implementation though I have no reason to recommend you to switch.
I admit the linux oprofile profiling of ccl documentation is daunting.
I noticed you said you couldnt run the SAM profiler, but you could still
have run it with a smaller depth (sam:profile () (perft $board 5)) --
which would still give results indicative of where the time was spent.
There is another profiler mentioned on the clozure mailing list
From: John Carroll <***@sussex.ac.uk>
Date: Wed, 3 Aug 2016 13:09:54 +0100
http://users.sussex.ac.uk/~johnca/profile.lisp
The comments at the beginning of this file show how you should read the
output.
I was able to get this running within 10 minutes on your code (after
indenting it to my taste so I could look at it), and you should get the
profiler working too, It is very informative, and the first run
(pro:profile-clear) (pro:profile (perft $board 3)) (pro:profile-results)
immediately suggested that my 64-bit handwaving was a bougyman and
logbitp was not the biggest culprit, instead it was an CCL internal list
DELETE function, called from REMOVE-IF in
remove_moves_leaving_own_king_in_check. This was easily replaced by a
loop
(defun remove_moves_leaving_own_king_in_check (board movelist)
(loop for x in movelist
when (let ((side (bitboard-side_to_move board))
(local_board (copy-structure board)))
(makemove local_board x)
(if (detect_check_fast local_board side) T nil))
collect x))
Try (time (perft $a 3)) - before and after this change. removing
remove-if-not seemed to cut consing down by a third! Profile showed this!
This was just the first iteration of profiling and optimising, I didn't
go further just yet..
Should I be worried about the tremendous data usage that I see when running perft?
(time (perft gameboard 5))
----> 3,234,097,952 bytes of memory allocated.
The C version only needs a small amount of memory, on the order of a
few kilobytes. The lisp implementation seems to need much more
memory, apparently 3.2 Gigabytes!
Is your C version allocating and freeing lists for every step like the
lisp code does? is it copying the board with malloc and free?
After that change on a 1Mhz box I get
(time (perft $a 5))
(PERFT $A 5) took 232,492,950 microseconds (232.492950 seconds) to run
with 2 available CPU cores.
During that period, 231,160,000 microseconds (231.160000 seconds) were
spent in user mode 1,160,000 microseconds (1.160000 seconds) were spent
in system mode
8,035,060 microseconds (8.035060 seconds) was spent in GC.
2,535,319,504 bytes of memory allocated.
It is still pretty bad. Your program is generating lisp lists during
the normal search operation so you would try to identify where this is
happening --- by profiling, the answer is still the obvious answer:
identify the biggest bottleneck, eliminating it, going after the next
biggest fruit, optimize that repeat.
I stopped here as I could not spot anything obvious optimization but I
am likely missing something obvious
How does one have Clozure CL report whether boxed or unboxed
representations are being used?
I'll defer on this... [The point I wanted to make was that CCL supports
(is supposed to support) logical operations and arithmetic on
(unsigned-byte 64), ie. it should compile down to the same instructions
as C, without bignums arithmetic. When we start looking at the
profiling results and come to the point when logbitp is a bottleneck, we
can consider this again]
Hi,
I'm a happy lisp programmer, currently using Clozure CL on Mac OS.
<snip>
I wonder, can anyone comment on the performance of Clozure CL on Mac OS? Is there another Common Lisp implementation available for Mac that has significantly improved performance? And/or, does anyone know what the highest speed lisp implementation is for unix/linux/freebsd type machines? I checked my functions in Clozure CL and it does seem to be compiling them down to assembly. I can't say for sure that the assembly is high performance.
I have another problem: there is no performance profiler on Clozure CL. I did some searching and found something called "Sam: A Tiny Concurrent Sampling Profiler for Clozure Common Lisp".
I do not use Mac OS, I use Linux. But since CPU-s are the same
performance on Linux on compute-intensive calls should be
similar. For me the fastest Lisp is sbcl. Clozure CL is
pretty fast and on some (many ???) tasks may be faster than
sbcl. However, sbcl have good profiler and with profiler
help I was able to get nice speedups. I was not able to
get _useful_ profiling information from Clozure CL, so
I do not know how to find and eliminate bottlenecks
specific to Clozure CL...
--
Waldek Hebisch
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4