A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://comp.lang.lisp.narkive.com/p5mtOud5/common-lisp-implementation-performance below:

Common Lisp Implementation Performance

Hi,

I'm a happy lisp programmer, currently using Clozure CL on Mac OS.

A friend mentioned in casual conversation that "lisp is about as fast as compiled C".

I was fascinated by this statement, and immediately resolved to test this claim by converting a chess engine that I wrote in C to a nearly equivalent implementation in lisp.

To my surprise, my lisp implementation is 1,600 times slower than a nearly identical C implementation. I'm not sure what the cause of this is, since I actually expected to initially have a lisp implementation that was perhaps 10x slower and then improve the implementation until finally perhaps it was only slightly slower than an equivalent C implementation (in other words I pretty much believe the claim even though clearly there are some significant problems in my own efforts to demonstrate the truth of it).

I wonder, can anyone comment on the performance of Clozure CL on Mac OS? Is there another Common Lisp implementation available for Mac that has significantly improved performance? And/or, does anyone know what the highest speed lisp implementation is for unix/linux/freebsd type machines? I checked my functions in Clozure CL and it does seem to be compiling them down to assembly. I can't say for sure that the assembly is high performance.

I have another problem: there is no performance profiler on Clozure CL. I did some searching and found something called "Sam: A Tiny Concurrent Sampling Profiler for Clozure Common Lisp".

It is here: http://mr.gy/blog/sam.html

I have two problems when trying to use Sam. The first is I try to profile code for (roughly) 5 minutes. In this case, the whole of Clozure CL freezes up and I have to kill CCL and restart the app (total loss of all data). The second problem is, I have tried to profile the code for a much shorter period of time, perhaps 30 seconds. This "works" in the sense that Sam outputs a report of the top n functions that "use the most time" for the lisp function that was profiled. The only problem is that after profiling the same function with exactly the same arguments and data multiple times, it is now quite clear that the output of Sam is essentially a nice collection of random numbers. Functions that appear to take 9% of the time in one run jump to 20% of total run time in another (exactly the same inputs and data) run, and in a third run, the function is no longer listed (perhaps meaning less than 1% of total run time?).

I can't optimize performance using this tool. There are no consistent patterns that emerge given data from multiple runs. It's just noise.

Might anyone know of a high performance lisp implementation that also has a working code profiler?

One other totally unrelated question: Does anyone know of a lisp implementation that provides a really great real time 2D/3D graphics capability? The best option I have found (so far) appears to be to use lisp bindings into the SDL2 library (Simple Directmedia Layer version 2). I'm pretty sure I could get this to work but I have doubts as to how portable it would be and how easy it would be for a beginning lisp programmer to set up (so they can play with my code).

Thanks and sorry for such a long posting,

Rob


Post by r***@gmail.com

Hi,

I'm a happy lisp programmer, currently using Clozure CL on Mac OS.

A friend mentioned in casual conversation that "lisp is about as fast as compiled C".

I was fascinated by this statement, and immediately resolved to test

this claim by converting a chess engine that I wrote in C to a nearly

equivalent implementation in lisp.


First, read the references at http://cliki.net/Performance

It is possible to write CL programs and compile them with SBCL to be

even faster than some versions of gcc.

And indeed, with native CL compilers you should be able to get speeds on

same order of magnitude (2 or 3 times at most).


Post by r***@gmail.com

To my surprise, my lisp implementation is 1,600 times slower than a

nearly identical C implementation.


But also, it is very easy for newbies to be oblivious of the performance

impact of lisp forms.


Post by r***@gmail.com

I'm not sure what the cause of

this is,


We'd have to see the sources.

Profilers are useless when you have a 1600x slower program. In that

case, the problem is your algorithms and your data structures.

You may use the profiler to go from 5x to 2x.


Post by r***@gmail.com

One other totally unrelated question: Does anyone know of a lisp

implementation that provides a really great real time 2D/3D graphics

capability?


Lisp is a general programming language. Perhaps you want CUDA?

--
__Pascal J. Bourguignon
http://www.informatimago.com

this may not be too helpful but i found you can profile your code


Post by r***@gmail.com
Post by Madhu
Post by Madhu

You have to add everywhere that these integers are used, and make

sure that the compiler knows that the functions that return these

values are still of these types, so it won't box them. This is not

impossible but it is tricky, and may not be worth doing casually or

for leisure


But how?


By putting (THE (UNSIGNED-BYTE 64) X) around every X

but this may not be necessary after all... see below...

e.g. (let ((x (add_bit_to_bitfield (bitboard-white_pawns board) offset)))

(declare (type (unsigned-byte 64)) x)

(setf (the (unsigned-byte 64) (bitboard-white_pawns board)) x))


Post by r***@gmail.com

Do I need to switch to a different lisp implementation? Because there

is no way to view argument types in Clozure CL (that I have found).


I haven't seen argument types in any implementation, so I can't answer

that.


Post by r***@gmail.com
Post by Madhu

[mentioned where]


This link was provided by Robert Munyer at the very end of his most

recent post.


[I missed that thx]


Post by r***@gmail.com
Post by Madhu

I'm afraid The only trick in my book was that mentioned in Gary's

posts. This was also what Robert Munyer suggested: To declare

everything as (unsigned-byte 64) and to make sure the compiler doesnt

do any unboxing in the operations that use them.

But I now see you have already tried disassembling

add_bit_to_bitfield. Run the profiler to make sure that the problem

is bignum consing and not something else, maybe try to identify the

functions where it is happening. Maybe Declare add_bit_to_bitfield

inline and dissassemble the function calling it to make sure nothing

is getting boxed.


Remember, Clozure CL has no profiler. Maybe you have a suggestion of

a lisp implementation I could switch to that does have a profiler?

Hopefully it also prints out the argument types of functions so I can

check that too...


The code seems to be portable common lisp, it should run on any

implementation though I have no reason to recommend you to switch.

I admit the linux oprofile profiling of ccl documentation is daunting.

I noticed you said you couldnt run the SAM profiler, but you could still

have run it with a smaller depth (sam:profile () (perft $board 5)) --

which would still give results indicative of where the time was spent.

There is another profiler mentioned on the clozure mailing list

From: John Carroll <***@sussex.ac.uk>

Date: Wed, 3 Aug 2016 13:09:54 +0100

http://users.sussex.ac.uk/~johnca/profile.lisp

The comments at the beginning of this file show how you should read the

output.

I was able to get this running within 10 minutes on your code (after

indenting it to my taste so I could look at it), and you should get the

profiler working too, It is very informative, and the first run

(pro:profile-clear) (pro:profile (perft $board 3)) (pro:profile-results)

immediately suggested that my 64-bit handwaving was a bougyman and

logbitp was not the biggest culprit, instead it was an CCL internal list

DELETE function, called from REMOVE-IF in

remove_moves_leaving_own_king_in_check. This was easily replaced by a

loop

(defun remove_moves_leaving_own_king_in_check (board movelist)

(loop for x in movelist

when (let ((side (bitboard-side_to_move board))

(local_board (copy-structure board)))

(makemove local_board x)

(if (detect_check_fast local_board side) T nil))

collect x))

Try (time (perft $a 3)) - before and after this change. removing

remove-if-not seemed to cut consing down by a third! Profile showed this!

This was just the first iteration of profiling and optimising, I didn't

go further just yet..


Post by r***@gmail.com

Should I be worried about the tremendous data usage that I see when running perft?

(time (perft gameboard 5))

----> 3,234,097,952 bytes of memory allocated.

The C version only needs a small amount of memory, on the order of a

few kilobytes. The lisp implementation seems to need much more

memory, apparently 3.2 Gigabytes!


Is your C version allocating and freeing lists for every step like the

lisp code does? is it copying the board with malloc and free?

After that change on a 1Mhz box I get

(time (perft $a 5))

(PERFT $A 5) took 232,492,950 microseconds (232.492950 seconds) to run

with 2 available CPU cores.

During that period, 231,160,000 microseconds (231.160000 seconds) were

spent in user mode 1,160,000 microseconds (1.160000 seconds) were spent

in system mode

8,035,060 microseconds (8.035060 seconds) was spent in GC.

2,535,319,504 bytes of memory allocated.

It is still pretty bad. Your program is generating lisp lists during

the normal search operation so you would try to identify where this is

happening --- by profiling, the answer is still the obvious answer:

identify the biggest bottleneck, eliminating it, going after the next

biggest fruit, optimize that repeat.

I stopped here as I could not spot anything obvious optimization but I

am likely missing something obvious


Post by r***@gmail.com

How does one have Clozure CL report whether boxed or unboxed

representations are being used?


I'll defer on this... [The point I wanted to make was that CCL supports

(is supposed to support) logical operations and arithmetic on

(unsigned-byte 64), ie. it should compile down to the same instructions

as C, without bignums arithmetic. When we start looking at the

profiling results and come to the point when logbitp is a bottleneck, we

can consider this again]


Post by r***@gmail.com

Hi,

I'm a happy lisp programmer, currently using Clozure CL on Mac OS.


<snip>


Post by r***@gmail.com

I wonder, can anyone comment on the performance of Clozure CL on Mac OS? Is there another Common Lisp implementation available for Mac that has significantly improved performance? And/or, does anyone know what the highest speed lisp implementation is for unix/linux/freebsd type machines? I checked my functions in Clozure CL and it does seem to be compiling them down to assembly. I can't say for sure that the assembly is high performance.

I have another problem: there is no performance profiler on Clozure CL. I did some searching and found something called "Sam: A Tiny Concurrent Sampling Profiler for Clozure Common Lisp".


I do not use Mac OS, I use Linux. But since CPU-s are the same

performance on Linux on compute-intensive calls should be

similar. For me the fastest Lisp is sbcl. Clozure CL is

pretty fast and on some (many ???) tasks may be faster than

sbcl. However, sbcl have good profiler and with profiler

help I was able to get nice speedups. I was not able to

get _useful_ profiling information from Clozure CL, so

I do not know how to find and eliminate bottlenecks

specific to Clozure CL...

--
Waldek Hebisch


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4