> Executive summary for python-dev folks seeing this for the first time: > > This thread started at > > http://mail.python.org/pipermail/spambayes/2003-February/003520.html > > Running in a single interpreter loop, I can score roughly 46 > messages per second. Running from the shell using hammiefilter.py > (which takes a msg on stdin and spits a scored message to stdout) > performance drops to roughly 2 messages per second. Neil > Schmenenauer noted all the failed open() calls during import > lookup, which got me started trying to whittle them down. > > Two more things to try before abandoning this quixotic adventure... > > It appears $prefix/python23.zip is left in sys.path even if it doesn't exist > (Just van Rossum explained to me in a bug report I filed that nonexistent > directories might actually be URLs or other weird hacks which import hooks > could make use of), so I went with the flow and created it, populating it > with the contents of $prefix/python2.3. My averate wallclock time went from > 0.5 seconds to 0.47 seconds and user+sys times went from 0.43 seconds to > 0.41 seconds. A modest improvement. > > One more little tweak. I moved the lib-dynload directory to the front of > sys.path (obviously only safe if nothing there appears earlier in sys.path). > Wall clock average stayed at 0.47 seconds and user+sys at 0.41 seconds, > though the total number of system calls as measured by ktrace went from 3454 > to 3042. > > Hammiefilter itself really does very little. Looking at the last > ktrace/kdump output, I see 3042 system calls. The hammie.db file isn't > opened until line 2717. All the rest before that is startup stuff, the > largest chunk of which are nami operations (731) and open (557) calls, most > of them involving nonexistent files (as evidenced by seeing only 164 calls > to close()). In contrast, only 278 system calls appear to be directly > related to manipulating the hammie database. > > This is still somewhat off-topic for this list (except for the fact that my > intention was to get hammiefilter to run faster), so I'll cc python-dev to > keep Tim happy, and perhaps mildly irritate Guido by discussing specific > apps on python-dev. Far from it, I wish spambayes well (and wish I could still be involved) :-). The issue seems to be that a moderately sized application takes a long time to start, right? How much of the user+sys time was user, how much was sys? Have you used python -v to see which modules it imports? Long ago I knew Hammie; I believe it reads a possibly large database. How much time does opening +closing the database take? (I presume that the 46 messages/second was not opening the database afresh for each message.) --Guido van Rossum (home page: http://www.python.org/~guido/)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4