A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://mail.python.org/pipermail/python-dev/2001-October/018002.html below:

[Python-Dev] python2.1.1 SEGV in GC on Solaris 2.7

[Python-Dev] python2.1.1 SEGV in GC on Solaris 2.7Neil Schemenauer nas@python.ca
Thu, 18 Oct 2001 07:38:15 -0700
[I'm moving the discussion here from SF, using the tracker is too
painful.]

Anthony Baxter:
> I've got a Zope installation where python2.1.1 is
> segfaulting on Solaris2.7 - it's running a largish 
> ZEO server.
[..]
> Here's the trace with debugging enabled:
> 
> #0  0xff00 in ?? ()
> #1  0x402f0 in collect (young=0x9b538, old=0x9b544) at
> ./Modules/gcmodule.c:379
> #2  0x405a8 in collect_generations () at
> ./Modules/gcmodule.c:484
> #3  0x40624 in _PyGC_Insert (op=0xbc1f24) at
> ./Modules/gcmodule.c:507
> #4  0x5a224 in PyList_New (size=0) at Objects/listobject.c:61
> #5  0x21bc8 in eval_code2 (co=0x1cb370, globals=0x21bc0,
> locals=0x67,
>     args=0x0, argcount=1, kws=0xf89b24, kwcount=0, defs=0x0,
> defcount=0,
>     closure=0xbc1f24) at Python/ceval.c:1741
> 
> Next trick is to rebuild without any optimisation (sigh)
> as I suspect that it's inlined subtract_refs().

Martin v. Löwis:
> It would be interesting what the value of "gc" is at the 
> time of the crash. It looks like you got an object that 
> claims to support GC but has a null tp_traverse.

Anthony Baxter:
> Ok, I have an intact core file, and a matching binary,
> no optimisations, nothing. This crash is showing the
> crash at line 166 of gcmodule.c
>  traverse = PyObject_FROM_GC(gc)->ob_type->tp_traverse;
> PyObject_FROM_GC(gc)->ob_type in this case is
> 
> $24 = {ob_refcnt = 1, ob_type = 0x0}
> 
> To check my logic, I checked gc_next and gc_prev using 
> the same GDB magic, and they correctly show up as a tuple
> and an instance method. 
> 
> Some fiddling around seems to rule out stack space as the
> problem, as well. We're going to try and see if purify 
> helps here, but the problem looks to be a junk object - 
> I have no idea how to track this down further. Help?
> Would taking the horrible horrible hack of removing the
> object from the gc linked list if ob_type is null help?
> Well, it'd stop the crashes, anyway.

Martin v. Löwis:
> There are two options:
> 
> a) the object isn't really a GC object, i.e. has no GC
> header. In gdb, you can try to cast gc to PyObject*, then
> see if the resulting pointer has a better ob_type (this is
> unlikely, though, since the logic entering the object was
> already using fromgc/togc)
> 
> b) somebody has cleared the ob_type field.
> 
> Can you guarantee that all extension modules have been
> compiled with the 2.1.1 header files?
> 
> Is the problem repeatable in the sense that gc will have the
> same pointer value on each crash? If so, it is relatively
> easy to track down: just set a gdb change watchpoint on the
> address on the ob_type field of that address (note that
> setting watchpoints is not possible until there is really
> mapped memory on that address).
> 
> If you can't analyse it through change breakpoints, I
> recommend to annotate the interpreter in the following way:
> in pyobject_init, put a printf that prints the address and
> the tp_name of the type. In subtract_refs, if the ob_type
> slot is null, print the address of the object and abort.
> Then analyse the log to see whether a object really has been
> allocated on that address, and what its type was (make sure
> you consider the possibility that address are off by the
> delta that FROM_GC adds).

Anthony Baxter:
> It's not a GC object. I'm positive all the extension 
> objects are correct - I just recompiled, without the
> 1.5/2.0 headers around.
> It's a different pointer each time round, unfortunately. It 
> also takes anything from 5 minutes to 2 hours to reproduce.
> I've got about 4 copies of it running now, and I've got a
> bunch of different core files. I've grabbed purify and an
> eval license, and I'm feeding it the binary. 
> 
> The printf approach is probably not going to work - these
> are busy busy Zope servers. Instead, my plan, assuming that
> purify doesn't immediately spot a problem, is to change the
> code so that if it gets a dud GC object, it will just bust
> it out of the tree and let it leak, and print a message 
> saying so. Then I can quit the program, and purify will
> tell me 'hey, you leaked!' and also tell me where it was
> allocated. 
> 
> More concerning, about half the segfaults are not from the
> GC at all, but from realloc in PyFrame_New (line 161 of
> frameobject). These are the only two I'm getting - it's 
> split 50-50 amongst the 10 coredumps I have now. I'm not
> sure whether to open a seperate bug for this. 
> 
> Has python2.1.1 been purified? With Zope and zope's 
> extensions?
> 
> 
> Wow - it's amazing how this SF bug thing is so painful for
> conversations :)

The ob_type pointer must be getting cleared after the object has been
added to the GC lists.  The PyObject_IS_GC call in _PyGC_Insert would
have segfaulted otherwise.  Knowing the type of the object would be
helpful in debugging the problem.  I suggest reconsidering Martin's
printf idea.  You could add something like this to _PyGC_Insert:

    void
    _PyGC_Insert(PyObject *op)
    {
        static int did_open = 0;
        static FILE *log;
        if (!did_open) {
            did_open = 1;
            log = fopen("type.log", "w");
        }
        fprintf(log, "%p %p\n", op, op->ob_type);
    ...
        
Debugging this type of problem is really hard (as you already know)
because the effect of the bug is found so far away from the source.

  Neil



RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4