On 03/03/2011 15:09, Graham Stratton wrote: > On Mar 2, 3:01 pm, Graham Stratton<grahamstrat... at gmail.com> wrote: >> We are using marshal for serialising objects before distributing them >> around the cluster, and extremely occasionally a corrupted marshal is >> produced. The current workaround is to serialise everything twice and >> check that the serialisations are the same. On the rare occasions that >> they are not, I have dumped the files for comparison. It turns out >> that there are a few positions within the serialisation where >> corruption tends to occur (these positions seem to be independent of >> the data of the size of the complete serialisation). These are: >> >> 4 bytes starting at 548867 (0x86003) >> 4 bytes starting at 4398083 (0x431c03) >> 4 bytes starting at 17595395 (0x10c7c03) >> 4 bytes starting at 19794819 (0x12e0b83) >> 4 bytes starting at 22269171 (0x153ccf3) >> 2 bytes starting at 25052819 (0x17e4693) >> 3 bytes starting at 28184419 (0x1ae0f63) > > I modified marshal.c to print when it extends the string used to write > the marshal to. This gave me these results: > >>>> s = marshal.dumps(list((i, str(i)) for i in range(1400000))) > Resizing string from 50 to 1124 bytes > Resizing string from 1124 to 3272 bytes > Resizing string from 3272 to 7568 bytes > Resizing string from 7568 to 16160 bytes > Resizing string from 16160 to 33344 bytes > Resizing string from 33344 to 67712 bytes > Resizing string from 67712 to 136448 bytes > Resizing string from 136448 to 273920 bytes > Resizing string from 273920 to 548864 bytes > Resizing string from 548864 to 1098752 bytes > Resizing string from 1098752 to 2198528 bytes > Resizing string from 2198528 to 4398080 bytes > Resizing string from 4398080 to 8797184 bytes > Resizing string from 8797184 to 17595392 bytes > Resizing string from 17595392 to 19794816 bytes > Resizing string from 19794816 to 22269168 bytes > Resizing string from 22269168 to 25052814 bytes > Resizing string from 25052814 to 28184415 bytes > Resizing string from 28184415 to 31707466 bytes > > Every corruption point occurs exactly three bytes above an extension > point (rounded to the nearest word for the last two). This clearly > isn't a coincidence, but I can't see where there could be a problem. > I'd be grateful for any pointers. > I haven't found the cause, but I have found something else I'm suspicious of in the source for Python 3.2. In marshal.c there's a function "w_object", and within that function is this: else if (PyAnySet_CheckExact(v)) { PyObject *value, *it; if (PyObject_TypeCheck(v, &PySet_Type)) w_byte(TYPE_SET, p); else w_byte(TYPE_FROZENSET, p); "w_byte" is a macro which includes an if-statement, not a function. Doesn't it need some braces? (There's are braces in the other places they're needed.)
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4