Showing content from http://mail.python.org/pipermail/python-dev/attachments/20170908/625f10df/attachment.html below:
<div dir="ltr">I also like having the header fixed-size, so it might be possible to rewrite headers (e.g. to flip the source bit) without moving the rest of the file.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Sep 8, 2017 at 3:38 AM, Antoine Pitrou <span dir="ltr"><<a href="mailto:solipsis@pitrou.net" target="_blank">solipsis@pitrou.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Fri, 8 Sep 2017 12:04:52 +0200<br>
Antoine Pitrou <<a href="mailto:solipsis@pitrou.net">solipsis@pitrou.net</a>> wrote:<br>
> On Thu, 7 Sep 2017 18:47:20 -0700<br>
> Nick Coghlan <<a href="mailto:ncoghlan@gmail.com">ncoghlan@gmail.com</a>> wrote:<br>
> > However, I do wonder whether we could encode *all* the mode settings<br>
> > into the magic number, such that we did something like reserving the<br>
> > top 3 bits for format flags:<br>
> ><br>
> > * number & 0x1FFF -> the traditional magic number<br>
> > * number & 0x8000 -> timestamp or hash?<br>
> > * number & 0x4000 -> checked or not?<br>
> > * number & 0x2000 -> reserved for future format changes<br>
><br>
> I'd rather a single magic number and a separate bitfield that tells<br>
> what the header encodes exactly. We don't *have* to fight for a tiny<br>
> size reduction of pyc files.<br>
<br>
Let me expand a bit on this. Currently, the format is:<br>
<br>
- bytes 0..3: magic number<br>
- bytes 4..7: source file timestamp<br>
- bytes 8..11: source file size<br>
- bytes 12+: pyc file body (marshal format)<br>
<br>
What I'm proposing is:<br>
<br>
- bytes 0..3: magic number<br>
- bytes 4..7: header options (bitfield)<br>
- bytes 8..15: header contents<br>
  Depending on header options:<br>
  - bytes 8..11: source file timestamp<br>
  - bytes 12..15: source file size<br>
  or:<br>
  - bytes 8..15: 64-bit source file hash<br>
- bytes 16+: pyc file body (marshal format)<br>
<br>
This way, we keep a single magic number, a single header size, and<br>
there's only a per-build variation in the middle of the header.<br>
<br>
<br>
Of course, there are possible ways to encode information. For<br>
example, the header could be a sequence of Type-Length-Value triplets,<br>
perhaps prefixed with header size or body offset for easy seeking.<br>
<br>
My whole point here is that we can easily avoid the annoyance of dual<br>
magic numbers and encodings which must be maintained in parallel.<br>
<br>
Regards<br>
<br>
Antoine.<br>
<br>
<br>
______________________________<wbr>_________________<br>
Python-Dev mailing list<br>
<a href="mailto:Python-Dev@python.org">Python-Dev@python.org</a><br>
<a href="https://mail.python.org/mailman/listinfo/python-dev" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/listinfo/python-dev</a><br>
Unsubscribe: <a href="https://mail.python.org/mailman/options/python-dev/guido%40python.org" rel="noreferrer" target="_blank">https://mail.python.org/<wbr>mailman/options/python-dev/<wbr>guido%40python.org</a><br>
</blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature">--Guido van Rossum (<a href="http://python.org/~guido" target="_blank">python.org/~guido</a>)</div>
</div>
RetroSearch is an open source project built by @garambo
| Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4