On 2016-01-27 3:46 PM, Glenn Linderman wrote: > On 1/27/2016 12:37 PM, Yury Selivanov wrote: >> >>> >>> MicroPython also has dictionary lookup caching, but it's a bit >>> different to your proposal. We do something much simpler: each opcode >>> that has a cache ability (eg LOAD_GLOBAL, STORE_GLOBAL, LOAD_ATTR, >>> etc) includes a single byte in the opcode which is an offset-guess >>> into the dictionary to find the desired element. Eg for LOAD_GLOBAL >>> we have (pseudo code): >>> >>> CASE(LOAD_GLOBAL): >>> key = DECODE_KEY; >>> offset_guess = DECODE_BYTE; >>> if (global_dict[offset_guess].key == key) { >>> // found the element straight away >>> } else { >>> // not found, do a full lookup and save the offset >>> offset_guess = dict_lookup(global_dict, key); >>> UPDATE_BYTECODE(offset_guess); >>> } >>> PUSH(global_dict[offset_guess].elem); >>> >>> We have found that such caching gives a massive performance increase, >>> on the order of 20%. The issue (for us) is that it increases bytecode >>> size by a considerable amount, requires writeable bytecode, and can be >>> non-deterministic in terms of lookup time. Those things are important >>> in the embedded world, but not so much on the desktop. >> >> That's a neat idea! You're right, it does require bytecode to become >> writeable. > > Would it? > > Remember "fixup lists"? Maybe they still exist for loading function > addresses from one DLL into the code of another at load time? > > So the equivalent for bytecode requires a static table of > offset_guess, and the offsets into that table are allocated by the > byte-code loader at byte-code load time, and the byte-code is "fixed > up" at load time to use the correct offsets into the offset_guess > table. It takes one more indirection to find the guess, but if the > result is a 20% improvement, maybe you'd still get 19%... Right, in my current patch I have an offset table per code object. Essentially, this offset table adds 8bits per opcode. It also means that only first 255 LOAD_GLOBAL/LOAD_METHOD opcodes *per-code-object* are optimized (because the offset table only can store 8bit offsets), which is usually enough (I think you need to have more than a 500 lines of code function to reach that limit). Yury
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4