On 7/28/20 2:25 PM, Mehdi AMINI wrote: > On Tue, Jul 28, 2020 at 12:07 PM Johannes Doerfert < > johannesdoerfert at gmail.com> wrote: > >> [I removed all but the data layout question, that is an important topic] >> On 7/28/20 1:03 PM, Mehdi AMINI wrote: >>Â > TL;DR >>Â >> ----- >>Â >> >>Â >> Let's allow to merge to LLVM-IR modules for different targets (with >>Â >> compatible data layouts) into a single LLVM-IR module to facilitate >>Â >> host-device code optimizations. >>Â >> >>Â > >>Â > I think the main question I have is with respect to this limitation >> on the >>Â > datalayout: isn't it too limiting in practice? >>Â > I understand that this is much easier to implement in LLVM today, but it >>Â > may get us into a fairly limited place in terms of what can be >> supported in >>Â > the future. >>Â > Have you looked into what would it take to have heterogeneous modules >> that >>Â > have their own DL? >> >> >> Let me share some thoughts on the data layouts situation, not all of >> which are >> fully matured but I guess we have to start somewhere: >> >> If we look at the host-device interface there has to be some agreement >> on parts of the datalayout, namely what the data looks like the host >> sends over and expects back. If I'm not mistaken, GPUs will match the >> host in things like padding, endianness, etc. because you cannot >> translate things "on the fly". That said, here might be additional >> "address spaces" on either side that the other one is not matching/aware >> of. Long story short, I think host & device need to, and in practice do, >> agree on the data layout of the address space they use to communicate. >> >> The above is for me a strong hint that we could use address spaces to >> identify/distinguish differences when we link the modules. However, >> there might be the case that this is not sufficient, e.g., if the >> default alloca address space differs. In that case I don't see a reason >> to not pull the same "trick" as with the triple. We can specify >> additional data layouts, one per device, and if you retrieve the data >> layout, or triple, you need to pass a global symbol as a "anchor". For >> all intraprocedural passes this should be sufficient as they are only >> interested in the DL and triple of the function they look at. For IPOs >> we have to distinguish the ones that know about the host-device calls >> and the ones that don't. We might have to teach all of them about these >> calls but as long as they are callbacks through a driver routine I don't >> even think we need to. >> >> I'm curious if you or others see an immediate problem with both a device >> specific DL and triple (optionally) associated with every global symbol. >> > > Having a triple/DL per global symbols would likely solve everything, I > didn't get from your original email that this was considered. > If I understand correctly what you're describing, the DL on the Module > would be a "default" and we'd need to make the DL/triple APIs on the Module > "private" to force queries to go through an API on GlobalValue to get the > DL/triple? That is what I tried to describe, yes. The "patch" I posted does this "conceptually" for the triple. You make them private or require a global value to be passed as part of the request, same result I guess. The key is that the DL/triple is a property of the global symbol. I'll respond to Renato's concerns on this as part of a response to him. >> >> >> ~ Johannes >> >> >
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4