On Tue, Jul 28, 2020 at 12:07 PM Johannes Doerfert < johannesdoerfert at gmail.com> wrote: > [I removed all but the data layout question, that is an important topic] > On 7/28/20 1:03 PM, Mehdi AMINI wrote: > > TL;DR > >> ----- > >> > >> Let's allow to merge to LLVM-IR modules for different targets (with > >> compatible data layouts) into a single LLVM-IR module to facilitate > >> host-device code optimizations. > >> > > > > I think the main question I have is with respect to this limitation > on the > > datalayout: isn't it too limiting in practice? > > I understand that this is much easier to implement in LLVM today, but it > > may get us into a fairly limited place in terms of what can be > supported in > > the future. > > Have you looked into what would it take to have heterogeneous modules > that > > have their own DL? > > > Let me share some thoughts on the data layouts situation, not all of > which are > fully matured but I guess we have to start somewhere: > > If we look at the host-device interface there has to be some agreement > on parts of the datalayout, namely what the data looks like the host > sends over and expects back. If I'm not mistaken, GPUs will match the > host in things like padding, endianness, etc. because you cannot > translate things "on the fly". That said, here might be additional > "address spaces" on either side that the other one is not matching/aware > of. Long story short, I think host & device need to, and in practice do, > agree on the data layout of the address space they use to communicate. > > The above is for me a strong hint that we could use address spaces to > identify/distinguish differences when we link the modules. However, > there might be the case that this is not sufficient, e.g., if the > default alloca address space differs. In that case I don't see a reason > to not pull the same "trick" as with the triple. We can specify > additional data layouts, one per device, and if you retrieve the data > layout, or triple, you need to pass a global symbol as a "anchor". For > all intraprocedural passes this should be sufficient as they are only > interested in the DL and triple of the function they look at. For IPOs > we have to distinguish the ones that know about the host-device calls > and the ones that don't. We might have to teach all of them about these > calls but as long as they are callbacks through a driver routine I don't > even think we need to. > > I'm curious if you or others see an immediate problem with both a device > specific DL and triple (optionally) associated with every global symbol. > Having a triple/DL per global symbols would likely solve everything, I didn't get from your original email that this was considered. If I understand correctly what you're describing, the DL on the Module would be a "default" and we'd need to make the DL/triple APIs on the Module "private" to force queries to go through an API on GlobalValue to get the DL/triple? > > > ~ Johannes > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200728/830dc2ab/attachment.html>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4