Hi, Heterogeneous modules seem like an important feature when targeting accelerators. On Mon, Jul 27, 2020 at 11:01 PM Johannes Doerfert via llvm-dev < llvm-dev at lists.llvm.org> wrote: > TL;DR > ----- > > Let's allow to merge to LLVM-IR modules for different targets (with > compatible data layouts) into a single LLVM-IR module to facilitate > host-device code optimizations. > I think the main question I have is with respect to this limitation on the datalayout: isn't it too limiting in practice? I understand that this is much easier to implement in LLVM today, but it may get us into a fairly limited place in terms of what can be supported in the future. Have you looked into what would it take to have heterogeneous modules that have their own DL? > > > Wait, what? > ----------- > > Given an offloading programming model of your choice (CUDA, HIP, SYCL, > OpenMP, OpenACC, ...), the current pipeline will most likely optimize > the host and the device code in isolation. This is problematic as it > makes everything from simple constant propagation to kernel > splitting/fusion painfully hard. The proposal is to merge host and > device code in a single module during the optimization steps. This > should not induce any cost (if people don't use the functionality). > > > But how do heterogeneous modules help? > -------------------------------------- > > Assuming we have heterogeneous LLVM-IR modules we can look at > accelerator code optimization as an interprocedural optimization > problem. You basically call the "kernel" but you cannot inline it. So > you know the call site(s) and arguments, can propagate information back > and forth (=constants, attributes, ...), and modify the call site as > well as the kernel simultaneously, e.g., to split the kernel or fuse > consecutive kernels. Without heterogeneous LLVM-IR modules we can do all > of this, but require a lot more machinery. Given abstract call sites > [0,1] and enabled interprocedural optimizations [2], host-device > optimizations inside a heterogeneous module are really not (much) > different than any other interprocedural optimization. > > [0] https://llvm.org/docs/LangRef.html#callback-metadata > [1] https://youtu.be/zfiHaPaoQPc > [2] https://youtu.be/CzWkc_JcfS0 > > > Where are the details? > ---------------------- > > This is merely a proposal to get feedback. I talked to people before and > got mixed results. I think this can be done in an "opt-in" way that is > non-disruptive and without penalty. I sketched some ideas in [3] but > *THIS IS NOT A PROPER PATCH*. If there is interest, I will provide more > thoughts on design choices and potential problems. Since there is not > much, I was hoping this would be a community effort from the very > beginning :) > > [3] https://reviews.llvm.org/D84728 > > > But MLIR, ... > ------------- > > I imagine MLIR can be used for this and there are probably good reasons > to do so. We might not want to *only* to do it there with mainly the > same arguments other things are still developed on LLVM-IR level. Feel > free to ask though :) (+1 : MLIR is not intended to be a reason to not improve LLVM!) -- Mehdi -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20200728/f51a4b86/attachment.html>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4