On Mon, Jan 12, 2015 at 12:48 PM, Philip Reames <listmail at philipreames.com> wrote: > > On 01/11/2015 10:17 PM, Daniel Berlin wrote: > > > > On Thu, Jan 8, 2015 at 3:33 PM, Philip Reames <listmail at philipreames.com> > wrote: > >> >> On 01/07/2015 05:33 PM, Chandler Carruth wrote: >> >>> How does this compare with classical approaches of loop peeling, >>> partitioning, fission, or whatever you might call it? >>> >> I'm still developing a good sense for this, but let me lay out some >> observations. Fair warning, this is going to be long and rambling. >> >> Let's start with a toy example: >> while(c) { >> x = this->x; >> y = this->y; >> if (x == y) { >> rare(); >> } >> } >> >> > > >> If we could tell x and y were loop invariant, we could unswitch this >> loop. However, the rare call clobbers our view of memory, so LICM fails, >> and thus unswitch fails. >> >> > >> We'd like to apply PRE here and push the reload into the loop preheader >> and the rare case. This currently fails for two reasons: 1) We don't allow >> code duplication during PRE, > > > ????? > If we don't, we aren't doing real PRE. So i'm not entirely sure what you > mean here. > > As I think you comment elsewhere in your response, we currently do not > duplicate loads; we only move them. The current structure is based around > finding a legal point which doesn't introduce a load along any path where > one didn't exist previously. If we have a path which has two copies of the > load, we try to find a place to move one of them such that all paths still > have the available load and we've removed the extra load along the one > path. > > (Note: The above explanation is rough and does not parallel how the code > is organized.) > > > > > > >> and 2) the canonical loop-simplify form of having a single latch means >> that PRE doesn't actually see the rare block at all, it sees the preheader >> and the join point after the if block. > > > > >> >> I think both of the previous are fixable: >> > > > GCC's PRE already does the above. > It doesn't do profile guided duplication. > We aren't doing anything special with these blocks. > > here is the code I used to test: > > struct a{ > int x; > int y; > }; > extern void rare(); > int mai(int c, struct a *this) > { > int d = 0; > while(c) { > int x = this->x; > int y = this->y; > d += x + y; > if (x == y) { > rare(); > } > } > return d; > } > > > It will do exactly what you expect, it is transformed into: > > struct a{ > int x; > int y; > }; > extern void rare(); > int mai(int c, struct a *this) > { > int d = 0; > int pretemp1 = this->x > int pretemp2 = this->y > > while(c) { > pretemp1phi = phi(rare block pretemp1, preheader pretemp1). > pretemp2phi = phi(rare block pretemp2, preheader pretemp2) > > d += pretemp1phi + pretemp2phi > if (x == y) { > rare(); > pretemp1 = this->x; > pretemp2 = this->y; > > } > } > return d; > } > I don't see why profile guided duplication is necessary here. This is a > basic load PRE case. It is handled by the first version of GVN-based Load > PRE I wrote for GCC. It is always a win. > > Well, to be pedantic, it is not always profitable. If this loop runs > exactly one iteration, this is both a code size pessimization and possibly > a runtime execution pessimization. > Sure, but it is a lifetime optimal PRE case :) > While in this particular case, you probably don't care, I'd worry that an > algorithm which gets this might also have other pessimization cases which > are more worrying. > Then you shouldn't use any basic PRE at all. It will always make these decisions. Profile guided PRE implementations are more complex. > > However, I think I've actually come to the same underlying conclusion. > This is a case that our PRE algorithm should handle without needing to > resort to profiling data. > > I have to admit that I am unfamiliar with the GCC implementation. Could > you outline the algorithm and it's important concepts? (links are fine too) > It uses an SCC based value numbering implementation( http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.56.1723), builds a value expressions, then performs GVNPRE on the value expressions ( https://www.cs.purdue.edu/homes/hosking/papers/cc04.pdf) > > > > Looking at what LLVM does, the failing on the PRE side is that our > PRE/GVN models are not strong enough to handle this. I'm not sure at all > why we think anything else is necessary. It's certainly not requiring > special code duplication heuristics, etc. > > So either you are thinking of a case that is different from the above, > or I am seriously confused :) > > Well, I think we've gotten to the same point here. An improved PRE > implementation should handle most of the cases I've actually encountered. > Having said that, I'm not yet willing to say that the profile guided loop > splitting isn't *also* worth exploring. From a practical perspective, a > lot of our existing optimizations are structured on loops. Letting those > kick in without falling back to (expensive) GVN might be worthwhile from a > practical perspective. This is as much a pass ordering problem as anything > else. > > p.s. I've started to post patches which improve the results given by our > current PRE to the llvm-commits list. So far, it's just simple stuff, but > that's the direction I'm current working in. I'm going to defer looking > into the loop nest splitting until I've pushed PRE as far as I easily can. > I truly think you would be better off coming up with a real PRE implementation than trying to push the current one, but it's your time :) The current PRE is an iterative PRE with a lot of limitations. > > Philip > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.llvm.org/pipermail/llvm-dev/attachments/20150112/4292b471/attachment.html>
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4