On 12:07 am, janssen at parc.com wrote: >exarkun at twistedmatrix.com wrote: >>On 08:31 pm, janssen at parc.com wrote: >> >My Intel Snow Leopard 2 build slave has gone into outer-space again. >> > >> >When I look at it, I see buildslave taking up most of a CPU (80%), >>and >> >nothing much else going on. The twistd log says: >> > >> >[... much omitted ...] >> >2011-04-04 08:35:47-0700 [-] sending app-level keepalive >> >2011-04-04 08:45:47-0700 [-] sending app-level keepalive >> >2011-04-04 08:55:47-0700 [-] sending app-level keepalive >> >2011-04-04 09:03:15-0700 [Broker,client] lost remote >> >2011-04-04 09:03:15-0700 [Broker,client] lost remote >> >2011-04-04 09:03:15-0700 [Broker,client] lost remote >> >2011-04-04 09:03:15-0700 [Broker,client] lost remote >> >2011-04-04 09:03:15-0700 [Broker,client] lost remote >> > 2011-04-04 09:03:15-0700 [Broker,client] Lost connection to >> > dinsdale.python.org:9020 >> > 2011-04-04 09:03:15-0700 [Broker,client] >> > <twisted.internet.tcp.Connector instance at 0x101629ab8> will retry >> > in 3 seconds >> > 2011-04-04 09:03:15-0700 [Broker,client] Stopping factory >> > <buildslave.bot.BotFactory instance at 0x1016299e0> >> > 2011-04-04 09:03:18-0700 [-] Starting factory >> > <buildslave.bot.BotFactory instance at 0x1016299e0> >> >2011-04-04 09:03:18-0700 [-] Connecting to dinsdale.python.org:9020 >> > 2011-04-04 09:03:18-0700 [Uninitialized] Connection to >> > dinsdale.python.org:9020 failed: Connection Refused >> > 2011-04-04 09:03:18-0700 [Uninitialized] >> > <twisted.internet.tcp.Connector instance at 0x101629ab8> will retry >> > in 8 seconds >> > 2011-04-04 09:03:18-0700 [Uninitialized] Stopping factory >> > <buildslave.bot.BotFactory instance at 0x1016299e0> >> > 2011-04-04 09:03:27-0700 [-] Starting factory >> > <buildslave.bot.BotFactory instance at 0x1016299e0> >> >2011-04-04 09:03:27-0700 [-] Connecting to dinsdale.python.org:9020 >> > >> >So it's been spinning its wheels for 3 days. >> >>Does this mean that the "2011-04-04 09:03:27-0700 [-] Connecting to >>dinsdale.python.org:9020" message in the logs is the last one you see >>until you restart the slave? > >Yes, that's the last line in the file. >>Or does it mean that the logs go on and on for three days with these >>"Connecting to dinsdale...." / "Connection Refused" / "... will retry >>in N seconds" cycles, thousands and thousands of times? > >Well, it's doing something, chewing up cycles, but there's only one >"Connecting" line at the end of the log file. That's very interesting. It may be worth doing some gdb or dtrace investigation next time it gets into this state. >>What does the buildmaster's info page for this slave say when the >>slave is in this state? In particular, what does it say about >>"connects/hour"? > >Ah, good question. Too bad I restarted the slave after I sent out my >info. Is there some way to recover that from earlier? If not, it will >undoubtedly fail again in a few days. If the master logs are available, that would provide some information. Otherwise, I think waiting for it to happen again is the thing to do. Since there were no other messages in the log file, I expect the connects/hour value will be low - perhaps 0. Jean-Paul
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4