We now have received responses from eight different vendors (more inquiries outstanding) on the Intel Atom C2000 bug. Intel noted that it needed to set aside a reserve for this bug in their latest earnings call. At STH we were the first to review Avoton and Rangeley products so we started digging. We also did want to note that we now have 22x Rangeley-server and networking products in production. The newest is 20 months old and the oldest is over 3 years and we have had zero failures thus far. We expect to see more vendors announce solutions as early as this week but wanted to provide a bit of insight as to what is happening. We also do believe (unconfirmed) that this may be a reason we recently saw yet another Denverton delay.
The Bug – A few non-NDA Sources Intel Rangeley Avoton CPU PackageYou can see in the latest Intel Atom C2000 family spec update a new errata. You can also see Cisco’s page on the bug and equipment replacement here. Here is AVR54 which popped up in the latest errata:
AVR54. System May Experience Inability to Boot or May Cease Operation Problem: The SoC LPC_CLKOUT0 and/or LPC_CLKOUT1 signals (Low Pin Count bus clock outputs) may stop functioning.
Implication: If the LPC clock(s) stop functioning the system will no longer be able to boot.
Workaround: A platform level change has been identified and may be implemented as a workaround for this erratum.
(Source: Intel Atom C2000 family spec update dated January 2017)
From Cisco’s FAQ:
Q: Do you expect these products to fail at 18 months in operation?
Although the issue may occur beginning at 18 months in operation, based on information provided by the supplier, we don’t expect an unusual spike in failures until year three of runtime.
(Source: Cisco Clock Signal Component Issue FAQ)
Responses from VendorsCisco was perhaps the most aggressive in getting platforms RMA’d and fixed and has been very public about it. Other vendors have been slower to announce plans.
If you are from a company or know of a company that has publicly disclosed their plans, please let me know (e-mail patrick at this domain) and we will update this post.
Industry-wide NDA?If you notice from Cisco’s FAQ, Cisco is declining to provide specifics:
Q: Who supplies the impacted component?
As a matter of policy, Cisco stands behind the reputation of our products. We do not intend to publicly name the supplier.
(Source: Cisco Clock Signal Component Issue FAQ)
Thus far (February 7, 2017) we have received responses from eight vendors who supply Rangeley products, excluding Cisco. Every single vendor has declined to discuss specifics citing NDA, some called me directly to say they were not responding due to NDA concerns.
No vendor confirmed this, however putting the pieces together we can see that all of the vendors are giving us the “cannot talk about this due to NDA restrictions” response. We can also see that Intel set aside a large reserve. Our educated guess is that Intel may have tied access to those reserve funds to signing an NDA for not discussing the issue.
How bad is it?Since we now have almost two dozen Intel Atom C2000 series machines deployed for 20-40 months, from 6 different vendors, we feel that this is not going to be the case where every machine fails immediately at 18 months or 36 months. While a small sample size, we can at least rule out “every” device failing. We did confirm this with a few of STH’s web hosting industry readers who we know have Avoton/ Rangeley deployed in much greater quantities.
UPDATE 2017-02-09: Online.net just posted that they are aware of a vendor component issue but are not seeing high failure volumes even on almost 68,000 nodes. Assuming this is related to the C2000 bug this would support what we have seen in our small deployments and from other web hosts with thousands installed themselves.
At the same time, we do urge manufacturers to have longer-term replacement plans available in the event failures do occur. We also hope to see vendors clearly spell out what their replacement policies are on this issue in a centralized place, similar to how Cisco has done.
Please do let us know if you hear of any programs or responses to this Intel Atom C2000 series bug and we will update this article accordingly.
Discuss this article on the STH forums.
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3