Any solutions for avoiding the damage if you happen to get a new one?
What, if anything, can customers do to slow or stop degradation ahead of the microcode update?
Intel recommends that users adhere to Intel Default Settings on their desktop processors, along with ensuring their BIOS is up to date. Once the microcode patch is released to Intel partners, we advise users check for the relevant BIOS updates.
I destroyed my second CPU, a 14900KF, while having already been aware of that recommendation, and having disabled all of the settings like that that the motherboard vendor had enabled by default prior to ever inserting the replacement CPU, and only used the CPU with those settings; it still destroyed itself, like the first. I am very confident that you can still destroy a CPU having done that.
That isn’t to say that using conservative settings is a bad idea (and maybe doing something further, like running memory at minimum frequency, not just using the Intel recommended default rather than the motherboard vendor defaults, might actually manage to reliably avoid CPU damage). But I am confident that just running standard Intel recommended settings is not, alone, enough to avoid damage.
Completely agree, that was just a quote ripped straight from the article. From everything I’ve heard it seems like people are having problems just running stock settings. Your best bet to absolutely avoid any damage is probably to literally shut your system down until the patches are available.
There’s no 100% way until the new microcode is released next month. All affected CPUs are at risk of silicon degradation by the excessive voltage.
The are some power limits and July bios updates you can use that Intel says can help reduce the damage or prevent it entirely in some scenarios. I believe the damage is specifically caused by single threaded spikes, so reducing LLC and running something like prime95 in the background might hold the voltage low enough that it won’t happen. But there is no fix yet, so if your CPU is susceptible, running it will degrade the CPU, at least until the fix is out.
If you can avoid using a new one, I would. I would not buy or use an unused 13th gen or 14th gen Intel CPU until Intel completes their updates.
In my case, there was a period of time where I had an old, damaged 13th gen CPU, and a new, unused 14th gen.
I was always able to use my damaged CPUs without problems as long as I booted up Linux and told it to use only one core (maxcpus=1 on the GRUB command line passed to the kernel). Even two cores enabled, and it couldn’t even boot towards the end, but I never saw corruption with one.
If I could rewind time, I would continue to use my old CPU and avoid using the new one. I would add maxcpus=1 to my Linux command line (to do it every boot, edit /etc/default/grub, runsudo update-grub
on Debian-family systems). And I’d use the damaged CPU on a single core until I know that Intel has a workaround in microcode, my motherboard has the relevant BIOS update applied, and then l’d swap in the replacement CPU).
If I didn’t have a known-damaged CPU, just have a still-working 13th or 14th gen processor and could get by using an old desktop or laptop or something until the update is out, I’d probably do that if at all possible, so that I don’t incur damage.
I have one in the box from Christmas. Kinda scared to use it.
Best to throw that away. Good job keeping it from affecting the performance of your pc.
At that rate I’ll make a keychain out of it. It sucks cause its above my normal price range and was a gift.
If I had a known unused one, I would absolutely not use it until Intel finishes putting out their patch to motherboards to address this. You have no idea whether you could cause damage that won’t be detected, leaving you with a slightly damaged processor that malfunctions occasionally.
Intel may publish guidance on how to use unpatched processors. If they don’t – they sure have not been forthcoming with information thus far – here’s my own suggestion.
When I do use it, I would, prior to booting any OS on the CPU, go into the BIOS and turn everything related to the CPU to minimal performance. Memory speed down, disable Intel turbo boost, everything. If you can disable cores there, disable all but one – even my severely-damaged pair of CPUs could still boot without corrupting my root filesystem as long as I ran using only a single core (though two cores induced problems), and I’d take that as an argument in favor of one core being preferable, though I cannot say for sure that doing so helps avoid damaging the chip rather then just avoiding being affected by the damage once incurred.
And the first thing I’d do, booted into that minimal-performance-CPU-environment, would be to do that motherboard BIOS update. Then go back and reset the motherboard to defaults and use the thing normally.
Maybe that’s over-cautious, but we know that the processors destroy themselves with use, and we have no idea what the minimum amount of time – if any – to incur damage is. Unless Intel can come out with some kind of diagnostic to reliably detect damaged CPUs, you won’t know if you damaged your CPU in that window before the BIOS update, and it is maybe occasionally corrupting data, which I’d guess is a situation that you probably don’t want to be in during the lifetime of the CPU.
Some motherboards can update the BIOS without a CPU installed. Look for a BIOS flash button on the motherboard.
I recently built a 12th Gen PC, expecting an upgrade to 13th Gen will be a cheap and significant upgrade path soon. Now there isn’t going to be any way to know if a second-hand CPU is damaged in this way.
It can get a whole lot worse.
I bought a $500 13th gen CPU that destroyed itself, replaced it (and didn’t keep the dead CPU) with a $500 14th gen CPU that destroyed itself, and spent another ~$500 on related hardware and dumping Intel stuff to go AMD to get a working system. I also spent a lot of time trying to resolve the problem. I’d bet that I’m not the person burned worst, because someone could very easily have replaced their motherboard or memory or power supply unit in the hopes of fixing the issue, as any of these could have looked like potential causes, and there’d be no way for anyone to prove to Intel that this was the cause even if Intel intended to reimburse for these.
Maybe, I might get $500 back at most if Intel reimburses for the 14th gen CPU; I’d assume that at best, based on what they’ve been doing so far, that they’d send out another Intel CPU (which I no longer have a use for, having gone AMD).
And I was mostly using this system for fun. While I was corrupting my root filesystem regularly at boot at the end, I ultimately didn’t – as far as I know – suffer any serious data loss or expense from the data that the processor was corrupting. My system was mostly to be used for my own entertainment. I didn’t miss deadlines or lose critical information.
As Steve Burke has pointed out in earlier episodes on this, there are people who have been impacted by those secondary costs, some of which might make my own costs look irrelevant.
He was talking to video game companies who were using affected processors as well as having customers who were affected; they had apparently banned some customers for cheating because they knew that the internal state of the game was incorrect; they couldn’t figure out what the customers were doing, but knew that their game state was being modified. It apparently wasn’t the customers cheating, but their CPU, which had partially destroyed itself, and was now corrupting memory.
Another had been using CPUs for video game servers and those kept dying and taking down service; another company estimated that they’d lost $100k in player business due to the problem.
Apparently these were also popular, due to high single-threaded performance, with hedge funds that do stock trading. I imagine that a system that suddenly stops working or corrupts data can very quickly become extremely expensive in that context, far in excess of what the CPUs cost.
OEMs who build and sold systems containing these CPUs had apparently been taking back systems and repeatedly replacing parts; they probably incurred substantial costs and hits to their own reputation, as customers are upset with them.
Same thing with datacenter providers, who incurred a lot of costs investigating and mitigating problems, swapping parts and CPUs. One of these Burke quoted as having advised customers to use an alternate AMD-based system and if they insisted on the Intel one, the provider would charge a $1000 additional service fee to cover all the costs the provider was taking in having to deal with systems based on the CPUs. Gives an idea of what they were losing.
God only knows what the impact of having a ton of data around the world corrupted is. Probably no more than a tiny fraction of the problems related to corruption will ever actually be attributed to the CPUs themselves.
And I don’t know how many systems out there may not be fully-tracked – so they don’t get updates to avoid the problem – and have the CPUs built into them. Industrial automation hardware? Ship navigation systems? Who knows? All kinds of things that might fail in absolutely spectacular ways if they work for a period of time, then down the road, eventually start corrupting data more and more severely.
I mean, Intel might, at best, provide a cash refund for a dead CPU. But they aren’t gonna cover losses from secondary problems, and there’s no realistic way that most businesses and people who bought these could prove them, anyway.
Buying the last CPU they made before this clusterfuck occurred is maybe one of the best things you could have done and still be indirectly affected, as you got a reasonably fast system that wasn’t directly affected – if I’d known about this in advance, rather then Intel not saying anything, I’d have purchased a 12th gen CPU happily rather than another $1k in useless hardware and spent a ton of time to try to resolve my problems. You’ll have the option to, at upgrade time, go AMD or 15th gen Intel and LGA 1851, if you want to hope that Intel’s 15th gen is more solid than their previous two. Just means a new motherboard and, if you’re using DDR4 memory, you’ll need to toss that and buy DDR5.
I would have gone AMD in the first place if this happened at the time of my purchase.
Oh well. Upgrade time is going to be a long way away. My last gaming PC served me well for almost 10 years before I did an in socket upgrade.
I would have gone AMD in the first place if this happened at the time of my purchase.
Well, you’ve got better judgement than me. l’d been running just Intel for ~25 years and was comfortable with them, and even when ordering the replacement, still wasn’t absolutely certain that the CPU was at fault until the replacement (temporarily, for a few months) resolved all the problems.
Moving forward, I expect I’ll use AMD unless they manage to do something like this.
My last gaming PC served me well for almost 10 years before I did an in socket upgrade.
Yeah, not a lot of annual single-threaded performance improvements since the early 2000s. Can very easily use older CPUs just fine for a long time these days, depending upon workload.
Anyone who knows of this and buys 15th gen over AMD is a fool imo. The risk is just so high and AMD has become so solid in the last decade.
And people laughed at me for sticking with my MOS 6502. Who’s laughing now?
“Pure, passive cooling. No fans or moving parts. Will be working a century from now.”
I thought I read that Intel said this was from messing with voltages? I have had plenty of these processors in the last couple of years and never experienced crashes, but I don’t overclock
That was one initial theory, but it’s known to not be the cause. An earlier video that Steve Burke and Wendell from A1techs did had Wendell examine several hundred CPUs that were running in servers on non-Z790 motherboards (another source of potential problems that was initially blamed) at conservative settings, known and logged temperature for the lifetime of the server (so not temperature). He still saw about a 50% failure rate.
I also personally destroyed one of my CPUs with motherboard default settings, and the other with Intel’s recommended settings (less aggressive than the motherboard defaults), so I can personally attest to this not just being people running with crazy voltages or something.
There may also be other issues that people have caused by doing something else, but the elephant in the room has been narrowed down to processors destroying themselves while running well within spec.