No. If everyone were on Linux and there was a breaking change introduced by a third-party there would be similar problems.
The problem is that critical infrastructure isn’t treated like critical infrastructure. If something you rely on can go down due to a single point of failure, maybe don’t fucking use it?! Have backups, have systems that can replace those systems, have contingency! Slapping Windows on to a small machine and running some shitty Chromium app to work as a cash register is a fucking stupid idea when you consider that it is responsible for your whole income.
The problem was never Windows. It was companies that were too cheap to have contingency, because an event like this was considered extraordinary and not worth investing in.
Nope, that’s not how it works on Linux, even if someone introduced the most heinous breaking change people would just not update until things were fixed, in fact the update is unlikely to do that because things are tested before being pushed. If someone were using latest of everything by having something like a Gentoo system with everything building from git maybe that person would be affected and he would have to rollback to an earlier version and keep going for a total downtime of 1h tops, and that is if someone was using the most stupid way possible in production.
The main reason why this will NEVER happen to a server running Linux is that updates are not automatic, i.e. they get triggered manually, so if there’s an issue upstream you don’t update, and if you encounter you rollback. The issue is not that Windows had a broken update, that can happen and it’s fine, the issue is when the OS forcefully installs that update and breaks your system without you doing anything.
And yeah, I know what I’m talking about, I worked as a software architect for a large website for a few years and now I work as a software engineer for the servers of one of the largest online games.
Edit: re-reading your post, I would like to ask you how would you build this critical infrastructure with Windows? Because independently of how you answer it you would have been affected by this.
the issue is when the OS forcefully installs that update and breaks your system without you doing anything.
The crowdstrike update was pushed out by their own software I thought, not the windows update system?
Plus crowdstrike has caused similar issues with Linux systems before, so the solution is to just not use crowdstrike and similar solutions on any OS.
The issue is not that Windows had a broken update, that can happen and it’s fine, the issue is when the OS forcefully installs that update and breaks your system without you doing anything.
I would have thought most businesses with windows would do staged rollouts.
the solution is to just not use crowdstrike and similar solutions on any OS.
Exactly, and since Windows is similar, therefore…
The problem wasn’t with an update Microsoft pushed out. It was due to an update by crowdstrike which iirc ignored all settings for staged rollout (or there were no settings at all for that)
It’s not like anyone outside Crowdstrike chooses to have these updates installed. It happened automatically with no way of stopping it.
Yes, this specific problem wasn’t caused by Microsoft, but it was caused by the forced automatic update policy that crowdstrike has, which is the same behavior Windows has. So while this time it wasn’t Microsoft, next time it could be. And while you can prevent this from happening on your Linux box by choosing software that doesn’t do this, it’s impossible to prevent it on a Windows box because the OS itself does it.
That is a wild assumption with two key flaws
-
Windows in many workplaces has updates locked down too, except in circumstances where critical security or vulnerability patches are pushed through.
-
The same is true for many servers that run Linux.
As someone that works on tier1 services for arguably the biggest tech company right now, that’s how it works in most of FAANG. Updates are gated, sure, but like with many things there’s a vetting process where some things that look super important and safe just slip through.
In regards to your edit, I guess most cases are different from others, but if your entire business requires you to be able to use a machine 100% of the time then you should have the means to either use a different machine to continue transactions (ideally one with a known state that won’t change, or has been tested in the last few months). If you need to log transactions and process 24-48 hours later do that on something that’s locked down hard, with printed/hard backups if necessary.
Ultimately, risk is always something you factor in. If you don’t care about 48 hours of downtime over several years, it’s not a huge concern. I’d probably argue that many companies lost more money during these days than they would have spent in both money and people-hours training them on a contingency system to use in case of downtime.
- Who determines which security updates are critical? In windows case it’s ultimately Microsoft, if they say this update is critical it will get installed on your machines whether you like it or not.
- The update process in Linux needs to be triggered manually, so it’s a big difference. No one external to your company can say “that computer will get this new software NOW”, and that’s the point you’re missing.
In answer to the other dit answer, if all of those machines are windows they were all affected by the update, so having secondary or tertiary machines is pointless because all of them failed at the same time when an external source decided to install new software on all your computers.
Windows updates don’t happen automatically in an Enterprise environment. They are tested and pushed out once the version is determined to be stable.