To what extent, if at all, would have CrowdStrike's faulty update have been easier to deal with with an immutable distro?

To deal with a bad update, I’d boot a Btrfs snapshot from before the bad update. ‘grub-btrfs’ is great. I confess, it works great for my laptop, but I’ve not yet got it on one of my server. When I finally rebuild my home server, I will though. Work servers, I hope won’t always be my problem!

permalink

report

parent

reply

[ - ]

shortwavesurfer@lemmy.zip

49 points

4 months ago

Turn off computer boot from previous day’s image, wipe current day’s image, continue using computer.

permalink

report

reply

[ - ]

intelisense@lemm.ee

18 points

4 months ago

That’s all well and good, but many of these Windows machines were headless or used by extremely non-technical people - think tills at your supermarket or airport check-in desks. Worse, some of these installations were running in the cloud, so console access would have been tricky.

permalink

report

parent

reply

[ - ]

shortwavesurfer@lemmy.zip

4 points

4 months ago

The cloud systems would have been a problem. Any local systems, a non-technical user, could have easily done because their IT department could simply tell them, turn on your computer, and when it gets to this screen with these words, press the down arrow key one time and press enter, and your computer will boot normally.

permalink

report

parent

reply

[ - ]

Irremarkable@fedia.io

20 points

4 months ago

*

You wildly overestimate the average person’s willingness to do that.

permalink

report

parent

reply

Show more comments

[ - ]

halcyoncmdr@lemmy.world

10 points

4 months ago

*

You clearly haven’t worked a help desk if you think even those simple instructions are something every end user is capable of or willing to do without issue.

report

reply

[ - ]

2 points

4 months ago

It should be relatively straightforward to script the recovery of cloud VM images (even without snapshots). Good luck getting the unwashed masses to follow a script to manually enter recovery mode and delete files in a critical area of the OS.

permalink

report

parent

reply

[ - ]

Lost_My_Mind@lemmy.world

15 points

4 months ago

Funny you should mention people at the airport. I work at the airport, but not for Fronteer. My sister was flying on thursday, and nobody could get a boarding pass printed. When I came down, thinking my sister was throwing a tantrum over nothing, I see a line longer than a football field. When trying to ask a Fronteer employee what happened, he just threw his hands in the air and said “I DON’T FUCKING KNOW, OK??? NOBODY KNOWS WHAT THE FUCK IS GOING ON!!! YOU SEE THIS??? YOU SEE THIS SHIT??? YOU THINK I’M JUST DENYING PEOPLE FOR FUN??? WHY DON’T I GO GRAB MY TRIDENT, AND I CAN STAB ALL OF YOU OVER AN OPEN FLAME!!! BECAUSE I’M THE DEVIL, RIGHT??? RIGHT??? THAT’S WHAT YOU’RE SAYING!!!”

And all I said was “Hey, my sister is flying today and…”

You think THAT guy is going to sit there and reformat a PC, or restore PC snapshots to previous update? He’s the kind of guy who SHOULD BE smoking weed at work. This platform is very tech savy, but they often forget that a very very small percentage of people hold their PC knowledge. Now what would happen if I threw a tech savy person into an auto garage, and told him to replace the gaskets of an engine. Would they know how? Would they enjoy a room full of mechanics laughing at them?

I’m not saying you specifically. I’m agreeing with you. I’m just adding to your point to an audience that I think sometimes misses the forest through the trees.

permalink

report

parent

reply

[ - ]

Yaztromo@lemmy.world

10 points

4 months ago

…until the CrowdStrike agent updated, and you wind up dead in the water again.

The whole point of CrowdStrike is to be able to detect and prevent security vulnerabilities, including zero-days. As such, they can release updates multiple times per day. Rebooting in a known-safe state is great, but unless you follow that up with disabling the agent from redownloading the sensor configuration update again, you’re just going to wing up in a BSOD loop.

A better architectural solution like would have been to have Windows drivers run in Ring 1, giving the kernel the ability to isolate those that are misbehaving. But that risks a small decrease in performance, and Microsoft didn’t want that, so we’re stuck with a Ring 0/Ring 3 only architecture in Windows that can cause issues like this.

permalink

report

parent

reply

[ - ]

nous@programming.dev

3 points

4 months ago

That assums the file is not stored on a writable section of the filesystem and treated as application data and thus wouldn’t survive a rollback. Which it likey would.

permalink

report

parent

reply

[ - ]

Artyom@lemm.ee

3 points

4 months ago

Wouldn’t help (on its own), you’d still get auto-updated to the broken version.

permalink

report

parent

reply

[ - ]

shortwavesurfer@lemmy.zip

4 points

4 months ago

If I’m correct wasn’t a fix found and deployed within several hours, so the next auto update would not have likely had the same issue.

permalink

report

parent

reply

[ - ]

fmstrat@lemmy.nowsci.com

1 point

4 months ago

Would still need to be on site.

permalink

report

parent

reply

[ - ]

shortwavesurfer@lemmy.zip

0 points

4 months ago

True

permalink

report

parent

reply

[ - ]

Lodra@programming.dev

2 points

4 months ago

I’m familiar enough with Linux but never used an immutable distro. I recognize the technical difference between what you describe and “go delete a specific file in safe mode”. But how about the more generic statement? Is this much different from “boot in a special way and go fix the problem”? Is any easier or more difficult than what people had to do on windows?

permalink

report

parent

reply

[ - ]

shortwavesurfer@lemmy.zip

4 points

4 months ago

Primarily it’s different because you would not have had to boot into any safe mode. You would have just booted from the last good image from like a day ago and deleted the current image and kept using the computer.

permalink

report

parent

reply

[ - ]

Lodra@programming.dev

1 point

4 months ago

*

What’s the user experience like there? Are you prompted to do it if the system fails to boot “happily”?

report

reply

[ - ]

17 points

4 months ago

Immutable, not really a difference. Bad updates can still break the OS.

AB root, however, it would be much easier to fix, but would still be a manual process.

permalink

report

reply

[ - ]

brian@programming.dev

5 points

4 months ago

idk if it would be manual, isn’t the point of ab root to rollback if it doesn’t properly boot afterwards?

permalink

report

parent

reply

[ - ]

barsoap@lemm.ee

2 points

4 months ago

*

Honestly if you’re managing kernel and userspace remotely it’s your own fault if you don’t netboot. Or maybe Microsoft’s don’t know what the netboot situation looks like in windows land.

permalink

report

parent

reply

[ - ]

sugar_in_your_tea@sh.itjust.works

10 points

4 months ago

Aren’t most immutable Linux distros AB, almost by definition? If it’s immutable, you can’t update the system because it’s immutable. If you make it mutable for updates, it’s no longer immutable.

The process should be:

Boot from A
Install new version to B
Reboot into B
If unstable, go to 1
If stable, repeat from 1, but with A and B swapped

That’s how immutable systems work. The main alternative is a PXE system, and in that case you fix the image in one place and power cycle all your machines.

If you’re mounting your immutable system as mutable for updates, congratulations, you have the worst of immutable and mutable systems and you deserve everything bad that happens because of it.

permalink

report

parent

reply

[ - ]

chameleon@fedia.io

31 points

4 months ago

Realistically, immutability wouldn’t have made a difference. Definition updates like this are generally not considered part of the provisioned OS (since they change somewhere around hourly) and would go into /var or the like, which is mutable persistent state on nearly every otherwise immutable OS. Snapshots like Timeshift are more likely to help.

permalink

report

reply

[ - ]

sugar_in_your_tea@sh.itjust.works

1 point

4 months ago

*

It’s a huge reason why I use BTRFS snapshots. I’m a bit more lax about what gets snapshotted on my desktop, but on a server, everything should live in a snapshot. If an update goes bad, revert to the last snapshot (and snapshots are cheap, so run one with every change and delete older ones).

permalink

report

parent

reply

[ - ]

wisha@lemmy.ml

1 point

4 months ago

Anything that’s updated with the OS can be rolled back. Now Windows is Windows so Crowdstrike handles things it’s own way. But I bet if Canonical or RedHat were to make their own versions of Crowdstrike, they would push updates through the o regular packages repo, allowing it to be rolled back.

permalink

report

parent

reply

To what extent, if at all, would have CrowdStrike's faulty update have been easier to deal with with an immutable distro?

Technology

!technology@lemmy.world

Our Rules

Approved Bots

Community stats

Community moderators