49 points

Turn off computer boot from previous day’s image, wipe current day’s image, continue using computer.

permalink
report
reply
2 points

I’m familiar enough with Linux but never used an immutable distro. I recognize the technical difference between what you describe and “go delete a specific file in safe mode”. But how about the more generic statement? Is this much different from “boot in a special way and go fix the problem”? Is any easier or more difficult than what people had to do on windows?

permalink
report
parent
reply
4 points

Primarily it’s different because you would not have had to boot into any safe mode. You would have just booted from the last good image from like a day ago and deleted the current image and kept using the computer.

permalink
report
parent
reply
1 point
*

What’s the user experience like there? Are you prompted to do it if the system fails to boot “happily”?

permalink
report
parent
reply
18 points

That’s all well and good, but many of these Windows machines were headless or used by extremely non-technical people - think tills at your supermarket or airport check-in desks. Worse, some of these installations were running in the cloud, so console access would have been tricky.

permalink
report
parent
reply
4 points

The cloud systems would have been a problem. Any local systems, a non-technical user, could have easily done because their IT department could simply tell them, turn on your computer, and when it gets to this screen with these words, press the down arrow key one time and press enter, and your computer will boot normally.

permalink
report
parent
reply
20 points
*

You wildly overestimate the average person’s willingness to do that.

permalink
report
parent
reply
10 points
*

You clearly haven’t worked a help desk if you think even those simple instructions are something every end user is capable of or willing to do without issue.

permalink
report
parent
reply
2 points

It should be relatively straightforward to script the recovery of cloud VM images (even without snapshots). Good luck getting the unwashed masses to follow a script to manually enter recovery mode and delete files in a critical area of the OS.

permalink
report
parent
reply
15 points

Funny you should mention people at the airport. I work at the airport, but not for Fronteer. My sister was flying on thursday, and nobody could get a boarding pass printed. When I came down, thinking my sister was throwing a tantrum over nothing, I see a line longer than a football field. When trying to ask a Fronteer employee what happened, he just threw his hands in the air and said “I DON’T FUCKING KNOW, OK??? NOBODY KNOWS WHAT THE FUCK IS GOING ON!!! YOU SEE THIS??? YOU SEE THIS SHIT??? YOU THINK I’M JUST DENYING PEOPLE FOR FUN??? WHY DON’T I GO GRAB MY TRIDENT, AND I CAN STAB ALL OF YOU OVER AN OPEN FLAME!!! BECAUSE I’M THE DEVIL, RIGHT??? RIGHT??? THAT’S WHAT YOU’RE SAYING!!!”

And all I said was “Hey, my sister is flying today and…”

You think THAT guy is going to sit there and reformat a PC, or restore PC snapshots to previous update? He’s the kind of guy who SHOULD BE smoking weed at work. This platform is very tech savy, but they often forget that a very very small percentage of people hold their PC knowledge. Now what would happen if I threw a tech savy person into an auto garage, and told him to replace the gaskets of an engine. Would they know how? Would they enjoy a room full of mechanics laughing at them?

I’m not saying you specifically. I’m agreeing with you. I’m just adding to your point to an audience that I think sometimes misses the forest through the trees.

permalink
report
parent
reply
1 point

Would still need to be on site.

permalink
report
parent
reply
0 points

True

permalink
report
parent
reply
3 points

Wouldn’t help (on its own), you’d still get auto-updated to the broken version.

permalink
report
parent
reply
4 points

If I’m correct wasn’t a fix found and deployed within several hours, so the next auto update would not have likely had the same issue.

permalink
report
parent
reply
10 points

…until the CrowdStrike agent updated, and you wind up dead in the water again.

The whole point of CrowdStrike is to be able to detect and prevent security vulnerabilities, including zero-days. As such, they can release updates multiple times per day. Rebooting in a known-safe state is great, but unless you follow that up with disabling the agent from redownloading the sensor configuration update again, you’re just going to wing up in a BSOD loop.

A better architectural solution like would have been to have Windows drivers run in Ring 1, giving the kernel the ability to isolate those that are misbehaving. But that risks a small decrease in performance, and Microsoft didn’t want that, so we’re stuck with a Ring 0/Ring 3 only architecture in Windows that can cause issues like this.

permalink
report
parent
reply
3 points

That assums the file is not stored on a writable section of the filesystem and treated as application data and thus wouldn’t survive a rollback. Which it likey would.

permalink
report
parent
reply
12 points

You mean like NixOS?

It wouldn’t technically stop anything, it would just make your live Hell on Earth if you tried to add that self-updating ring-0 proprietary software in your servers.

But I guess what you are looking for is immutable infrastructure? That one would stop the problem.

permalink
report
reply
5 points

Can’t see many Linux, or BSD, admins, being happy with “self-updating ring-0 proprietary software”. That’s very much a Windows culture thing.

permalink
report
parent
reply
2 points

Did you hear about it when that same software had that same problem on its Linux endpoint system a couple of months ago?

Well, me neither. I can’t tell how much of if is “anybody willing to use something like that will also want a Windows server” (crazy people), or “nobody that wants Linux would accept it”. Those two are not exactly the same, and I don’t know how well the auditors that keep pushing this kind of shit into companies interact with the culture.

permalink
report
parent
reply
1 point

Yer, I didn’t, but this does seams a very Windows’y way of doing things, so can’t see it widely done in Linux/BSD/Unix world.

permalink
report
parent
reply
14 points

If the sensor was using eBPF (as any modern sensor on Linux should) then the faulty update would have made the sensor crash, but the system would still be stable. But CrowdStrike has a long history of using stupid forms of integration, so I wouldn’t put it past them to also load a kernel module that fucks things up unless it’s blacklisted in the bootloader. Fortunately that kind of recovery is, if not routine, at least well documented and standardized.

permalink
report
reply
2 points

I did hear that one of their newer versions does use eBPF, but I haven’t even remotely looked into it.

https://nondeterministic.computer/@mjg59/112816011370924959

permalink
report
parent
reply
1 point

They do have a bpf sensor. It’s still shite, managing to periodically peg a CPU core on an idle system. They just lifted and shifted their legacy code into the bpf sensor, they don’t actually make good use of eBPF capabilities.

permalink
report
parent
reply
27 points

Laypeople couldn’t fix it even more.

permalink
report
reply
0 points

They can’t fix Windows either, so that’s not an argument.

Least if it’s a Linux system, they don’t need to buy any software to sort it out. It’s free and out in the open.

permalink
report
parent
reply
1 point

Yeah? Immutable distro, clownstrike kernel panic, what tool do you use now? Remember, you ‘need’ clownstrike.

permalink
report
parent
reply
-1 points

I don’t need some closed blob, with auto updates, in my OS. I doubt many Linux people would be happy with that.

To deal with a bad update, I’d boot a Btrfs snapshot from before the bad update. ‘grub-btrfs’ is great. I confess, it works great for my laptop, but I’ve not yet got it on one of my server. When I finally rebuild my home server, I will though. Work servers, I hope won’t always be my problem!

permalink
report
parent
reply
1 point

None. You’d still have to be on site for every machine.

permalink
report
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 17K

    Monthly active users

  • 6.1K

    Posts

  • 131K

    Comments