All our servers and company laptops went down at pretty much the same time. Laptops have been bootlooping to blue screen of death. It’s all very exciting, personally, as someone not responsible for fixing it.

Apparently caused by a bad CrowdStrike update.

Edit: now being told we (who almost all generally work from home) need to come into the office Monday as they can only apply the fix in-person. We’ll see if that changes over the weekend…

215 points

Reading into the updates some more… I’m starting to think this might just destroy CloudStrike as a company altogether. Between the mountain of lawsuits almost certainly incoming and the total destruction of any public trust in the company, I don’t see how they survive this. Just absolutely catastrophic on all fronts.

permalink
report
reply
127 points

If all the computers stuck in boot loop can’t be recovered… yeah, that’s a lot of cost for a lot of businesses. Add to that all the immediate impact of missed flights and who knows what happening at the hospitals. Nightmare scenario if you’re responsible for it.

This sort of thing is exactly why you push updates to groups in stages, not to everything all at once.

permalink
report
parent
reply
77 points

Looks like the laptops are able to be recovered with a bit of finagling, so fortunately they haven’t bricked everything.

And yeah staged updates or even just… some testing? Not sure how this one slipped through.

permalink
report
parent
reply
128 points

Not sure how this one slipped through.

I’d bet my ass this was caused by terrible practices brought on by suits demanding more “efficient” releases.

“Why do we do so much testing before releases? Have we ever had any problems before? We’re wasting so much time that I might not even be able to buy another yacht this year”

permalink
report
parent
reply
3 points

One of my coworkers, while waiting on hold for 3+ hours with our company’s outsourced helpdesk, noticed after booting into safe mode that the Crowdstrike update had triggered a snapshot that she was able to roll back to and get back on her laptop. So at least that’s a potential solution.

permalink
report
parent
reply
48 points

Agreed, this will probably kill them over the next few years unless they can really magic up something.

They probably don’t get sued - their contracts will have indemnity clauses against exactly this kind of thing, so unless they seriously misrepresented what their product does, this probably isn’t a contract breach.

If you are running crowdstrike, it’s probably because you have some regulatory obligations and an auditor to appease - you aren’t going to be able to just turn it off overnight, but I’m sure there are going to be some pretty awkward meetings when it comes to contract renewals in the next year, and I can’t imagine them seeing much growth

permalink
report
parent
reply
22 points

Nah. This has happened with every major corporate antivirus product. Multiple times. And the top IT people advising on purchasing decisions know this.

permalink
report
parent
reply
13 points

Yep. This is just uninformed people thinking this doesn’t happen. It’s been happening since av was born. It’s not new and this will not kill CS they’re still king.

permalink
report
parent
reply
2 points

At my old shop we still had people giving money to checkpoint and splunk, despite numerous problems and a huge cost, because they had favourites.

permalink
report
parent
reply
6 points
*

Don’t most indemnity clauses have exceptions for gross negligence? Pushing out an update this destructive without it getting caught by any quality control checks sure seems grossly negligent.

permalink
report
parent
reply
40 points
*
Deleted by creator
permalink
report
parent
reply
11 points

explain to the project manager with crayons why you shouldn’t do this

Can’t; the project manager ate all the crayons

permalink
report
parent
reply
3 points

Why is it bad to do on a Friday? Based on your last paragraph, I would have thought Friday is probably the best week day to do it.

permalink
report
parent
reply
21 points
*

Most companies, mine included, try to roll out updates during the middle or start of a week. That way if there are issues the full team is available to address them.

permalink
report
parent
reply
5 points
*
Deleted by creator
permalink
report
parent
reply
1 point

Was it not possible for MS to design their safe mode to still “work” when Bitlocker was enabled? Seems strange.

permalink
report
parent
reply
3 points

I’m not sure what you’d expect to be able to do in a safe mode with no disk access.

permalink
report
parent
reply
1 point

rolling out an update to production that there was clearly no testing

Or someone selected “env2” instead of “env1” (#cattleNotPets names) and tested in prod by mistake.

Look, it’s a gaffe and someone’s fired. But it doesn’t mean fuck ups are endemic.

permalink
report
parent
reply
23 points

I think you’re on the nose, here. I laughed at the headline, but the more I read the more I see how fucked they are. Airlines. Industrial plants. Fucking governments. This one is big in a way that will likely get used as a case study.

permalink
report
parent
reply
13 points

The London Stock Exchange went down. They’re fukd.

permalink
report
parent
reply
18 points

Yeah saw that several steel mills have been bricked by this, that’s months and millions to restart

permalink
report
parent
reply
10 points

Got a link? I find it hard to believe that a process like that would stop because of a few windows machines not booting.

permalink
report
parent
reply
13 points

a few windows machines with controller application installed

That’s the real kicker.

permalink
report
parent
reply
2 points
*

There are a lot of heavy manufacturing tools that are controlled and have their interface handled by Windows under the hood.

They’re not all networked, and some are super old, but a more modernized facility could easily be using a more modern version of Windows and be networked to have flow of materials, etc more tightly integrated into their systems.

The higher precision your operation, the more useful having much more advanced logs, networked to a central system, becomes in tracking quality control.

Imagine if after the fact, you could track a set of .1% of batches that are failing more often and look at the per second logs of temperature they were at during the process, and see that there’s 1° temperature variance between the 30th to 40th minute that wasn’t experienced by the rest of your batches. (Obviously that’s nonsense because I don’t know anything about the actual process of steel manufacturing. But I do know that there’s a lot of industrial manufacturing tooling that’s an application on top of windows, and the higher precision your output needs to be, the more useful it is to have high quality data every step of the way.)

permalink
report
parent
reply
16 points
*

Testing in production will do that

permalink
report
parent
reply
10 points

Not everyone is fortunate enough to have a seperate testing environment, you know? Manglement has to cut cost somewhere.

permalink
report
parent
reply
9 points

Manglement is the good term lmao

permalink
report
parent
reply
-1 points

What lawsuits do you think are going to happen?

permalink
report
parent
reply
5 points

They can have all the clauses they like but pulling something like this off requires a certain amount of gross negligence that they can almost certainly be held liable for.

permalink
report
parent
reply
-1 points

Whatever you say my man. It’s not like they go through very specific SLA conversations and negotiations to cover this or anything like that.

permalink
report
parent
reply
1 point

Forget lawsuits, they’re going to be in front of congress for this one

permalink
report
parent
reply
-2 points

For what? At best it would be a hearing on the challenges of national security with industry.

permalink
report
parent
reply
-2 points

Don’t we blame MS at least as much? How does MS let an update like this push through their Windows Update system? How does an application update make the whole OS unable to boot? Blue screens on Windows have been around for decades, why don’t we have a better recovery system?

permalink
report
parent
reply
13 points

Crowdstrike runs at ring 0, effectively as part of the kernel. Like a device driver. There are no safeguards at that level. Extreme testing and diligence is required, because these are the consequences for getting it wrong. This is entirely on crowdstrike.

permalink
report
parent
reply
4 points
*

This didn’t go through Windows Update. It went through the ctowdstrike software directly.

permalink
report
parent
reply
194 points

The amount of servers running Windows out there is depressing to me

permalink
report
reply
81 points

The four multinational corporations I worked at were almost entirely Windows servers with the exception of vendor specific stuff running Linux. Companies REALLY want that support clause in their infrastructure agreement.

permalink
report
parent
reply
25 points

I’ve worked as an IT architect at various companies in my career and you can definitely get support contracts for engineering support of RHEL, Ubuntu, SUSE, etc. That isn’t the issue. The issue is that there are a lot of system administrators with “15 years experience in Linux” that have no real experience in Linux. They have experience googling for guides and tutorials while having cobbled together documents of doing various things without understanding what they are really doing.

I can’t tell you how many times I’ve seen an enterprise patch their Linux solutions (if they patched them at all with some ridiculous rubberstamped PO&AM) manually without deploying a repo and updating the repo treating it as you would a WSUS. Hell, I’m pleasantly surprised if I see them joined to a Windows domain (a few times) or an LDAP (once but they didn’t have a trust with the Domain Forest or use sudoer rules…sigh).

permalink
report
parent
reply
15 points
*

The issue is that there are a lot of system administrators with “15 years experience in Linux” that have no real experience in Linux.

Reminds me of this guy I helped a few years ago. His name was Bob, and he was a sysadmin at a predominantly Windows company. The software I was supporting, however, only ran on Linux. So since Bob had been a UNIX admin back in the 80s they picked him to install the software.

But it had been 30 years since he ever touched a CLI. Every time I got on a call with him, I’d have to give him every keystroke one by one, all while listening to him complain about how much he hated it. After three or four calls I just gave up and used the screenshare to do everything myself.

AFAIK he’s still the only Linux “sysadmin” there.

permalink
report
parent
reply
7 points

“googling answers”, I feel personally violated.

/s

To be fare, there is not reason to memorize things that you need once or twice. Google is tool, and good for Linux issues. Why debug some issue for few hours, if you can Google resolution in minutes.

permalink
report
parent
reply
5 points

Companies REALLY want that support clause in their infrastructure agreement.

RedHat, Ubuntu, SUSE - they all exist on support contracts.

permalink
report
parent
reply
17 points

I dunno, but doesn’t like a quarter of the internet kinda run on Azure?

permalink
report
parent
reply
34 points
8 points

so 40% of azure crashes a quarter of the internet…

permalink
report
parent
reply
4 points

I guess Spotify was running on the other 40%, as many other services

permalink
report
parent
reply
2 points

doesn’t like a quarter of the internet kinda run on Azure?

Said another way, 3/4 of the internet isn’t on Unsure cloud blah-blah.

And azure is - shhh - at least partially backed by Linux hosts. Didn’t they buy an AWS clone and forcibly inject it with money like Bobby Brown on a date in the hopes of building AWS better than AWS like they did with nokia? MS could be more protectively diverse than many of its best customers.

permalink
report
parent
reply
16 points

I’ve had my PC shut down for updates three times now, while using it as a Jellyfin server from another room. And I’ve only been using it for this purpose for six months or so.

I can’t imagine running anything critical on it.

permalink
report
parent
reply
41 points

Windows server, the OS, runs differently from desktop windows. So if you’re using desktop windows and expecting it to run like a server, well, that’s on you. However, I ran windows server 2016 and then 2019 for quite a few years just doing general homelab stuff and it is really a pain compared to Linux which I switched to on my server about a year ago. Server stuff is just way easier on Linux in my experience.

permalink
report
parent
reply
11 points

It doesn’t have to, though. Linux manages to do both just fine, with relatively minor compromises.

Expecting an OS to handle keeping software running is not a big ask.

permalink
report
parent
reply
4 points

Not judging, but why wouldn’t you run Linux for a server?

permalink
report
parent
reply
2 points

Because I only have one PC (that I need for work), and I can’t be arsed to cock around with dual boot just to watch movies. Especially when Windows will probably break that at some point.

permalink
report
parent
reply
-8 points
Removed by mod
permalink
report
parent
reply
7 points

Wow dude you’re so cool. I bet that made you feel so superior. Everyone on here thinks you are so badass.

permalink
report
parent
reply
7 points

Where did you think Microsoft was getting all (hyperbole) of their money from?

permalink
report
parent
reply
1 point

I know i was really surprised how many there are. But honestly think of how many companies are using active directory and azure

permalink
report
parent
reply
169 points

>Make a kernel-level antivirus
>Make it proprietary
>Don’t test updates… for some reason??

permalink
report
reply
154 points

never do updates on a Friday.

permalink
report
reply
77 points
*
Deleted by creator
permalink
report
parent
reply

And especially now the work week has slimmed down where no one works on Friday anymore

Excuse me, what now? I didn’t get that memo.

permalink
report
parent
reply
15 points

Yeah it’s great :-) 4 10hr shifts and every weekend is a 3 day weekend

permalink
report
parent
reply
6 points
*
Deleted by creator
permalink
report
parent
reply
15 points

Yep, anything done on Friday can enter the world on a Monday.

I don’t really have any plans most weekends, but I sure as shit don’t plan on spending it fixing Friday’s fuckups.

permalink
report
parent
reply
3 points

And honestly, anything that can be done Monday is probably better done on Tuesday. Why start off your week by screwing stuff up?

We have a team policy to never do externally facing updates on Fridays, and we generally avoid Mondays as well unless it’s urgent. Here’s roughly what each day is for:

  • Monday - urgent patches that were ready on Friday; everyone WFH
  • Tuesday - most releases; work in-office
  • Wed - fixing stuff we broke on Tuesday/planning the next release; work in-office
  • Thu - fixing stuff we broke on Tuesday, closing things out for the week; WFH
  • Fri - documentation, reviews, etc; WFH

If things go sideways, we come in on Thu to straighten it out, but that almost never happens.

permalink
report
parent
reply
2 points

Actually I was not even joking. I also work in IT and have exactly the same opinion. Friday is for easy stuff!

permalink
report
parent
reply
3 points

You posted this 14 hours ago, which would have made it 4:30 am in Austin, Texas where Cloudstrike is based. You may have felt the effect on Friday, but it’s extremely likely that the person who made the change did it late on a Thursday.

permalink
report
parent
reply
-34 points

Never update unless something is broken.

permalink
report
parent
reply
58 points

This is fine as long as you politely ask everyone on the Internet to slow down and stop exploiting new vulnerabilities.

permalink
report
parent
reply
22 points

I think vulnerabilities found count as “something broken” and chap you replied to simply did not think that far ahead hahah

permalink
report
parent
reply
8 points

That’s advice so smart you’re guaranteed to have massive security holes.

permalink
report
parent
reply
2 points

This is AV, and even possible that it is part of definitions (for example some windows file deleted as false positive). You update those daily.

permalink
report
parent
reply
114 points

Yeah my plans of going to sleep last night were thoroughly dashed as every single windows server across every datacenter I manage between two countries all cried out at the same time lmao

permalink
report
reply
58 points

I always wondered who even used windows server given how marginal its marketshare is. Now i know from the news.

permalink
report
parent
reply
39 points

Marginal? You must be joking. A vast amount of servers run on Windows Server. Where I work alone we have several hundred and many companies have a similar setup. Statista put the Windows Server OS market share over 70% in 2019. While I find it hard to believe it would be that high, it does clearly indicate it’s most certainly not a marginal percentage.

permalink
report
parent
reply
7 points

I’m not getting an account on Statista, and I agree that its marketshare isn’t “marginal” in practice, but something is up with those figures, since overwhelmingly internet hosted services are on top of Linux. Internal servers may be a bit different, but “servers” I’d expect to count internet servers…

permalink
report
parent
reply
11 points

Well, I’ve seen some, but they usually don’t have automatic updates and generally do not have access to the Internet.

permalink
report
parent
reply
10 points

This is a crowdstrike issue specifically related to the falcon sensor. Happens to affect only windows hosts.

permalink
report
parent
reply
7 points

It’s only marginal for running custom code. Every large organization has at least a few of them running important out-of-the-box services.

permalink
report
parent
reply
6 points

Not too long ago, a lot of Customer Relationship Management (CRM) software ran on MS SQL Server. Businesses made significant investments in software and training, and some of them don’t have the technical, financial, or logistical resources to adapt - momentum keeps them using Windows Server.

For example, small businesses that are physically located in rural areas can’t use cloud based services because rural internet is too slow and unreliable. Its not quite the case that there’s no amount of money you can pay for a good internet connection in rural America, but last time I looked into it, Verizon wanted to charge me $20,000 per mile to run a fiber optic cable from the nearest town to my client’s farm.

permalink
report
parent
reply
2 points
*

Almost everyone, because the Windows server market share isn’t marginal at all.

permalink
report
parent
reply
1 point

My current company does and I hate it so much. Who even got that idea in the first place? Linux always dominated server-side stuff, no?

permalink
report
parent
reply
2 points

Yes, but the developers learned on Windows, so they wrote software for Windows.

permalink
report
parent
reply
2 points

You should read the saga of when MS bought Hotmail. The work they had to do to be able to run it on Windows was incredible. It actually helped MS improve their server OS, and it still wasn’t as performance when they switched over.

permalink
report
parent
reply
1 point
*

No, Linux doesn’t now nor has it ever dominated the server space.

permalink
report
parent
reply
1 point

In university computer science, in the states, MS server was the main server OS that they taught my class during our education.

Microsoft loses money to let the universities and students use and learn MS server for free, or at least they did at the time. This had the effect of making a lot of fresh grad developers more comfortable with using MS server, and I’m sure it led to MS server being used in cases where there were better options.

permalink
report
parent
reply
24 points

permalink
report
parent
reply
15 points

How many coffee cups have you drank in the last 12 hours?

permalink
report
parent
reply
37 points

I work in a data center

I lost count

permalink
report
parent
reply
16 points

What was Dracula doing in your data centre?

permalink
report
parent
reply
4 points

I work in a datacenter, but no Windows. I slept so well.

Though a couple years back some ransomware that also impacted Linux ran through, but I got to sleep well because it only bit people with easily guessed root passwords. It bit a lot of other departments at the company though.

This time even the Windows folks were spared, because CrowdStrike wasn’t the solution they infested themselves with (they use other providers, who I fully expect to screw up the same way one day).

permalink
report
parent
reply
3 points

There was a point where words lost all meaning and I think my heart was one continuous beat for a good hour.

permalink
report
parent
reply
10 points

Did you feel a great disturbance in the force?

permalink
report
parent
reply
3 points

Oh yeah I felt a great disturbance (900 alarms) in the force (Opsgenie)

permalink
report
parent
reply
4 points

How’s it going, Obi-Wan?

permalink
report
parent
reply

Technology

!technology@lemmy.world

Create post

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related content.
  3. Be excellent to each another!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, to ask if your bot can be added please contact us.
  9. Check for duplicates before posting, duplicates may be removed

Approved Bots


Community stats

  • 18K

    Monthly active users

  • 5.8K

    Posts

  • 122K

    Comments