The machines, now inaccessible, are arguably more secure than before.

You are viewing a single thread.
View all comments View context
30 points
*

Fair warning that I’ll be ranty because I hate losers talking about DEI hires.

So why is memory address 0x9c trying to be read from? Well because… programmer error.

So what happened is that the programmer forgot to check that the object it’s working with isn’t valid, it tried to access one of the objects member variables…

This is a huge assumption. The last rumor I’ve read from actual cybersecurity people is that Crowdstrike’s update files were corrupt (update: disproven by Crowdstrike’s blog post). If this is true it’s likely still from programmer error at some level, but maybe not as simple as “whoopsie I forgot an if (data == nullptr) teehee”.

He, like the rest of us that don’t work at Crowdstrike, has no idea what happened. I have seen computers do the weirdest gosh darn things. I know better than to assume anything at this point. I wouldn’t even rule out weird stuff like the data getting corrupted between release qualification and release yet.

It turns out that C++, the language crowdstrike is using, likes to use address 0x0 as a special value to mean “there’s nothing here”, don’t try to access it or you’ll die.

This thread is full of these sorts of small technical inaccuracies and oversimplifications so I won’t point out all of them, but nothing in the C++ standard requires null pointers to refer to memory address 0x0. Nor does it require that dereferencing a null pointer terminates the program.

Windows died not because C++ asked it nicely to, but because a driver tried to access an address which wasn’t paged in.

Crowdstrike should have set up automated testing using address sanitizer and thread sanitizer that runs on every code update.

The funny thing about accessing into non-paged memory in kernel space:

  1. It will crash regardless of if it’s running under Asan or not, sanitizers are literally irrelevant based on what we know so far
  2. The Asan version he linked to is for user-space. In the windows kernel you’d need KASAN instead.

(If this was a simple nullptr dereference on bad input data then perhaps a fuzzer would have helped. Fuzzers are great though I have no idea how hard they are to use with kernel drivers)

C++ is hard. Maybe they have a DEI engineer that did this

Dude would probably call me a “DEI hire”; but I bet I could beat him in a C++ deathmatch so neener neener.

permalink
report
parent
reply
8 points

@sailor_sega_saturn And given enough time and enough scale even the most improbably weird things will eventually happen. Update file corrupted by a storage controller that flips a couple of bits at random after every 720 hours of uptime but only if it’s 23.682 seconds after the hour? Weirder shit has happened.

permalink
report
parent
reply
16 points

I once helped one of my company’s customers troubleshoot an issue that had seen the same ridiculous edge case error happen three times over the course of a few years. At one point the actual sustaining developer we worked with was able to narrow down a specific bit that was getting flipped somehow, and pitched that cosmic radiation was a plausible solution given how rarely this kind of thing impacted other customers.

It was at this point that we remembered that the customer was either a university with a nuclear physics lab or a hospital with a nuclear medicine program (can’t remember now, ironically enough) that the server rack lived adjacent to.

permalink
report
parent
reply
11 points
*

some twenty four years ago i managed, amongst others, a company’s samba and print server (that was at the time when all the company’s servers were beige boxes with less memory and disk than the laptop i’m using to type this – and still they served a few hundred employees).

the machine developed a strange custom of hard-resetting itself, which we initially tracked to specific files being sent for printing; the behaviour was fully reproducible.

as it happened, it was a hardware fault somewhere between the mainboard and the integrated SCSI card; installing a separate SCSI card and reconnecting the disks and backup tape device fixed the problem. (i did not have the budget for a new serwer, no.)

establishing the actual cause took me fucking weeks.

permalink
report
parent
reply
9 points

As someone who is still confused why C++ is different from a B-, thank you for your sacrifice in wading through that nonsense.

permalink
report
parent
reply
6 points

“DEI hire” is arrogant. That’s a great way to other people instead of owning the flaw. I appreciate the call for maturity in the field. Own your flaws.

permalink
report
parent
reply
19 points

the use of “DEI hire” is a shorthand for “i’m a massive racist shitweasel”

permalink
report
parent
reply
12 points

Hold on now, it could also be shorthand for “I’m a massive misogynist shitweasel”.

permalink
report
parent
reply
17 points

It actually blows my mind that these people can see a bad thing happen, know exactly zero about it, and conclude “must have been a (insert slur) who did that”. They did the same shit with the Baltimore bridge collapse.

permalink
report
parent
reply
12 points

Mention C (and to an extent C++) and turbo nerds froth to show off how ultra cool they are cause they are LoW lEvEl programmers. But like most things, these loud freaks are mostly incoherent with their random insertion of tech words. Putting aside the DEI stuff cause I will rant forever against this racist and sexist fuckwit, it’s massively annoying working in an industry and dummies love to be all hand wavy and suggest something like sanitizers. Thanks bro, let’s all add runtime sanitizers and watch perf tank in the most critical section of your computer. And as you pointed out he doesn’t even mention the right one.

Next time Crowdstrike should just have an if check all registers after every instruction to make sure their values are within your address space! And and and make sure a woman doesn’t program it cause according to him they are exempt from code reviews cause of the left agenda or some bullshit

permalink
report
parent
reply
4 points
*

(update: disproven by Crowdstrike’s blog post).

How do you mean? The current top post on the blog seems to mention .sys files as part of the problem very prominently.

Channel file “C-00000291*.sys” with timestamp of 0527 UTC or later is the reverted (good) version. Channel file “C-00000291*.sys” with timestamp of 0409 UTC is the problematic version.

permalink
report
parent
reply
11 points

https://www.crowdstrike.com/blog/technical-details-on-todays-outage/

This is not related to null bytes contained within Channel File 291 or any other Channel File.

That to me implied that the channel file wasn’t actually necessarily corrupt (or as corrupt as people thought), but that it triggered a logic error. In particular this point implies that it wasn’t from garbage zero bytes in the file.

(That said I could have worded this better, in my defense I’m sick in bed and only half thinking straight)

permalink
report
parent
reply
6 points

I see, thank you.

permalink
report
parent
reply
3 points

yeah that phrase of “null bytes” reads like addressing one of the rumours

“what was the problem?” “well it wasn’t null bytes” “so… what was it then?” “have definitely eliminated null bytes from the running!”

permalink
report
parent
reply
14 points

Also, and this shouldn’t be left unsaid, we’re talking about the Windows kernel here. A place with C++ code so cursed it is legendarily unhealthy to work in, as the cosmic horrors contained within slowly eat away at your sanity and warp the perception of time and space. Seeing that code for a few hours is enough to make a grown man cry. Seeing that code for a few weeks is enough to make you never cry again, as the terrible truth worms its way into your mind.

“DEI hire”, hah! The creature makes no distinction for race or gender as it fattens itself upon your failure! Even a glimpse at the edge of its abyss is enough to trigger a cycle of revelation - all modern software lies upon a rotting pile of ancient mistakes.

permalink
report
parent
reply
7 points

From a lovely response to the Crowdstrike error and various speculation on what caused it (https://ruby.social/deck/@V0ldek@awful.systems/112824202708490681), comes this gem:

> all modern software lies upon a rotting pile of ancient mistakes.

To be clear: this is 100% true. As we slowly, painfully work our way toward being less awful at software engineering, we are better than we have ever been. As fucked as modern code is, old code was worse.

The lower in the stack you go, the more horrifying the revelations, just as a rule.

permalink
report
parent
reply
4 points

@V0ldek @sailor_sega_saturn Thanks! You are talking straight from my heart!

permalink
report
parent
reply
7 points

@V0ldek @sailor_sega_saturn “That gibbering under the desk? Oh, that’s just Azathoth. Poor thing got a look at the pump controller code last year. It’s never been quite the same since.”

permalink
report
parent
reply
6 points
permalink
report
parent
reply
4 points

Absolutely stellar writing, except for this one weird bit

Database people are systems people. Modern databases have their own memory management, thread scheduler, and a fucking compiler inside. A promising research direction is to just bundle the database with your own bloody kernel that you handwrote with a box of scraps to make the entire thing less cursed and not have to wrestle with Linux.

You know, just in case you were looking for people to include in your postapo gang, database experts will also murder whatever you want with bare hands.

permalink
report
parent
reply
0 points

@sailor_sega_saturn You’re a nigger?!

permalink
report
parent
reply
6 points

go for the whole instance, the fucker is the admin there.

permalink
report
parent
reply
4 points

how did you find that? is it published by some part of lemmy?

permalink
report
parent
reply

TechTakes

!techtakes@awful.systems

Create post

Big brain tech dude got yet another clueless take over at HackerNews etc? Here’s the place to vent. Orange site, VC foolishness, all welcome.

This is not debate club. Unless it’s amusing debate.

For actually-good tech, you want our NotAwfulTech community

Community stats

  • 1.9K

    Monthly active users

  • 240

    Posts

  • 5.7K

    Comments

Community moderators