The machines, now inaccessible, are arguably more secure than before.
Fair warning that I’ll be ranty because I hate losers talking about DEI hires.
So why is memory address 0x9c trying to be read from? Well because… programmer error.
So what happened is that the programmer forgot to check that the object it’s working with isn’t valid, it tried to access one of the objects member variables…
This is a huge assumption. The last rumor I’ve read from actual cybersecurity people is that Crowdstrike’s update files were corrupt (update: disproven by Crowdstrike’s blog post). If this is true it’s likely still from programmer error at some level, but maybe not as simple as “whoopsie I forgot an if (data == nullptr)
teehee”.
He, like the rest of us that don’t work at Crowdstrike, has no idea what happened. I have seen computers do the weirdest gosh darn things. I know better than to assume anything at this point. I wouldn’t even rule out weird stuff like the data getting corrupted between release qualification and release yet.
It turns out that C++, the language crowdstrike is using, likes to use address 0x0 as a special value to mean “there’s nothing here”, don’t try to access it or you’ll die.
This thread is full of these sorts of small technical inaccuracies and oversimplifications so I won’t point out all of them, but nothing in the C++ standard requires null pointers to refer to memory address 0x0. Nor does it require that dereferencing a null pointer terminates the program.
Windows died not because C++ asked it nicely to, but because a driver tried to access an address which wasn’t paged in.
Crowdstrike should have set up automated testing using address sanitizer and thread sanitizer that runs on every code update.
The funny thing about accessing into non-paged memory in kernel space:
- It will crash regardless of if it’s running under Asan or not, sanitizers are literally irrelevant based on what we know so far
- The Asan version he linked to is for user-space. In the windows kernel you’d need KASAN instead.
(If this was a simple nullptr dereference on bad input data then perhaps a fuzzer would have helped. Fuzzers are great though I have no idea how hard they are to use with kernel drivers)
C++ is hard. Maybe they have a DEI engineer that did this
Dude would probably call me a “DEI hire”; but I bet I could beat him in a C++ deathmatch so neener neener.
@sailor_sega_saturn And given enough time and enough scale even the most improbably weird things will eventually happen. Update file corrupted by a storage controller that flips a couple of bits at random after every 720 hours of uptime but only if it’s 23.682 seconds after the hour? Weirder shit has happened.
I once helped one of my company’s customers troubleshoot an issue that had seen the same ridiculous edge case error happen three times over the course of a few years. At one point the actual sustaining developer we worked with was able to narrow down a specific bit that was getting flipped somehow, and pitched that cosmic radiation was a plausible solution given how rarely this kind of thing impacted other customers.
It was at this point that we remembered that the customer was either a university with a nuclear physics lab or a hospital with a nuclear medicine program (can’t remember now, ironically enough) that the server rack lived adjacent to.
some twenty four years ago i managed, amongst others, a company’s samba and print server (that was at the time when all the company’s servers were beige boxes with less memory and disk than the laptop i’m using to type this – and still they served a few hundred employees).
the machine developed a strange custom of hard-resetting itself, which we initially tracked to specific files being sent for printing; the behaviour was fully reproducible.
as it happened, it was a hardware fault somewhere between the mainboard and the integrated SCSI card; installing a separate SCSI card and reconnecting the disks and backup tape device fixed the problem. (i did not have the budget for a new serwer, no.)
establishing the actual cause took me fucking weeks.
“DEI hire” is arrogant. That’s a great way to other people instead of owning the flaw. I appreciate the call for maturity in the field. Own your flaws.
the use of “DEI hire” is a shorthand for “i’m a massive racist shitweasel”
Hold on now, it could also be shorthand for “I’m a massive misogynist shitweasel”.
Mention C (and to an extent C++) and turbo nerds froth to show off how ultra cool they are cause they are LoW lEvEl programmers. But like most things, these loud freaks are mostly incoherent with their random insertion of tech words. Putting aside the DEI stuff cause I will rant forever against this racist and sexist fuckwit, it’s massively annoying working in an industry and dummies love to be all hand wavy and suggest something like sanitizers. Thanks bro, let’s all add runtime sanitizers and watch perf tank in the most critical section of your computer. And as you pointed out he doesn’t even mention the right one.
Next time Crowdstrike should just have an if check all registers after every instruction to make sure their values are within your address space! And and and make sure a woman doesn’t program it cause according to him they are exempt from code reviews cause of the left agenda or some bullshit
(update: disproven by Crowdstrike’s blog post).
How do you mean? The current top post on the blog seems to mention .sys files as part of the problem very prominently.
Channel file “C-00000291*.sys” with timestamp of 0527 UTC or later is the reverted (good) version. Channel file “C-00000291*.sys” with timestamp of 0409 UTC is the problematic version.
https://www.crowdstrike.com/blog/technical-details-on-todays-outage/
This is not related to null bytes contained within Channel File 291 or any other Channel File.
That to me implied that the channel file wasn’t actually necessarily corrupt (or as corrupt as people thought), but that it triggered a logic error. In particular this point implies that it wasn’t from garbage zero bytes in the file.
(That said I could have worded this better, in my defense I’m sick in bed and only half thinking straight)
yeah that phrase of “null bytes” reads like addressing one of the rumours
“what was the problem?” “well it wasn’t null bytes” “so… what was it then?” “have definitely eliminated null bytes from the running!”
Also, and this shouldn’t be left unsaid, we’re talking about the Windows kernel here. A place with C++ code so cursed it is legendarily unhealthy to work in, as the cosmic horrors contained within slowly eat away at your sanity and warp the perception of time and space. Seeing that code for a few hours is enough to make a grown man cry. Seeing that code for a few weeks is enough to make you never cry again, as the terrible truth worms its way into your mind.
“DEI hire”, hah! The creature makes no distinction for race or gender as it fattens itself upon your failure! Even a glimpse at the edge of its abyss is enough to trigger a cycle of revelation - all modern software lies upon a rotting pile of ancient mistakes.
From a lovely response to the Crowdstrike error and various speculation on what caused it (https://ruby.social/deck/@V0ldek@awful.systems/112824202708490681), comes this gem:
> all modern software lies upon a rotting pile of ancient mistakes.
To be clear: this is 100% true. As we slowly, painfully work our way toward being less awful at software engineering, we are better than we have ever been. As fucked as modern code is, old code was worse.
The lower in the stack you go, the more horrifying the revelations, just as a rule.
@V0ldek @sailor_sega_saturn Thanks! You are talking straight from my heart!
@V0ldek @sailor_sega_saturn “That gibbering under the desk? Oh, that’s just Azathoth. Poor thing got a look at the pump controller code last year. It’s never been quite the same since.”
@V0ldek @sailor_sega_saturn have you read the writings of James Mickens, e.g. https://www.usenix.org/system/files/1311_05-08_mickens.pdf ?
Absolutely stellar writing, except for this one weird bit
Database people are systems people. Modern databases have their own memory management, thread scheduler, and a fucking compiler inside. A promising research direction is to just bundle the database with your own bloody kernel that you handwrote with a box of scraps to make the entire thing less cursed and not have to wrestle with Linux.
You know, just in case you were looking for people to include in your postapo gang, database experts will also murder whatever you want with bare hands.