I mean a simple
grep -r “string” *
Does wonders to find anything, but you need to know what you’re looking for. I’d probably look for DNS names that end in government or China specific TLDs to start with.
it’s trivial to break that approach by obfuscating strings. You can do things like using base64 encoded strings in the source code, building strings from smaller component parts, or using rot13 on, say, the host component of a URI. That last one could be pretty interesting if you, as a threat actor, owned both permutations. The hostname (minus TLD) in the source code could be the nice, human readable version (www.happysite.org) that appears to be something legit. Then, when you rot13 it to www.uncclfvgr.org, traffic is sent to the evil site doing scary things. People can be far more tricksy than that. There’s also the whole issue around whether or not the binaries you’re running actually match the code in the repo. The xz kerfuffle showed how much can be hidden that way.
EDIT: I should make it clear that I don’t use Deepin or the DE it provides because I only use WMs with no desktop, so the distro and DE are of no interest to me. I don’t know if it’s a security hazard or not, I have no horse in this fight.
grep -r "evil spyware" *
nothing? awesome, I guess this software is safe to use. Let’s gooo
I know you’re be facetious here and I’m ignorant to actual application security methodology. I do have to ask though, when you are looking for something in code that could be a security risk, isn’t it possible to look for methods or functions used to lookup DNS, outbound network calls, or even libraries used to obfuscate code? It seems to me that most programmers wouldn’t go through lengths to obfuscate their code and would want it to be readable/maintainable, so doing so would be a red flag.
Obviously no one is going to search for “evil spyware” when auditing code. Your point stands it is not as simple as that.
You’re totally right, I just think you underestimate how long it takes to rigorously audit a whole codebase. Let’s say you look for outvound network calls. Now you need to figure out for all of them whether they are malicious, which will require specific domain knowledge. And that’s assuming you find them all, the network call could be hidden away in a dependency. None of this is impossible but it requires a serious effort.