GitCode, a git-hosting website operated Chongqing Open-Source Co-Creation Technology Co Ltd and with technical support from CSDN and Huawei Cloud.
It is being reported that many users’ repository are being cloned and re-hosted on GitCode without explicit authorization.
There is also a thread on Ycombinator (archived link)
Solution: create a GitHub repo with Markdown articles outlining human rights abuses by the CCP and have a large number of GitHub users star and fork the repo.
That’s the whole point of this: they will automatically filter that out, and this is an impotent, though well intended, gesture.
How will they filter it out? If they just don’t mirror anything with ‘forbidden’ terms, we can poison repos to prevent them being mirrored. If they try to tamper with the repo histories then they’ll end up breaking a load of stuff that relies on consistent git hashes.
Yeah I figured as much. It was mostly a joke. At the end of the day, if stuff is on GH, people can take it. It’s barely even stealing. Unless the license disagrees of course but then you were putting a lot of trust in society by making it public in the first place.
That’s what I don’t get about this. Why does anyone care? Even this Chinese company, why do they care to clone it all? It’s already all hosted and publicly available.
The real solution is to include a few tiananmenSquare
variables in all the repositories. Either they exclude the entire repository or just the specific file, in either case the entire project may be unusable.
China filters every byte of Internet traffic in and out of the country.
It seems naive to think they can’t accomplish the same thing for a GitHub mirror.
So… You’re saying instead of “main”, “app”, or “core”, we should change the convention to make tiananmenSquare the entry point for apps?
Or maybe make it the filename for utils, so it’ll just break
create a GitHub repo with Markdown articles outlining human rights abuses by the CCP
Once you have logged “China killed 100 Zillion people! End CCP now!” in Chinese GitHub, everyone in China will realize that their lives are actually very bad and they need to do a Revolution immediately.
Maybe we should consider the same for the US government instead of being afraid of the big Chinese boogeyman across the sea? Because I guarantee you the US has just as many, if not more. But China bad. 🙄
I was making a joke about abusing Chinese censorship in order to stop them cloning GitHub repos (assuming that was something you wanted to do). The joke being that the CCP suppresses information about their human rights abuses. That is not true of the US. You could absolutely make a GitHub repo detailing the crimes of the US government. Nobody will stop you.
Tankie whataboutism strikes again.
Two things can be bad at the same time. Wild, I know.
Edit: also, the point of my joke wasn’t the human rights abuses. It is that these things are censored in China. So your comment is even more irrelevant. One could very easily create a repo outlining American crimes and put it on GitHub. But doing so in China with CCP crimes will have you sent to a Gulag
The vast majority of projects on GitHub is open-source and forkable, why would that need authorization?
It’s… suspicious that China’s doing it en masse, but there’s nothing wrong in cloning or forking a repo last i heard.
It’s not about authorization. They want to build a knowledge base for when the Great Firewall gets some more filters. Just like russias mirror of wikipedia which is heavily edited to discredit the west.
And under copyleft licensing, they’re allowed to do that. Both to GitHub repositories and Wikipedia.
Of course they are, it’s not like there is some kind of international jurisdiction anyway. What is bothersome is why they do it.
This seems like the most plausible explanation. Only other thing I can think of is they want to develop their own CoPilot (which I’m guessing isn’t available in China due to the U.S. AI restrictions?), and they’re just using their existing infrastructure to gather training data.
Just like russias mirror of wikipedia which is heavily edited to discredit the west.
How come I live in Russia and have never seen such?
I know only of quite a few troll\counterculture projects, some, like Lurkmore, are already, well, dead, some, like Traditsiya, are not.
That, of course, if you don’t mean that Russian Wikipedia in itself has problems. Which would be true.
It’s called Ruwiki.
It was launched in June 24, 2023 as a fork of the Russian Wikipedia, and has been described by some media groups as “Putin-friendly” and “Kremlin-compliant”.
It’s a bit odd, but isn’t it equivalent to forking and putting up a fork elsewhere?
I guess I don’t see the problem.
Does it though? You can still put up a fork somewhere else as long as you uphold the license right? Unless I guess in the case where the license explicitly disallows forks, but I don’t think that’s very common (can you even do that?).
Forks are derivative works (quite obviously) so yes you can forbid them via license terms. Whether or not that’s still open source, take it up with OSI. I vaguely recall that at least once upon a time there was some project that required modification to the code to be published as separate patches and it was generally accepted to be open source don’t ask me which.
It will be funny to see folks who spent the last ten years posting “It’s not stealing, it’s copying” memes suddenly find religion because Evil Foreign People got involved.
I’m quite scared of how AI apparently pushes people in favour of significantly stricter copyrights. This is not a good trend.
This isn’t people being influenced by AI. This is Microsoft’s Godzilla battling the RIAA/MPAA’s King Kong.
The trend, to date, has been consolidation of media properties under fewer and more hegemonic distributors. And now we’re seeing a couple of economic Titans battle over the position of “Last Legitimate Music Vendor”.
With the obligatory “fuck everyone who disregards open source licenses”, I am still slightly amused at this raising eyebrows while nearly no one is complaining about MS using github to train their copilot LLM, which will help circumvent licenses & copyrights by the bazillion.
Came here to say this. As much as I don’t like china, there is really nothing to see (apart from the source, that’s for everybody to see).
This could be illegal for git repos that do not have a open source license that allows mirroring or copying (BSD, Apache, Mit, GPL, etc.) Sometimes these repos are more “source available” and the source is only allowed to be read, not redistributed or modified. I would say that this is more of a matter for each individual copyright holder, not Microsoft.
But ultimately I agree, this really isn’t as big of a deal as people are making.
edit: changed some wording to be clearer
China is a sovereign entity. I’m pretty sure they can decide foreign licensing laws don’t apply there.
Not like MS couldn’t be sued.
It may be expensive but possible.
Unlike China. Good luck suing china (or the chinese government) as a whole. Maybe you’ll get out a domestic ban but I can hardly believe that they will care and probably will continue with their operation. But now it’s not on very legal grounds.
If I look at a few implementations of an algorithm and then implement my own using those as inspiration, am I breaking copyright law and circumventing licenses?
That depends on how similar your resulting algorithm is to the sources you were “inspired” by. You’re probably fine if you’re not copying verbatim and your code just ends up looking similar because that’s how solutions are generally structured, but there absolutely are limits there.
If you’re trying to rewrite something into another license, you’ll need to be a lot more careful.
What’s the limit? This needs to be absolutely explicit and easy to understand because this is what LLMs are doing. They take hundreds of thousands of similar algorithms and they create an amalgamation of it.
When is it copying and when it is “inspiration”? What’s the line between learning and copying?
As I am a big proponent of open source, there is nothing wrong even with copying code - the point is that you should not be allowed to claim something as your own idea and definitely not to claim copyright on code that was “inspired” by someone else’s work. The easiest solution would be to forbid patents on software (and patents altogether) completely. The only purpose that FOSS licenses have is to prevent corporations from monetizing the work under the license.
Well let’s say there’s an algorithm to find length of longest palindrome with a set of letters. I look at 20 different implementations. Some people use hashmaps, some don’t. Some do it recursively, some don’t. Etc
I consider all of them and create my own. I decide to implement myself both recursive and hash map but also add certain novel elements.
Am I copying code? Am I breaking copyright? Can I claim I wrote it? Or do I have to give credit to all 20 people?
As for forbidding patents on software, I agree entirely. Would be a net positive for the world. You should be able to inspect all software that runs on your computer. Of course that’s a bit idealistic and pipe-dreamy.
Are you just trying to make a bad pro-China argument or have you never been online before?
“Why does no one say murder is bad unless China is murdering”
Isn’t a good anti-murder argument
I don’t understand why this is a bad thing? Open source code is designed to be shared/distributed, and an open-source license can’t place any limits on who can use or share the code. Git was designed as a distributed, decentralized model partly for this reason (even though people ended up centralizing it on Github anyways)
They might end up using the code in a way that violates its license, but simply cloning it isn’t a problem.
I expect it’s going likely to be used to train some Chinese AI model. The race to AGI is in progress. IMO: “ideas” (code included) should be freely usable by anyone, including the people I might disagree with. But I understand the fear it induces to think that an authoritarian government will get access to AGI before a democratic one. That said I’m not entirely convinced the US is a democratic government…
PS: I’m french, and my gov is soon to be controlled by fascist pigs if it’s not already, so I’m not judging…
I expect it’s going likely to be used to train some Chinese AI model.
Even if they do that, the license for open source software doesn’t disallow it from being done.
It certainly can. Most licences require derivative works to be under the same or similar licence, and an AI based on FOSS would likely not respect those terms. It’s the same issue as AI training on music, images, and text, it’s a likely violation of copyright and thus a violation of open source licensing terms.
Training on it is probably fine, but generating code from the model is likely a whole host of licence violations.
The code needs to maintain the copyrights and authors. They are “mirroring” usernames into their own domain, with mails that dont correspond to the original authors, stealing their contributions.
I’m seeing this misconception in a lot of places.
Just because something is on GitHub, doesn’t mean it’s open source. It doesn’t automatically grant permission to share either.
It may not be de jure open source, but if the code is posted publicly on the internet in a way that anyone can download and modify it, it sort of becomes de facto open source (or “source available” if you prefer).
I personally don’t care if someone “steals” my code (Here’s my profile if you want to do so: https://github.com/ZILtoid1991 ), however it can mean some mixture of two things:
- China is getting ready for war, which will mean the US will try its best to block technology, including open source projects.
- China is planning to block GitHub due to it being able to host information the Chinese government might not like.
Of course it could mean totally unrelated stuff too (e.g. just your typical anti-China and/or anti-communist paranoia sells political points).
US will try its best to block technology, including open source projects.
You can’t block open source projects from anyone. That’s the entire point of open source. For a license to be considered open-source, it must not have any limitations as to who can use it.