83 points

As one person on Mastodon said, “AI is a toxic industry created by toxic people with toxic ideals”.

permalink
report
reply
8 points

I wouldn’t go that far. As it turns out AI is a buzz word and buzz words have little meaning

permalink
report
parent
reply
4 points

Yea I thought about that too. But apparently some people find “AI” useful.

permalink
report
parent
reply
1 point

I find LLMs very useful

permalink
report
parent
reply
2 points

If an LLM can save me 30 minutes writing nice emails and responses and help me brainstorm, debug, or elucidate my thoughts then it is very useful.

permalink
report
parent
reply
51 points

From the article:

Also, in 2022, several unidentified developers sued OpenAI and GitHub based on claims that the organizations used publicly posted programming code to train generative models in violation of software licensing terms

They can argue about it not being a copy all they want. If there is a single GPL licenced line of code scraped then anything they produce is a derivative work & must be licenced GPL.

nice.

permalink
report
reply
2 points

I’ll play the uniformed devils advocate here:

  1. Is the GPL license enforceable?
  2. And if so, I assume “derivative” will still subjective to some degree. Where do we draw the line between derivative and non-derivative?

I’m torn about my personal opinion about copyrights and software licensing in general. I think the main problem is the huge power imbalance between people and corporations, not so much the fact a company analyzed a bunch of available data to solve programming problems.

They don’t copy the data and sell it verbatim to others which would be a legal issue and in my mind also a moral issue, as they don’t add any additional value.

permalink
report
parent
reply
2 points

1: yes

2: Normally derivative works are patched or modified versions of the original. I think the common English meaning would apply & chatGPT et al are fucked. I doubt there is a precedent for this yet.

permalink
report
parent
reply
1 point

The only way I can see them weaseling out of this is by keeping the program running the model made in-house and proprietary while releasing the model in a format unusable without the base (proprietary) program. But maybe the GPL forbids such obfuscstion efforts (I don’t know, I haven’t studied it in detail)

permalink
report
parent
reply
1 point

GPL v2 don’t, which lead to tivoization. But Linus himself didn’t agree with that standing.

permalink
report
parent
reply
24 points

I’m fine with that, but let’s put some rules against this.

  • Any AI models should be able to determine the source of their data to a defined level of accuracy.
  • There should be a well-defined way to block data from being used by AI. If one of these ways (e.g. robots.txt) has been breached, the model has to be rebuilt without the data, and reparations made to the content owners.
permalink
report
reply
4 points

What you’re asking for is literally impossible.

A neural network is basically nothing more than a set of weights. If one word makes a weight go up by 0.0001 and then another word makes it go down by 0.0001, and you do that billions of times for billions of weights, how do you determine what in the data created those weights? Every single thing that’s in the training data had some kind of effect on everything else.

It’s like combining billions of buckets of water together in a pool and then taking out 1 cup from that and trying to figure out which buckets contributed to that cup. It doesn’t make any sense.

permalink
report
parent
reply
3 points

It’s not impossible lol. All a company would need to do is keep track of where they were getting content. If I use a script to download as much of the internet as possible and end up with a bunch of copyrighted content I could still get in trouble, hell there was even a guy arrested for downloading jstor without authorization.. Stop letting these guys get away with crimes just because you like the idea of the end product

permalink
report
parent
reply
0 points

Sounds like homeopathy lol

permalink
report
parent
reply
11 points

Respectfully, I worked for Alexa AI on compositional ML, and we were largely able to do exactly this with customer utterances, so to say it is impossible is simply not true. Many companies have to have some degree of ability to remove troublesome data, and while tracing data inside a model is rather difficult (historically it would be done during the building of datasets or measured at evaluation time) it’s definitely something that most big tech companies will do.

permalink
report
parent
reply
2 points

Sorry, I misinterpreted what you meant. You said “any AI models” so I thought you were talking about the model itself should somehow know where the data came from. Obviously the companies training the models can catalog their data sources.

But besides that, if you work on AI you should know better than anyone that removing training data is counter to the goal of fixing overfitting. You need more data to make the model more generalized. All you’d be doing is making it more likely to reproduce existing material because it has less to work off of. That’s worse for everyone.

permalink
report
parent
reply
15 points

I went into a smidge more detail over on my Mastodon last night, but my response is summed up as “WTAF? No! Freeware is an explicit license, as anyone from the BBS days will recall.”

permalink
report
reply
1 point

Would you mind sharing a link to it here if it’s not any trouble? (Or your handle if that’s easier for you) I’m always looking for new stuff to check out and new people to follow on Mastodon

permalink
report
parent
reply
152 points

Cool so we can just make up our own rules now. Well, all Microsoft products are freeware now because the same reason this guy

permalink
report
reply
10 points

Ok… so from now on … when I see a “repackaged” Microsoft product that for some reason… which I don’t care to know… doesn’t ask for a payment… I can use it without restrictions ?!! that’s really nice of you Microsoft … thank you.

permalink
report
parent
reply
79 points

Windows XP code was leaked 2 years ago, so it’s freeware according to this idi… stable genius .

permalink
report
parent
reply

Privacy

!privacy@lemmy.ml

Create post

A place to discuss privacy and freedom in the digital world.

Privacy has become a very important issue in modern society, with companies and governments constantly abusing their power, more and more people are waking up to the importance of digital privacy.

In this community everyone is welcome to post links and discuss topics related to privacy.

Some Rules

  • Posting a link to a website containing tracking isn’t great, if contents of the website are behind a paywall maybe copy them into the post
  • Don’t promote proprietary software
  • Try to keep things on topic
  • If you have a question, please try searching for previous discussions, maybe it has already been answered
  • Reposts are fine, but should have at least a couple of weeks in between so that the post can reach a new audience
  • Be nice :)

Related communities

Chat rooms

much thanks to @gary_host_laptop for the logo design :)

Community stats

  • 4.4K

    Monthly active users

  • 1.7K

    Posts

  • 24K

    Comments