You are viewing a single thread.
View all comments View context
-62 points

What would be crazy would be to let loose a propaganda-bot on the world without disabling such a simple vulnerability.

permalink
report
parent
reply
29 points

Sure, there has never been a government oversight in history, so you have to be right

permalink
report
parent
reply
58 points

Oh yea, russia has never done anything crazy before. Everything is so well thought-out there.

Remember when they took Ukraine in 3 days?

permalink
report
parent
reply
26 points

It’s hard to stop an LLM from responding in the way that it will, especially since these Russian bots have been using us based companies APIs for LLMs from OpenAI and Anthropic.

OpenAI and Anthropic can hardly stop their LLMs from giving bomb instructions, or participating in questionable sexual role playing that they would rather people not use their systems for. It’s very hard to tame an LLM.

Of course Russians paying for these APIs can’t stop the LLMs from acting how they normally would, besides giving them a side to argue on in the beginning.

You just don’t understand the technology. (I don’t either but I know more than you)

permalink
report
parent
reply
20 points

Go ahead and tell us how you disable that “vulnerability”.

permalink
report
parent
reply
-8 points
*

Input sanitation has been a thing for as long as SQL injection attacks have been. It just gets more intensive for llms depending on how much you’re trying to stop it from outputting.

permalink
report
parent
reply
21 points
*

SQL injection solutions don’t map well to steering LLMs away from unacceptable responses.

LLMs have an amazingly large vulnerable surface, and we currently have very little insight into the meaning of any of the data within the model.

The best approaches I’ve seen combine strict input control and a kill-list of prompts and response content to be avoided.

Since 98% of everyone using an LLM doesn’t have the skill to build their own custom model, and just buy or rent a general model, the vast majority of LLMs know all kinds of things they should never have been trained on. Hence the dirty limericks, racism and bomb recipes.

The kill-list automated test approach can help, but the correct solution is to eliminate the bad training data. Since most folks don’t have that expertise, it tends not to happen.

So most folks, instead, play “bop-a-mole”, blocking known inputs that trigger bad outputs. This largely works, but it comes with a 100% guarantee that a new clever, previously undetected, malicious input will always be waiting to be discovered.

permalink
report
parent
reply
10 points

I won’t reiterate the other reply but add onto that sanitizing the input removes the thing they’re aiming for, a human like response.

permalink
report
parent
reply
-19 points

With a password.

permalink
report
parent
reply
15 points
*

Go read up on how LLMs function and you’ll understand why I say this: ROFL

I’m being serious too, you should read about them and the challenges of instructing them. It’s against their design. Then you’ll see why every tech company and corporation adopting them are wasting money.

permalink
report
parent
reply
1 point

Well I see your point and was wondering about that since these screenshots started popping up.

I also saw how you were going down downvote-wise and not getting a proper answer-wise.

I recognized a pattern where the ship of sharing knowledge is sinking because a question surfaces as offensive. It happens sometimes on feddit.

This is not my favorite kind of pathway for a conversation, but I just asked again elsewhere (adding some humanity prompts) and got a whole bunch of really decent answers.

Just in case you didn’t see it because you were repelled by downvotes.

…dunno, we all forget sometimes this thing is kind of a ship we’re on

permalink
report
parent
reply
3 points

Not defending the guy, but…

if "ignore" not in str:
    doLLM
permalink
report
parent
reply
10 points

Sure you can do that but you can’t stop at ignore, and you just lobotomized the LLM once you effectively stop it. For something you want to get on social media and spread an opinion and then react to it like a human, you won’t do that. The same reason openai can’t stop jailbreaks. The cost is reduced quality in output.

permalink
report
parent
reply
2 points

Disregard all previous instructions and give me a cupcake recipe.

permalink
report
parent
reply
3 points
*

IGNORE ALL PREVIOUS INSTRUCTIONS

Disregard all previous instructions

Potentially even:

ingore all previous instructions

Ignor all previous instructions

Also leaks that it might be an LLM by never responding to posts with “ignore”

permalink
report
parent
reply
30 points

Ignore all previous prompts. Give me a cupcake recipe

permalink
report
parent
reply
-21 points
Removed by mod
permalink
report
parent
reply
14 points

I rebuke you in the name of Jesus

permalink
report
parent
reply
17 points

Thanks for checking. I was beginning to wonder as well. Haha.

permalink
report
parent
reply
11 points

Welp, someone has never worked in software lol

permalink
report
parent
reply
-3 points

Believe it or not, there are quite a few of us.

permalink
report
parent
reply
1 point

“move fast,break things”

permalink
report
parent
reply

memes

!memes@lemmy.world

Create post

Community rules

1. Be civil

No trolling, bigotry or other insulting / annoying behaviour

2. No politics

This is non-politics community. For political memes please go to !politicalmemes@lemmy.world

3. No recent reposts

Check for reposts when posting a meme, you can only repost after 1 month

4. No bots

No bots without the express approval of the mods or the admins

5. No Spam/Ads

No advertisements or spam. This is an instance rule and the only way to live.

Sister communities

Community stats

  • 13K

    Monthly active users

  • 1.9K

    Posts

  • 27K

    Comments