feddit.org

report

[ - ]

in sneerclub@awful.systems•That Time Eliezer Yudkowsky recommended a really creepy sci-fi book to his audience

21 points

20 days ago

In case anybody skips the article, it’s a six year old cybernetically force grown to the body of a horny 13 to 14 year old.

The rare sentence that makes me want to take a shower for having written it.

report

[ - ]

in techtakes@awful.systems•"The Subprime AI Crisis" - Ed Zitron on the bubble's impending collapse

21 points

3 months ago

On each step, one part of the model applies reinforcement learning, with the other one (the model outputting stuff) “rewarded” or “punished” based on the perceived correctness of their progress (the steps in its “reasoning”), and altering its strategies when punished. This is different to how other Large Language Models work in the sense that the model is generating outputs then looking back at them, then ignoring or approving “good” steps to get to an answer, rather than just generating one and saying “here ya go.”

Every time I’ve read how chain-of-thought works in o1 it’s been completely different, and I’m still not sure I understand what’s supposed to be going on. Apparently you get a strike notice if you try too hard to find out how the chain-of-thinking process goes, so one might be tempted to assume it’s something that’s readily replicable by the competition (and they need to prevent that as long as they can) instead of any sort of notably important breakthrough.

From the detailed o1 system card pdf linked in the article:

According to these evaluations, o1-preview hallucinates less frequently than GPT-4o, and o1-mini hallucinates less frequently than GPT-4o-mini. However, we have received anecdotal feedback that o1-preview and o1-mini tend to hallucinate more than GPT-4o and GPT-4o-mini. More work is needed to understand hallucinations holistically, particularly in domains not covered by our evaluations (e.g., chemistry). Additionally, red teamers have noted that o1-preview is more convincing in certain domains than GPT-4o given that it generates more detailed answers. This potentially increases the risk of people trusting and relying more on hallucinated generation.

Ballsy to just admit your hallucination benchmarks might be worthless.

The newsletter also mentions that the price for output tokens has quadrupled compared to the previous newest model, but the awesome part is, remember all that behind-the-scenes self-prompting that’s going on while it arrives to an answer? Even though you’re not allowed to see them, according to Ed Zitron you sure as hell are paying for them (i.e. they spend output tokens) which is hilarious if true.

report

[ - ]

in techtakes@awful.systems•Google Search is getting worse and worse

22 points

1 month ago

Maybe Momoa’s PR agency forgot to send an appropriate tribute to Alphabet this month.

report

[ - ]

22 points

11 days ago

in sneerclub@awful.systems•TPOT hits the big time!

thinkers like computer scientist Eliezer Yudkowsky

That’s gotta sting a bit.

report

[ - ]

in techtakes@awful.systems•Stubsack: weekly thread for sneers not worth an entire post, week ending 23rd December 2024

25 points

4 days ago

Rationalist debatelord org Rootclaim, who in early 2024 lost a $100K bet by failing to defend covid lab leak theory against a random ACX commenter, will now debate millionaire covid vaccine truther Steve Kirsch on whether covid vaccines killed more people than they saved, the loser gives up $1M.

One would assume this to be a slam dunk, but then again one would assume the people who founded an entire organization about establishing ground truths via rationalist debate would actually be good at rationally debating.

report

[ - ]

in sneerclub@awful.systems•Bostrom's advice for the ethical treatment of LLMs: remind them to be happy

27 points

4 months ago

Archive the weights of the models we build today, so we can rebuild them in the future if we need to recompense them for moral harms.

To be clear, this means that if you treat someone like shit all their life, saying you’re sorry to their Sufficiently Similar Simulation™ like a hundred years after they are dead makes it ok.

This must be one of the most blatantly supernatural rationalist Accepted Truths, that if your simulation is of sufficiently high fidelity you will share some ontology of self with it, which by the way is how the basilisk can torture you even if you’ve been dead for centuries.

report

[ - ]

in techtakes@awful.systems•"Sam Altman is one of the dullest, most incurious and least creative people to walk this earth."

29 points

5 days ago

It’s useful insofar as you can accommodate its fundamental flaw of randomly making stuff the fuck up, say by having a qualified expert constantly combing its output instead of doing original work, and don’t mind putting your name on low quality derivative slop in the first place.

report

[ - ]

in sneerclub@awful.systems•what if, right, what *if* our super-duper-autocomplete was just *tricking* us so it could TAKE OVER ZEE VORLD AHAHAHAHAHAHA! that'd be wild, hey

31 points

6 months ago

I’m not spending the additional 34min apparently required to find out what in the world they think neural network training actually is that it could ever possibly involve strategy on the part of the network, but I’m willing to bet it’s extremely dumb.

I’m almost certain I’ve seen EY catch shit on twitter (from actual ml researchers no less) for insinuating something very similar.

report

[ - ]

in techtakes@awful.systems•Apple Intelligence AI mangles headlines so badly the BBC officially complains

40 points

7 days ago

In every RAG guide I’ve seen, the suggested system prompts always tended to include some more dignified variation of “Please for the love of god only and exclusively use the contents of the retrieved text to answer the user’s question, I am literally on my knees begging you.”

Also, if reddit is any indication, a lot of people actually think that’s all it takes and that the hallucination stuff is just people using LLMs wrong. I mean, it would be insane to pour so much money into something so obviously fundamentally flawed, right?

report