diz

diz@awful.systems

Joined1 year ago

1 posts • 16 comments

Direct message

Filter:

Overview Posts Comments

Sort:

New Old TopAll

[ - ]

diz@awful.systems

7 points

3 months ago

in sneerclub@awful.systems•OAI employees channel the spirit of Marvin Minsky

Frigging exactly. Its a dumb ass dead end that is fundamentally incapable of doing vast majority of things ascribed to it.

They keep imagining that it would actually learn some underlying logic from a lot of text. All it can do is store a bunch of applications of said logic, as in a giant table. Deducing underlying rules instead of simply memorizing particular instances of rules, that’s a form of compression, there wasn’t much compression going on and now that the models are so over-parametrized, even less.

permalink

report

parent

[ - ]

diz@awful.systemsOP

8 points

3 months ago

in sneerclub@awful.systems•[long] Some tests of how much AI "understands" what it says (spoiler: very little)

Perhaps it was near ready to emit a stop token after “the robot can take all 4 vegetables in one trip if it is allowed to carry all of them at once.” but “However” won, and then after “However” it had to say something else because that’s how “however” works…

Agreed on the style being absolutely nauseating. It wasn’t a very good style when humans were using it, but now it is just the style of absolute bottom of the barrel, top of the search results garbage.

permalink

report

parent

[ - ]

diz@awful.systemsOP

9 points

3 months ago

in sneerclub@awful.systems•[long] Some tests of how much AI "understands" what it says (spoiler: very little)

I think you can make a slight improvement to Wolfram Alpha: using an LLM to translate natural language queries into queries WA can consume, then feeding them into WA. WA always reports exactly what it computed, so if it “misunderstands” you, it’s a lot easier to notice.

The problem here is that AI boys got themselves hyped up for it being actually intelligent, so none of them would ever settle for some modest application of LLMs. Google fired the authors of “stochastic parrot” paper, AFAIK.

simply pasting LLM output into CAS input and then the CAS output back into LLM input (which, let’s be honest, is the first thing tech bros will try as it doesn’t require much basic research improvement), will not help that much and will likely generate an entirely new breed of hilarious errors and bullshit (I like the term bullshit instead of hallucination, it captures the connotation errors are of a kind with the normal output).

Yeah I have examples of that as well. I asked GPT4 at work to calculate the volume of 10cm long, 0.1mm diameter wire. It seems to be doing correct arithmetic by some mysterious means which do not use scientific notation, and then the LLM can not actually count so it miscounts zeroes and outputs a result that is 1000x larger than the correct answer.

permalink

report

parent

[ - ]

diz@awful.systemsOP

9 points

3 months ago

in sneerclub@awful.systems•[long] Some tests of how much AI "understands" what it says (spoiler: very little)

GPT4 supposedly (it says that it is GPT4). I have access to one that is cleared for somewhat sensitive data, so presumably my queries aren’t getting flagged and human reviewed by OpenAI.

permalink

report

parent

[ - ]

diz@awful.systemsOP

12 points

3 months ago

in sneerclub@awful.systems•[long] Some tests of how much AI "understands" what it says (spoiler: very little)

I feel like letter counting and other letter manipulation problems kind of under-sell the underlying failure to count - LLMs work on tokens, not letters, so they are expected to have a difficulty with letters.

The inability to count is of course wholly general - in a river crossing puzzle an LLM can not keep track of what’s on either side of the river, for example, and sometimes misreports how many steps it output.

permalink

report

parent

[ - ]

diz@awful.systemsOP

14 points

3 months ago

in sneerclub@awful.systems•[long] Some tests of how much AI "understands" what it says (spoiler: very little)

But if your response to the obvious misrepresentation that a chatbot is a person of ANY level of intelligence is to point out that it’s dumb you’ve already accepted the premise.

How am I accepting the premise, though? I do call it an Absolute Imbecile, but that’s more of a word play on the “AI” moniker.

What I do accept is an unfortunate fact that they did get their “AIs” to score very highly on various “reasoning” benchmarks (some of their own design), standardized tests, and so on and so forth. It works correctly across most simple variations, such as changing the numbers in a problem or the word order.

They really did a very good job at faking reasoning. I feel that even though LLMs are complete bullshit, the sheer strength of that bullshit is easy to underestimate.

permalink

report

parent

[ - ]

diz@awful.systemsOP

17 points

3 months ago

in sneerclub@awful.systems•[long] Some tests of how much AI "understands" what it says (spoiler: very little)

Yeah I think that’s why we need an Absolute Imbecile Level Reasoning Benchmark.

Here’s what the typical PR from AI hucksters looks like:

https://www.anthropic.com/news/claude-3-family

Fully half of their claims about performance are for “reasoning”, with names like “Graduate Level Reasoning”. OpenAI is even worse - recall theirs claiming to have gotten 90th percentile on LSAT?

On top of it, LLMs are fine tuned to convince some dumb ass CEO who “checks it out”. Even though you can pay for the subscription, you’re neither the customer nor the product, you’re just collateral eyeballs on the ad.

permalink

report

parent

[ - ]

diz@awful.systemsOP

22 points

3 months ago

in sneerclub@awful.systems•[long] Some tests of how much AI "understands" what it says (spoiler: very little)

Also, my thought on this is that since an LLM has no internal state with which to represent the state of the problem, it can’t ever actually solve any variation of the river crossing. Not even those that it “solves” correctly.

If it outputs the correct sequence, inside your head the model of the problem will be in the solved state, but on the LLM’s side there’s just a sequence of steps that it wrote down, with those steps directly inhibiting production of another “Trip” token, until that crosses a threshold. There isn’t an inventory or even a count of items, there’s an unrelated number that weights for or against “Trip”.

If we are to anthropomorphize it (which we shouldn’t, but anyway), it’s bullshitting up an answer and it gradually gets a feeling that it has bullshitted enough, which can happen at the right moment, or not.

permalink

report

[ - ]

diz@awful.systems

23 points

3 months ago

in sneerclub@awful.systems•Tech Bros Invented Trains And It Broke Me

Other thing to add to this is that there’s just one or two people in the train providing service for hundreds of other people or millions of dollars worth of goods. Automating those people away is simply not economical, not even in terms of the headcount replaced vs headcount that has to be hired to maintain the automation software and hardware.

Unless you’re a techbro, who deeply resents labor, someone who would rather hire 10 software engineers than 1 train driver.

permalink

report

parent

[ - ]

diz@awful.systemsOP

23 points

3 months ago

in sneerclub@awful.systems•[long] Some tests of how much AI "understands" what it says (spoiler: very little)

Both parties are buying into a premise we already know to be incorrect.

We may know it is incorrect, but LLM salesmen are claiming things like “90th percentile on LSAT”, high scores on a “college level reasoning benchmark” and so on and so forth.

They are claiming “yeah yeah there’s all the anekdotal reports of glue pizza, but objectively, our AI is more capable than your workers, so you can replace them with our AI”, and this is starting to actually impact the job market.

permalink

report

parent

[long] Some tests of how much AI "understands" what it says (spoiler: very little)

posted 3 months ago

diz@awful.systems

sneerclub@awful.systems

51 commentshide report