LLMs don’t do formal reasoning - and that is a HUGE problem(garymarcus.substack.com)

posted 27 days ago

☆ Yσɠƚԋσʂ ☆@lemmy.ml

technology@lemmy.ml

14 commentshide report

Sort:

Hot Top Controversial New Old

[ - ]

JackGreenEarth@lemm.ee

32 points

27 days ago

It’s only a problem if you expect them to do formal reasoning. They are fancy word predictors, useful for when your output doesn’t need to be factually accurate. If you use them for things they’re not designed for, you’ll get bad results, but that would be your fault for using them in an incorrect manner, not the LLMs’ faults. You don’t use a screwdriver to bang in a nail and say the screwdriver ‘has a HUGE problem’ when it does a bad job.

permalink

report

[ - ]

Hazzard@lemm.ee

15 points

27 days ago

I think it is a problem. Maybe not for people like us, that understand the concept and its limitations, but “formal reasoning” is exactly how this technology is being pitched to the masses. “Take a picture of your homework and OpenAI will solve it”, “have it reply to your emails”, “have it write code for you”. All reasoning-heavy tasks.

On top of that, Google/Bing have it answering user questions directly, it’s commonly pitched as a “tutor”, or an “assistant”, the OpenAI API is being shoved everywhere under the sun for anything you can imagine for all kinds of tasks, and nobody is attempting to clarify it’s weaknesses in their marketing.

As it becomes more and more common, more and more users who don’t understand it’s fundamentally incapable of reliably doing these things will crop up.

permalink

report

parent

[ - ]

geekwithsoul@lemm.ee

11 points

27 days ago

The problem is the laymen expect it to do reasoning, so the sales & marketing team says that it can do reasoning, and then the CEO will have consumed the Kool-Aid and restructure the company because he believes it can do reasoning.

permalink

report

parent

[ - ]

☆ Yσɠƚԋσʂ ☆@lemmy.mlOP

4 points

27 days ago

Right, I find LLMs are fundamentally no different from Markov chains. It doesn’t mean they’re not useful, they’re a tool that’s good for certain use cases. Unfortunately, we’re in a hype phase right now where people are trying to apply them for a lot of cases they’re terrible at and where better tools already exist to boot.

permalink

report

parent

[ - ]

vrighter@discuss.tchncs.de

2 points

26 days ago

they aren’t. The only difference is that the state transition table is so unimaginably gargantuan thit we can only generate an approximation of a tiny slice of it, instead of it being literally a table

permalink

report

parent

[ - ]

☆ Yσɠƚԋσʂ ☆@lemmy.mlOP

1 point

26 days ago

exactly

permalink

report

parent

[ - ]

slacktoid@lemmy.ml

7 points

26 days ago

This knife is a bad hammer… I wonder why?

permalink

report

[ - ]

Letstakealook@lemm.ee

6 points

27 days ago

And yet people will continue to argue that llms are demonstrating understanding and problem solving. This shit is just Eliza on steroids. I’m not saying it didn’t require skill or knowledge to create, but it is in no way close to what it is being billed as.

permalink

report

[ - ]

m_f@midwest.social

-5 points

27 days ago

Gary Marcus is an AI crank and should be disregarded

permalink

report

[ - ]

☆ Yσɠƚԋσʂ ☆@lemmy.mlOP

13 points

27 days ago

Should the research he’s discussing also be disregarded? https://arxiv.org/pdf/2410.05229

permalink

report

parent

[ - ]

m_f@midwest.social

3 points

27 days ago

Gary Marcus should be disregarded because he’s emotionally invested in The Bitter Lesson being wrong. He really wants LLMs to not be as good as they already are. He’ll find some interesting research about “here’s a limitation that we found” and turn that into “LLMS BTFO IT’S SO OVER”.

The research is interesting for helping improve LLMs, but that’s the extent of it. I would not be worried about the limitations the paper found for a number of reasons:

There doesn’t seem to be any reason to believe that there’s a ceiling on scaling up
LLM’s reasoning abilities improve with scale (notice that the example they use for kiwis they included the answers from o1-mini and llama3-8B, which are much smaller models with much more limited capabilities. GPT-4o got the problem correct when I tested it, without any special prompting techniques or anything)
Techniques such as RAG and Chain of Thought help immensely on many problems
Basic prompting techniques help, like “Make sure you evaluate the question to ignore extraneous information, and make sure it’s not a trick question”
LLMs are smart enough to use tools. They can go “Hey, this looks like a math problem, I’ll use a calculator”, just like a human would
There’s a lot of research happening very quickly here. For example, LLMs improve at math when you use a different tokenization method, because it changes how the model “sees” the problem

Until we hit a wall and really can’t find a way around it for several years, this sort of research falls into the “huh, interesting” territory for anybody that isn’t a researcher.

permalink

report

parent

[ - ]

☆ Yσɠƚԋσʂ ☆@lemmy.mlOP

9 points

27 days ago

Actually we do know that there are diminishing returns from scaling already. Furthermore, I would argue that there are inherent limits in simply using correlations in text as the basis for the model. Human reasoning isn’t primarily based on language, we create an internal model of the world that acts as a shared context. The language is rooted in that model and that’s what allows us to communicate effectively and understand the actual meaning behind words. Skipping that step leads to the problems we’re seeing with LLMs.

That said, I agree they are a tool, and they obviously have uses. I just think that they’re going to be a part of a bigger tool set going forward. Right now there’s an incredible amount of hype associated with LLMs. Once the hype settles we’ll know what use cases are most appropriate for them.

permalink

report

parent

Show more comments

Technology

!technology@lemmy.ml

Create post

This is the official technology community of Lemmy.ml for all news related to creation and use of technology, and to facilitate civil, meaningful discussion around it.

Ask in DM before posting product reviews or ads. All such posts otherwise are subject to removal.

Rules:

1: All Lemmy rules apply

2: Do not post low effort posts

3: NEVER post naziped*gore stuff

4: Always post article URLs or their archived version URLs as sources, NOT screenshots. Help the blind users.

5: personal rants of Big Tech CEOs like Elon Musk are unwelcome (does not include posts about their companies affecting wide range of people)

6: no advertisement posts unless verified as legitimate and non-exploitative/non-consumerist

7: crypto related posts, unless essential, are disallowed

Community stats

3.8K
Monthly active users
1.2K
Posts
7.6K
Comments

Community moderators

MinutePhrase@lemmy.ml