You are viewing a single thread.
View all comments

This kind of seems like a non-article to me. LLMs are trained on the corpus of written text that exists out in the world, which are overwhelmingly standard English. American dialects effectively only exist while spoken, be it a regional or city dialect, the black or chicano dialect, etc. So how would LLMs learn them? Seems like not a bias by AI models themselves, rather a reflection of the source material.

permalink
report
reply
52 points
*

It’s not an article about LLMs not using dialects. In fact, they have learned said dialects and will use them if asked.

What they did was, ask the LLM to suggest adjectives associated with sentences - and it would associate more aggressive or negative adjectives with African dialect.

Seems like not a bias by AI models themselves, rather a reflection of the source material.

All (racial) bias in AI models is actually a reflection of the training data, not of the modelling.

permalink
report
parent
reply
1 point
*

I would assume the small amount of training data written that way doesn’t contain that many professional research papers, corporate emails or calm poetry, but would consist mostly of social media posts and comments which have a rather heavy bias towards aggressive and negative.

permalink
report
parent
reply
27 points
*

Seems like not a bias by Al models themselves, rather a reflection of the source material.

That’s what is usually meant by AI bias: a bias in the material used to train the model that reflects in its behavior

permalink
report
parent
reply
18 points

But why is it even mentioned then? It’s FUCKING OBVIOUS. It’s like saying “AIs are biased towards english and neglect latin” or smth ffs

permalink
report
parent
reply
32 points

I feel like not everyone is conscious of these biases and we need to raise the awareness and try preventing for example HR people from buying AI-based screening software that has a strong bias that is not disclosed by their vendors (because why would you advertise that?)

permalink
report
parent
reply
11 points

Great comparison, a dialect used by millions of people to a dead language. It really shows how much you care about the people who speak that dialect…

permalink
report
parent
reply
10 points

It’s FUCKING OBVIOUS

What is obvious to you is not always obvious to others. There are already countless examples of AI being used to do things like sort through applicants for jobs, who gets audited for child protective services, and who can get a visa for a country.

But it’s also more insidious than that, because the far reaching implications of this bias often cannot be predicted. For example, excluding all gender data from training ended up making sexism worse in this real world example of financial lending assisted by AI and the same was true for apple’s credit card and we even have full-blown articles showing how the removal of data can actually reinforce bias indicating that it’s not just what material is used to train the model but what data is not used or explicitly removed.

This is so much more complicated than “this is obvious” and there’s a lot of signs pointing towards the need for regulation around AI and ML models being used in places it really matters, such as decision making, until we understand it a lot better.

permalink
report
parent
reply
4 points

Yeah this seems like a non-issue to me as well; the source material for the models is probably the cause of this bias.

I also don’t think there’s a lot of sources for this manner of speaking. Let’s also not forget that there’s oftentimes instructions given to the LLM that ask it to avoid certain topics which it will in fact do.

permalink
report
parent
reply
1 point

I’m from the Midwest US and I know there are words and sounds I pronounce with a Midwestern accent but I can still type and spell them correctly.

If’n I typ lik dis den o’course people gonna think I hev the big dumb or that I’m a mole from a Redwall book.

permalink
report
parent
reply

Technology

!technology@beehaw.org

Create post

A nice place to discuss rumors, happenings, innovations, and challenges in the technology sphere. We also welcome discussions on the intersections of technology and society. If it’s technological news or discussion of technology, it probably belongs here.

Remember the overriding ethos on Beehaw: Be(e) Nice. Each user you encounter here is a person, and should be treated with kindness (even if they’re wrong, or use a Linux distro you don’t like). Personal attacks will not be tolerated.

Subcommunities on Beehaw:


This community’s icon was made by Aaron Schneider, under the CC-BY-NC-SA 4.0 license.

Community stats

  • 2.9K

    Monthly active users

  • 1.5K

    Posts

  • 8K

    Comments