The new global study, in partnership with The Upwork Research Institute, interviewed 2,500 global C-suite executives, full-time employees and freelancers. Results show that the optimistic expectations about AI’s impact are not aligning with the reality faced by many employees. The study identifies a disconnect between the high expectations of managers and the actual experiences of employees using AI.
Despite 96% of C-suite executives expecting AI to boost productivity, the study reveals that, 77% of employees using AI say it has added to their workload and created challenges in achieving the expected productivity gains. Not only is AI increasing the workloads of full-time employees, it’s hampering productivity and contributing to employee burnout.
AI is stupidly used a lot but this seems odd. For me GitHub copilot has sped up writing code. Hard to say how much but it definitely saves me seconds several times per day. It certainly hasn’t made my workload more…
Probably because the vast majority of the workforce does not work in tech but has had these clunky, failure-prone tools foisted on them by tech. Companies are inserting AI into everything, so what used to be a problem that could be solved in 5 steps now takes 6 steps, with the new step being “figure out how to bypass the AI to get to the actual human who can fix my problem”.
I’ve thought for a long time that there are a ton of legitimate business problems out there that could be solved with software. Not with AI. AI isn’t necessary, or even helpful, in most of these situations. The problem is that creatibg meaningful solutions requires the people who write the checks to actually understand some of these problems. I can count on one hand the number of business executives that I’ve met who were actually capable of that.
They’ve got a guy at work whose job title is basically AI Evangelist. This is terrifying in that it’s a financial tech firm handling twelve figures a year of business-- the last place where people will put up with “plausible bullshit” in their products.
I grudgingly installed the Copilot plugin, but I’m not sure what it can do for me better than a snippet library.
I asked it to generate a test suite for a function, as a rudimentary exercise, so it was able to identify “yes, there are n return values, so write n test cases” and “You’re going to actually have to CALL the function under test”, but was unable to figure out how to build the object being fed in to trigger any of those cases; to do so would require grokking much of the code base. I didn’t need to burn half a barrel of oil for that.
I’d be hesitant to trust it with “summarize this obtuse spec document” when half the time said documents are self-contradictory or downright wrong. Again, plausible bullshit isn’t suitable.
Maybe the problem is that I’m too close to the specific problem. AI tooling might be better for open-ended or free-association “why not try glue on pizza” type discussions, but when you already know “send exactly 4-7-Q-unicorn emoji in this field or the transaction is converted from USD to KPW” having to coax the machine to come to that conclusion 100% of the time is harder than just doing it yourself.
I can see the marketing and sales people love it, maybe customer service too, click one button and take one coherent “here’s why it’s broken” sentence and turn it into 500 words of flowery says-nothing prose, but I demand better from my machine overlords.
Tell me when Stable Diffusion figures out that “Carrying battleaxe” doesn’t mean “katana randomly jutting out from forearms”, maybe at that point AI will be good enough for code.
Maybe the problem is that I’m too close to the specific problem. AI tooling might be better for open-ended or free-association “why not try glue on pizza” type discussions, but when you already know “send exactly 4-7-Q-unicorn emoji in this field or the transaction is converted from USD to KPW” having to coax the machine to come to that conclusion 100% of the time is harder than just doing it yourself.
I, too, work in fintech. I agree with this analysis. That said, we currently have a large mishmash of regexes doing classification and they aren’t bulletproof. It would be useful to see about using something like a fine-tuned BERT model for doing classification for transactions that passed through the regex net without getting classified. And the PoC would be would be just context stuffing some examples for a few-shot prompt of an LLM and a constrained grammar (just the classification, plz). Because our finance generalists basically have to do this same process, and it would be nice to augment their productivity with a hint: “The computer thinks it might be this kinda transaction”
I’d be hesitant to trust it with “summarize this obtuse spec document” when half the time said documents are self-contradictory or downright wrong. Again, plausible bullshit isn’t suitable.
That’s why I have my doubts when people say it’s saving them a lot of time or effort. I suspect it’s planting bombs that they simply haven’t yet found. Like it generated code and the code seemed to work when they ran it, but it contains a subtle bug that will only be discovered later. And the process of tracking down that bug will completely wreck any gains they got from using the LLM in the first place.
Same with the people who are actually using it on human languages. Like, I heard a story of a government that was overwhelmed with public comments or something, so they were using an LLM to summarize those so they didn’t have to hire additional workers to read the comments and summarize them. Sure… and maybe it’s relatively close to what people are saying 95% of the time. But 5% of the time it’s going to completely miss a critical detail. So, you go from not having time to read all the public comments so not being sure what people are saying, to having an LLM give you false confidence that you know what people are saying even though the LLM screwed up its summary.
Again, plausible bullshit isn’t suitable.
It is suitable when you’re the one producing the bullshit and you only need it accepted.
Which is what people pushing for this are. Their jobs and occupations are tolerant to just imitating, so they think that for some reason it works with airplanes, railroads, computers.
I’ll say that so far I’ve been pretty unimpressed by Codeium.
At the very most it has given me a few minutes total of value in the last 4 months.
Ive gotten some benefit from various generic chat LLMs like ChatGPT but most of that has been somewhat improved versions of the kind of info I was getting from Stackexchange threads and the like.
There’s been some mild value in some cases but so far nothing earth shattering or worth a bunch of money.
I have never heard of Codeium but it says it’s free, which may explain why it sucks. Copilot is excellent. Completely life changing, no. That’s not the goal. The goal is to reduce the manual writing of predictable and boring lines of code and it succeeds at that.
Cool totally worth burning the planet to the ground for it. Also love that we are spending all this time and money to solve this extremely important problem of coding taking slightly too long.
Think of all the progress being made!
I presume it depends on the area you would be working with and what technologies you are working with. I assume it does better for some popular things that tend to be very verbose and tedious.
My experience including with a copilot trial has been like yours, a bit underwhelming. But I assume others must be getting benefit.
Github Copilot is about the only AI tool I’ve used at work so far. I’d say it overall speeds things up, particularly with boilerplate type code that it can just bang out reducing a lot of the tedious but not particularly difficult coding. For more complicated things it can also be helpful, but I find it’s also pretty good at suggesting things that look correct at a glance, but are actually subtly wrong. Leading to either having to carefully double check what it suggests, or having fix bugs in code that I wrote but didn’t actually write.
Leading to either having to carefully double check what it suggests, or having fix bugs in code that I wrote but didn’t actually write.
100% this. Recent update from jetbrains turned on the AI shitcomplete (I guess my org decided to pay for it). Not only is it slow af, but in trying it, I discovered that I have to fight the suggestions because they are just wrong. And what is terrible is I know my coworkers will definitely use it and I’ll be stuck fixing their low-skill shit that is now riddled with subtle AI shitcomplete. The tools are simply not ready, and anyone that tells you they are, do not have the skill or experience to back up their assertion.
Every time I’ve discussed this on Lemmy someone says something like this. I haven’t usually had that problem. If something it suggests seems like more than something I can quickly verify is intended, I just ignore it. I don’t know why I am the only person who has good luck with this tech but I certainly do. Maybe it’s just that I don’t expect it to work perfectly. I expect it to be flawed because how could it not be? Every time it saves me from typing three tedious lines of code it feels like a miracle to me.