What is AI Good For Anyway: Telling You What You Mostly Already Know

"You are a helpful AI assistant who is an expert on hunting snipe..."

Nov 05, 2024

Two years into the AI revolution and I’m not out of a job yet. Though aspects of my day-to-day software development have become less tedious, artificial intelligence has yet to change it in any fundamental way. For instance, AI coding assistants are useful. I leave Copilot on all the time now, and though its ability to spit out half a page of boilerplate code can come in handy, it feels more like an incremental improvement on non-stochastic syntax completion than a game-changer. AI’s biggest contribution to my IDE has been in providing a novel way to anthropomorphize the thing. I still don’t attribute as much agency to Copilot as I do my Roomba (to whom I will shout encouraging words from the other room along the lines of, “You getting all the dust up, little guy?”) but there is still some degree of emotional involvement.1 Sometimes I get annoyed because Copilot’s suggestions come too fast and voluminous for me to dismiss them. At such times the machine seems too eager to please, a pick-me-bot. Other times, though, I will type the name of a function and then sit back with arms folded, waiting to see how the machine responds. When it produces code that I could have already written myself I smile and say, “Good, job!” Copilot is not my God, but it has become my pet.

LLMs are actually useful to me when they perform tasks that I know how to do but don’t know how to do quickly. Write me a Docker file to do such-and-such. Write me a React app that displays a chat window. Write me the single line of pandas or numpy code that puts my data through the particular set of contortions I need at the moment. Stuff I call the Just-Tell-Me-the-Magic-Word problems: conceptually simple, but requiring hours of slogging through a thicket of arbitrary syntactic conventions. Until recently you solved them by perusing similar questions on Stack Overflow, then stitching together a few of the most relevant answers. Now you can literally say to Claude or ChatGPT2 “Just tell me the magic words” and it does the searching and synthesizing for you.

The tendency of LLMs to hallucinate facts isn’t a problem here, because if the machine gives you an incorrect suggestion the code simply won’t run. Software fact-checks itself. However, LLMs’ tendency to come off as serenely self-confident even when they have no idea what they’re talking about3 can still get you in trouble. The process is just more involved.

For example, I am interested in doing reinforcement learning for multiple asynchronous agents. To this end, I used Python’s asyncio library to build a generic agent class. Unlike Docker, React, pandas, etc., which I have learned how to use, until recently I had no idea how asyncio worked. My only experience with it consisted of occasionally cargo-culting the word “async” in front of FastAPI route definitions for reasons I did not understand. But that was fine. I had no interest in understanding asyncio. I wanted to finish the agent class and get on with the fun reinforcement learning stuff. I wanted the machine to do the boring part for me.

When it came time to implement the UI, I went to ChatGPT and said “Write me a Streamlit app that handles asynchronous messages using asyncio.” It cheerfully complied. I copy-pasted the code it generated into my editor, ran it, and reported back the errors, for which ChatGPT apologized and immediately offered corrected versions, and the next thing you know I had wasted two weeks on a fool’s errand. I don’t feel like going into the technical details for non-Python programmers, so just know that asking “How can I make a Streamlit app handle asyncio messages?” is sort of like asking “How do I build a sewer main out of cardboard?” Though ChatGPT did a convincing impersonation of a helpful and enthusiastic colleague, an actually helpful colleague would have said, “You can’t do that. That’s not how it works.”

I gradually came to realize that the machine had no idea what it was talking about and fell back on tried-and-true methods: skimming Stack Overflow, asking questions in help forums, reading tutorials. I finally broke down and just learned how to use the asyncio library, which I should have done in the first place. It would have been faster.

In that instance I made the mistake of trusting an AI, but this was a particular kind of misplaced trust. At no point did ChatGPT hallucinate untrue facts. Its educated guesses at how to combine two incompatible pieces of software were reasonable. They were the sort of guesses a skilled programmer who hadn’t taken the time to understand an unfamiliar Python library would make. They were the sort of guesses I would make, which was precisely the problem.

So even though the misinformation originated from a machine, it took human psychological factors (whether real or emergently feigned) to transform it from a high-probability string of tokens into an error: my impatience, ChatGPT’s overconfidence, a tendency to overvalue thought processes that resemble one’s own, and the inability to pick up on subtle social cues indicating your interlocutor may not be wholly reliable when said interlocutor is lap-sized plastic box. The problem did not lie in either my brain or its CPU, but rather shared social space that enables information to pass between us.

Involvement, or projection? The real question is what’s the difference?

I vacillate between these two, settling on whichever strikes me as the most charming on any given day.

Arguably, artificial intelligence’s most human-like trait.

Corner Cases

Discussion about this post