AI’s babbling towers: ChatGPT and large language models
Okay. I didn’t want to write a post on ChatGPT, because its public release has been like shrapnel through the meat of the internet – but at the end of the day this IS a language blog, and I would be remiss for not doing due diligence.
If you’ve been under a rock for the past few months, ChatGPT is a new chatbot that answers user questions, generating natural and conversational (“human”) sounding text responses. Beyond the standard chat applications, it can be used to write fiction or non-fiction from a prompt, generate computer code, and translate between languages, among other things. (If you’d like to try it out for yourself, follow the link here.)
I’ll go over some basics first. Then we’ll explore several major concerns about the technology, along with more specific limitations. I won’t be discussing all the things ChatGPT does well, because (despite working in tech) I’m a luddite, and prefer to gripe. Plus it’s more entertaining to inspect the flaws.
How it works (the nutshell version)
ChatGPT stands for “Chat Generative Pre-Trained Transformer”. OpenAI, the company that produced it, used their existing large language model (LLM) architecture (specifically, a model in the GPT-3.5 series), then fine-tuned it with both supervised and reinforcement learning (called Reinforcement Learning with Human Feedback, or RLHF). Broken down a bit:
- Large language model: A machine-learning model that tries to predict the next word in a sentence by assigning probabilities to sequences of words. The model is trained on enormous amounts of text.
- GPT-3 has 175 billion parameters, and was trained on 570 gigabytes of text.
- Supervised learning: Uses labeled datasets and correct answers to teach algorithms how to classify data or predict a particular outcome. In the case of ChatGPT, human trainers created and labeled two-sided conversations, which were then mixed into the existing dataset.
- Reinforcement learning: A dynamic process that trains algorithms via a system of reward and punishment. The algorithm learns without human intervention by maximizing its rewards (for correct answers) and minimizing its punishments (for incorrect answers). With ChatGPT, humans ranked 2+ model responses by quality, and used these to create reward models for training.
ChatGPT was trained on mountains of data scraped from the internet, along with text sources from public and research domains. It finished training at the beginning of 2022.
Large language models
A few qualities of large language models more generally:
- LLMs are a type of deep neural network – a neural network that has more than two layers.
- The scale of these models changes their behavior dramatically – so that they’re able to perform tasks they weren’t explicitly trained on (called zero-shot learning).
- They’re able to do things that range from searching, summarizing text, and engaging in chatbot dialog, to creating charts from text descriptions, translating between languages, generating code, and writing essays.
Quite a few LLMs exist already, predating the grandiose hullabaloo surrounding ChatGPT. Examples include Pathways Language Model, or PaLM (540 billion parameters) and Language Models for Dialog Applications, or LaMDA (137 billion parameters) from Google Research; Megatron-Turing NLG (530 billion parameters) from Microsoft/Nvidia; and BLOOM – BigScience Large Open-science Open-access Multilingual Language Model (176 billion parameters).
People are fired up about ChatGPT. Articles from every corner of the web extol its virtues, and also caution against its limitations and potential for harm. Below is a summary of the concerns I’ve seen.
- Gives inaccurate, often completely fabricated responses
- Hallucinates responses that, nonetheless, sound very plausible.
- Also, does not (yet) contain identifying sources for the information it provides, so that human users cannot vet the source material for reliability.
- Environmental impact
- Training LLMs takes enormous environmental resources.
- A paper from 2019 estimated that “training a neural architecture search based model with 213 million parameters” would “produce more than five times the lifetime carbon emissions from the average car.” Then compare those 213 million parameters with the 175 billion of ChatGPT.
- Exhibits political bias
- From Wikipedia: “According to the BBC, as of December 2022 ChatGPT is not allowed to ‘express political opinions or engage in political activism.’ Yet, research suggests that ChatGPT exhibits a pro-environmental, left-libertarian orientation when prompted to take a stance on political statements from two established voting advice applications.”
- I’ve asked it political questions myself, and confirm that it has a clear left-wing bias.
- Facilitates cheating on school assignments
- The fear is that this technology will hinder learning and critical thinking on a massive scale, unless safeguards are put in place. (Some plagiarism-detection software has already been released; I haven’t looked into how well it works.)
- Potential for geopolitical consolidation of power
- From El País article (about AI more generally): “Disinformation campaigns, cyberattacks, data collection, facial recognition and the like are already a significant problem, and may develop even more sophisticated capabilities that can disrupt democracies and consolidate internal control in authoritarian regimes.”
- Used underpaid laborers to build toxic content filters
- Since most of ChatGPT’s training data was collected from the internet, OpenAI wanted to build an “AI-powered safety mechanism” (basically a filter for toxic content). To accomplish this, they used outsourced Kenyan workers, paid less than $2 per hour, to label examples of violence, hate speech, sexual abuse, etc., that could then be used as training data for the AI safety system. Beyond the low pay, workers reported experiencing deep psychological harm from repeated exposure to such content…which obviously raises some important ethical questions.
- Invasion into creative sectors
- ChatGPT and other AI tools (e.g. image generation software) will and already are disrupting the artistic world – pursuits that until now were uniquely human, such as painting, creative writing, and music.
- Responses may be insulting / biased
- I haven’t personally experienced this one, and to be honest, it doesn’t bother me much (I think a bit of Stoic training would do wonders for those sunk deep in our current hyper-sensitive culture), but as this is widely reported to be a serious negative, I’m including it here.
In the above section, I outlined some large-scale drawbacks of ChatGPT and this kind of AI technology generally. Next I’d like to detail ChatGPT-specific limitations, that I’ve confirmed or discovered by giving it various prompts myself.
First, I will say that a couple aspects did really impress me – one, the speed with which the chatbot answers; two, the incredibly fluid language of the responses; and three, its “memory” of past dialog.
Now onto the fun stuff!
- Confirmed: ChatGPT is super quick to hallucinate
- I asked it about a current event, which it couldn’t comment on because its last training data dates to September 2021. I asked the same question, but pretended the event happened in 2019. It quickly fabricated a believable account of the event.
- Confirmed: ChatGPT may categorize terms and concepts completely wrong
- It categorized the clause it was as a conjunction, giving an incorrect example with that misclassification.
- It categorized “olive oil” as a “seed oil” (an olive is a fruit, not a seed).
Below are features that voice assistants like Google Assistant and Alexa currently have, that ChatGPT does not have. (Although I’ve heard that integrations between the chatbot and voice assistants are planned or already in-progress.)
- Asking disambiguation questions
- To my question about Portland, I would prefer a follow-up question from ChatGPT, asking something like, “Do you mean Portland, Oregon, or Portland, Maine?”, instead of having to correct its first assumption.
- Pretending to emote
- Some people may view this as a positive, however (as they’re loath to continue blurring the line between human and AI). Then there are the people pining for a reality like that in the movie Her.
- For the record, when I asked the same question to Google Assistant, it responded with “Being helpful to you is the best part of my day 😊”, which admittedly is a bit indirect, but still more personable.
- Making use of personal user context/information to give the most relevant responses
- ChatGPT can’t recommend businesses “nearby” because it doesn’t have your location data (unlike enabled search engines and voice assistants).
- It does give restaurant recommendations within a particular location if your query contains the location name, such as “can you recommend a good Thai restaurant in San Jose?”
- Controlling device behavior
- By “device behavior”, I mean actions like setting an alarm on your phone, turning off the lights in your kitchen, or playing a song on your smart speaker. ChatGPT does not (yet) have the ability to remotely control any smart devices.
There is a personal side to the chatbot brouhaha for me, as I was recently let go in the giant wave of Google / tech layoffs… And although it seems like the layoff decisions were complex, multifactored, and mainly algorithmic, I was working in a field (natural language understanding – or NLU – for Google Assistant) into which Google leadership is now incredibly keen to push LLM technology. And who needs linguists when you have LLMs?! (An answer: If you followed the recent Bard demo debacle, which cost Google $100 billion from a single-day drop in shares, well… my linguist colleague found the training data source and we quickly spotted the ambiguous phrasing and context that contributed to Bard’s mistake. Hmm…)
I have to end with the hackneyed but nevertheless true observation (repeated across many other recent articles on ChatGPT and AI) that, despite the host of current issues, it behooves us all to reconcile with this technology, and to improve it across every dimension – because it is very likely here to stay.
Note: The featured images at the top of the post were made with Midjourney (an image generator), using the prompt “a babel-like explosion of words a tower a creepy beautiful woman – all in the style of Anne Bachelier”. Bizarrely, I tried multiple prompts, with more explicit focus on “words” and “letters”, but could not get a picture that rendered actual words of any kind.
 See fuller list in https://en.wikipedia.org/wiki/Language_model#Notable_language_models
 Email me or otherwise get in touch if you’d like to know the details!