A Whirlwind Tour of Linguistics and Linguists in Tech

At the beginning of May I presented a talk at an event called “AVL Tech Connect,” which is a monthly occurrence of the AVL Digital Nomads group (a wonderful community that I’ve joined here in Asheville, NC). The presentation was well-received, and I thought I’d share it with the wider internet.

Linguists bring unique qualifications to areas like natural language processing (NLP), natural language understanding (NLU), and AI. My hope is that the explanations and examples in these slides help highlight (particularly for those in tech, but really, for anyone who’s interested) what some of these qualifications are, and why they’re valuable – even, or especially! in the age of LLMs. (And first, enjoy a lightning-fast primer on the field of linguistics. 😊 )

You can view the full presentation here if you’d like to look at the deck separately.

Without further ado, here are the slides.

Note: I’ve inserted my speaker notes and additional details below where relevant.

Phonetics/Phonology:

Phonetics includes the physical properties of speech sounds (e.g. their articulation; acoustic aspects).
A “phoneme” is the smallest unit of speech that distinguishes one word or word element from another in a particular language.

Morphology:

A “morpheme” is the smallest unit of language that has its own meaning (either a word or a part of a word).

Syntax

- Images: The white “tree” is a constituency syntax parse tree; the dark grey image is a dependency syntax parse. The sentence parsed in both, since it’s probably very hard to read from there, is “my parakeet drinks mimosas in the morning.” Dependency and constituency parsers have been used in NLP for some time, but are now being replaced (or used in conjunction with) neural net / deep learning-based parsers.

Semantics

- Within this domain, there’s lexical semantics (the study of word meaning) and sentential or phrasal semantics (the study of phrase/sentence meaning, including propositional logic).

Pragmatics

- Another common example folks like to use: When you’re sitting at the dinner table and your friend says “Can you pass the salt?” you know that she’s making a request, not asking about your physical capability to perform the action.

Traditionally

Linguistics has fused with many of its related disciplines to create specialties like Psycholinguistics, Cognitive Linguistics, Sociolinguistics, and Neurolinguistics.

Recent shift

This timeframe and these observations are just impressionistic (based on my experience) – I haven’t researched the actual trends to confirm, but I have seen them first-hand.

Even more titles similar to “Analytical Linguist”: AI Linguist, Data Specialist, Language Engineer, Language Scientist
Note: Basically, FAANG (Facebook/Meta, Amazon, Apple, Netflix, and Google) + Microsoft and other tech giants employ all of the first 3 types of people, but they’re not listed for every one in order to not be redundant – and to show that other companies outside of big tech also employ linguists.
A few good articles on Knowledge Graphs (KGs) and ontologies, if you’d like to learn more:

Ex. (A): [Article link] Google’s “GoEmotions” dataset: A human-labeled dataset of 58K Reddit comments categorized according to 27 emotions.

Built for use in chatbots, for task of suggesting emojis in conversational text, improved customer support interactions
Missed idioms, sarcasm, profanity, basic English, US politics & culture, Reddit memes
Problem #1: Comments presented with no extra metadata
Problem #2: “All raters are native English speakers from India” – unfamiliar with US English & culture (esp. Reddit culture)
–> Underscores the importance of accurate data labeling, and the importance of the data labelers’ knowledge of context for the data (see also next slide).

Ex. (B): The “Set goals” image presents options from Grammarly’s AI writing assistant for the user.

Another example of verbosity issues in this space: ChatGPTs have a very verbose, super polite but also pedantic register/style that bothers many users (likely due to reinforcement learning). Linguists could help fix this, avoiding the need for users to have to constantly tweak their prompts.
ChatGPT’s verbosity has been a sticking point for me personally.

Ex. (A): Here’s an excellent, fun list of hundreds of such titles: songs-that-are-also-names-of-movies.

“AI chatbot models”: “The researchers tracked back the processes the LLM went through to answer each prompt. They found that the path of processing through the layers almost always passes via what they call the English subspace. If asked to translate Chinese to Russian, the correct Russian characters travel through the English subspace, before going back to Russian”
(A): Another lexical-semantic problem example is how languages carve up the spatial domain. Concepts/Words like English prepositions ‘on’ or ‘in’ aren’t encoded the same way in other languages.

* * * * * * * * * * * * * * *

A final note: If by chance you’re a linguist working in industry, I would love to hear what you do, skills you use regularly, problems you help solve, et cetera. Share in the comments or contact me!

A Whirlwind Tour of Linguistics and Linguists in Tech

Like this:

Leave a Reply Cancel reply

A Whirlwind Tour of Linguistics and Linguists in Tech

Share this:

Like this:

Related Posts

Real-life Star Trek: Humans fail to detect speech deepfakes

Plugging the holes of AI with Knowledge Graph goodness

Leave a Reply Cancel reply