A Whirlwind Tour of Linguistics and Linguists in Tech

0

At the beginning of May I presented a talk at an event called “AVL Tech Connect,” which is a monthly occurrence of the AVL Digital Nomads group (a wonderful community that I’ve joined here in Asheville, NC). The presentation was well-received, and I thought I’d share it with the wider internet.

Linguists bring unique qualifications to areas like natural language processing (NLP), natural language understanding (NLU), and AI. My hope is that the explanations and examples in these slides help highlight (particularly for those in tech, but really, for anyone who’s interested) what some of these qualifications are, and why they’re valuable – even, or especially! in the age of LLMs. (And first, enjoy a lightning-fast primer on the field of linguistics. 😊 )

You can view the full presentation here if you’d like to look at the deck separately.

 

Without further ado, here are the slides.

Note: I’ve inserted my speaker notes and additional details below where relevant.

 

 

 

 

Phonetics/Phonology:

  • Phonetics includes the physical properties of speech sounds (e.g. their articulation; acoustic aspects).
  • A “phoneme” is the smallest unit of speech that distinguishes one word or word element from another in a particular language.

Morphology:

  • A “morpheme” is the smallest unit of language that has its own meaning (either a word or a part of a word).

Syntax

    • Images: The white “tree” is a constituency syntax parse tree; the dark grey image is a dependency syntax parse. The sentence parsed in both, since it’s probably very hard to read from there, is “my parakeet drinks mimosas in the morning.” Dependency and constituency parsers have been used in NLP for some time, but are now being replaced (or used in conjunction with) neural net / deep learning-based parsers.

Semantics

    • Within this domain, there’s lexical semantics (the study of word meaning) and sentential or phrasal semantics (the study of phrase/sentence meaning, including propositional logic).

Pragmatics

    • Another common example folks like to use: When you’re sitting at the dinner table and your friend says “Can you pass the salt?” you know that she’s making a request, not asking about your physical capability to perform the action.

Traditionally

  • Linguistics has fused with many of its related disciplines to create specialties like Psycholinguistics, Cognitive Linguistics, Sociolinguistics, and Neurolinguistics.

Recent shift

  • This timeframe and these observations are just impressionistic (based on my experience) – I haven’t researched the actual trends to confirm, but I have seen them first-hand.

Ex. (A): [Article link] Google’s “GoEmotions” dataset: A human-labeled dataset of 58K Reddit comments categorized according to 27 emotions.

  • Built for use in chatbots, for task of suggesting emojis in conversational text, improved customer support interactions
  • Missed idioms, sarcasm, profanity, basic English, US politics & culture, Reddit memes
  • Problem #1: Comments presented with no extra metadata
  • Problem #2: “All raters are native English speakers from India” – unfamiliar with US English & culture (esp. Reddit culture)
  • –> Underscores the importance of accurate data labeling, and the importance of the data labelers’ knowledge of context for the data (see also next slide).

Ex. (B): The “Set goals” image presents options from Grammarly’s AI writing assistant for the user.

  • Another example of verbosity issues in this space: ChatGPTs have a very verbose, super polite but also pedantic register/style that bothers many users (likely due to reinforcement learning). Linguists could help fix this, avoiding the need for users to have to constantly tweak their prompts.
  • ChatGPT’s verbosity has been a sticking point for me personally.

Ex. (A): Here’s an excellent, fun list of hundreds of such titles: songs-that-are-also-names-of-movies.

  • AI chatbot models”: “The researchers tracked back the processes the LLM went through to answer each prompt. They found that the path of processing through the layers almost always passes via what they call the English subspace. If asked to translate Chinese to Russian, the correct Russian characters travel through the English subspace, before going back to Russian”
  • (A): Another lexical-semantic problem example is how languages carve up the spatial domain. Concepts/Words like English prepositions ‘on’ or ‘in’ aren’t encoded the same way in other languages.

 

* * * * * * * * * * * * * * *

 

A final note: If by chance you’re a linguist working in industry, I would love to hear what you do, skills you use regularly, problems you help solve, et cetera. Share in the comments or contact me!

Related Posts

Leave a Reply

Your email address will not be published. Required fields are marked *

© All Rights Reserved
Proudly powered by WordPress | Theme: Shree Clean by Canyon Themes.