Career interviews: Computational linguist for a virtual assistant

working wugs_cropped

Wugs go to work

After much delay (eek! just realized it’s been a year!), I have another interview with a career linguist for your reading pleasure. [See the first interview here.] Even though I still get the “I’ve never met a real-live linguist” reaction when telling folks what I do, these days there are indeed people working full-time, earning living wages, as these specialized language nuts – and not all as professors in academia, or as translators/interpreters for the UN.

* * * * *

Just like with my last interviewee, I met Allan at Samsung Research America, where we worked together on Bixby, Samsung’s virtual voice assistant. On the Bixby linguist team, we worked with engineers, Quality Assurance (QA) testers and others to develop a personal assistant that would carry out thousands of different spoken user commands. Also like with my last interviewee, Allan is no longer at the job I interviewed him about. (He’s now a Language Engineer on Amazon’s Alexa!). I’m keeping questions and answers in present tense, however, because I feel like it.

Allan Schwade, a graduate student in linguistics, won the Humanities Division Dean's Award for his poster on the adaptation of Russian words by English speakers

  1. What kind of work do you do?

I’m a computational linguist, which means I create solutions for natural language processing problems using computers. More specifically, I work on the systems and machine learning models that enable your smart devices to understand you when you say “set an alarm for 7am” or “tell me the weather in Chicago”.

  1. Describe a typical day at your job.

I usually start the day by meeting with my manager. The lab I work in supports products in production and conducts research and development for smart devices. If there is an issue with a product in production, I’ll work with the team to solve the problem. Usually this involves curating the training data for the machine learning models – removing aberrant data from training sets or generating new data to support missing patterns. If nothing is on fire, there are usually several projects I’ll be working on at any given time. Projects generally start out with me doing a lot of reading on the state of the art, then I’ll reach a point where I’m confident enough to build a proof of concept (POC). While I’m creating the POC, the linguists will generate data for the models. Once the code and data are ready, I’ll build the models and keep iterating until performance is satisfactory. The only really dependable thing in my schedule is lunch and a mid-afternoon coffee break with colleagues, which are both indispensable.

  1. How does your linguistics background inform your current work?

My degree in linguistics is crucial for my current line of work. When building machine learning models, so much rests on data you feed into your models. If your data set is diverse and representative of the problem, your model will be robust.

Having a linguistics background also gives me quick insight into data sets and how to balance them. Understanding the latent structures in the data allows me to engineer informative feature vectors for my models (feature vectors are derived from the utterances collected and are the true inputs to the machine learning model).

  1. What do you enjoy most and/or least about the job?

I really enjoy getting to see differences between human and machine learning. We have a pretty good idea of the types of things humans will attend to when learning language, but sometimes those things aren’t informative for machines. It can be frustrating when something I’d call “obvious” is useless in a model and even more frustrating when something “marginal” is highly informative. But I never tire of the challenge, the satisfaction I feel at the end of a project is worth it.

The thing I enjoy least is data annotation. The process of doing it is indispensable because you become intimately familiar with the problem, but after a couple of hours of it my mind goes numb.

  1. What did you study in college and/or grad school?

I got my BA from Rutgers University and my MS from the University of California, Santa Cruz. Both degrees were in linguistics and both schools specialized in generative linguistic theory. I enjoyed a lot about the programs but they did a better job of preparing people for careers in academia than industry. Learning programming or common annotation tools and schemas before graduating would have made industry life easier for me.

  1. What is your favorite linguistic phenomenon?

Loanword adaptation! I wrote my master’s thesis on it. Seeing how unfamiliar phonemes are digested by speakers never fails to pique my interest. In general, I love it when stable systems are forced to reconcile things outside their realm of experience.

  1. (If you had the time) what language would you learn, and why?

As a phonetician I’d love to learn Georgian for its consonant clusters, Turkish for its morpho-phonology, Hmong for its tones, or ASL because it’s a completely different modality than what I specialized in. As a subjective entity who does things for personal enjoyment, I’d love to learn Japanese.

  1. Do you have any advice for young people looking to pursue a career in linguistics?

If you want to go into industry doing natural language processing, I cannot stress enough how important the ability to code is. It’s true that for annotation work you won’t usually need it, but if you want to be annotation lead, the ability to write utility scripts will save you a lot of time. Also, how I transitioned from annotator to computational linguist came from me showing basic coding competency – the engineers were too busy to work on some projects so they threw the smaller ones my way. This brings me to my next piece of advice: always voice your interest in things that interest you to those with the potential to get you involved. Telling your co-worker you really want to work on a cool project will do next to nothing, but telling your manager or the project lead that you are interested in a project may get you involved.

Career interviews: Linguistics Project Manager at a branding firm

working wugs_cropped

Wugs go to work

Something I’ve been planning to post occasionally are interviews with career linguists and related language folk – especially those working outside of academia. Yes, they (we) exist! Until recently these were rare birds, but lately the numbers are growing. I credit several factors: the growth of the discipline generally; the growth of technology industries trying to wrangle natural language; globalization; and (sadly), the increasing impracticality of landing a faculty position that pays a living wage, at least in the U.S.

Another (more popular!) language and linguistics blog has been running a job interview series over the last several years as well. I encourage you to also take a look over there: Superlinguo Linguist Job Interviews.

* * * * *

I met Noah on our team of linguists at Samsung Research America. For this interview, I asked him to talk about the job he had previous to Samsung – which was at Lexicon, a branding agency based in Northern California. Lexicon has come up with brand names for some of today’s most popular products, including Blackberry, Febreze, (Subaru) Outback, Dasani, Swiffer, Pentium, and ThermaCare.

noah

  1. What kind of work did you do at Lexicon?

I was a linguistics project manager, which basically meant that I coordinated with Lexicon’s network of linguists worldwide (85 countries with something like 50 or 60 languages represented). I basically sent them lists of names for real-time evaluation and also helped coordinate with another linguist in Quebec to prepare reports for deeper dives into particular names in order to ascertain particular issues a name might face in a target language or culture. Basically, you learn a lot of multilingual profanity doing this, and realize you shouldn’t name a company Zinda.

  1. Describe a typical day at that job.

It was a small company, so I wore whatever hats necessary. I prepared real-time and comprehensive reports, editing and working with the linguists to determine whether or not a given name that either a client had brought to us or one that we had brought would work well in a particular language, and also trying to read between the lines to figure out whether we should take a linguist’s comments at face value or do a little more digging and cross-checking, including interviews with native speakers. This was mostly done by our network, not in-house. But aside from the linguisty side, I also created names. Lots of names for lots of projects. Most of which didn’t actually make it through, but still it was a creative pursuit that stretched creativity. I also helped write a program to categorize and classify and try to ascertain good brand names using NLP [Natural Language Processing] techniques. Things like consonance, assonance, alliteration, etc. It was pretty helpful for going through our backlog of names and finding viable names to use going forward.

  1. How did your linguistics background inform that work?

Well, my fascination with language itself inherently got me the job and kept me entertained, though it would have easily, I think, been doable with some other kind of background. But creating a good name, actually looking into the science of sound symbolism, helping with a few linguistic studies. Pretty cool stuff.

  1. What did you enjoy most and/or least about the job?

I most enjoyed getting to see what kinds of things big clients were trying to market and create next. Some pretty cool things there, with an insider’s perspective as to what the market was going to look like in the future. Issues were managerial in nature, in combination with the claustrophobia that a small company can engender, but overall it was a very good way to get some experience in the field.

  1. What did you study in college and/or grad school?

Major: Linguistics. Minor: English. Minor: Business Administration (useless). Interest: everything else.

  1. What is your favorite linguistic phenomenon?

Splicing. Or whatever that thing was that we came up with as an inside joke that you should write a fake blogpost on.[1]

  1. (If you had the time) what language would you learn, and why?

ASL [American Sign Language]. As a monolingual, sign language has always fascinated me the most, oddly enough. Alas, those CSD [Communication Sciences and Disorders] students and their required classes.

  1. Do you have any advice for young people looking to pursue a career in linguistics?

Be overzealous, and marry linguistics to another discipline. In industry, you’ve got for the most part three choices: linguistics + [design, management, or computational]. Or astrobiology and linguistics if you happen to work with NASA. Might be cool. But yeah, linguistics is interdisciplinary by nature, so I assume everyone who studies it must enjoy the interplay of different subjects like I do. Oh, maybe start a computational linguistics club in undergrad when you don’t know anything about computational linguistics. It’ll make you learn, if nothing else.

 

 

[1] Noah sent me his responses long enough ago now that I cannot for the life of me remember what this was. Not that I would explain it even if I could remember, to preserve the opportunity of writing said fake blogpost. 😛

What is linguistics, and what do linguists do?

mylingbooks

I love patterns. They’re how we learn and evolve, and they’re everywhere.

Here’s a pattern for you.
When I tell someone new that I do linguistics, their response often goes like this:

Nod and/or smile and/or give small verbal acknowledgment.
Slight awkward pause.
“And what is linguistics again exactly?”[1]

People know that linguistics has to do with language, but beyond that, things get fuzzy. My goal with this post is to unfuzz (defuzz? disfuzz?) the basics of the field.

Most succinctly put, “Linguistics is the scientific study of language”. Like all sciences, linguistics is about patterns. Identifying them, analyzing them, making generalizations about them, making predictions (or hypotheses) from the generalizations, and then testing the predictions. What does that mean more specifically?

Well, what is language? Language is a conventionalized and arbitrary pairing of form and meaning. The form is usually sound, but it can also be gesture – in the case of sign language. There are many levels at which we might observe and analyze such form-meaning pairings, and these levels comprise the main subfields of the linguistics discipline. I’ll introduce each subfield through a couple of questions:

  1. How do our mouth, tongue, and throat produce consonants versus vowels? How do we segment a continuous stream of speech into words, so that we may understand it? How do we perceive sounds as belonging to our native language(s) versus other languages?

The study of speech sounds is Phonetics, and speech patterns, Phonology.

  1. What is going on when we add the prefix un- to the word happy, and the resulting word (unhappy) means the opposite of happy? How do words like steampunktoberfest, appletini, or totes come about? Why is the plural for cat cats, while the plural for mouse is mice?

The study of word structure and formation is Morphology.

  1. Why do we say the red car in English (with the adjective before the noun), when French has la voiture rouge and Spanish el coche rojo (both with the adjective after the noun)? Why is the interpretation of John saw the man with a telescope ambiguous?

The study of sentence structure is Syntax.

  1. How do we know that a poodle is a type of dog, or that if something is alive it cannot also be dead, or that Maddie plays the drums like a rock star must imply that Maddie plays the drums?

The study of meaning is Semantics.

  1. Why do we understand that it is annoying to say “Yes” (and take no subsequent action) in answer to your dinner partner’s question “Can you pass the salt”?

The study of discourse in context is Pragmatics.

 

Once we’ve discussed what linguistics is, the question that inevitably follows looks something akin to: where does studying language patterns get you in the real world? What do linguists actually do for a living? Until more recently, linguists were generally constrained to teaching and researching within academia. Many still do follow that path. However, in the last couple decades, various industrial sectors have realized the necessity of employing people with serious language knowledge. Here is a short list of possible careers outside of academia for those with a linguistics background:

  • Computational Linguist (works on improving computers’ ability to “understand” and generate human language – often in machine learning contexts)
  • Conlanger for Movie/TV Industry (invents new languages based on attested linguistic principles)
  • Data Scientist (statistically analyzes large amounts of data to provide business insights)
  • Field Linguist/Researcher (documents endangered or dying languages – although often from a university position)
  • Forensic Linguist (analyzes legal and judicial language; provides linguistic evidence in legal proceedings)
  • Lexicographer (builds dictionaries)
  • Naming/Branding Consultant
  • Nonprofit sociolinguistic research
  • Second or Foreign Language Instructor
  • Speech-Language Pathologist (diagnoses and treats communication, voice, and swallowing disorders)
  • Translator & Interpreter

Here are a few cool examples of actual people using their linguistics training in the real world:

One of my acquaintances is an interactional sociolinguist at the FrameWorks Institute, a nonprofit organization that conducts research on sociopolitical and scientific topics like aging, criminal justice, and climate change. FrameWorks investigates the language used in talk about these subjects, and teaches ways of reframing each issue. The woman I know manages the Institute’s Learning Unit, where she organizes professional learning events for advocates who want to change particular social dialogues.

Another friend of mine is a Speech-language Pathologist, or SLP. She works with veterans at the VA Hospital in San Francisco. Her patients have swallowing conditions, aphasia, and other disorders that interfere with speaking or understanding. The SLP path requires a master’s in Communicative Disorders/Speech-Language Pathology. Although it doesn’t require a degree in linguistics, my friend has this too, and she says that it has lent her a deeper understanding of the disorders she’s trying to treat, as well as the subtleties involved in clinician-patient communication.

David Peterson is neither a friend nor an acquaintance, although I wish he was one. He is a conlanger who created Dothraki and Valerian for the HBO series “Game of Thrones”. Dothraki and Valerian are not just random sets of made-up words. They are full languages, with their own phonology, morphology, and syntax. For example, to form a question in Dothraki – as in Hash yer ray tih zhavors chiorisi anni (“Have you seen my lady’s dragon?”) – one must include a word whose main purpose is to formulate questions, hash. English lacks a single separate word with just this function; instead, we use multifunctional auxiliary verbs like do, be and have, or rising intonation. French on the other hand does have a word with this unique function: est-ce que (subject-verb inversion and rising intonation are other possible strategies). Conlanging for film basically started in the eighties with Marc Okrand, the inventor of Vulcan and Klingon, used in the Star Trek movies. With sci-fi/fantasy shows becoming more and more involved these days, the opportunity for such constructed language work seems to be growing.

And then, take a watch of these videos. Anna Marie Trester, author of the Career Linguist blog, has interviewed and recorded multiple linguist folks (me among them!) working in different areas of industry.

 

I’d like to wrap up with some historical and contextual nuggets about the field.

Linguistics termed as such, and as its own independent discipline, is relatively new. It arose at the beginning of the twentieth century; the University of California (Berkeley) formed America’s first “Department of Linguistics” in 1901. Edward Sapir and Leonard Bloomfield were two prominent linguists early on. There was also structuralism or structural linguistics which dealt with signs, syntax, and other formal units of language. Main characters included Ferdinand de Saussure and Roman Jakobson. In the 1950s, Noam Chomsky devised his generative theory of language and Universal Grammar, and the discipline really took off. Chomsky is thus usually known as the “father of modern linguistics”.

Pre-twentieth century, philology (the study of ancient languages and texts), and then comparative philology (studies comparing European languages and language groups) existed from the middle ages through the 1800s. The first formal study of language comes from India; in the fifth century BC a man named Pāṇini categorized Sanskrit consonants and vowels, word classes like nouns and verbs, and other patterns.

One curious aspect of linguistics is that it has borrowed a good bit of terminology (and corresponding concepts) from biology. My brother is getting his PhD in lichenology, a little-known subfield of biology, and it’s super fun for us to chat about our respective fields because there’s an immediate overlap of understanding. For instance, linguistics uses the terms root, stem, tree to describe words and phrase structures. It adopts jargon like morphology, genealogy, diachronic, convergent and divergent evolution. A fascinating “language as organism”[2] metaphor appears frequently.

Lastly, linguistics is a small field. Even large university departments usually count no more than twenty to twenty-five graduate students at a time. Meeting another linguist randomly, outside of dedicated school or work contexts is, for me at least, a rare treat. Meeting people who want to talk about language, however, is wonderfully common! And understandably so – it applies to us all. I hope my post has provided a sprinkling of insight into this universal human subject.

 

Please check back soon for upcoming content – planned posts include a linguist’s perspective on speech errors, an explanation of the nifty phenomenon of metathesis (where sounds, syllables, or words are switched around), and summaries of Japanese and Korean writing systems.

 

[1] Another frequent response is: “So how many languages do you speak?” See this great post addressing the topic: http://allthingslinguistic.com/post/48473292525/why-linguists-hate-being-asked-how-many-languages

[2] Janse, M., Verlinden, A., & Uhlenbeck, E.M. (1998). Productivity and Studies in General and Descriptive Linguistics in Honor of E.M. Uhlenbeck. Trends in linguistics, 116, 197.