After much delay (eek! just realized it’s been a year!), I have another interview with a career linguist for your reading pleasure. [See the first interview here.] Even though I still get the “I’ve never met a real-live linguist” reaction when telling folks what I do, these days there are indeed people working full-time, earning living wages, as these specialized language nuts – and not all as professors in academia, or as translators/interpreters for the UN.
* * * * *
Just like with my last interviewee, I met Allan at Samsung Research America, where we worked together on Bixby, Samsung’s virtual voice assistant. On the Bixby linguist team, we worked with engineers, Quality Assurance (QA) testers and others to develop a personal assistant that would carry out thousands of different spoken user commands. Also like with my last interviewee, Allan is no longer at the job I interviewed him about. (He’s now a Language Engineer on Amazon’s Alexa!). I’m keeping questions and answers in present tense, however, because I feel like it.
- What kind of work do you do?
I’m a computational linguist, which means I create solutions for natural language processing problems using computers. More specifically, I work on the systems and machine learning models that enable your smart devices to understand you when you say “set an alarm for 7am” or “tell me the weather in Chicago”.
- Describe a typical day at your job.
I usually start the day by meeting with my manager. The lab I work in supports products in production and conducts research and development for smart devices. If there is an issue with a product in production, I’ll work with the team to solve the problem. Usually this involves curating the training data for the machine learning models – removing aberrant data from training sets or generating new data to support missing patterns. If nothing is on fire, there are usually several projects I’ll be working on at any given time. Projects generally start out with me doing a lot of reading on the state of the art, then I’ll reach a point where I’m confident enough to build a proof of concept (POC). While I’m creating the POC, the linguists will generate data for the models. Once the code and data are ready, I’ll build the models and keep iterating until performance is satisfactory. The only really dependable thing in my schedule is lunch and a mid-afternoon coffee break with colleagues, which are both indispensable.
- How does your linguistics background inform your current work?
My degree in linguistics is crucial for my current line of work. When building machine learning models, so much rests on data you feed into your models. If your data set is diverse and representative of the problem, your model will be robust.
Having a linguistics background also gives me quick insight into data sets and how to balance them. Understanding the latent structures in the data allows me to engineer informative feature vectors for my models (feature vectors are derived from the utterances collected and are the true inputs to the machine learning model).
- What do you enjoy most and/or least about the job?
I really enjoy getting to see differences between human and machine learning. We have a pretty good idea of the types of things humans will attend to when learning language, but sometimes those things aren’t informative for machines. It can be frustrating when something I’d call “obvious” is useless in a model and even more frustrating when something “marginal” is highly informative. But I never tire of the challenge, the satisfaction I feel at the end of a project is worth it.
The thing I enjoy least is data annotation. The process of doing it is indispensable because you become intimately familiar with the problem, but after a couple of hours of it my mind goes numb.
- What did you study in college and/or grad school?
I got my BA from Rutgers University and my MS from the University of California, Santa Cruz. Both degrees were in linguistics and both schools specialized in generative linguistic theory. I enjoyed a lot about the programs but they did a better job of preparing people for careers in academia than industry. Learning programming or common annotation tools and schemas before graduating would have made industry life easier for me.
- What is your favorite linguistic phenomenon?
Loanword adaptation! I wrote my master’s thesis on it. Seeing how unfamiliar phonemes are digested by speakers never fails to pique my interest. In general, I love it when stable systems are forced to reconcile things outside their realm of experience.
- (If you had the time) what language would you learn, and why?
As a phonetician I’d love to learn Georgian for its consonant clusters, Turkish for its morpho-phonology, Hmong for its tones, or ASL because it’s a completely different modality than what I specialized in. As a subjective entity who does things for personal enjoyment, I’d love to learn Japanese.
- Do you have any advice for young people looking to pursue a career in linguistics?
If you want to go into industry doing natural language processing, I cannot stress enough how important the ability to code is. It’s true that for annotation work you won’t usually need it, but if you want to be annotation lead, the ability to write utility scripts will save you a lot of time. Also, how I transitioned from annotator to computational linguist came from me showing basic coding competency – the engineers were too busy to work on some projects so they threw the smaller ones my way. This brings me to my next piece of advice: always voice your interest in things that interest you to those with the potential to get you involved. Telling your co-worker you really want to work on a cool project will do next to nothing, but telling your manager or the project lead that you are interested in a project may get you involved.
I want to learn Georgian too!
My favorite is the script!
https://www.google.com/search?q=georgian+writing&client=firefox-b-1-d&sxsrf=ALeKk01zTZKRxeCIKRLZVcNksfvtD0HL8w:1595736748208&source=lnms&tbm=isch&sa=X&ved=2ahUKEwizr6DQhurqAhUPJDQIHaP2A7gQ_AUoAXoECBwQAw&biw=1920&bih=925