Digital voice agents like Alexa, Siri, and Google Assistant are all the rage these days. But when we talk to our smart devices, are they actually “understanding” our speech in the same way that another human understands it? Take the command, “Find flights from Chicago to New York on February 21.” We can easily comprehend this sentence; our newborn brains were predisposed to acquire language, and we’ve been using it ever since.
Computers, on the other hand, cannot acquire language. They must be trained. In order to train them, computational linguists, other linguists, and engineers have broken language down into more manageable parts that can be tackled individually. Automatic speech recognition (ASR) deals with training machines to recognize speech (via acoustic properties, etc.), and convert that speech to text. Next, natural language processing (NLP) attempts to figure out what is meant by that text[1]. An NLP system itself is composed of multiple modules[2], one of which will likely be a syntactic parser.
Today we’re going to delve into the parser component. Let’s start with some syntactic basics!
Syntax is the set of rules and processes governing sentence structure in any natural language. It involves things like word order, and constituents (words or phrases that form functional units). One of the most common ways to represent syntactic information (at least as of the 20th century) is with a syntax tree. Traditional syntax trees specify:
- The words of a phrase/sentence
- Part of speech for each word, usually abbreviated
- N (noun); V (verb); P (preposition); D or DET (determiner, a.k.a. article); A (adjective); etc.
- Larger phrases, also abbreviated
- S (sentence); NP (noun phrase); VP (verb phrase); etc.
- Relationships between all of the words and phrases
- These are hierarchical relationships that show how constituents combine into larger ones (or split into smaller ones, if starting from the opposite end of the tree)
Here’s a tree diagram (specifically, a constituency tree) for the sentence, “My parakeet drinks mimosas in the morning”:
You can see that my parakeet forms a larger chunk which is a noun phrase, in the morning forms a larger chunk which is a prepositional phrase, drinks mimosas in the morning forms an even larger chunk which is a verb phrase, and both the NP and VP combine to form the largest chunk, a full sentence S. Remember that syntax focuses on phrasal order and structure, not meaning or context – so it can’t tell us why on earth you’re feeding boozy orange juice to your pet bird.
Onto the parsing! Very generally, a parser is a piece of software (often a trained machine learning model) that takes input text, and outputs a parse tree or similar structural representation, based on syntactic rules and statistics learned from its training data.
Syntactic parsers include a component called a Context-Free Grammar, which has:
- A set of non-terminal symbols – abbreviations for language constituents (lexical parts of speech and phrasal types):
{S, NP, VP, PP, D, N, A…}
- A set of terminal symbols – words of the phrase/sentence:
{drinks, parakeet, mimosas, morning, my, in, the}
- A set of rules like:
S → NP VP (a sentence S is composed of a noun phrase NP and verb phrase VP)
NP → D N (a noun phrase NP is composed of a determiner D and a noun N)
VP → VP PP (etc.)
PP → P NP
- A start symbol: S
The parser starts at S, and applies its rules successively, until it arrives at the terminal symbols. The resulting parse is the labeled relationships connecting those terminals (i.e. words).
There are two main kinds of syntactic parsers: dependency and constituency. To keep this post to a reasonable length, I’ll focus on dependency only, but constituency parsers output structures similar to the parakeet tree above[3]. A dependency parser builds a tree for each input sentence by starting with a sentence root (usually the main verb), and assigning a head word to each word, until it gets to the end of the sentence. (Heads link to dependents.) When it’s done, each word has at least one branch, or relationship, with another word. The parser also characterizes each word-word relationship. These are things like: nominal subject of a verb (“nsubj”); object of a verb or a preposition (“dobj” and “pobj,” respectively); conjunction (“cc” for the conjunction word, and “conj” for the elements being conjoined); determiner (“det”); and adverbial modifier (“advmod”).
A visualized example will probably help. Taking that same sentence, “My parakeet drinks mimosas in the morning,” a visualization of the dependency parse might look like this:
Can you spot the root, or main verb? It’s the one without any arrows going towards it: drinks. The parser then finds the subject of drinks, which is parakeet, and labels that relationship “nsubj.” It finds mimosas as the direct object of drinks, and labels it “dobj.” And so on and so forth.
Let’s look at another example, for a dollop of variety. Here is “Mr. Vanderloop had smiled and said hello”:
In this one, the past participle smiled is the root/main verb, which has multiple dependents: its subject Vanderloop, its auxiliary (a.k.a. “helping verb”) had, its conjunction and, and the other verb with which it conjoins, said. The subject Vanderloop has a dependent Mr., with which it forms a compound (proper) noun; said’s dependent is the interjection hello.
How about our sentence from the beginning, “Find flights from Chicago to New York on February 21”? How might it be parsed? (You can check your hypotheses by typing the sentence into an interactive demo of the displaCy dependency visualizer, from which the visualizations above also came[4].) Something to keep in mind here is that English imperative structure leaves the subject – whoever is being addressed – implicit.
A slight aside: I’ve chosen simple examples for demonstration, but parsing gets decidedly complicated when input sentences are themselves complicated. Questions, subordinate clauses, coordination (or all three: “What’s the name of the movie where the guy drives a flying taxi and saves the human race from aliens?”), and structurally ambiguous sentences (“The horse raced past the barn fell”) get tricky quickly.
So now we have some parsed output. How is this structured, annotated data useful? Well, one thing you can do with these word relations is identify noun phrases. Identifying noun phrases across sentences helps with another step in the NLP pipeline called Named Entity Recognition, or NER. NER tries to recognize nouns/noun phrases (names, places, dates, etc.) and label them with categories of concepts from the real world. In our flights example, “Chicago” and “New York” should get tagged with some label like CITY or GEOGRAPHIC LOCALE, and “February 21” should get tagged with DATE. Once a text has been automatically annotated for such named entities, information about those entities can then be pulled from a knowledge base (say, Wikipedia).
Having parts of speech and word relations also makes it easier to match up the specifics of a given user command (e.g. “Text mom saying I’ll call tonight,” or “Show popular Thai restaurants near me”) with slightly more generalized intents (e.g. Send text or Get restaurants); machine models can start learning how words typically pattern across the main verb and direct object positions for various commands. Code then uses the more generalized intent to fulfill that request on a device – be it smartphone, tablet, or home speaker. “Find flights from Chicago to New York on February 21” would hopefully be matched with a more general Get flights intent, and the particular noun phrases could be passed to fields for origin, destination, and date.
* * * * *
Before leaving you to your holiday leftovers, I’d like to reiterate that syntactic parsing is only one step in an NLP system. Its parses don’t tell us much about the actual semantics of the linguistic input. Language meaning, however, is a whole other ball of wax, best left for the new year…
[1] There is often terminological confusion between NLP and NLU (natural language understanding). See this graphic for one common breakdown, although I’ve heard the terms used interchangeably as well.
[2] If you’re interested to learn about other NLP steps, read this accessible post, Natural Language Processing is Fun!
[3] You can also play around with this interactive demo from Stanford CoreNLP, http://corenlp.run. In the second “Annotations” field dropdown, make sure you have “constituency parse” selected.
[4] The visualizer is from the creators of spaCy, an awesome open-source NLP library in Python; a dependency parser is one of its components.
This post is amazing, Hannah!
Sent from my iPhone
>
Thanks so much, Warren! Sorry for my crazy delay in seeing your comment.