Frame Semantics and FrameNet

FN image

I’d like to discuss a theory in cognitive linguistics which is very near to my heart[1]: frame semantics. I’ll also present FrameNet, a database built using frame semantic theory, which has been and continues to be an excellent resource in the fields of natural language processing (NLP) and machine learning (ML).

Why is frame semantics cool? Why should you want to learn about it? Just this: the theory is an intuitive and comprehensive way to categorize the meaning of any scenario you could possibly dream up and express via language. Unlike many other semantic and syntactic theories, the core concepts are quickly understandable to the non-linguist. What’s more, frame semantics can apply to language meaning at many different levels (from the tiniest morpheme to entire swaths of discourse), and it works equally well for any particular language – be it English, Mandarin, Arabic, or Xhosa. I’ll try to demonstrate the theory’s accessibility and applicability with some details.

American linguist Charles Fillmore developed the frame semantics research program in the 1980s, using the central idea of a frame: a cognitive scene or situation which is based on a person’s prototypical understanding of real-world (social, cultural, biological) experiences. A frame is ‘evoked’ by language – this can be a single word (called a lexical unit), a clause, a sentence, or even longer discourse. Each frame contains various participants and props, called frame elements (FEs). If you’ve studied syntax/semantics (the generative grammar kind), FEs are somewhat analogous to traditional theta roles.

FrameNet is a corpus-based lexicographic and relational database (sort of a complex dictionary) of English frames, the lexical units evoking them, annotated sentences containing those lexical units, and a hierarchy of frame-to-frame relations. It was built and continues to grow at the International Computer Science Institute (ICSI), a nonprofit research center affiliated with UC Berkeley. FrameNets have also been developed in other languages, such as Spanish, Brazilian Portuguese, Japanese, Swedish, French, Chinese, Italian, and Hebrew.

Each frame entry includes a definition, example sentences, frame elements, lexical units, and annotation that illustrates the various fillers (words) of the FEs as well as their syntactic patterns. Let’s unpack all of this!

We’ll take a look at the Motion frame in FrameNet. Some screenshots of the frame entry follow.

framenet_motion1

The Motion frame is first defined. Its definition includes the frame elements that belong to the frame (the text with color highlighting):

“Some entity (Theme) starts out in one place (Source) and ends up in some other place (Goal), having covered some space between the two (Path). Alternatively, the Area or Direction in which the Theme moves or the Distance of the movement may be mentioned.”

After the definition come example sentences, featuring lexical units that evoke the frame (the black-backgrounded text) such as move, drift, float, roll, go.

Further down is the list of frame elements with their definitions and examples.

framenet_motion2

Here, the Theme FE is “the entity that changes location,” while the Goal FE is “the location the Theme ends up in.” In order for language to evoke this Motion frame, it must have some words or phrases which instantiate the Theme, the Goal, and the other FEs listed. In the examples above, me is a Theme in The explosion made [me] MOVE in a hurry; and into the slow lane is a Goal in The car MOVED [into the slow lane].

At the bottom of the entry is a list of lexical units that belong to or evoke the frame, as well as links to annotation of sentences from real data that contain those words.

framenet_motion3

Verbs like come, glide, roll, travel, and zigzag all evoke, quite sensibly, the Motion frame.

Once you click on the “Annotation” link for a particular lexical item, you’re taken to a page that looks like this:

framenet_motion4

Natural language sentences pulled from online corpora (texts from newspapers, magazines, books, tv transcripts, scholarly articles, etc.) are annotated for their Motion FEs. Annotation for the lexical item glide gives us an idea of the types of “entities” (the purple-backgrounded text, or Theme FEs) that “change location” (i.e. that glide) – boats, pink clouds, men, cars, planes, gondolas, and so on.

* * * * *

After this mini FrameNet dive, you may be wondering how the database is used in a concrete sense. To illustrate, let’s compare two sentences:

  1. The boat GLIDED into the harbor.
  2. The dingy DRIFTED away from the harbor.

The entities differ (boat vs. dingy), the verbs differ (glide vs. drift) and the prepositions differ (into vs. [away] from). Yet at a higher level, both of these sentences describe a Theme which “changes location” – either moving towards a Goal in (1), or from a Source in (2). They both indicate motion. Because FrameNet helps machines “learn” that sentences with a variety of nouns, verbs, prepositions, and syntactic patterns can basically point to the same scenario, it’s a useful tool for many applications in the computational realm.

These days computers do all kinds of language-y things for us: answer questions, paraphrase texts, extract relevant information from text (and then maybe organize it thematically – for instance, around people, places, or events), and even generate new texts. These feats require that a computer parse natural language into accurate semantic chunks. FrameNet’s semantically- and syntactically-annotated data can be used as training input for machine models that “learn” how to analyze such meaning chunks, enabling our electronic devices to respond, paraphrase, or extract information appropriately.

To peruse a (very long) list of the projects which have used FrameNet data (organized by requester/researcher), check out the FrameNet Downloaders page.

So – on the off-chance that you find yourself stuck at home and bored out of your mind (?!?!)… you might perhaps enjoy a little investigation of frame-semantic characterization of scenes that involve applying heat, intoxication, or temporal collocation. 🙂

 

[1] Why am I so fond of frame semantics? A terrific professor of mine during grad school introduced the theory, and it resonated with me immediately. I used it in my master’s thesis, then presented the paper at the International Conference on Construction Grammar in 2014. Eventually, I had the privilege of working at FrameNet, where I came to know the brilliant lexicographers/semanticists/cognitive linguists who have dedicated decades of their lives to the theory and the project. Sadly, I never met the legendary Chuck Fillmore, as he passed away the year before I joined the FrameNet team.

O Syntax Tree, O Syntax Tree!

how_lovely

Digital voice agents like Alexa, Siri, and Google Assistant are all the rage these days. But when we talk to our smart devices, are they actually “understanding” our speech in the same way that another human understands it? Take the command, “Find flights from Chicago to New York on February 21.” We can easily comprehend this sentence; our newborn brains were predisposed to acquire language, and we’ve been using it ever since.

Computers, on the other hand, cannot acquire language. They must be trained. In order to train them, computational linguists, other linguists, and engineers have broken language down into more manageable parts that can be tackled individually. Automatic speech recognition (ASR) deals with training machines to recognize speech (via acoustic properties, etc.), and convert that speech to text. Next, natural language processing (NLP) attempts to figure out what is meant by that text[1]. An NLP system itself is composed of multiple modules[2], one of which will likely be a syntactic parser.

Today we’re going to delve into the parser component. Let’s start with some syntactic basics!

Syntax is the set of rules and processes governing sentence structure in any natural language. It involves things like word order, and constituents (words or phrases that form functional units). One of the most common ways to represent syntactic information (at least as of the 20th century) is with a syntax tree. Traditional syntax trees specify:

  • The words of a phrase/sentence
  • Part of speech for each word, usually abbreviated
    • N (noun); V (verb); P (preposition); D or DET (determiner, a.k.a. article); A (adjective); etc.
  • Larger phrases, also abbreviated
    • S (sentence); NP (noun phrase); VP (verb phrase); etc.
  • Relationships between all of the words and phrases
    • These are hierarchical relationships that show how constituents combine into larger ones (or split into smaller ones, if starting from the opposite end of the tree)

Here’s a tree diagram (specifically, a constituency tree) for the sentence, “My parakeet drinks mimosas in the morning”:

tree

You can see that my parakeet forms a larger chunk which is a noun phrase, in the morning forms a larger chunk which is a prepositional phrase, drinks mimosas in the morning forms an even larger chunk which is a verb phrase, and both the NP and VP combine to form the largest chunk, a full sentence S. Remember that syntax focuses on phrasal order and structure, not meaning or context – so it can’t tell us why on earth you’re feeding boozy orange juice to your pet bird.

Onto the parsing! Very generally, a parser is a piece of software (often a trained machine learning model) that takes input text, and outputs a parse tree or similar structural representation, based on syntactic rules and statistics learned from its training data.

Syntactic parsers include a component called a Context-Free Grammar, which has:

  1. A set of non-terminal symbols – abbreviations for language constituents (lexical parts of speech and phrasal types):

{S, NP, VP, PP, D, N, A…}

  1. A set of terminal symbols – words of the phrase/sentence:

{drinks, parakeet, mimosas, morning, my, in, the}

  1. A set of rules like:

S → NP VP  (a sentence S is composed of a noun phrase NP and verb phrase VP)

NP → D N  (a noun phrase NP is composed of a determiner D and a noun N)

VP → VP PP  (etc.)

PP → P NP

  1. A start symbol: S

The parser starts at S, and applies its rules successively, until it arrives at the terminal symbols. The resulting parse is the labeled relationships connecting those terminals (i.e. words).

There are two main kinds of syntactic parsers: dependency and constituency. To keep this post to a reasonable length, I’ll focus on dependency only, but constituency parsers output structures similar to the parakeet tree above[3]. A dependency parser builds a tree for each input sentence by starting with a sentence root (usually the main verb), and assigning a head word to each word, until it gets to the end of the sentence. (Heads link to dependents.) When it’s done, each word has at least one branch, or relationship, with another word. The parser also characterizes each word-word relationship. These are things like: nominal subject of a verb (“nsubj”); object of a verb or a preposition (“dobj” and “pobj,” respectively); conjunction (“cc” for the conjunction word, and “conj” for the elements being conjoined); determiner (“det”); and adverbial modifier (“advmod”).

A visualized example will probably help. Taking that same sentence, “My parakeet drinks mimosas in the morning,” a visualization of the dependency parse might look like this:

displacy_parse_parakeet_drinks

Can you spot the root, or main verb? It’s the one without any arrows going towards it: drinks. The parser then finds the subject of drinks, which is parakeet, and labels that relationship “nsubj.” It finds mimosas as the direct object of drinks, and labels it “dobj.” And so on and so forth.

Let’s look at another example, for a dollop of variety. Here is “Mr. Vanderloop had smiled and said hello”:

displacy_parse_vanderloop

In this one, the past participle smiled is the root/main verb, which has multiple dependents: its subject Vanderloop, its auxiliary (a.k.a. “helping verb”) had, its conjunction and, and the other verb with which it conjoins, said. The subject Vanderloop has a dependent Mr., with which it forms a compound (proper) noun; said’s dependent is the interjection hello.

How about our sentence from the beginning, “Find flights from Chicago to New York on February 21”? How might it be parsed? (You can check your hypotheses by typing the sentence into an interactive demo of the displaCy dependency visualizer, from which the visualizations above also came[4].) Something to keep in mind here is that English imperative structure leaves the subject – whoever is being addressed – implicit.

A slight aside: I’ve chosen simple examples for demonstration, but parsing gets decidedly complicated when input sentences are themselves complicated. Questions, subordinate clauses, coordination (or all three: “What’s the name of the movie where the guy drives a flying taxi and saves the human race from aliens?”), and structurally ambiguous sentences (“The horse raced past the barn fell”) get tricky quickly.

So now we have some parsed output. How is this structured, annotated data useful? Well, one thing you can do with these word relations is identify noun phrases. Identifying noun phrases across sentences helps with another step in the NLP pipeline called Named Entity Recognition, or NER. NER tries to recognize nouns/noun phrases (names, places, dates, etc.) and label them with categories of concepts from the real world. In our flights example, “Chicago” and “New York” should get tagged with some label like CITY or GEOGRAPHIC LOCALE, and “February 21” should get tagged with DATE. Once a text has been automatically annotated for such named entities, information about those entities can then be pulled from a knowledge base (say, Wikipedia).

Having parts of speech and word relations also makes it easier to match up the specifics of a given user command (e.g. “Text mom saying I’ll call tonight,” or “Show popular Thai restaurants near me”) with slightly more generalized intents (e.g. Send text or Get restaurants); machine models can start learning how words typically pattern across the main verb and direct object positions for various commands. Code then uses the more generalized intent to fulfill that request on a device – be it smartphone, tablet, or home speaker. “Find flights from Chicago to New York on February 21” would hopefully be matched with a more general Get flights intent, and the particular noun phrases could be passed to fields for origin, destination, and date.

* * * * *

Before leaving you to your holiday leftovers, I’d like to reiterate that syntactic parsing is only one step in an NLP system. Its parses don’t tell us much about the actual semantics of the linguistic input. Language meaning, however, is a whole other ball of wax, best left for the new year…

 

[1] There is often terminological confusion between NLP and NLU (natural language understanding). See this graphic for one common breakdown, although I’ve heard the terms used interchangeably as well.

[2] If you’re interested to learn about other NLP steps, read this accessible post, Natural Language Processing is Fun!

[3] You can also play around with this interactive demo from Stanford CoreNLP, http://corenlp.run. In the second “Annotations” field dropdown, make sure you have “constituency parse” selected.

[4] The visualizer is from the creators of spaCy, an awesome open-source NLP library in Python; a dependency parser is one of its components.

On machine translation, the media, and meaning (a response)

Shallowness_GoogTrans

I’m a Douglas Hofstadter fan. I read his book I Am a Strange Loop years ago, and it remains one of my three favorite non-fiction books, period. I highly recommend it to anyone who is at all interested in the nature of consciousness. The cognitive scientist’s Pulitzer Prize-winning Gödel, Escher, Bach: An Eternal Golden Braid has also been on my to-read list for a long time. So I was excited to see this article by him in The Atlantic, on another area that interests me: machine translation and machine “intelligence”.

Early on in the piece, Hofstadter says he has a “longstanding belief that it’s important to combat exaggerated claims about artificial intelligence”. Having worked in the machine learning/AI field for a little under a year now (but what an intense year it has been!), and having read countless popular media articles touting the astonishing advances in natural language processing/understanding, ML, and AI, I heartily agree with his sentiment. Such reporting is as misleading as it is annoying.

I came across a statement of this type the other day, in Stanford AI researchers make ‘socially inclusive’ NLP:

“The average person working with NLP today may consider language identification a solved problem.”

I have trouble believing that any researcher working in NLP/NLU/ML/AI thinks anything is a solved problem. Despite much progress, the field is still in its infancy. Doesn’t anyone remember Einstein’s quote (adapted from a similar idea expressed by Socrates) – “The more I learn, the more I realize how much I don’t know”? Where I work, every possible solution to a given problem brings up more questions, and even the “simplest” “facts” cannot always be taken for granted. (Remember when you were taught parts of speech like verb, noun, and preposition in grade school? Working at the level of detail we do, even these fundamental rules are often inadequate, requiring further specification. Turns out it’s hard to throw messy, real language into clean, fixed bins.) So I think the media does the field, its researchers, and the reading public a great disservice by sensationalizing and oversimplifying the challenges.

Hofstadter’s argument about understanding is even more poignant:

“The practical utility of Google Translate and similar technologies is undeniable, and probably it’s a good thing overall, but there is still something deeply lacking in the approach, which is conveyed by a single word: understanding. Machine translation has never focused on understanding language. Instead, the field has always tried to ‘decode’— to get away without worrying about what understanding and meaning are.”

We call the study of meaning and understanding semantics and pragmatics. People’s real world knowledge plays a key role here as well. To my mind, meaning (only complete when tied to real world knowledge) is the last frontier for AI and language. Today’s mobile/home voice assistants have definitely not yet mastered meaning. Technologies have made serious headway in resolving structural patterns (syntax), proper nouns (Named Entity Recognition) and some other aspects of language. But meaning, that great magical beast, eludes its pursuers. It is really, really challenging to computationally model the depth and complexity of human understanding. Because, although language itself is quite complicated, it’s still an impoverished medium for conveying the millions of subtle things we want and are able to convey – it relies heavily on context, implicature, presuppositionentailment, prosody, speaker-listener relationship, etc. I agree again with the author when he says that human-like machines are “not around the corner.”

I do think that Hofstadter seems to be simultaneously recognizing how hard the task of modeling meaning is, while not giving enough credit for things accomplished. Google Translate is way better now than it was at its inception over ten years ago. I also think he glosses over the tool’s main usage – which is more functional than artistic or poetic. Am I wrong to assume people use Translate much more for a quick word or phrase, when traveling or speaking on the fly, than for translating longer passages of literature? If they’re doing the latter… per the author’s experimental results, they clearly shouldn’t be.

 

What do you think – about media reportage of technology, machine “intelligence”, Hofstadter’s article? Feel free to comment!