Frame Semantics and FrameNet

FN image

I’d like to discuss a theory in cognitive linguistics which is very near to my heart[1]: frame semantics. I’ll also present FrameNet, a database built using frame semantic theory, which has been and continues to be an excellent resource in the fields of natural language processing (NLP) and machine learning (ML).

Why is frame semantics cool? Why should you want to learn about it? Just this: the theory is an intuitive and comprehensive way to categorize the meaning of any scenario you could possibly dream up and express via language. Unlike many other semantic and syntactic theories, the core concepts are quickly understandable to the non-linguist. What’s more, frame semantics can apply to language meaning at many different levels (from the tiniest morpheme to entire swaths of discourse), and it works equally well for any particular language – be it English, Mandarin, Arabic, or Xhosa. I’ll try to demonstrate the theory’s accessibility and applicability with some details.

American linguist Charles Fillmore developed the frame semantics research program in the 1980s, using the central idea of a frame: a cognitive scene or situation which is based on a person’s prototypical understanding of real-world (social, cultural, biological) experiences. A frame is ‘evoked’ by language – this can be a single word (called a lexical unit), a clause, a sentence, or even longer discourse. Each frame contains various participants and props, called frame elements (FEs). If you’ve studied syntax/semantics (the generative grammar kind), FEs are somewhat analogous to traditional theta roles.

FrameNet is a corpus-based lexicographic and relational database (sort of a complex dictionary) of English frames, the lexical units evoking them, annotated sentences containing those lexical units, and a hierarchy of frame-to-frame relations. It was built and continues to grow at the International Computer Science Institute (ICSI), a nonprofit research center affiliated with UC Berkeley. FrameNets have also been developed in other languages, such as Spanish, Brazilian Portuguese, Japanese, Swedish, French, Chinese, Italian, and Hebrew.

Each frame entry includes a definition, example sentences, frame elements, lexical units, and annotation that illustrates the various fillers (words) of the FEs as well as their syntactic patterns. Let’s unpack all of this!

We’ll take a look at the Motion frame in FrameNet. Some screenshots of the frame entry follow.

framenet_motion1

The Motion frame is first defined. Its definition includes the frame elements that belong to the frame (the text with color highlighting):

“Some entity (Theme) starts out in one place (Source) and ends up in some other place (Goal), having covered some space between the two (Path). Alternatively, the Area or Direction in which the Theme moves or the Distance of the movement may be mentioned.”

After the definition come example sentences, featuring lexical units that evoke the frame (the black-backgrounded text) such as move, drift, float, roll, go.

Further down is the list of frame elements with their definitions and examples.

framenet_motion2

Here, the Theme FE is “the entity that changes location,” while the Goal FE is “the location the Theme ends up in.” In order for language to evoke this Motion frame, it must have some words or phrases which instantiate the Theme, the Goal, and the other FEs listed. In the examples above, me is a Theme in The explosion made [me] MOVE in a hurry; and into the slow lane is a Goal in The car MOVED [into the slow lane].

At the bottom of the entry is a list of lexical units that belong to or evoke the frame, as well as links to annotation of sentences from real data that contain those words.

framenet_motion3

Verbs like come, glide, roll, travel, and zigzag all evoke, quite sensibly, the Motion frame.

Once you click on the “Annotation” link for a particular lexical item, you’re taken to a page that looks like this:

framenet_motion4

Natural language sentences pulled from online corpora (texts from newspapers, magazines, books, tv transcripts, scholarly articles, etc.) are annotated for their Motion FEs. Annotation for the lexical item glide gives us an idea of the types of “entities” (the purple-backgrounded text, or Theme FEs) that “change location” (i.e. that glide) – boats, pink clouds, men, cars, planes, gondolas, and so on.

* * * * *

After this mini FrameNet dive, you may be wondering how the database is used in a concrete sense. To illustrate, let’s compare two sentences:

  1. The boat GLIDED into the harbor.
  2. The dingy DRIFTED away from the harbor.

The entities differ (boat vs. dingy), the verbs differ (glide vs. drift) and the prepositions differ (into vs. [away] from). Yet at a higher level, both of these sentences describe a Theme which “changes location” – either moving towards a Goal in (1), or from a Source in (2). They both indicate motion. Because FrameNet helps machines “learn” that sentences with a variety of nouns, verbs, prepositions, and syntactic patterns can basically point to the same scenario, it’s a useful tool for many applications in the computational realm.

These days computers do all kinds of language-y things for us: answer questions, paraphrase texts, extract relevant information from text (and then maybe organize it thematically – for instance, around people, places, or events), and even generate new texts. These feats require that a computer parse natural language into accurate semantic chunks. FrameNet’s semantically- and syntactically-annotated data can be used as training input for machine models that “learn” how to analyze such meaning chunks, enabling our electronic devices to respond, paraphrase, or extract information appropriately.

To peruse a (very long) list of the projects which have used FrameNet data (organized by requester/researcher), check out the FrameNet Downloaders page.

So – on the off-chance that you find yourself stuck at home and bored out of your mind (?!?!)… you might perhaps enjoy a little investigation of frame-semantic characterization of scenes that involve applying heat, intoxication, or temporal collocation. 🙂

 

[1] Why am I so fond of frame semantics? A terrific professor of mine during grad school introduced the theory, and it resonated with me immediately. I used it in my master’s thesis, then presented the paper at the International Conference on Construction Grammar in 2014. Eventually, I had the privilege of working at FrameNet, where I came to know the brilliant lexicographers/semanticists/cognitive linguists who have dedicated decades of their lives to the theory and the project. Sadly, I never met the legendary Chuck Fillmore, as he passed away the year before I joined the FrameNet team.

Back from hiatus

Why, hello there! It’s been ages since I’ve posted, but I’ve been pretty busy with a tiny new experiment:

IMG_20191125_111833_cropped

Ryden was born in October (the photo was taken at not-quite-two-months) and is now emphatically ingesting solids, crawling (but only backwards), and beginning to babble.

Now that my life has gone from hallucinatorily topsy-turvy to relatively stable (in a pandemic – yes, that’s how childbirth and newborn-land will relativize things), I plan on posting again more regularly. Coming up, stuff on:

  • frame semantics and FrameNet
  • “parentese” (apropos, yes?)
  • another linguist career interview
  • “crashblossoms”

Hurray!

 

A Norwegian smörgåsbord

norwegian_sign_cropped

Okay, “smörgåsbord” is a Swedish borrowing, but close enough. It’s appropriate for this post, which will be a buffet of miscellaneous facts about the Norwegian language.

I became interested in and started learning Norwegian because my brother has been living in Oslo for the past several years, where he is getting his Ph.D. in lichenology.[1] My family and I traveled to visit him last summer. To characterize the country in a few words, I’d say Norway is – more iconically – Vikings, fjords, trolls, nature, Norse mythology, and – more personally – lichens, stellar black coffee, gross sweet brown cheese, overly-restricted booze-purchasing hours, part of my paternal ancestry, and vampires.[2]

Heddal stavkirke (stave church), built in the early 13th century

So what’s cool about Norwegian?

Dialects

First (as I mentioned in one of the recent dialect posts), Norwegian forms a dialect continuum with Swedish and Danish, languages with which it is, to a greater or lesser extent, mutually intelligible. These are Scandinavian or North Germanic languages, along with Icelandic and Faroese. My brother, who now has a decent command of Norwegian, says he can understand Swedish relatively well too, although Danish is harder. Have a listen to differences between Danish and Norwegian in this video.

However, there are also a staggering number of Norwegian dialects spread across Norway. People claim it’s often harder to understand someone from a different part of the country (for example, Oslo inhabitants vs. speakers of trøndersk, a group of sub-dialects in north-central Trøndelag county) than it is to understand a Swede speaking Swedish. Wikipedia corroborates: “Variations in grammar, syntax, vocabulary, and pronunciation cut across geographical boundaries and can create a distinct dialect at the level of farm clusters. Dialects are in some cases so dissimilar as to be unintelligible to unfamiliar listeners.”

There are two official standard forms for the written language, even if there is no standard for spoken Norwegian (since local dialects rule in most situations). Bokmål (literally “book tongue”) is used in the majority of publications, and Nynorsk (“new Norwegian”) in under 10% of written communication.

Lexicon and Morphology

Onto smaller language-y bits: words and morphemes. Norwegian is super fun because it is prone to extensive compounding (like German), and these compounds often break down into etymologically amusing or charming pieces. By this I mean that the component words reveal interesting (but usually sensible) semantic relationships with the larger compound. Let me give you some examples:

Norwegian compound English word Individual morphemes
fruktkjøtt “pulp” frukt (“fruit”) + kjøtt (“meat”)  ⇒  “fruit meat”
matbit “snack” mat (“food”) + bit (“bite”)  ⇒  “food bite”
sommerfugl “butterfly” sommer (“summer”) + fugl (“bird”) ⇒  “summer bird”
morkake “placenta” mor (“mother”) + kake (“cake”)  ⇒  “mother cake”
verdensrommet “(outer) space” verden (“world”) + s (possessive) + romm (“room”) + et (“the”)  ⇒  “the room of the world”
skyehus “hospital” skye (“sick”) + hus (“house”)  ⇒  “sick house”
grønnsak “vegetable” grøn (“green”) + sak (“thing”)  ⇒  “green thing”
støvsuger “vacuum cleaner” støv (“dust”) + suger (“suck[er]”)  ⇒  “dust suck[er]”
flaggermus “bat” flagger (“flying”) + mus (“mouse”)  ⇒  “flying mouse”
piggsvin “hedgehog” pig (“spike”) + svin (“pig”)  ⇒  “spike pig”

Morphosyntax 

rommegraut_cropped


Rest stop on the road back to Oslo. Rømmegraut is the Nynorsk word for a traditional porridge – kind of like cream of wheat, but sweeter and topped with butter.

One facet of Norwegian morphosyntax that was novel to me is the structure of its determiners. In English, both definite (“the”) and indefinite (“a / an”) articles are independent words that always precede their noun or noun phrase. So we have:

“the house”          “the big blue house”
“a house”             “a big blue house”

The same is true for the Romance languages I know about (French, Spanish, Italian), the other Germanic language I’m familiar with (German)… and it is simply not relevant for the Asian languages I’ve dabbled in (Japanese, Cantonese) because they lack articles entirely.

In Norwegian (as well as in Swedish and Danish), indefinite articles are, familiarly, the independent words which precede the noun, while definite articles are actually suffixes, which attach to the end of the noun they modify. What’s more – if you place something in front of the noun, like an adjective or a number, there’s another set of determiners to use, called demonstratives (in English: this, that, these, those). These precede the noun phrase (adjective/number + noun), where the noun already contains its definite suffix. Again, a table might help illustrate:

Norwegian (Bokmål) determiners

Indefinite articles

Definite articles

Masc. singular

Fem. singular

Neuter singular

Masc. singular

Fem. singular

Neuter singular

en

ei

et

-en

-a

-et

en sykkel
“a bicycle”

ei jente
“a girl”

et hus
“a house”

bilen
“the car”

døra
“the door”

huset
“the house”

Demonstratives + noun phrase

den

den

det

den røde bilen
“the red car”

den røde døra
“the red door”

det røde huset
“the red house”

Because Norwegian and English are closely related in their linguistic genealogy, a native English speaker may have less trouble learning Norwegian than, say, Taa (also known as !Xóõ, a southern African language with possibly the largest phoneme inventory in the world, including dozens of clicks) – but as the determiner situation here demonstrates, it’s still no piece of bløtkake.

IMG_20180708_100933

View (!) from our rental house deck on Hardangerfjord

Phonology and Prosody

Norwegian is what’s called a pitch-accent language. There are roughly three categories of languages when it comes to stress and pitch. Here’s a super abridged breakdown [3]:

  1. Stress-accented languages

Stress (emphasis) is placed on a syllable in a word, or on a word in a phrase/sentence. This can create a difference in word meaning, but it doesn’t have to. Stress is a combination of loudness, length, and higher pitch.

  • Example languages: English, Czech, Finnish, Classical Arabic, Quechua, Italian
  • Example words/phrases [English]:
    • On a word in a sentence (no difference in meaning) – “I REALLY like your jacket”
    • On a syllable in a word (meaning difference) –

NOUNS vs. VERBS
REcord vs. reCORD
INcrease vs. inCREASE
PERmit vs. perMIT

  1. Pitch-accented languages

A syllable on a word/morpheme is accentuated by a particular pitch contour (instead of by stress). So only pitch is involved, not loudness or length. Distinct tonal patterns occur in words that otherwise look and sound the same, giving them different meanings.

  • Example languages: Norwegian, Swedish, Japanese, Turkish, Filipino, Yaqui (a Native American language)
  • Example words/phrases [Norwegian]:
    • Norwegian has two kinds of tonal accents or pitch patterns:

ACCENT 1 (ACUTE) and ACCENT 2 (GRAVE)

(Audio extracted from video by “Norwegian Teacher – Karin”)

hender – “hands” vs. hender – “happens”
ånden – “the spirit” vs. ånden – “the breath”
bønder – “farmer” vs. bønner – “beans”
været – “the weather” vs. være – “to be”

  1. Tonal languages

Each syllable of the language has an independent tone or pitch contour. Tones are used to distinguish between words (they create a difference in meaning between words that otherwise look and sound the same).

  • Example languages: Mandarin, Cantonese, Thai, Zulu, Navajo, Yucatec (a Mayan language)
  • Examples words/phrases [Mandarin]:
    • Tones combine with the syllable ma, resulting in different words:
  1. “mother” [high level tone]
  2. “hemp” [mid pitch rising to high pitch]
  3. “horse” [low with slight fall]
  4. “scold” [short, sharply falling tone]
  5. ma (an interrogative particle) [neutral, used on weak syllables]

 

The pitch-accent feature of Norwegian contributes to the language’s sing-song quality. Just listen to the melodiousness of Norway’s King Harald V as he gives a speech:

(Audio extracted from full NRK video)

Orthography

Norwegian writing uses the same Latin alphabet as English, except that it has three additional letters at the end – æ, ø, and å. I highly recommend insist that you watch this ridiculous video to hear how the vowels are pronounced, as well as be entertained in musically nerdy fashion. (Final note: Contrary to the video’s main argument, several letters – c, q, w, x, and z – are not actually used to spell Norwegian-native words, although they’re sometimes used in loan words. One could therefore quibble that they shouldn’t count towards the alphabet size…)

vowels_cropped

 

 

[1] If you want to ogle some gorgeous macrophotography of lichens, scope out his Instagram, https://www.instagram.com/lichens_of_norway/.

[2] The ancient stave churches for some reason reminded me of True Blood (plus three of the show’s main characters, Eric, Pam, and Godric, were Swedish and Norwegian); also I was coincidentally reading The Vampire Lestat while we were there… but NO I’m not generally obsessed with vampires.

[3] This subject gets really complex. There are a lot more subtleties and distinctions than I make above.

Career interviews: Linguistics Project Manager at a branding firm

working wugs_cropped

Wugs go to work

Something I’ve been planning to post occasionally are interviews with career linguists and related language folk – especially those working outside of academia. Yes, they (we) exist! Until recently these were rare birds, but lately the numbers are growing. I credit several factors: the growth of the discipline generally; the growth of technology industries trying to wrangle natural language; globalization; and (sadly), the increasing impracticality of landing a faculty position that pays a living wage, at least in the U.S.

Another (more popular!) language and linguistics blog has been running a job interview series over the last several years as well. I encourage you to also take a look over there: Superlinguo Linguist Job Interviews.

* * * * *

I met Noah on our team of linguists at Samsung Research America. For this interview, I asked him to talk about the job he had previous to Samsung – which was at Lexicon, a branding agency based in Northern California. Lexicon has come up with brand names for some of today’s most popular products, including Blackberry, Febreze, (Subaru) Outback, Dasani, Swiffer, Pentium, and ThermaCare.

noah

  1. What kind of work did you do at Lexicon?

I was a linguistics project manager, which basically meant that I coordinated with Lexicon’s network of linguists worldwide (85 countries with something like 50 or 60 languages represented). I basically sent them lists of names for real-time evaluation and also helped coordinate with another linguist in Quebec to prepare reports for deeper dives into particular names in order to ascertain particular issues a name might face in a target language or culture. Basically, you learn a lot of multilingual profanity doing this, and realize you shouldn’t name a company Zinda.

  1. Describe a typical day at that job.

It was a small company, so I wore whatever hats necessary. I prepared real-time and comprehensive reports, editing and working with the linguists to determine whether or not a given name that either a client had brought to us or one that we had brought would work well in a particular language, and also trying to read between the lines to figure out whether we should take a linguist’s comments at face value or do a little more digging and cross-checking, including interviews with native speakers. This was mostly done by our network, not in-house. But aside from the linguisty side, I also created names. Lots of names for lots of projects. Most of which didn’t actually make it through, but still it was a creative pursuit that stretched creativity. I also helped write a program to categorize and classify and try to ascertain good brand names using NLP [Natural Language Processing] techniques. Things like consonance, assonance, alliteration, etc. It was pretty helpful for going through our backlog of names and finding viable names to use going forward.

  1. How did your linguistics background inform that work?

Well, my fascination with language itself inherently got me the job and kept me entertained, though it would have easily, I think, been doable with some other kind of background. But creating a good name, actually looking into the science of sound symbolism, helping with a few linguistic studies. Pretty cool stuff.

  1. What did you enjoy most and/or least about the job?

I most enjoyed getting to see what kinds of things big clients were trying to market and create next. Some pretty cool things there, with an insider’s perspective as to what the market was going to look like in the future. Issues were managerial in nature, in combination with the claustrophobia that a small company can engender, but overall it was a very good way to get some experience in the field.

  1. What did you study in college and/or grad school?

Major: Linguistics. Minor: English. Minor: Business Administration (useless). Interest: everything else.

  1. What is your favorite linguistic phenomenon?

Splicing. Or whatever that thing was that we came up with as an inside joke that you should write a fake blogpost on.[1]

  1. (If you had the time) what language would you learn, and why?

ASL [American Sign Language]. As a monolingual, sign language has always fascinated me the most, oddly enough. Alas, those CSD [Communication Sciences and Disorders] students and their required classes.

  1. Do you have any advice for young people looking to pursue a career in linguistics?

Be overzealous, and marry linguistics to another discipline. In industry, you’ve got for the most part three choices: linguistics + [design, management, or computational]. Or astrobiology and linguistics if you happen to work with NASA. Might be cool. But yeah, linguistics is interdisciplinary by nature, so I assume everyone who studies it must enjoy the interplay of different subjects like I do. Oh, maybe start a computational linguistics club in undergrad when you don’t know anything about computational linguistics. It’ll make you learn, if nothing else.

 

 

[1] Noah sent me his responses long enough ago now that I cannot for the life of me remember what this was. Not that I would explain it even if I could remember, to preserve the opportunity of writing said fake blogpost. 😛