Okay, “smörgåsbord” is a Swedish borrowing, but close enough. It’s appropriate for this post, which will be a buffet of miscellaneous facts about the Norwegian language.

I became interested in and started learning Norwegian because my brother has been living in Oslo for the past several years, where he is getting his Ph.D. in lichenology.[1] My family and I traveled to visit him last summer. To characterize the country in a few words, I’d say Norway is – more iconically – Vikings, fjords, trolls, nature, Norse mythology, and – more personally – lichens, stellar black coffee, gross sweet brown cheese, overly-restricted booze-purchasing hours, part of my paternal ancestry, and vampires.[2]

Heddal stavkirke (stave church), built in the early 13th century

So what’s cool about Norwegian?


First (as I mentioned in one of the recent dialect posts), Norwegian forms a dialect continuum with Swedish and Danish, languages with which it is, to a greater or lesser extent, mutually intelligible. These are Scandinavian or North Germanic languages, along with Icelandic and Faroese. My brother, who now has a decent command of Norwegian, says he can understand Swedish relatively well too, although Danish is harder. Have a listen to differences between Danish and Norwegian in this video.

However, there are also a staggering number of Norwegian dialects spread across Norway. People claim it’s often harder to understand someone from a different part of the country (for example, Oslo inhabitants vs. speakers of trøndersk, a group of sub-dialects in north-central Trøndelag county) than it is to understand a Swede speaking Swedish. Wikipedia corroborates: “Variations in grammar, syntax, vocabulary, and pronunciation cut across geographical boundaries and can create a distinct dialect at the level of farm clusters. Dialects are in some cases so dissimilar as to be unintelligible to unfamiliar listeners.”

There are two official standard forms for the written language, even if there is no standard for spoken Norwegian (since local dialects rule in most situations). Bokmål (literally “book tongue”) is used in the majority of publications, and Nynorsk (“new Norwegian”) in under 10% of written communication.

Lexicon and Morphology

Onto smaller language-y bits: words and morphemes. Norwegian is super fun because it is prone to extensive compounding (like German), and these compounds often break down into etymologically amusing or charming pieces. By this I mean that the component words reveal interesting (but usually sensible) semantic relationships with the larger compound. Let me give you some examples:

Norwegian compound English word Individual morphemes
fruktkjøtt “pulp” frukt (“fruit”) + kjøtt (“meat”)  ⇒  “fruit meat”
matbit “snack” mat (“food”) + bit (“bite”)  ⇒  “food bite”
sommerfugl “butterfly” sommer (“summer”) + fugl (“bird”) ⇒  “summer bird”
morkake “placenta” mor (“mother”) + kake (“cake”)  ⇒  “mother cake”
verdensrommet “(outer) space” verden (“world”) + s (possessive) + romm (“room”) + et (“the”)  ⇒  “the room of the world”
skyehus “hospital” skye (“sick”) + hus (“house”)  ⇒  “sick house”
grønnsak “vegetable” grøn (“green”) + sak (“thing”)  ⇒  “green thing”
støvsuger “vacuum cleaner” støv (“dust”) + suger (“suck[er]”)  ⇒  “dust suck[er]”
flaggermus “bat” flagger (“flying”) + mus (“mouse”)  ⇒  “flying mouse”
piggsvin “hedgehog” pig (“spike”) + svin (“pig”)  ⇒  “spike pig”



Rest stop on the road back to Oslo. Rømmegraut is the Nynorsk word for a traditional porridge – kind of like cream of wheat, but sweeter and topped with butter.

One facet of Norwegian morphosyntax that was novel to me is the structure of its determiners. In English, both definite (“the”) and indefinite (“a / an”) articles are independent words that always precede their noun or noun phrase. So we have:

“the house”          “the big blue house”
“a house”             “a big blue house”

The same is true for the Romance languages I know about (French, Spanish, Italian), the other Germanic language I’m familiar with (German)… and it is simply not relevant for the Asian languages I’ve dabbled in (Japanese, Cantonese) because they lack articles entirely.

In Norwegian (as well as in Swedish and Danish), indefinite articles are, familiarly, the independent words which precede the noun, while definite articles are actually suffixes, which attach to the end of the noun they modify. What’s more – if you place something in front of the noun, like an adjective or a number, there’s another set of determiners to use, called demonstratives (in English: this, that, these, those). These precede the noun phrase (adjective/number + noun), where the noun already contains its definite suffix. Again, a table might help illustrate:

Norwegian (Bokmål) determiners

Indefinite articles

Definite articles

Masc. singular

Fem. singular

Neuter singular

Masc. singular

Fem. singular

Neuter singular







en sykkel
“a bicycle”

ei jente
“a girl”

et hus
“a house”

“the car”

“the door”

“the house”

Demonstratives + noun phrase




den røde bilen
“the red car”

den røde døra
“the red door”

det røde huset
“the red house”

Because Norwegian and English are closely related in their linguistic genealogy, a native English speaker may have less trouble learning Norwegian than, say, Taa (also known as !Xóõ, a southern African language with possibly the largest phoneme inventory in the world, including dozens of clicks) – but as the determiner situation here demonstrates, it’s still no piece of bløtkake.


View (!) from our rental house deck on Hardangerfjord

Phonology and Prosody

Norwegian is what’s called a pitch-accent language. There are roughly three categories of languages when it comes to stress and pitch. Here’s a super abridged breakdown [3]:

  1. Stress-accented languages

Stress (emphasis) is placed on a syllable in a word, or on a word in a phrase/sentence. This can create a difference in word meaning, but it doesn’t have to. Stress is a combination of loudness, length, and higher pitch.

  • Example languages: English, Czech, Finnish, Classical Arabic, Quechua, Italian
  • Example words/phrases [English]:
    • On a word in a sentence (no difference in meaning) – “I REALLY like your jacket”
    • On a syllable in a word (meaning difference) –

REcord vs. reCORD
INcrease vs. inCREASE
PERmit vs. perMIT

  1. Pitch-accented languages

A syllable on a word/morpheme is accentuated by a particular pitch contour (instead of by stress). So only pitch is involved, not loudness or length. Distinct tonal patterns occur in words that otherwise look and sound the same, giving them different meanings.

  • Example languages: Norwegian, Swedish, Japanese, Turkish, Filipino, Yaqui (a Native American language)
  • Example words/phrases [Norwegian]:
    • Norwegian has two kinds of tonal accents or pitch patterns:


(Audio extracted from video by “Norwegian Teacher – Karin”)

hender – “hands” vs. hender – “happens”
ånden – “the spirit” vs. ånden – “the breath”
bønder – “farmer” vs. bønner – “beans”
været – “the weather” vs. være – “to be”

  1. Tonal languages

Each syllable of the language has an independent tone or pitch contour. Tones are used to distinguish between words (they create a difference in meaning between words that otherwise look and sound the same).

  • Example languages: Mandarin, Cantonese, Thai, Zulu, Navajo, Yucatec (a Mayan language)
  • Examples words/phrases [Mandarin]:
    • Tones combine with the syllable ma, resulting in different words:
  1. “mother” [high level tone]
  2. “hemp” [mid pitch rising to high pitch]
  3. “horse” [low with slight fall]
  4. “scold” [short, sharply falling tone]
  5. ma (an interrogative particle) [neutral, used on weak syllables]


The pitch-accent feature of Norwegian contributes to the language’s sing-song quality. Just listen to the melodiousness of Norway’s King Harald V as he gives a speech:

(Audio extracted from full NRK video)


Norwegian writing uses the same Latin alphabet as English, except that it has three additional letters at the end – æ, ø, and å. I highly recommend insist that you watch this ridiculous video to hear how the vowels are pronounced, as well as be entertained in musically nerdy fashion. (Final note: Contrary to the video’s main argument, several letters – c, q, w, x, and z – are not actually used to spell Norwegian-native words, although they’re sometimes used in loan words. One could therefore quibble that they shouldn’t count towards the alphabet size…)




[1] If you want to ogle some gorgeous macrophotography of lichens, scope out his Instagram, https://www.instagram.com/lichens_of_norway/.

[2] The ancient stave churches for some reason reminded me of True Blood (plus three of the show’s main characters, Eric, Pam, and Godric, were Swedish and Norwegian); also I was coincidentally reading The Vampire Lestat while we were there… but NO I’m not generally obsessed with vampires.

[3] This subject gets really complex. There are a lot more subtleties and distinctions than I make above.

Of Kanji and Kana


The Japanese writing system, like other aspects of Japanese culture, is complicated and fascinating. Its three main character sets are a notorious struggle for second-language learners and young native speakers alike. While many tongues have what is called synchronic digraphia (where two or more writing systems for the same language coexist), Japanese is famous for having three main character sets within one single writing system.[1] Of interest to linguistics-minded folks, these three character sets systematically express different areas of the language’s grammar (word classes, for instance). Below is my attempt at a fun, informative introduction to the system.

The three main character sets of Japanese are kanji, hiragana, and katakana.

漢字 | KANJI

Kanji characters are logographic, meaning they cannot be spelled (sounded) out, but instead must be memorized whole. As many know, they were taken from the Chinese writing system. The term kanji literally means “Chinese characters”. If you’ve ever complained about the obtuse nature of English orthography, or remember the pain of memorizing weird word spellings as a child, consider this: a Japanese person of average education knows (i.e. has memorized) about three thousand kanji. Dictionaries contain about ten thousand kanji.[2]

Kanji are used for content words – nouns, verb stems, adjective stems, adverbs, personal names and place names. They’re composed of radicals, graphical pieces that often have either a semantic or phonetic quality (they indicate part of the meaning or the sound of the character, respectively). There is a particular stroke order for each character, which everyone is expected to follow when writing. And as if all that wasn’t enough of a challenge, there are also two separate pronunciations – on’yomi and kun’yomi – that depend on context or conjugation.

Here are some examples of kanji:

東京 – Tokyo (place name)                                   長谷川 – Hasegawa (surname)

薔薇 – bara (a noun, means “rose”)

違う – chigau (a verb or adjective, means “to be wrong” or “wrong”. Only the first character, the verb stem, is kanji; the second character, or conjugation, is hiragana)

Kana characters include the two sets hiragana and katakana. They’re both phonetic, meaning they can be sounded out. Kana also originally came from Chinese, but the characters are so altered and simplified that their sources are not apparent today. Japan adopted Chinese writing in the third century, and ran into trouble since the two spoken languages were completely unrelated. They began using characters not for their meanings, but for their sound values only. Both modern-day kana sets have an inventory of 46 characters (along with two types of diacritics), and these constitute a syllabary[3] of consonant-vowel pairings.


Hiragana has rounded symbols, smooth curves. The hiragana syllabary is used for native words, and grammatical elements like particles, auxiliary verbs, and inflections (e.g. verb conjugations, noun suffixes). Japanese children’s books are mostly in hiragana since younger kids haven’t yet learned many kanji. When books do include kanji, they have small furigana by the side – hiragana or katakana to help with pronunciation.

Here are some examples of hiragana:

               ありがとう – arigatou (“thank you”)                ください – kudasai (“please”)

               です – desu (auxiliary verb, “is”)                           の, は, を – no, wa, o (particles)


With katakana, you’ll notice similarities to hiragana, but the symbol shapes are clearly more angular. Katakana is used for foreign names and words, loanwords, onomatopoeia, and emphasis.

Here are some examples of katakana:

アメリカ – amerika (foreign name, “America”)

サラリーマン – sararii man (“salary man”, i.e. office worker)

テレビ – terebi (loanword, “television”)

ニャンニャン – nyan nyan (onomatopoeia, sound of cat meowing)

* * * * *

The Japanese system has TWO directions for writing: vertical (tategaki), and horizontal (yokogaki). Vertical is the traditional form, running from top to bottom, right to left on the page. Books written with vertical text open the opposite way from Western language books. Horizontal is the direction Western language readers are used to – left to right on the page. This Western style is used in more modern applications, like websites. To maximize space, newspapers, magazines, and signs frequently use both directions![4] Then, because we still haven’t juggled enough variables, Japanese text doesn’t include spaces between words, so readers must infer based on context where divisions are to be made.

Cool Japanese literature tangent: The Tale of Genji (源氏物語 – Genji Monogatari), written by noblewoman Lady Murasaki Shikibu in the early 11th century, is frequently considered the world’s first novel or first modern novel.

I’ll leave you with some marvelously idiosyncratic Japanese words and concepts, for which there are definitely no concise words/phrases in English. You can observe how the three character sets interact in various ways. (Most of the words come from this site).


Japanese Pronunciation     (in rōmaji) Character set(s) Definition Literal meaning
教育ママ kyouiku mama kanji + katakana A mother who is obsessed with her children’s education
バーコード人 baakoudo jin katakana + kanji Men with ridiculous comb-overs “barcode people”
横飯 yoko meshi kanji Western food “horizontal rice”
侘寂 wabi-sabi kanji An aesthetic that sees beauty in the ephemerality and imperfection of things both natural and manmade
ぽかぽか poka poka hiragana Feeling warm throughout one’s body
口寂しい kuchi sabishii kanji + hiragana When you’re not hungry but you eat anyway “mouth lonely”
猫糞 neko baba kanji To steal/pocket and pretend innocence “cat feces”
ありがた迷惑 arigata meiwaku hiragana + kanji “An act someone does for you that you didn’t want to have them do and tried to avoid having them do, but they went ahead anyway, determined to do you a favor, and then things went wrong and caused you a lot of trouble, yet in the end social conventions required you to express gratitude”[5]


[1] I say three “main” character sets because there are actually more, if you count Arabic numerals, rōmaji (i.e. the Roman alphabet), punctuation, etc. Also, this person argues that the focus on three+ character sets in Japanese is silly and that English and other writing systems have multiple sets as well (capital and lowercase letters in English, for example), but in order to keep things succinct here, I didn’t go into that level of detail. Additionally, I disagree with them that capital vs. lowercase Roman letters possess the same grammatical significance as kanji/hiragana/katakana and so it’s not an apples to apples comparison.

[2] https://nihongoichiban.com/2011/05/24/the-japanese-writing-system/

[3] Where each symbol represents a syllable.

[4] See this nice article with lots of illustrative pictures.

[5] https://sobadsogood.com/2012/04/28/25-words-that-simply-dont-exist-in-english/

Voynich: The manuscript that keeps on giving


The Voynich manuscript is one of those marvels that, even in these times of boundless knowledge and incredible technology, eludes continual efforts to understand it.

Not heard of the thing? Welcome to the show. There has been a vigorous little dance of press coverage over the past couple years. It goes something like this:

Step to your left.  “An eternal mystery.”
Step to your right.  “I’ve cracked the code!” – some dude
Step back.  “Nope, you’re full of shit.”
Step forward.  “We’ve solved it this time for sure.” – some other dudes

The manuscript is a hand-written, illustrated codex that’s been shown through carbon dating to have originated in the early fifteenth century (1404–1438). The writing system used throughout its approximately 240 pages has yet to be identified.[1] Cryptographers, historians, computer scientists and others have proposed numerous hypotheses over the decades, including that it’s a hoax. Based on the illustrations, scholars divide the manuscript into five thematic sections: Herbal, Astrological, Biological, Pharmacological, and Recipes.

Below I list links to the (more recent) rhythmic pulse of “discoveries” and rejections, in chronological order. Under each link I’ve pulled out quotes of the more intriguing tidbits.

* * * * *

November 30, 2016: https://www.newyorker.com/books/page-turner/the-unsolvable-mysteries-of-the-voynich-manuscript

“The first half of the book is filled with drawings of plants; scholars call this the “herbal” section. None of the plants appear to be real, although they are made from the usual stuff (green leaves, roots, and so on […]). The next section contains circular diagrams of the kind often found in medieval zodiacal texts; scholars call this part “astrological,” which is generous. Next, the so-called “balneological” section shows “nude ladies,” in Clemens’s words, in pools of liquid, which are connected to one another via a strange system of tubular plumbing that often snakes around whole pages of text. […] Then we get what appear to be instructions in the practical use of those plants from the beginning of the book, followed by pages that look roughly like recipes.”

“The Voynich MS was an early attempt to construct an artificial or universal language of the a priori type.   –Friedman.”

* * * * *

September 8, 2017: https://arstechnica.com/science/2017/09/the-mysterious-voynich-manuscript-has-finally-been-decoded/

“Now, history researcher and television writer Nicholas Gibbs appears to have cracked the code, discovering that the book is actually a guide to women’s health that’s mostly plagiarized from other guides of the era.”

“Gibbs realized he was seeing a common form of medieval Latin abbreviations, often used in medical treatises about herbs. ‘From the herbarium incorporated into the Voynich manuscript, a standard pattern of abbreviations and ligatures emerged from each plant entry,’ he wrote. ‘The abbreviations correspond to the standard pattern of words used in the Herbarium Apuleius Platonicus – aq = aqua (water), dq = decoque / decoctio (decoction), con = confundo (mix), ris = radacis / radix (root), s aiij = seminis ana iij (3 grains each), etc.’ So this wasn’t a code at all; it was just shorthand. The text would have been very familiar to anyone at the time who was interested in medicine.”

“Gibbs concluded that it’s likely the Voynich Manuscript was a customized book, possibly created for one person, devoted mostly to women’s medicine.”

* * * * *

September 10, 2017: https://www.theatlantic.com/science/archive/2017/09/has-the-voynich-manuscript-really-been-solved/539310/

“This week, the venerable Times Literary Supplement published as its cover story a ‘solution’ for the Voynich manuscript. The article by Nicholas Gibbs suggests the manuscript is a medieval women’s-health manual copied from several older sources. And the cipher is no cipher at all, but simply abbreviations that, once decoded, turn out to be medicinal recipes.”

“’Frankly I’m a little surprised the TLS published it,’ says Lisa Fagin Davis, executive director of the Medieval Academy of America. When she was a doctoral student at Yale—whose Beinecke Library holds the Voynich manuscript—Davis read dozens of theories as part of her job. ‘If they had simply sent to it to the Beinecke Library, they would have rebutted it in a heartbeat,’ she says.”

“In the second part—only two paragraphs long—Gibbs gets into the meat of his solution: Each character in the manuscript is an abbreviated word, not a letter. This could be a breakthrough, but the TLS presents only two lines decoded using Gibbs’s method. Davis did not find those two lines convincing either. ‘They’re not grammatically correct. It doesn’t result in Latin that makes sense,’ she says.”

* * * * *

February 1, 2018: https://www.atlasobscura.com/articles/voynich-manuscript-artificial-intelligence-solved

“There are two problems with this notoriously difficult puzzle—it’s written in code, and no one knows what language that code enciphers.”

“’That was surprising,’ Kondrak said, in a statement. ‘And just saying “this is Hebrew” is the first step. The next step is how do we decipher it.’ The scientists think the code used in the manuscript might have been created using alphagrams. (In standard alphagrams, the letters in a word are placed in alphabetical order—the alphagram of ‘alphagram,’ for example, is ‘aaaghlpmr.’) Vowels also seemed to have been dropped. These assumptions made, they tried to come up with an algorithm to decipher this scrambled Hebrew text, to striking effect. ‘It turned out that over 80 percent of the words were in a Hebrew dictionary,’ said Kondrak.”

“Hebrew-speaking data scientist Shlomo Argamon offered some excoriating feedback. ‘They are saying it looks more like Hebrew than other languages,’ he said. ‘In my opinion, that’s not necessarily saying all that much.’ The use of Google Translate, too, struck him as somewhat unscientific. […] Other scholars have raised doubts about the scientists’ use of modern, rather than medieval, Hebrew.”

* * * * *

Certain researchers have made a compelling case against the “hoax” hypothesis, in any event. In 2013, an interesting paper analyzed the Voynich manuscript from an information theory perspective. They looked at organizational structure resulting from word distribution over the entire text, and concluded that there was “presence of a genuine linguistic structure”.[2] You can read the full paper here.

A couple information theory takeaways:

  1. Highly informative content words occur much more irregularly (and in clusters) throughout a text, while more uninformative function words tend to have a more homogenous or uniform distribution. So it’s the content words that indicate specific text sections.
  2. Words that are semantically related tend to co-occur in the same sections of a text.


Who will claim to have cracked the code next? My personal opinion, of course, is that they should throw some linguists on it.


[1] https://en.wikipedia.org/wiki/Voynich_manuscript

[2] Montemurro MA, Zanette DH. (2013). Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis. PLoS ONE 8(6): e66344, 5. https://doi.org/10.1371/journal.pone.0066344