Accents & dialects (part I): Yinzers and jawn

yinzers sign

What’s the difference between an accent, a dialect, and a language? These concepts are prone to a multitude of misconceptions, often with adverse consequences for millions of people whose speech doesn’t fall within the realm of what’s considered “standard” for their particular region. In this series of posts, I summarize three articles about accents and dialects, and I hope to pique your interest such that you check out the full pieces themselves!

To answer the initial question: an accent is one’s pronunciation and prosody (intonation, tone, stress, etc.) particularized by individual, geographic, temporal, cultural, and socioeconomic factors. A dialect is an accent PLUS all the other linguistic features of a language (syntax, lexicon, idioms, slang) also influenced by those factors. A language is basically a convenient abstraction over a grouping of mutually-intelligible dialects. It helps us conceptualize things, but it’s sometimes hard to draw fool-proof, scientifically valid lines between what’s a language versus a dialect, and aspects like culture and nationality further muddle these line-drawing attempts.

Consider the following two cases, mentioned frequently in linguistic realms: Swedish, Norwegian, and Danish are relatively mutually intelligible – in reality, they’re probably closer to dialects of a single Scandinavian language – but because they’re spoken in separate countries, they’re considered separate languages. The opposite situation holds for “Chinese.” There is actually no single “Chinese” language. There’s Mandarin and Cantonese, which are NOT mutually intelligible, as well as hundreds of other “dialects” across China which are also not necessarily understandable between their groups of speakers. However, because all of the speakers reside within a single nation (and share a writing system, among other things), Mandarin and Cantonese (and others) are usually considered “dialects” of a single, monolithic “Chinese” language.

To use an oft-quoted expression: “a language is a dialect with an army and navy.” This conveys the idea that the distinction between a language and a dialect is arbitrary, becoming possible only through a social lens; a language almost always has more official recognition, more cultural clout, more political power, etc. than a dialect.

Okay, time for some articles. The piece linked and summarized below is an enjoyable read about Pennsylvania dialects. My Part II follow-up will discuss accents, and Part III will consider Black English (also called AAVE – African-American Vernacular English).

greetings from PA

Pennsylvania dialects

Article: “Where Yinz At” (Slate)

“Pennsylvania, in case yinz didn’t know, is a regional dialect hotbed nonpareil.”

While states have on average two to three dialects, Pennsylvania has five – the ones associated with Philadelphia and Pittsburgh being the most widely known.

“The Philadelphia dialect features a focused avoidance of the ‘th’ sound, the swallowing of the L in lots of words, and wooder instead of water, among a zillion other things. In Pittsburgh, it’s dahntahn for downtown, and words like nebby and jagoff and yinz.”

(To sample the actual dialects, watch the funny clip embedded in the article – a skit of a Philly-Pittsburgh phone conversation between two pawnbrokers.)

Geography and migration likely shaped the unique speech patterns found in the Keystone State. North of the Interstate 80 (which roughly bisects the state), ways of speaking were influenced by immigrants from southern England. Below that boundary line, people came from Northern England, Scotland, and Northern Ireland. The Allegheny Mountains also created a barrier between Pittsburgh and other parts of the state. After a couple hundred years, Philadelphians and Pittsburghers have come to sound pretty distinct from each other.

“[…] people from Pittsburgh are talking about ‘gettin’ off the caach and gone dahntawn on the trawly to see the fahrworks for the Fourth a July hawliday n’at,’ while Philadelphia folks provide linguistic gems like the one Monahan offered up as the most Philly sentence possible: ‘Yo Antny, when you’re done your glass of wooder, wanna get a hoagie on Thirdyfish Street awn da way over to Moik’s for de Iggles game?’”

University of Pennsylvania linguistics professor William Labov says the Philadelphia dialect is generally a source of pride for residents, most of whom are positive about the city. Pittsburghese is similarly well-regarded by its speakers. The unique dialect has received a good deal of attention since linguists began visiting the area in the 1930s.

However, the fact that increasing numbers of young people are going farther away for college has resulted in Philly and Pittsburgh accents and dialects being dropped (since of course college kids want to fit in and be understood). Huge surges in online/text communication do not speed that decline though (as is often thought), and in fact, additional exposure to multiple dialects means people don’t judge others’ speech as much as they used to. Both Labov and Carnegie Mellon University English/linguistics professor Barbara Johnstone rightfully point out that the Philly and Pittsburgh dialects are (like all language) constantly evolving.

* * * * * * * * * * * * * * *

I’ll leave you with a few extra Pennsylvania dialect delights:


  • “The Enduring Mystery of ‘Jawn,’ Philadelphia’s All-Purpose Noun”
    • “The word ‘jawn’ is unlike any other English word. In fact, according to the experts that I spoke to, it’s unlike any other word in any other language. It is an all-purpose noun, a stand-in for inanimate objects, abstract concepts, events, places, individual people, and groups of people. It is a completely acceptable statement in Philadelphia to ask someone to ‘remember to bring that jawn to the jawn.’”

Lastly and more generally, if you really want to know just how complicated the dialect situation is in North America, take a gander at this incredibly detailed map/site.


*Photo attributions: Yinzers In The Burgh Sign; Greetings from Pennsylvania; Buses speak #Pittsburghese now, too. “Need vaccinated.”

Voynich: The manuscript that keeps on giving


The Voynich manuscript is one of those marvels that, even in these times of boundless knowledge and incredible technology, eludes continual efforts to understand it.

Not heard of the thing? Welcome to the show. There has been a vigorous little dance of press coverage over the past couple years. It goes something like this:

Step to your left.  “An eternal mystery.”
Step to your right.  “I’ve cracked the code!” – some dude
Step back.  “Nope, you’re full of shit.”
Step forward.  “We’ve solved it this time for sure.” – some other dudes

The manuscript is a hand-written, illustrated codex that’s been shown through carbon dating to have originated in the early fifteenth century (1404–1438). The writing system used throughout its approximately 240 pages has yet to be identified.[1] Cryptographers, historians, computer scientists and others have proposed numerous hypotheses over the decades, including that it’s a hoax. Based on the illustrations, scholars divide the manuscript into five thematic sections: Herbal, Astrological, Biological, Pharmacological, and Recipes.

Below I list links to the (more recent) rhythmic pulse of “discoveries” and rejections, in chronological order. Under each link I’ve pulled out quotes of the more intriguing tidbits.

* * * * *

November 30, 2016:

“The first half of the book is filled with drawings of plants; scholars call this the “herbal” section. None of the plants appear to be real, although they are made from the usual stuff (green leaves, roots, and so on […]). The next section contains circular diagrams of the kind often found in medieval zodiacal texts; scholars call this part “astrological,” which is generous. Next, the so-called “balneological” section shows “nude ladies,” in Clemens’s words, in pools of liquid, which are connected to one another via a strange system of tubular plumbing that often snakes around whole pages of text. […] Then we get what appear to be instructions in the practical use of those plants from the beginning of the book, followed by pages that look roughly like recipes.”

“The Voynich MS was an early attempt to construct an artificial or universal language of the a priori type.   –Friedman.”

* * * * *

September 8, 2017:

“Now, history researcher and television writer Nicholas Gibbs appears to have cracked the code, discovering that the book is actually a guide to women’s health that’s mostly plagiarized from other guides of the era.”

“Gibbs realized he was seeing a common form of medieval Latin abbreviations, often used in medical treatises about herbs. ‘From the herbarium incorporated into the Voynich manuscript, a standard pattern of abbreviations and ligatures emerged from each plant entry,’ he wrote. ‘The abbreviations correspond to the standard pattern of words used in the Herbarium Apuleius Platonicus – aq = aqua (water), dq = decoque / decoctio (decoction), con = confundo (mix), ris = radacis / radix (root), s aiij = seminis ana iij (3 grains each), etc.’ So this wasn’t a code at all; it was just shorthand. The text would have been very familiar to anyone at the time who was interested in medicine.”

“Gibbs concluded that it’s likely the Voynich Manuscript was a customized book, possibly created for one person, devoted mostly to women’s medicine.”

* * * * *

September 10, 2017:

“This week, the venerable Times Literary Supplement published as its cover story a ‘solution’ for the Voynich manuscript. The article by Nicholas Gibbs suggests the manuscript is a medieval women’s-health manual copied from several older sources. And the cipher is no cipher at all, but simply abbreviations that, once decoded, turn out to be medicinal recipes.”

“’Frankly I’m a little surprised the TLS published it,’ says Lisa Fagin Davis, executive director of the Medieval Academy of America. When she was a doctoral student at Yale—whose Beinecke Library holds the Voynich manuscript—Davis read dozens of theories as part of her job. ‘If they had simply sent to it to the Beinecke Library, they would have rebutted it in a heartbeat,’ she says.”

“In the second part—only two paragraphs long—Gibbs gets into the meat of his solution: Each character in the manuscript is an abbreviated word, not a letter. This could be a breakthrough, but the TLS presents only two lines decoded using Gibbs’s method. Davis did not find those two lines convincing either. ‘They’re not grammatically correct. It doesn’t result in Latin that makes sense,’ she says.”

* * * * *

February 1, 2018:

“There are two problems with this notoriously difficult puzzle—it’s written in code, and no one knows what language that code enciphers.”

“’That was surprising,’ Kondrak said, in a statement. ‘And just saying “this is Hebrew” is the first step. The next step is how do we decipher it.’ The scientists think the code used in the manuscript might have been created using alphagrams. (In standard alphagrams, the letters in a word are placed in alphabetical order—the alphagram of ‘alphagram,’ for example, is ‘aaaghlpmr.’) Vowels also seemed to have been dropped. These assumptions made, they tried to come up with an algorithm to decipher this scrambled Hebrew text, to striking effect. ‘It turned out that over 80 percent of the words were in a Hebrew dictionary,’ said Kondrak.”

“Hebrew-speaking data scientist Shlomo Argamon offered some excoriating feedback. ‘They are saying it looks more like Hebrew than other languages,’ he said. ‘In my opinion, that’s not necessarily saying all that much.’ The use of Google Translate, too, struck him as somewhat unscientific. […] Other scholars have raised doubts about the scientists’ use of modern, rather than medieval, Hebrew.”

* * * * *

Certain researchers have made a compelling case against the “hoax” hypothesis, in any event. In 2013, an interesting paper analyzed the Voynich manuscript from an information theory perspective. They looked at organizational structure resulting from word distribution over the entire text, and concluded that there was “presence of a genuine linguistic structure”.[2] You can read the full paper here.

A couple information theory takeaways:

  1. Highly informative content words occur much more irregularly (and in clusters) throughout a text, while more uninformative function words tend to have a more homogenous or uniform distribution. So it’s the content words that indicate specific text sections.
  2. Words that are semantically related tend to co-occur in the same sections of a text.


Who will claim to have cracked the code next? My personal opinion, of course, is that they should throw some linguists on it.



[2] Montemurro MA, Zanette DH. (2013). Keywords and Co-Occurrence Patterns in the Voynich Manuscript: An Information-Theoretic Analysis. PLoS ONE 8(6): e66344, 5.