Thursday, January 8, 2009

How Many Words?

If you are reading this post, then the chances are that you have a pretty good vocabulary. You know lots of words and can use them. I was talking to a coworker today about the uses of vocabulary analysis, and he talked about the fact that while we may have large vocabularies, we don't use the whole thing. What if, he asked, you needed to figure out what someone's active vocabulary is, through conversation?

In ordinary conversation, I (and most people) use a relatively small subset of the words that I know. How long does it take me to have said, say, 10% of my active speaking vocabulary? (Let's define it as the set of all the English words I've spoken up to now) 25%? 75%? It will plainly go to infinity well before hitting 100%: there are some words that I will simply never say again. If you plot the curve of conversation time versus percentage, it's going to look like a boomerang: it will rise very quickly, and then level out.

Talking this over with ratatosk, the subject of specialized vocabularies and context came up. A lot of my regular vocabulary revolves around computers and networking, and use terms that I would not use when talking to, say, my grandmother. (Actually I use a lot of words I wouldn't say in front of my grandmother, but that's another point entirely) Furthermore, the existence of acryonyms and slang will prevent (or at least delay) the use of other terms: why say "deoxyribonucleic" when "DNA" rolls off the tongue? That graph, then, could look very differently depending on who you're talking to.

Now, why was I talking about vocabulary in the first place? Well, you can talk about a vocabulary of words, including favorite words, unusual words, and words you don't use anymore, but you can also talk about a person's vocabulary of places in much the same way. If you watch me for a week, you'll quickly discover some percentage of my vocabulary: my offices, my home, my girlfriend's home, the pub up the hill, a grocery store or two, and a handful of parking lots and streets. Watch for a month and you'll see more grocery stores and restaurants, one or two friends' homes, a couple bookstores, etc. But you could be watching for years before I go back to visit West Virginia, and (though I hope not!) decades before I go back to the Acropolis.

Now, let's say that you picked a dumb private eye to tail someone. He catches sight of my car, decides that's the one he wants to follow, and follows it for a random couple of non-consecutive days -- can you figure out from his reports that he's not following the person you hired him to follow? What about if you hired him because this person's habits abruptly changed? Can you figure out that it's me in particular?


  1. Oops, I meant to expand on that analogy with respect to jargon and slang: You can frequently tell that someone is a subject expert by the language they use. Is that analogous to the way locals know the best pizza joints tucked away in the middle of nowhere (jargon?) and the fastest shortcuts that you should use instead of the highway (slang? acronyms?)?

  2. I read something a while ago about a Media Lab project that used cell phone GPS to track people's daily habits, and got to the point that they could guess what you'd be in the mood to do in the afternoon based on startlingly little data from the morning.

  3. Interesting that you bring that up -- the coworker I was talking to about this is using that dataset.