[225004600010] |A new drink from the SPECULATIVE GRAMMARIAN: The Psycholinguist wine (any kind: color is not a dependent variable in this study) several glasses 1 stopwatch Pour the wine into a glass while whining about how no one has properly modeled the process of wine pouring.
[225004600020] |Observe the wine under controlled conditions for an hour.
[225004600030] |Present a wordy but content-less paper to an international conference on what wine might look like in infants.
[225004600040] |Rerun the analysis in a different glass in case the receptor affects the nature of the process.
[225004600050] |Wait another hour.
[225004600060] |Drink the wine.
[225004600070] |Drink more wine.
[225004600080] |Fall onto the floor drunk, bumping your head on a pipe on the way down.
[225004600090] |Write an even less coherent paper on the effects of head bumping on linguistic processing.
[225004600100] |Gain professorship.
[225004610010] |
pimp grammar
[225004610020] |There's a pimp's handwritten business plan floating around the interwebz.
[225004610030] |While the soundness of its basic logic cannot be denied ("Treat This Pimpin Like it's a Business" indeed), the former writing teacher in me could not help but pull out the old red pen and make a few suggestions.
[225004610040] |But here's the thing, it's a fact of contemporary college education that most writing teachers are loath to outright criticize or correct their students (they're paying tuition after all).
[225004610050] |You see, outside of the Ivy League, most college writing teachers are faced with whole classrooms filled with pimps like Keep It Pimpin', and they're our bread and butter (we can't all be blessed with students like the Winkelvi, can we?).
[225004610060] |As a result, we are careful to word our feedback delicately, so as not to offend the senses of the ones who pad our, admittedly thin, paychecks*.
[225004610070] |*Absolute truth: I taught college level research writing courses for the whopping total price of $1250/semester.
[225004610080] |The MOST I ever got paid for teaching a college level course was $2800.
[225004610090] |In the (modified) words of my literary hero DJay: You know it's hard out here for a [rhetoric & writing instructor]. When he tryin to get this money for the rent. For the Cadillacs and gas money spent Because a whole lotta [students] talkin [nonsense].
[225004610100] |HT kottke.
[225004620010] |a debate!
[225004620020] |From The Economist: This house believes that the language we speak shapes how we think.
[225004620030] |Discuss...
[225004630010] |so you want to study linguistics?
[225004630020] |Recently, a reader asked me for advice about studying linguistics.
[225004630030] |She is an undergraduate in the USA at a college that does not offer a BA in linguistics and she likes math and language, particularly historical linguistics.
[225004630040] |I've posted advice to students before here, but this new request was a particularly interesting variation.
[225004630050] |What do you do if you're a smart 20 year old at a school that does not quite offer what you want?
[225004630060] |What follows is an edited version of the email I sent back: I must begin with a warning: academic linguistics is a small field, there is precious little room for mediocrity.
[225004630070] |There are two kinds of academic linguists, the top 15% and the unemployed.
[225004630080] |With that said, if your school doesn't offer linguistics as a degree, then I suggest psychology (the experimental, lab-based kind) or computer science.
[225004630090] |Get hands-on experience in lab settings where you are collecting and analyzing data.
[225004630100] |Learn basic scientific method.
[225004630110] |Both psychology and computer science can offer that.
[225004630120] |Computational linguistics is a hot field with lots of opportunities in all sub-fields of linguistics.
[225004630130] |Plus, they can get jobs, hehe.
[225004630140] |High paying jobs!
[225004630150] |Computational linguists are one the the few who can get jobs outside of academia, but the truth is most industry CL jobs are really programming jobs where your programing skills are the real reason you get a job; your Natural Language Processing (NLP) skills are little more than icing on the cake.
[225004630160] |The industry is really looking for engineers with some NLP experience, not linguists with some programming skills.
[225004630170] |There's nothing wrong with majoring in math (I definitely think all 21st Century linguists should study math), though I think knowing stats is preferable, and that's really a separate field.
[225004630180] |There is some controversy regarding whether linear algebra or calculus is better for linguistics (see here, especially the comments), but I really do think stats is key.
[225004630190] |Studying biology or genetics is a possibility (neurolinguistics is a hot field).
[225004630200] |Liberman posted about genetics and linguistics here.
[225004630210] |Probably the single best thing you can do for yourself right now is work your way through the NLTK book.
[225004630220] |This will teach you about basic concepts, plus teach you basic tools as well, and it's completely free!
[225004630230] |You could also start learning the R language, a great stats based language that many linguists are using these days.
[225004630240] |You could also work your way through Tarski's World because basic logic is a sound foundation for all disciplines.
[225004630250] |If you want a serious challenge, get your hands on the late Partha Niyogi's ' The Computational Nature of Language Learning and Evolution'.
[225004630260] |He passed away recently, far too young for a rising star.
[225004630270] |He was a pioneer in using mathematical models to understand linguistics.
[225004630280] |If you're interested in cognitive science and linguistics, I suggest regularly reading the Child's Play blog, written by two Stanford cognitive science grad students.
[225004630290] |My general advice to any undergrad is simple: don't sweat your undergrad too much; it's the least important part of your education.
[225004630300] |Just get it done, regardless of which major you choose, and move on to the good stuff in grad school.
[225004640010] |harvard jumps the linguistic shark
[225004640020] |Harvard Business Review editor Julia Kirby adds to the mountain of pseudo-scientific bullshit filling the innerwebz by taking the modest results of a small study (about the fact that mimicking accents helps sentence comprehension) and jumping to the wild and unfounded conclusion that salespeople should start faking accents.
[225004640030] |It would make a great Monty Python skit, but it's a sad blog post from an editor of a prestigious business magazine.
[225004640040] |Money quote:
[225004640050] |But this study suggests another possibility.
[225004640060] |Perhaps part of why mirroring and matching works is not because of how it operates on the prospect in a sales conversation, but how it operates on the salesperson.
[225004640070] |When we switch into another person's mode, however superficially, perhaps our brains are triggered to do so on a deeper level, and we become more able to receive the information that person is trying to convey.
[225004640080] |We all know the key to empathy is to walk a mile in another's shoes.
[225004640090] |That can never literally be done, especially in brief sales encounters.
[225004640100] |But at least we can put on their brogues.
[225004640110] |sigh...
[225004650010] |From Stephen Fry's twitter feed: Just had an fMRI scan at UCL (part of BBC doc on language I'm making).
[225004650020] |Had to play Just A Minute while being scanned.
[225004650030] |Fun.
[225004650040] |A BBC documentary about language?
[225004650050] |Ugh...I don't think even the talents of Stephen Fry can save that one.
[225004660010] |google has a huge tool
[225004660020] |NPR ran a story today called Google Book Tool Tracks Cultural Change With Words.
[225004660030] |It's about "the biggest collection of words ever assembled*", Google's 500 billion word corpus is drawn from the books they've scanned, but here's the catch: many of those books are copywrited, so what Google did is pull a trick that goes back to the very beginnings of computational linguistics, they present the words as an unordered set, or bag o' words: Many of these books are covered by copyright, and publishers aren't letting people read them online.
[225004660040] |But the new database gets around that problem: It's just a collection of words and phrases, stripped of all context except the date in which they appeared.
[225004660050] |I first learned about this technique back in 1999 in an intro to computational linguistics course (bit of trivia: we we're using an incomplete pre-print of Martin and Jurafsky; as I recall, the discourse chapter was composed entirely of one page that read 21 Computational Discourse write something here...) and I remember being appalled at its crass simplicity.
[225004660060] |I mean, how dare those idiot engineers reduce language down to simple lists of words.
[225004660070] |How dare they try to use simple word lists to discover important facts about language and devise important linguistic tools.
[225004660080] |It took less than a week for me to change my tune.
[225004660090] |The fact is, the bag o' words technique is remarkably powerful and useful.
[225004660100] |No, it doesn't solve all problems in one swoop, but it solves a hell of a lot more than I could possibly predict as a naive 2nd year linguistics grad student.
[225004660110] |For example:
[225004660120] |Irregular verbs are used as a model of grammatical evolution.
[225004660130] |For each verb, researchers plotted the usage frequency of its irregular form in red ("thrived"), and the usage frequency of its regular past-tense form in blue ("throve/thriven").
[225004660140] |Virtually all irregular verbs are found from time to time used in a regular form, but those used more often tend to be used in a regular way more rarely.
[225004660150] |Google labs lets you play with its tool here (hehe).
[225004660160] |*Not sure where this claim originated, but Google has already released a 1 trillion word corpus via LDC, the Web 1T 5-gram Version 1.
[225004670010] |magical machine translation
[225004670020] |This is un-fucking-believable:
[225004670030] |The future has arrived.
[225004670040] |HT kottke
[225004680010] |ngram or n-gram?
[225004680020] |The hottest story of the day is clearly Google's Ngram Viewer.
[225004680030] |It's all over blogs, twitter and even the MSM.
[225004680040] |But why did Google call it the Ngram Viewer and not the N-gram Viewer?
[225004680050] |The hyphenated form is more common in the NLP industry and in general search results (by a 10-1 margin at that).
[225004680060] |Nunberg's LL post and Languagehat's post both prefer n-gram when speaking about the tokens themselves and only use Ngram when referencing Google's named product.
[225004680070] |Even Google's own people used n-gram in a blog post here.
[225004680080] |You gotta wonder what kind of branding process Google went through to decide on ngram (they are notoriously conscious about that kind of thing).
[225004680090] |The popularity of this story also demonstrates how much more media savvy Google is because Microsoft has almost exactly the same tool, but no one knows about it.
[225004680100] |See here.
[225004680110] |The difference is that Microsoft didn't link its use to studying culture and history and give us a nifty online tool to play with, making it more dull sounding than perhaps it otherwise would.
[225004680120] |Also, note Microsoft uses N-gram ... frikkin Microsoft.
[225004690010] |how NOT to interpret ngrams
[225004690020] |Andrew Sullivan has predictably misunderstood the value of Google's Ngram Viewer.
[225004690030] |He spent all day yesterday posting trite and simplistic mis-interpretations of the data.
[225004690040] |For example,
[225004690050] |the concept of ideology is a relatively recent one because the word ideology has become more frequent recently (this is almost certainly false).
[225004690060] |Jesus "wins" (his word, not mine) against the Beatles because the word Jesus is more frequent.
[225004690070] |I like the Ngram Viewer, but simply plotting the frequency of words against each other to determine something about culture or concepts is a very weak technique that leads to massive mis-interpretations, as we've seen recently with things like counting the number of times President Obama uses pronouns in his speeches.
[225004690080] |I discussed the failings of simple word counts as a technique here.
[225004690090] |To sum up,
[225004690100] |We don't know what causes word frequencies.
[225004690110] |We don't know what the effects of word frequencies are.
[225004690120] |There are good alternatives.
[225004700010] |the linguistics of the simpsons
[225004700020] |The magnificent and admiral Snowclone X is the Y of Z made a surprise and instructive appearance on The Simpsons tonight*: Marge -- Don't worry Lisa, you could still go to McGill, it's the Harvard of Canada.
[225004700030] |Lisa --Anything that is the something of the something isn't the anything of anything... Too true, Lisa, too true.
[225004700040] |It's never good to be the shadow of something else.
[225004700050] |*This appears to have been a repeat of the 10-10-2010 episode MoneyBART (a nice allusion to Moneyball, btw).
[225004710010] |ngram roundup
[225004710020] |It's not difficult to find glee and excitement surrounding Google's new Ngram Viewer.
[225004710030] |Hyperbolic praise is whirling around the innerwebz like mad.
[225004710040] |As an antidote and a nod to the role skepticism should play in our contemporary society, I present a brief round up of criticisms: Geoffrey Nunberg: ...there are still a fair number of misdated works, and there's no way to restrict a query by genre or topic.
[225004710050] |But in the end, the most important consequence of the Science paper, and of allowing public access to the data, is that it puts "culturomics" into conversational play.
[225004710060] |Mark Davies: Google Books can't use wildcards to search for parts of words.
[225004710070] |For example, try searching for freak* out (all forms: freak_, freaked, freaking, etc) or even a simple search like teenager* ... if Google Books doesn't know about part of speech tags or variant forms of a word, then how can it look at change in grammar?
[225004710080] |... To use collocates with Google Books, you would have to manually download thousands or millions of hits to your hard drive, and then use another program to look for and categorize the collocates.
[225004710090] |Mark Liberman: The Science paper says that "Culturomics is the application of high-throughput data collection and analysis to the study of human culture".
[225004710100] |But as long as the historical text corpus itself remains behind a veil at Google Books, then "culturomics" will be restricted to a very small corner of that definition, unless and until the scholarly community can reproduce an open version of the underlying collection of historical texts.
[225004710110] |David Crystal: ...this is just a collection of books - no newspapers, magazines, advertisements, or other orthographic places where culture resides.
[225004710120] |No websites, blogs, social networking sites.
[225004710130] |No spoken language, of course, so over 90 percent of the daily linguistic usage of the world isn't here...The approach, in other words, shows trends but can't interpret or explain them.
[225004710140] |It can't handle ambiguity or idiomaticity..
[225004710150] |The Binder Blog: The value of the Ngrams Viewer rests on a bold conceit: that the number of times a word is used at certain periods of time has some kind of relationship to the culture of the time.
[225004710160] |For example, the fact that the word “slavery” peaks around 1860 suggests that people in 1860 had a lot to say about slavery.
[225004710170] |Another spike around the 1970s meshes nicely with the Civil Rights Movement.
[225004710180] |Well, that’s sort of interesting.
[225004710190] |However, I didn’t need ngrams to tell me that a lot of people were writing about slavery in 1860.
[225004710200] |These data are broad but not deep, which makes them relatively useless to most humanities majors interested in intensive study.
[225004710210] |The one positive comment that I think bears repeating is the role this fun little tool might play is sparking the imagination of young students interested in the role technology can play in the humanities.
[225004710220] |Geoffrey Nunberg:
[225004710230] |Whatever misgivings scholars may have about the larger enterprise, the data will be a lot of fun to play around with.
[225004710240] |And for some—especially students, I imagine—it will be a kind of gateway drug that leads to more-serious involvement in quantitative research.
[225004720010] |digg's c**ktail
[225004720020] |[UPDATE below) I couldn't help but notice a story on Digg: Images of alcoholic drinks under the microscope from vodka c**ktails to pina colada.
[225004720030] |I checked the original Daily Mail story and saw that the word cocktail was not censored.
[225004720040] |I looked for other instances of cocktail on Digg's site and found that all instances look censored, except when the string c-o-c-k occurs in a user name, as the image below demonstrates:
[225004720050] |This appears to be a candidate for unnecessary censorship.
[225004720060] |I sent an email to Digg asking them if this is intentional censorship or an inside joke within the site.
[225004720070] |I'll report any response (don't hold your breath).
[225004720080] |[UPDATE: 3:01 Eastern) Digg support did in fact reply, noting that it was a function of a profanity filter that can be turned off: Hello , You see that because you have the profanity filter enabled.
[225004720090] |To disable it just log in and go to: http://digg.com/settings/preferences --Digg Support
[225004730010] |language and thought votes
[225004730020] |On the eve of the conclusion to Mark Liberman and Lera Boroditsky's debate at The Economist, there are two vote totals that are interesting to compare.
[225004730030] |The obvious one is the lopsided results so far on the main question: Do you agree with the motion?
[225004730040] |Here, Boroditsky has a 77%-23% advantage.
[225004730050] |However, if you mouse-over each day's vote, it tells you how many yes's have switched to no and vice versa.
[225004730060] |The totals there are the near exact opposite: by a 5-1 margin yes's have switched to no.
[225004730070] |You are free to interpret this as you wish.
[225004730080] |Unfortunately I don't see any raw totals for the number of people voting, so it's anyone's guess what proportion of votes the 6 changes represent (likely, a very small percentage).
[225004740010] |half a million language deaths?
[225004740020] |Lera Boroditsky's recent concluding statement in The Economist's debate about how language shapes thought states "At the moment we have good linguistic descriptions of only about 10% of the world's existing languages (and we know even less about the half a million or so languages that have existed in the past) (emphasis added).
[225004740030] |In my previous post on language death here, I used the number 100,000 to estimate how many languages have previously existed and related it favorably to David Crystal's 64,000 to 140,000 reasonable guesstimate.
[225004740040] |I'm just curious to know where Boroditsky came up with the half million number?
[225004740050] |I've managed to come up with a few references to this 500,000 number, but they claim it's a "radical estimate" (e.g., see here).
[225004740060] |My hunch is that this is yet another example of Boroditsky's profound-problem.
[225004740070] |She has a tendency to call modest results profound when they are not.
[225004740080] |She is, I suspect, a tad prone to hyperbole.
[225004750010] |i know your email address...so what?
[225004750020] |Cory Doctorow over at Boing Boing makes the bold claim that there's no compelling evidence that obscuring your email address online using techniques like john DOT smith at host DOT com actually reduces the amount of spam you recieve.
[225004750030] |As long as his spam filters are catching the spam effectively, he doesn't mind sharing his email address with the world.
[225004750040] |Are you willing to follow his lead?
[225004760010] |my bad, global edition
[225004760020] |Manute Bol is often credited with coining the phrase my bad (see here and here, or here for alternate hypotheses).
[225004760030] |It has apparently made the jump, in some way, to international usage, it's just not clear to me how.
[225004760040] |While watching The Girl Who played with Fire again last night, I noticed Lisbeth says something that is translated as my bad, but what she actually says is in Swedish, of course.
[225004760050] |(screen shot from Netflix)
[225004760060] |To my non-Swedish speaking ears, it sounds like she says mitt viel, which would mean something closer to my very, if Google translate is any help.
[225004760070] |Google translates my bad into Swedish as mitt dåliga (dåliga appears to be a literal translation of bad).
[225004760080] |I'm pretty sure that's not what she said, but I'd have to re-listen to be sure.
[225004760090] |So, the linguistic questions are these:
[225004760100] |What does she say in Swedish?
[225004760110] |What is the history of the Swedish phrase?
[225004760120] |Is my bad the best English translation (given its history in slang and in pop culture)?
[225004770010] |bustin' a cap
[225004770020] |Watching the original True Grit on teevee and what do I hear?
[225004770030] |Ned Pepper (Robert Duvall) says something to the effect "I ain't never busted a cap in no girl before."
[225004770040] |I thought only contemporary gansta movies and rap lyrics used that phrase (and yes, I did find some examples of bust(-ed) a cap using the Ngram Viewer).
[225004790010] |true grit
[225004790020] |I posted recently about the phrase "bust a cap" occurring in the original 1969 John Wayne movie True Grit.
[225004790030] |I got a chance to see the new Coen Bros version and my reactions are worth airing...or not, you decide... First, it turns out the phrase true grit has a storied history in the history of English letters:
[225004790040] |But this review is destined to be of the non-linguistic kind... I also had the chance to re-watch the original John Wayne version just a couple days before watching the new one.
[225004790050] |While it may be the case that this is a bit unfair because it means the recent version is asked to live up to the original is some ways, nonetheless, it is instructive (insofar as it does NOT).
[225004790060] |I hereby forgive the Coen Bros for not watching the original again in preparation for their version.
[225004790070] |Surely this would have scuttled their project.
[225004790080] |Let me make it clear that the individual performances in the Coen Bros movie alone make it worth watching.
[225004790090] |Each actor is given great opportunity to breath life into their character and I respect the Coen Bros for allowing that.
[225004790100] |They are truly dedicated to the fine craft of acting and I enjoyed watching their version of True Grit.
[225004790110] |Frankly, I could watch Jeff Bridges eat oatmeal and be amazed at how weird and wonderfully he did it.
[225004790120] |Nonetheless, my primary complaint is devastating: the new Coen Bros version lacks the basic narrative structure and emotional depth that made the original so fundamentally enjoyable and satisfying.
[225004790130] |For the record, I have never read the novel, so I have no clue what it says and the Coen Bros based their new version entirely on that.
[225004790140] |However, I can say that one of the most deeply satisfying elements of the John Wayne movie is the development of the relationships that evolve between the child Mattie Ross, the drunken but courageous Rooster Cogburn, and the goofy, but basically decent La Boeuf.
[225004790150] |Throughout the original movie, those three characters find a way to forge a sort of dysfunctional, yet basically good and meaningful family unit between them.
[225004790160] |This family unit is completely absent from the new version.
[225004790170] |And I missed it.
[225004790180] |One of the most touching and important moments of the original movie involves Rooster finally opening up to Mattie about his past and his wife and son while the two sit and wait for Ned Pepper's gang to arrive.
[225004790190] |This scene reveals Rooster's humanity and deeply emotional character.
[225004790200] |It is this scene that helps forge a familial bond, almost like an uncle/niece relationship, between Rooster and Mattie.
[225004790210] |And this deep relationship is played out for the rest of the movie.
[225004790220] |Developing this scene during a crucial moment of patience and waiting is pure narrative brilliance.
[225004790230] |Yet, the Coen Bros took this and turned it into camp and parody.
[225004790240] |The lines about his wife and son are basically thrown away in a drunken mumbling as his horse barely manages to contain his heavy frame while they trod along meaninglessly.
[225004790250] |What should be a deeply emotional connection forged in a tense moment of expectation becomes slapstick and meaningless.
[225004790260] |Why throw this away?
[225004790270] |I would need a copy of the new film to point out all of the moments lacking narrative continuity, but here are a few to suffice: Late in both movies, Mattie stumbles upon her nemesis Tom Chaney while gathering water from a river.
[225004790280] |In the original film, the proximity of Ned Pepper's gang is made clear and ominous.
[225004790290] |The likelihood that she would find trouble while going for water is made plain.
[225004790300] |But in the new version, it plays out like some wildly random coincidence.
[225004790310] |The ending of both movies requires these events to take place, but the original movie at least gives us some reasons behind the events, not just chaos and random nothingness.
[225004790320] |Ned Pepper is a critical character in the story.
[225004790330] |In the original movie, the truly great actor Robert Duvall is given the chance to give the man some decency and honor.
[225004790340] |He is a killer, yes, but he also saves Mattie's life, despite claiming to be willing to end it.
[225004790350] |In fact, it is Ned Pepper, more than anyone else (in the original), who keeps Mattie alive (until the snake-hole scene at least).
[225004790360] |Robert Duvall was given the opportunity to create a Ned Pepper who is full and complex.
[225004790370] |In the Coen Bros version the actor Barry Pepper (seriously, no joke, that's his name, weird right?)
[225004790380] |is barely a grubby and dirty (really seriously dirty, nasty dirty, disgustingly dirty...) killer.
[225004790390] |The pathos of Ned Pepper is gone.
[225004790400] |By far, the most iconic moment of the original movie is the scene where Rooster takes the reigns of his horse in his mouth and single handedly draws down against four armed opponents.
[225004790410] |This is one of the greatest moments of American Western lore, involving the single greatest actor of American Western mythology.
[225004790420] |It is truly a moment of cinematic greatness.
[225004790430] |Leading up to this, Rooster describes a previous moment in his storied life much like this (earlier in both films) and it forms a crucial part of his legend and character.
[225004790440] |When the ultimate moment arrives in the original version, it is a moment of destiny, built up by the dialogue and scenes that have come before it.
[225004790450] |But in the Coen Bros version, the whole raison d'etre has been obscured by mumbling and misdirection.
[225004790460] |It's almost as if this were every bit as random as everything else that came before it.
[225004790470] |You may well argue that randomeness and chaos is in fact the Coen Bros' raison d'etre, and I can't argue against that.
[225004790480] |Fair enough.
[225004790490] |But then, why bother making a movie about a story for which destiny and courage is so crucial a factor?
[225004790500] |Without the great inevitable showdown of Rooster's grit against the despots' manpower, well, why make this movie at all?
[225004790510] |If you believe in pure chaos, fine, make No Country For Old Men over and over, got it.
[225004790520] |That makes sense.
[225004790530] |That's coherent.
[225004790540] |But why take this novel and make a movie?
[225004790550] |If your primary goal as movie makers is to take previous material well loved by the public and trash it for your own philosophical gain, that's just pure douchebaggery, so screw you Joel and Ethan.
[225004800010] |another lingo toy...
[225004800020] |I love free online lingo toys like BYU's Corpora and Google's Ngram Viewer and now there's a new one: The Human Speechome Project from MIT" provides a look into the most complete record of a single child’s speech development ever created.
[225004800030] |The data has been organized to show the age of the child when he spoke each of his first 400 words." It's profiled in Forbes here.
[225004800040] |And they provide a nifty interactive graph to sort the data:
[225004820010] |the linguistics of brand names
[225004820020] |The Neurocritic reviews evidence for the whopping increase in drug brand names beginning with the letters z and x starting in 1986 and quotes the conclusion of the study's authors:
[225004820030] |Reflecting their infrequent occurrence in English words, x and z count for 8 and 10 points in Scrabble, the highest values (along with j and q) in the game.
[225004820040] |So names that contain them are likely to seem special and be memorable.
[225004820050] |“If you meet them in running text, they stand out,” is the way one industry insider explained.
[225004820060] |Generally, they are also easy to pronounce.
[225004820070] |The last point about being easy to pronounce is basically nonesense, so forgive them that, but their basic point that infrequent sounds are more memorable is basically a restatement of Zipf's Law and may have some truth to it.
[225004820080] |I can tell you this, there are entire companies that charge high fees to help manufacturers develop brand names (see here for a discussion of what brand name developers do).
[225004820090] |I worked at one of them ever so briefly and I found there to be a mix of legitimate linguistics and voodoo linguistics mixed together in the "research" they prepared for their customers.
[225004820100] |I also found a resistance to serious linguistics for two reasons: 1) the customers didn't like science (I'm not joking; this was a serious obstacle) and 2) serious linguistics took too long and didn't come to firm conclusions.
[225004820110] |Typically, we were asked to initiate, perform, and complete linguistic research on brand names in a matter of weeks.
[225004820120] |Ultimately, though, it was my conclusion that a product's name simply was not that crucial to its success, which teetered on the manufacturers overall marketing strategy more than the name.
[225004820130] |Think about Google vs. Microsoft.
[225004820140] |So, the rise in z and x named drug products is a fad based more in the board room than in the marketplace.
[225004830010] |non-linguistic CAPTCHA
[225004830020] |David Bradley, writing at sciencetech, reports on a new face-based CAPTCHA process, quoting the team that created it, "Unlike a text-based CAPTCHA, a major benefit of the proposed image-based face detection CAPTCHA is that it does not have any language barriers..." I guess it never really occurred to me that there would be language barriers in CAPTCHAs because so many of the strings are in fact nonesense words, but I guess language specific phonotactics are helpful (often the identity of a single letter is quite ambiguous).
[225004840010] |not any or not one??
[225004840020] |The NYTs recent The Number of None grammar blog post brings up an interesting question: is none semantically closer to not any or not one?
[225004840030] |And what should its morphosyntactic agreement be, singular or plural?
[225004840040] |The Times takes the not any, plural position, but I am inclined to disagree based on my intuition about substitution.
[225004840050] |Below are the two sentences the Times uses to illustrate:
[225004840060] |None of the interim employers or temporary agencies have contributed to a 401(k)
[225004840070] |None of the works have gained a foothold in the seasonal repertory.
[225004840080] |Now, with the substitutions and my personal acceptability rating (where * means mildly unacceptable/not sure and ** means completely unacceptable).
[225004840090] |Not one of the interim employers or temporary agencies has contributed to a 401(k)
[225004840100] |Not one of the works has gained a foothold in the seasonal repertory.
[225004840110] |**Not any of the interim employers or temporary agencies have contributed to a 401(k)
[225004840120] |**Not any of the works have gained a foothold in the seasonal repertory.
[225004840130] |Not one of the interim employers or temporary agencies have contributed to a 401(k)
[225004840140] |Not one of the works have gained a foothold in the seasonal repertory.
[225004840150] |**Not any of the interim employers or temporary agencies has contributed to a 401(k)
[225004840160] |**Not any of the works has gained a foothold in the seasonal repertory.
[225004840170] |The above ratings suggest that I make no distinction in acceptability between none has and none have.
[225004840180] |But wait, there's more.
[225004840190] |Let's remove the lengthy PP and see how this pans out:
[225004840200] |*Not one of them has contributed to a 401(k)
[225004840210] |*Not one of them has gained a foothold in the seasonal repertory.
[225004840220] |**Not any of them have contributed to a 401(k)
[225004840230] |**Not any of them have gained a foothold in the seasonal repertory.
[225004840240] |Not one of them have contributed to a 401(k)
[225004840250] |Not one of them have gained a foothold in the seasonal repertory.
[225004840260] |**Not any of them has contributed to a 401(k)
[225004840270] |**Not any of them has gained a foothold in the seasonal repertory.
[225004840280] |I seem to slightly prefer the singular reading when the word none is close to the verb but with a plural noun heading the PP.
[225004840290] |But this is not true if we delete the PP altogether:
[225004840300] |Not one has contributed to a 401(k)
[225004840310] |Not one has gained a foothold in the seasonal repertory.
[225004840320] |*Not one have contributed to a 401(k)
[225004840330] |*Not one have gained a foothold in the seasonal repertory.
[225004840340] |It would appear I have an incoherent grammar (surely this is true as I believe all grammars are, in some way, incoherent.
[225004840350] |As Sapir said, all grammars leak).
[225004840360] |But, there's at least one other factor muddying the linguistic waters.
[225004840370] |The fact that one also acts a pronoun as in one does one's duty.
[225004840380] |When acting as a pronoun, it takes 2nd pers, SG agreement, as in one has to do one's duty (think he has to do his duty), not *one have to do one's duty.
[225004840390] |It may be that this pronoun agreement is interfering with my reading when one occurs right next to the verb.
[225004840400] |Also, I did this pretty fast, so I wouldn't be surprised if I change my mind by COB... Of course, how could I resist:
[225004840410] |I believe I got the full paradigm:
[225004840420] |not one of them has
[225004840430] |not one of them have
[225004840440] |not any of them has
[225004840450] |not any of them have
[225004840460] |none of them has
[225004840470] |none of them have
[225004840480] |none has
[225004840490] |none have
[225004840500] |It appears as though none have had a hell of a start to the 18th 19th Century, but got killed off along with the Buffalo.
[225004850010] |refudiate, the word that won't die
[225004850020] |Thanks in no small measure to the Oxford University Press naming refudiate its Word Of The Year plus The Daily Dish rekindling its favorite topic, we have a new round of he-said-she-said to deal with.
[225004850030] |Made famous by Sarah Palin this past summer (see Liberman's original post here, and others here), it is yet again the object of speculation as to why Palin used the form to begin with.
[225004850040] |Palin herself poured fuel on this fire two days ago by tweeting that it was a typo.
[225004850050] |Liberman thinks that explanation didn't hold water the first time around because she first said it aloud on teevee: the original example [on teevee] wasn't a slip of the tongue, but a symptom of the fact that Ms. Palin had a blend of repudiate and refute as a well-established entry in her mental lexicon [note added].
[225004850060] |Why the fuss?
[225004850070] |There's nothing particularly interesting or telling about the linguistic blending of repudiate and refute.
[225004850080] |Everyone does this kind of thing now and again and sometimes it sticks.
[225004850090] |Some people like to beat up on public figures any time they can, so something like this is a target.
[225004850100] |But the more serious speculation is that the Palin Camp's public responses expose something important about Sarah Palin's inner circle and consultation.
[225004850110] |I'll leave it to the political pundits to fight that one out.
[225004850120] |For now,
[225004860010] |etymologists , unite!
[225004860020] |A buddy wrote me an interesting question (to which I did not have an answer): It's been driving me crazy, is there a term of art for when the etymological root of a word is the opposite of the word's modern meaning?
[225004860030] |For example, asbestos means "an unquenchable fire"; philander means "a lover of men" etc.
[225004860040] |Cheers, A., Anyone know this?
[225004870010] |dialects map
[225004870020] |Extremely detailed North American English Dialects, Based on Pronunciation Patterns.
[225004870030] |The site could use a bit of a web re-design ... looks circa 1999.
[225004870040] |Anyone care to offer free web design help to clean up this otherwise useful resource a little?
[225004880010] |does asbestos really mean 'unquenchable'?
[225004880020] |Yes, at least etymologically.
[225004880030] |The Online Etymology Dictionary explains its etymology this way: ...from O.Fr.
[225004880040] |abeste, from L. asbestos "quicklime" (which "burns" when cold water is poured on it), from Gk.
[225004880050] |asbestos, lit. "inextinguishable," from a- "not" + sbestos, verbal adj.
[225004880060] |from sbennynai "to quench," from PIE base *(s)gwes- "to quench, extinguish" (cf. Lith.
[225004880070] |gestu "to go out," O.C.S. gaso, Hittite kishtari "is being put out") (emphasis added).
[225004880080] |Like people, every word has lived its own peculiar and unique life.
[225004880090] |Riffing on my post below regarding words that have the opposite meaning of their etymology, my friend Andy (who did graduate work in Classics, and hence, actually reads Greek) challenged me to help him understand why the word asbestos, whose etymology literally means 'unquenchable' is used today to mean a substance that cannot burn.
[225004880100] |With some Googling, I found this (PDF): "First mention of asbestos appeared in the Greek text On Stones, written by Theophrastus, one of Aristotle’s students.
[225004880110] |Theophrastus referred to a substance that resembled rotten wood and burned (right) without being harmed when doused with oil."
[225004880120] |So, Ol' Theophrastus kept pouring oil onto this stuff, but it never burnt, so he kept pouring, but the stuff was never quenched by oil/fire.
[225004880130] |Hence, it was unquenchable.
[225004880140] |That's my story and I'm sticking to it (for now).
[225004880150] |Andy did some follow-up of his own and provides the following: Yes, that's one of the more likely explanations.
[225004880160] |In my research I came across the use of asbestos as permanent wicks in lamps, but never noted the bit about being unquenchable with oil.
[225004880170] |That Theophrastos citation really belongs in the dictionary entry below, as it's the only cite that explains the meaning under A. The lexicon below is massively comprehensive (if you couldn't tell) so it's odd they missed Theo. The other possible explanation is II.
[225004880180] |or "unslaked lime", as quick lime burns underwater.
[225004880190] |This was a key component in later "Greek fire", but so far I haven't been able to find any ancient source that cites an unquenchable substance (Greek Fire dates to 500 AD, white phosphorus, which also burns underwater, dates to 1600 AD, and sodium, which explodes on contact with water, dates to 1800 AD). If I had the time and language skill I used to have I would search my CD of all Greek text up to 600 AD for cites of asbestos and then comb thru them, but that would be a day's worth of work I'm pleased that we got close to the meaning in online research and I'm not sure that looking up every instance of asbestos would change anything.
[225004880200] |Andy also provided the following reference
[225004880210] |ἄσβεστος , ον, also η, ον Il.16.123:—
[225004880220] |A. unquenchable, inextinguishable, “φλόξ” Il. l. c.; not quenched, “πῦρ ἄ.” D.H.3.67, Plu.Num.9; “κλέος” Od.4.584; “γέλως” Il.1.599; “βοή” 11.50; “ἐργμάτων ἀκτὶς καλῶν ἄ.
[225004880230] |αἰεί” Pi.I.4(3).42; ἄ.
[225004880240] |πόρος ὠκεανοῦ ocean's ceaseless flow, A.Pr.532 (lyr.); πῦρ, of hell, Ev.Marc.9.43.
[225004880250] |II.
[225004880260] |as Subst., ἄσβεστος (sc.
[225004880270] |τίτανος), h(, unslaked lime, Dsc.5.115, Plu.Sert.17, Eum.16; “ἄ.
[225004880280] |κονία” Lyc. ap.
[225004880290] |Orib.8.25.16.
[225004880300] |2.
[225004880310] |a mineral or gem, Plin.HN37.146.
[225004880320] |ἀσβεστώδης: tofus, Gloss.
[225004880330] |Henry George Liddell.
[225004880340] |Robert Scott.
[225004880350] |A Greek-English Lexicon.
[225004880360] |revised and augmented throughout by.
[225004880370] |Sir Henry Stuart Jones.
[225004880380] |with the assistance of.
[225004880390] |Roderick McKenzie.
[225004880400] |Oxford.
[225004880410] |Clarendon Press.
[225004880420] |1940.
[225004890010] |plagiarism and n-grams
[225004890020] |Big media plagiarism is once again in the news as ESPN has suspended an on-air host for plagiarizing three sentences from a newspaper columnist.
[225004890030] |The on air host has admitted the plagiarism*, issued an apology, and asked for forgiveness.
[225004890040] |The multiple and confusing ethical standards for plagiarism has have been the subject of of several LL posts (recently here) and this led me to wonder about what counts as plagiarism in the first place.
[225004890050] |Clearly a three sentence, 45 word passage, almost word for word identical with another, in the same semantic domain with the same referents, is a case of plagiarism.
[225004890060] |But what about a 20 word passage?
[225004890070] |10 word?
[225004890080] |4 word**?
[225004890090] |Many short phrases are highly frequent, right?
[225004890100] |You couldn't felicitously accuse me of plagiarism for using the phrase "I am going..." could you?
[225004890110] |Even though, there can be no doubt, that someone else before me used it first.
[225004890120] |Yes, I know you can find guidelines for plagiarism in college student handbooks and such.
[225004890130] |I dealt with those for years when I taught college writing courses (and I recall flunking at least three students for plagiarism, but those were whole papers, really stupid stuff).
[225004890140] |But I wonder, now that we have a 500 million word corpus available to us, couldn't we simply compare all n-grams to discover how likely it is that any given 5-gram is repeated?
[225004890150] |I'd prefer to do this up to 20-gram and such, but wouldn't we predict that there comes a point at which the likelihood that a particular phrase was plagiarized (given that we had found two alike) would be based solely on the general likelihood that n-grams of that size are repeated.
[225004890160] |The situation would be this: you discover that a particular 11 word passage has an identical twin from 2 years ago.
[225004890170] |Without bothering to look into whether or not the author had access to the previous work, you simply look up the likelihood that any 11-gram passage is repeated and discover that there is a 0.0002% chance that a phrase that long will be repeated.
[225004890180] |With some effort, you could then derive predictions for near identical passages (using WordNet and similar resources).... ..just thinking out loud... *I am ignorant of the role ESPN's producers play in the writing of on air speeches, but the quote seems clearly to have been written on a teleprompter at the time of speaking, which means someone else was involved, even if unwittingly.
[225004890190] |Nonetheless, the host is taking the fall willingly.
[225004890200] |**Excluding obviously famous phrases like Ich bin ein Berliner.
[225004920010] |how we hear ourselves speak
[225004920020] |Science Daily has a nice article on new neurolinguistic research out of Cal linking auditory and speech processes: "We used to think that the human auditory system is mostly suppressed during speech, but we found closely knit patches of cortex with very different sensitivities to our own speech that paint a more complicated picture," said Adeen Flinker, a doctoral student in neuroscience at UC Berkeley and lead author of the study. "We found evidence of millions of neurons firing together every time you hear a sound right next to millions of neurons ignoring external sounds but firing together every time you speak," Flinker added. "Such a mosaic of responses could play an important role in how we are able to distinguish our own speech from that of others."
[225004920030] |HT Linguistic News Feeds
[225004930010] |the germans fear my language too, muahahaha
[225004930020] |It's a mighty era to be a native speaker of English.
[225004930030] |It seems the world fears my language and is instituting fruitless policies to protect their languages against my own.
[225004930040] |First the Chinese banned English words and phrases.
[225004930050] |Now, the Germans are getting on the banning bandwagon: Germany's Transport Minister claimed to have struck an important blow for the preservation of the German language yesterday after enforcing a strict ban on the use of all English words and phrases within his ministry. Peter Ramsauer stopped his staff from using more than 150 English words and expressions that have crept into everyday German shortly after being appointed in late 2009. His aim, which was backed by Chancellor Angela Merkel, was to defend his language against the spread of "Denglish" – the corruption of German with words such as "handy" for mobile phone and other expressions including "babysitten" and "downloaden".
[225004930060] |As a result, words such as "laptop", "ticket" and "meeting" are verboten in Mr Ramsauer's ministry.
[225004930070] |Instead, staff must use their German equivalents: "Klapprechner", "Fahrschein" and "Besprechung" as well as many other common English words that the minister has translated back into German.
[225004940010] |naive bayes knows restaurants better than 5,000 mechanical turks
[225004940020] |Yelp recently sponsored a bake-off between a Naive Bayes classifier and the online crowd-sourcing site Mechanical Turk.
[225004940030] |The task was classifying web sites according to their business category (i.e., is it a restaurant or a doctors office?).
[225004940040] |The classifier beat the turkers handily:
[225004940050] |Money quote: In almost every case, the algorithm, which was trained on a pool of 12 million user-submitted Yelp reviews, correctly identified the category of a business a third more often than the humans.
[225004940060] |In the automotive category, the computer was twice as likely as the assembled masses to correctly identify a business.
[225004940070] |There are a variety of qualifications (why did 99% of Turkers who applied for the task fail the basic test?
[225004940080] |ESL issues perhaps?).
[225004940090] |But it's an interesting result.
[225004940100] |HT kdnuggets
[225004950010] |jobs for linguists
[225004950020] |As the economy slowly starts to wake, I hope and expect to see more jobs like this one popping up where general linguistics skills are being sought by innovative tech companies (these were a dime a dozen in the glory days of the tech boom 90s).
[225004950030] |Were I a bit younger, and less well-payed, I'd probably consider applying myself.
[225004950040] |We are seeking a Linguist interested in joining a rapidly growing organization.
[225004950050] |The Linguist will work closely with our NLP Team in researching and developing lexica and grammars specific to various languages (“Language Packs”) that will be used for various NLP tasks.
[225004950060] |She/he will be expected to contribute substantive insight/action with regard to developing language packs and must have a keen eye for understanding the end-user experience. Specific responsibilities include: - Research specific languages for their lexical, morphological, and grammatical structures - Develop original lexicons and reformat acquired lexicons - Create grammatical rules using the research done above or other sources - Analyze results from the system for mistakes and plan for improvement - Willingness to focus research and development of Language Packs on meeting the end-user’s needs If you're a linguist interested in a non-academic career, you could do worse than apply here.
[225004950070] |And for the record, I have no association with this company, have never worked for them, get nothing from posting this, but I do know one of their employees (we went to grad school together).
[225004960010] |annals of unnecessary censorship, literary canon edition
[225004960020] |Upcoming NewSouth 'Huck Finn' Eliminates the 'N' Word.
[225004960030] |Twain scholar Alan Gribben and NewSouth Books plan to release a version of Huckleberry Finn, in a single volume with The Adventures of Tom Sawyer, that does away with the "n" word (as well as the "in" word, "Injun") by replacing it with the word "slave."
[225004960040] |[...]
[225004960050] |"What he suggested," said La Rosa, "was that there was a market for a book in which the n-word was switched out for something less hurtful, less controversial.
[225004960060] |We recognized that some people would say that this was censorship of a kind, but our feeling is that there are plenty of other books out there—all of them, in fact—that faithfully replicate the text, and that this was simply an option for those who were increasingly uncomfortable, as he put it, insisting students read a text which was so incredibly hurtful."
[225004960070] |I'm curious about this notion of replacement as an "option" for two reasons.
[225004960080] |First, it reminds me of Ted Turner's infamous and ill-fated 1980s colorization project whereby he went back and artificially colorized black and white movies.
[225004960090] |As I recall, Turner also spoke of it as an "option", but it failed miserably as a cultural movement.
[225004960100] |Second, now that eReaders are becoming commonplace I wonder if publishers will begin to offer sanitized versions of books as an option.
[225004960110] |I don't have an eReader, so maybe this is already available, but I could imagine a filter that you click on and magically Henry Miller's Tropic of Cancer becomes a weirdly different novel.
[225004960120] |HT kottke
[225004970010] |adults process language in a baby way!
[225004970020] |Do babies process language in a "grown-up" way?
[225004970030] |First, read this from UCSD: Babies, even those too young to talk, can understand many of the words that adults are saying – and their brains process them in a grown-up way. Combining the cutting-edge technologies of MRI and MEG, scientists at the University of California, San Diego show that babies just over a year old process words they hear with the same brain structures as adults, and in the same amount of time.
[225004970040] |Moreover, the researchers found that babies were not merely processing the words as sounds, but were capable of grasping their meaning [emphasis added].
[225004970050] |It certainly is an interesting finding to discover that infant and adult lexical processing may be similar, but why couch it in asymmetrical phrasing?
[225004970060] |Given the facts as this press release states them, could we equally as well say that adults process language in a baby way?
[225004970070] |This wouldn't get any press attention, though, would it.
[225004970080] |Or worse, it would be mocked.
[225004970090] |The author of the press release, Debra Kain, is referred to as a spokesperson for the UCSD Medical Center in this article.
[225004970100] |But it's not clear she consulted Jeff Elmen, a very well respected cognitive scientist who participated in the research.
[225004970110] |I'm not sure how comfortable he would have been with the somewhat excitable language.
[225004980010] |biggest linguistics story of 2010?
[225004980020] |I have nothing but respect and admiration for Erin McKean, CEO and Co-Founder of the awesome Wordnik project as well as the person who has given by far the single greatest lingo-TED-talk ever; nonetheless, I take exception to her most recent column in the Boston Globe titled The year in language which is an article about the best and worst language stories of 2010.
[225004980030] |She notes many worthy events, yet... With no offense meant, I can say that I was shocked, SHOCKED!
[225004980040] |to discover that no mention whatsoever was made of what I consider to be the single most important and shocking linguistics related story of 2010: the revelation that Harvard's Marc Hauser fabricated data regarding rule learning by monkeys.
[225004980050] |For years, Hauser has posed as a giant in the Chomsky camp, and created an ivy-league cottage industry based on his research.
[225004980060] |2010's revelations of his still-unclear-yet-nonetheless-obvious-forgery is a shock-wave whose full power and ramifications have yet to be fully understood.
[225004980070] |Plus, it was the Boston Globe itself, the paper Erin publishes in, that broke the original story.
[225004980080] |Language Log's extensive discussions of the Hauser story can be found here.
[225004990010] |replace QWERTY with little circles?
[225004990020] |Android users can look forward to a new typing layout specifically designed for one handed, hand-held device typing by 8pen.
[225004990030] |There have long been alternatives to the traditional QWERTY layout, but this one replaces keys with hand motion, so rather than landing your finger on the letter you want to type (the conceptual foundation of most keyboard concepts) this one rests on the idea that you make little circles on the screen while different letters are accessed.
[225004990040] |In the words of the horse from Ren and Stimpy, no sir, I don't like it.
[225004990050] |Why not?
[225004990060] |While inefficient and clumsy, the classic idea of touching the letter you want is fundamentally natural and clear.
[225004990070] |Any child or lazy adult can grasp it immediately.
[225004990080] |The little circles idea creates an artificial and unnatural interface that puts you multiple steps away from what you want.
[225004990090] |I'm not trying to make circles, I'm trying to type a frikkin k.
[225004990100] |I'm sure with practice anyone could get good at this, but I don't wanna practice typing for frik's sake!
[225004990110] |That's why I've been a clumsy hunt and pecker for 30 years with the damned QWERTY.
[225004990120] |I could have practiced typing on this damn thing also, but I didn't for the same reason I'm not gonna practice the little circles: I'm lazy.
[225004990130] |But at least with keys I can just touch the letter I want and get it.
[225004990140] |It's clear and obvious.
[225004990150] |I'm sure the little circles would drive me mad.
[225005000010] |The Psychological Functions of Function Words
[225005000020] |Here is Chung & Pennebaker's 2007 paper on function words which crucially relies on Pennebakers' LWIC data: The Psychological Functions of Function Words (pdf).
[225005000030] |I have long felt that function words have been wrongly ignored by computational linguists and SEO specialists.
[225005000040] |While the use of stop lists have sped up processing time considerably, they have also wiped out huge amounts of semantically meaningful data.
[225005000050] |Nonetheless, I also feel the Pennebaker's LWIC corpus is not as transparent or as comprehensive as I would prefer it to be.
[225005010010] |Do rich families talk to their kids more than poor families?
[225005010020] |Are Children in professional families talked to three times as much as the children in welfare families? That's the underlying assumption behind a new program at Bellevue hospital designed to coach "poor families on how to talk to their infant children, encouraging more interaction."
[225005010030] |At least, that's how the Huffington Post wants you to think about this story: University of Kansas graduate student Betty Hart and her professor, Todd Risley, wanted to figure out the cause of the education gap between the rich and poor.
[225005010040] |So, they targeted early education and headed a study that recorded the first three years of 40 infants' lives.
[225005010050] |The conclusion?
[225005010060] |Rich families talk to their kids more than poor families.
[225005010070] |Pretty impressive, huh?
[225005010080] |Sounds cutting edge, right?
[225005010090] |With a little searching I discovered the following:
[225005010100] |Betty Hart was a grad student at KU in the 1960s.
[225005010110] |The research data for this study was collected in the early 1980s.
[225005010120] |The paper publishing these results was published in 1995.
[225005010130] |I have no problem with the common sense underlying these notions: talking to babies a lot helps them achieve higher success in academics later in life.
[225005010140] |Good advice all around, no doubt.
[225005010150] |But I'm suspicious of several assumptions about the finding of the original paper.
[225005010160] |From Alix Spiegel: According to their research, the average child in a welfare home heard about 600 words an hour while a child in a professional home heard 2,100. "Children in professional families are talked to three times as much as the average child in a welfare family," Hart says [emphasis added].
[225005010170] |Hearing words in your environment and talking to children are two different things and need to be distinguished, as well as child-directed speech.
[225005010180] |All I have are secondary sources not the 1995 book (Spiegel's article is the most thorough) so I can't tell how the data was coded and what they looked for (did the make the above three distinctions?).
[225005010190] |But more to the point is the contemporary rush to paint these old findings as rationale to create new programs aimed at poor parents as if being poor makes your language use wrong somehow.
[225005010200] |It strikes me as convoluted logic to take a 15 year old book (based on 20 year old data) and decide that poor parents need linguistic intervention.
[225005010210] |Exactly how much grant money did Dr. Mendelsohn spend on this program?
[225005010220] |Even if the 3-1 ratio holds true (I suspect it would not under close scrutiny), what other factors might be affecting this?
[225005010230] |It struck me that people with basically good intentions took a small amount of science out of context and used it to reinforce class stereotypes and class pressure.
[225005020010] |doggie do do at the the HuffPo
[225005020020] |The Huffington Post is resetting the bar for astoundingly stupid science reporting: They report on a dog, Chaser, who has been trained to accurately fetch over 1000 toys by sound of the name and conclude that the dog's abilities, wait for it, place her at an intelligence level equivalent to a three-year-old human child!
[225005020030] |My oh my, their view of the cognitive ability of 3 year olds is as depressing as it is profoundly wrong.
[225005020040] |Sorry, 3 year old humans can do more than make one-to-one correspondences between sounds and objects.
[225005020050] |They can, for example, recognize that the sound swing can mean an object with a seat attached to ropes OR the action you perform when you move your body back and forth on that thing with the ropes, they can watch TV and follow plot developments, ... sigh, I mean fuck it, it's not worth debunking ... UPDATE: Sean at Replicated Typo reviews the original research involving Chaser.
[225005040010] |how distinctive is app store?
[225005040020] |Microsoft is arguing that Apple cannot trademark the term app store because it is a generic term. "An 'app store' is an 'app store'," Russell Pangborn, Microsoft's associate general counsel, said, according to the BBC. "Like 'shoe store' or 'toy store', it is a generic term that is commonly used by companies, governments and individuals that offer apps."
[225005040030] |A commenter at Hacker News begs to differ: Ngram data shows no usage of "App Store" or "app store" from the time of 1800 to 2008.
[225005040040] |I was suspicious of this, but using the terms "app,store" separately produced lots of data points.
[225005040050] |My tentative hypothesis is that Ngram is using data that existed before the App Store went public and thus will not show up in Ngram.
[225005040060] |I'm no trademark expert, but the basic idea, as Wikipedia defines it, is distinctiveness: A trademark may be eligible for registration, or registrable, if amongst other things it performs the essential trademark function, and has distinctive character.
[225005040070] |Registrability can be understood as a continuum, with "inherently distinctive" marks at one end, "generic" and "descriptive" marks with no distinctive character at the other end, and "suggestive" and "arbitrary" marks lying between these two points.
[225005040080] |First, I used BYU's Corpus of Contemporary American English and found an instance in 2009 of 'app store" being used to describe Zune's product: Oh, the Zune has an app store, all right.
[225005040090] |As of today, there are exactly nine programs in the Zune App Store.
[225005040100] |A quick google search reveals that it commonly gets applied to non-Apple related products as well: Yep, Amazon Launching Their Own App Store For Android Too.
[225005040110] |While it may be the case that Apple introduced the term in 2008, it seems to have expanded to generic use in less than a year and now gets used at least semi-regularly for non-Apple products.
[225005040120] |I'm not an Apple user myself and my own reading of app store is definitely generic.
[225005040130] |It does not distinctly mean Apple's product at all, to me.
[225005040140] |I have no clue if a court would agree.
[225005050010] |true grit phonological ambiguity
[225005050020] |Thanks to Jeff Bridges' now infamous mumbling performance, the clever folks at College Humor give True Grit a version of the lip reading treatment that Star Trek received not too long ago.
[225005050030] |See more funny videos and funny pictures at CollegeHumor.
[225005050040] |apologies for the weird embedding, I don't know how to fix it (I just pasted the embed code into the Blogger HTML with no option to adjust size)...and yes, I'll have some more of that woop woop, please...
[225005060010] |god awful is an odd phrase
[225005060020] |I used the phrase god awful in a comment at Language Log and it occurs to me that it's an odd little creature.
[225005060030] |From the OED*: Pronunciation: /ˌgɒdˈɔːfʊl/ Forms: Also God awful, Godawful.(Show More) Etymology: < god n. + awful adj. slang (orig.
[225005060040] |U.S.). Terrible; extremely unpleasant.
[225005060050] |(In quot.
[225005060060] |1878 the sense is ‘impressively large’.) 1878 J. H. Beadle Western Wilds xxxvii.
[225005060070] |611 Put thirty acres‥into wheat, and went to work with a hurrah in 1874 to make a God-awful crop. 1897 C. M. Flandrau Harvard Episodes 88 Ellis is such a God awful fool. 1930 W. S. Maugham Breadwinner ii.
[225005060080] |124 Your affairs are in a god-awful mess. 1946 ‘S. Russell’ To Bed with Grand Music i. 14 Listen to the most godawful programmes on the radio. 1958 R. Graves in Times Lit.
[225005060090] |Suppl.
[225005060100] |15 Aug. p. x/4 The credible and vivid story that any context (red-brick, yellow-brick, or otherwise God-awful) offers. 1959 P. McCutchan Storm South iv.
[225005060110] |63, I heard the most God-awful racket above my head. The meaning is derived from using god as an intensifier like very.
[225005060120] |Fine, I get this analysis, it makes sense.
[225005060130] |But is god ever used in any other construction to intensifier a negative quality like awful?
[225005060140] |This is a case where corpora are not terribly useful because the instances of god are so frequent, and so frequently NOT in this kind of construction, it's difficult to discover automatically.
[225005060150] |I could go all qualitative and just read a million phrases with god in them, but that would take a really long time and still have a low probability of success.
[225005060160] |HT to the OED for making their site freely available this month!
[225005060170] |Use name/password trynwoed/trynewoed.
[225005070010] |the most difficult linguistics sentence ever?
[225005070020] |Imagine I give you the sentence template that follows:
[225005070030] |If speakers omit X to avoid Y, optional Z should be less likely if W.
[225005070040] |Question: What X, Y, Z and W could possibly make that sentence EASIER to understand?
[225005070050] |For no particular reason other than (that) I love linguistics and will read any free article that catches my fancy, I've been reading Florian Jaeger's Phonological Optimization and Syntactic Variation: The Case of Optional that.
[225005070060] |Submitted for Proceedings of 32nd BLS (pdf).
[225005070070] |I have nothing but respect for Jaeger as a linguist* and this is a very interesting paper that I have enjoyed reading**.
[225005070080] |But flo*** has a knack for producing very difficult to read sentences.
[225005070090] |Here's the original that produced the template above: If speakers omit optional that to avoid segmental OCP violations with the immediately preceding or following segment, optional that should be less likely if the segments was to share some articulatory feature with the adjacent segment of that.
[225005070100] |It actually got worse WITH context, right?
[225005070110] |And I read the actual paper, with all kinds o' context.
[225005070120] |And I still had to re-read that sentence many many times.
[225005070130] |I'm still not sure I understand it.
[225005070140] |I may have to whip out PowerPoint, a laser pointer, and a flashlight before I figure it out for sure.
[225005070150] |Now, I'm prepared to admit that the three pints of BBC Bourbon Barrel Stout at Galaxy Hut may have influenced my critique ...
[225005070160] |...but not entirely for the worse.
[225005070170] |If I ever get around to typing up my awesome and prodigious commentary, it might make a great blog post ... but don't hold your breath.
[225005070180] |I have a stack of linguistics articles I've read and reviewed over the last 12 months and yet somehow, I just never get around to typing up my truly awesome comments (including in-depth discussion of flo's partner-in-crime Peter Graff's Longitudinal Phonetic Variation in a Closed System -- I got mad comments on that one).
[225005070190] |Maybe I should have called this blog The Lazy Linguist?
[225005070200] |*I've never met the guy so maybe he's a bastard in person, I dunno, I hope not... **Not in the least because it has some tangential connection to my somewhat defunct dissertation research.
[225005070210] |***Hey, he calls himself that on his site...
[225005080010] |oh snap! daume talkin trash 'bout "stupid" penn tree bank
[225005080020] |Hal Daume at his excellent NLPers blog is wondering aloud about parsing algorithms doing "real" syntax: One thing that stands in our way, of course, is the stupid Penn Treebank, which was annotated only with very simple transformations (mostly noun phrase movements) and not really "deep" transformations as most Chomskyan linguists would recognize them [emphasis added].
[225005080030] |Oh no he di'nt!
[225005080040] |[UPDATE: hal responds thoughtfully in the comments and properly corrects my misunderstandings of his post.]
[225005080050] |It's certainly fair to say that the Penn Treebank is not annotated for everything.
[225005080060] |Sure.
[225005080070] |But show me the perfect resource and I'll let you throw all the stones you want.
[225005080080] |More to the point, once you get beyond deciding what the basic chunks are (NPs,VPs, PPs, etc), there's little agreement on what is and what is not a "real" syntactic thing.
[225005080090] |In order to annotate anything above this level, you have to choose a theoretical camp to park your tent in.
[225005080100] |You have to take sides. Daume is happy to be a Chomskyan.
[225005080110] |He's taken his side.
[225005080120] |Good for him. In order to annotate Daume's beloved deep transformations, one must first admit such things exist.
[225005080130] |I do not.
[225005080140] |And if Daume started annotating the Penn Treebank with such things, I wouldn't care.
[225005080150] |I would argue he is wasting his time chasing unicorns.
[225005080160] |Daume may believe that Chomskyan theory is "real" syntax, but I do not.
[225005080170] |Nor do most linguists (if you surveyed all linguists throughout the world, yes I do believe a majority would disagree with the statement I believe in Chomskyan deep structure).
[225005080180] |UPDATE: Daume's comments and his responses are well worth reading.
[225005090010] |worst word play of the year?
[225005090020] |Your call...
[225005100010] |like wikipedia with a voice?
[225005100020] |It can be difficult to get a feel for what some tech start-ups are going for.
[225005100030] |This demo of Qwiki at a Tech Crunch event asks us to think of information as an experience.
[225005100040] |I'm pretty sure the voice is synthesized because of some odd prosody and the weird way Yelp is pronounced (oh, and the unlikelihood that they could pre-record all the possible narration ... yeah, that too).
[225005100050] |At the end, all I could think of was "it's like Wikipedia with a voice..." Qwiki at TechCrunch Disrupt from Qwiki on Vimeo.
[225005110010] |the perils of translation: does und mean well?
[225005110020] |I'm watching the truly powerful 2009 Oscar winning German film The White Ribbon on Netflix.
[225005110030] |Even after a few minutes it has grabbed me and impressed me with its simplicity and power, in the style of many great films.
[225005110040] |Hollywood used to make films like this.
[225005110050] |Films that mattered.
[225005110060] |Films that taught deep truths about what it means to be human.
[225005110070] |Films like "Inherit The Wind", "Guess Who's Coming To Dinner", "To Kill A Mockingbird".
[225005110080] |Now Hollywood makes three Jennifer Anister rom-coms a year and panders to fan boys... but I digress ... My German is pretty rusty, but the film's dialogue is simple enough (in a good way) for me to catch most of it even without the subtitles, which is exactly the source of the linguistic point I want to discuss.
[225005110090] |In one early scene, the narrator, a teacher, recounts an incident involving himself* and a student wherein the student was endangering himself, so the teacher demands the student explain his actions.
[225005110100] |When he fails to get a proper response, he says repeatedly Und?
[225005110110] |... Und?
[225005110120] |... German und is easily translated as and but the film's translators choose to use the English word well instead.
[225005110130] |(screen grab from Netflix)
[225005110140] |As a native speaker of English, I can see the reasoning behind well, yet I must say, it's equally plausible to use and in that situation as well, maybe even more so.
[225005110150] |The use of well in English would suggest a certain formality that the translator felt was proper, but it also makes me, as an English speaker, feel a bit awkward, like I'm being fed an anachronism.
[225005110160] |Perhaps that's appropriate for the movie, I'm not sure, it just struck me as an interesting linguistic choice.
[225005110170] |It's a nice example of the beautiful ambiguity of lexical items, really.
[225005110180] |For example, just a few scenes later the teacher encounters Eva and asks her about who she is and what he's heard about her, namely that she's a new nanny in town, and her response is und, but it is translated as English so:
[225005110190] |(screen grab from Netflix)
[225005110200] |Again, as a native speaker of English, I can "get" the translation, but still, I'd be perfectly happy with and in both.
[225005110210] |I've never been a translator and I have much respect for the difficult job professional translators do navigating these treacherous waters.
[225005110220] |I don't mean to second guess.
[225005110230] |Rather, it strikes my as an interesting point of discussion.
[225005110240] |*why can't I say hisself?
[225005110250] |Oh, where are you Jeff Runner when I need you!
[225005120010] |Obama's State Of The Union and word frequency
[225005120020] |In anticipation of President Obama's 2011 State Of The Union speech tonight, and the inevitable bullshit word frequency analysis to follow, I am re-posting my post from last year's SOTU reaction, in hope that maybe, just maybe, some political pundit might be slightly less stupid than they were last year ... sigh ..
[225005120030] |here's to hope ...
[225005120040] |(cropped image from Huffington Post)
[225005120050] |It has long been a grand temptation to use simple word frequency* counts to judge a person's mental state.
[225005120060] |Like Freudian Slips, there is an assumption that this will give us a glimpse into what a person "really" believes and feels, deep inside.
[225005120070] |This trend came and went within linguistics when digital corpora were first being compiled and analyzed several decades ago.
[225005120080] |Linguists quickly realized that this was, in fact, a bogus methodology when they discovered that many (most) claims or hypotheses based solely on a person's simple word frequency data were easily refuted upon deeper inspection.
[225005120090] |Nonetheless, the message of the weakness of this technique never quite reached the outside world and word counts continue to be cited, even by reputable people, as a window into the mind of an individual.
[225005120100] |Geoff Nunberg recently railed against the practice here: The I's Dont Have It.
[225005120110] |The latest victim of this scam is one of the blogging world's most respected statisticians, Nate Silver who performed a word frequency experiment on a variety of U.S. presidential State Of The Union speeches going back to 1962 HERE.
[225005120120] |I have a lot of respect for Silver, but I believe he's off the mark on this one.
[225005120130] |Silver leads into his analysis talking about his own pleasant surprise at the fact that the speech demonstrated "an awareness of the difficult situation in which the President now finds himself."
[225005120140] |Then, he justifies his linguistic analysis by stating that "subjective evaluations of Presidential speeches are notoriously useless.
[225005120150] |So let's instead attempt something a bit more rigorous, which is a word frequency analysis..."
[225005120160] |He explains his methodology this way: To investigate, we'll compare the President's speech to the State of the Union addresses delivered by each president since John F. Kennedy in 1962 in advance of their respective midterm elections.
[225005120170] |We'll also look at the address that Obama delivered -- not technically a State of the Union -- to the Congress in February, 2009.
[225005120180] |I've highlighted a total of about 70 buzzwords from these speeches, which are broken down into six categories.
[225005120190] |The numbers you see below reflect the number of times that each President used term in his State of the Union address. The comparisons and analysis he reports are bogus and at least as "subjective" as his original intuition.
[225005120200] |Here's why:
[225005120210] |We don't know what causes word frequencies.
[225005120220] |We don't know what the effects of word frequencies are.
[225005120230] |His sample is skewed.
[225005120240] |Silver invented categories that have no cognitive reality.
[225005120250] |There are good alternatives.
[225005120260] |We don't know what causes word frequencies.
[225005120270] |Why does a person use one word more than another?
[225005120280] |WE.
[225005120290] |DON'T.
[225005120300] |KNOW.
[225005120310] |I understand the simple intuition that this should mean something, but no one actually knows what it means.
[225005120320] |We simply don't understand the workings of the brain well enough to study the speech production system well enough to answer this question (despite these guys' suspect claims).
[225005120330] |So we are left with pure intuition (which is generally bad in the cognitive sciences because we don't think the way we think we do).
[225005120340] |So, again, this methodology is not "objective" as Silver claims (not the simplistic way he implemented it, anyway).
[225005120350] |We don't know what the effects of word frequencies are.
[225005120360] |The correlate to #1: When a person hears another person use one word more than another, what effect does it have?
[225005120370] |WE.
[225005120380] |DON'T.
[225005120390] |KNOW.
[225005120400] |Same reasons as above.
[225005120410] |This remains the realm of intuition and guesswork.
[225005120420] |His sample is skewed. While I understand that to the lay person, the set of SOTU speeches seems like a coherent category to analyze, it is in fact a linguistically incoherent grouping because these sorts of speeches are constructed slowly, painfully, over time, by teams of individuals, NOT spoken extemporaneously by a single individual.
[225005120430] |Silver could spin this as a positive in the sense that the speeches represent presidential administrations as a whole, but this makes the "evidence" (i.e., word frequency) extremely messy.
[225005120440] |What factor is driving the frequency of a particular word in a speech?
[225005120450] |No clue.
[225005120460] |The variables are numerous and unknown (two bad things for "rigorous" analysis).
[225005120470] |Having such a messy data set makes interpretation nearly impossible even if we DID know the answers to #1 and #2 (which we don't).
[225005120480] |Silver invented categories that have no cognitive reality.
[225005120490] |Silver's 70 buzzwords are shoved into six arbitrary categories.
[225005120500] |Linguists have bee keen on word categories for ... well ... let's say at least 2500 years.
[225005120510] |This we care about.
[225005120520] |Deeply.
[225005120530] |William Labov famously wrote, "If linguistics can be said to be any one thing it is the study of categories" (full text here).
[225005120540] |More recently, in the last few decades, linguists have expanded their repertoire of tools for analyzing lexical categories using psycholinguistic, cognitive linguistic, and computational linguistic tools and methods.
[225005120550] |None of these were employed by Silver in determining whether or not his six categories have any coherence or cognitive reality.
[225005120560] |He just made them up.
[225005120570] |How is this MORE objective than intuition?
[225005120580] |There are alternatives.
[225005120590] |Let me be clear.
[225005120600] |I am a fan of corpus linguistics.
[225005120610] |Counting words is good (as Nunberg says, and as many linguists say.
[225005120620] |We like this).
[225005120630] |But this is just the beginning of a long road of analysis.
[225005120640] |It must be done in a systematic and sophisticated way to be of any use.
[225005120650] |There are numerous software tools and methodologies that Silver could have made use of that would have given him a more nuanced analysis.
[225005120660] |There are whole books that teach people how to do this, such as Corpora in Cognitive Linguistics (just one of many).
[225005120670] |Again, I have a lot of respect for Silver and his advanced skill set in stats.
[225005120680] |I would love to see Silver bring the full weight of his skills to bear on linguist analysis (as I've said, every linguist should study math and stats), but this experiment falls far short of the mark and he should know better.
[225005120690] |To a certain extent, this critique is unfair to Silver because he implicitly seemed to be acknowledging many of these deficits.
[225005120700] |All he wanted to do was get a more objective picture of what the SOTU speech meant and how it fits into a bigger picture.
[225005120710] |On the other hand, it's a fair critique because he put in a lot of effort and posted the results to his popular and influential blog (yes, I note my blog is neither); one ought not to waste such effort.
[225005120720] |There is the glaringly negative possibility that his popularity and influence as a statistician will actually serve to further strengthen the popular but wrong notion that simple word counts are somehow meaningful.
[225005120730] |This would be bad. *By "simple word frequency counts" I mean counting the words a person uses (say, in a speech) without counting anything else or adding any other data to give the frequency counts meaning and context.
[225005130010] |Call For Participation
[225005130020] |More and more researchers are using the web to gather data for serious research, but they need your help as participants.
[225005130030] |As a proponent, I like to do my part and share the calls for participation that I know about.
[225005130040] |If you know of any others, I'm happy to add: MPI -- The Max Planck Institute for Psycholinguistics: Investigates how people use language.
[225005130050] |Cue-word memories -- Clare Rathbone, University of Reading: The study is specifically interested in the way people remember events from their lives.
[225005130060] |You will be asked to recall 16 memories and then rate them for details, such as vividness and how often you have thought about the memories before.
[225005130070] |You will also be asked to fill in two short questionnaires.
[225005130080] |Please note, this questionnaire is for people over the age of 40 only - please do not take part if you are aged 39 or younger.
[225005130090] |Phrase Detectives -- University of Essex: Lovers of literature, grammar and language, this is the place where you can work together to improve future generations of technology.
[225005130100] |By indicating relationships between words and phrases you will help to create a resource that is rich in linguistic information.
[225005130110] |Games With Words -- Joshua Hartshorne, Harvard University: Test your language sense!
[225005130120] |Play a game while participating in cutting-edge research.
[225005130130] |How good is your language sense?
[225005130140] |Color Naming -- Dimitris Mylonas, London College of Communication: This is a multi-lingual colour naming experiment.
[225005130150] |It is part of research on colour naming and colour categorisation within different cultures, and aims to improve the inter-cultural colour dialogue.
[225005130160] |By taking part you are helping us to develop an online colour naming model which will be based on the "natural" language provided from your responses.
[225005130170] |CogLab 2.0 -- The Cognitive Psychology Online Laboratory: Aggregated set of many online research projects.
[225005140010] |do we need parsed corpora?
[225005140020] |Maybe not, according to Google: For many tasks, words and word combinations provide all the representational machinery we need to learn from text...invariably, simple models and a lot of data trump more elaborate models based on less data.
[225005140030] |I've been wondering about this very issue for 5 years or so.
[225005140040] |When I first started collecting parsed BNC data for my defunct dissertation, I needed sentences involving various verbs and prepositions, but the examples I found were often of the wrong structural type because of preposition attachment ambiguity.
[225005140050] |I used Tgrep2 queries to find proper examples, but even then there were false positives, so I did some error correction.
[225005140060] |One of the more interesting discoveries I made was a relationship between a verb's role in its semantic class and its error rate.
[225005140070] |I was trying to find a way to objectively define core members of a semantic verb class and peripheral members.
[225005140080] |I had a pretty good intuition about which were which, but I wanted to get beyond intuition (yes yes, it's all very Beth Levin).
[225005140090] |For example, one of the objective clues for barrier verbs (a class of negative verbs encoding obstruction, like prevent, ban, exclude, etc) was the unusual role of the preposition from in sentences like these:
[225005140100] |She prevented them from entering the pub.
[225005140110] |He banned them from the pub.
[225005140120] |They were excluded from the pub.
[225005140130] |The preposition from is usually used to mark sources (He drove here from Buffalo) but in these sentences it's acting much more like a complementizer.
[225005140140] |This is fairly unique to barrier verbs and I felt it was distinctive of the verb class, so I wanted a bunch of examples.
[225005140150] |Because I needed to exclude examples involving old-fashioned source from, I used a Tgrep2 search that required the PP to be in a particular relationship to the verb (the BNC parse was a bit odd as I recall, and required some gymnastics).
[225005140160] |Again, I had a lot of false positives even with Tgrep2 so I did some manual error analysis and discovered that certain verbs had very low error rates while others had very high rates and the difference coincided nicely with my intuition about which verbs were core members of the class and which were peripheral: core members like prevent had very low error rates.
[225005140170] |This means that when prevent is followed by a from-PP, it's almost always the complementizer from; obvious to adults, the meaning of a barrier verb doesn't easily include source (necessary for old-fashioned from), but how would a kid learn that?
[225005140180] |If I ban you from the pub, how does a kid know the pub is NOT where you started (source) but rather the opposite, it's where you're not allowed to end up (goal)?
[225005140190] |Cool little learning problem, I thought ... and with a data set other than frikkin dative (which Pinker and Levin have, let's face it, done to death).
[225005140200] |I assumed there was something central to the meaning of the verb class that caused this special use of from.
[225005140210] |Then it occurred to me, if this is true, why do I need the parse?
[225005140220] |Imagine I ignore structure, take all sentences where from follows a relevant verb, then sample for false positives.
[225005140230] |That should give me basically the same thing.
[225005140240] |I became increasingly fascinated with this methodology.
[225005140250] |I was now interested in how I was studying language, not what I was studying.
[225005140260] |And that led me to ask whether or not the parse info was all that valuable for other linguistic studies?
[225005140270] |But then I realized that when big news stories start getting old, the media always, always starts reporting on themselves, on how the news gets made ... I didn't like where I was heading ... ...and then I got a job and that was that ... HT: Melodye
[225005150010] |more jobs for linguists
[225005150020] |As the economy continues to grow (Dow over 12,000), so do the non-academic opportunities for linguists.
[225005150030] |Here's an interesting one for an Analyst in the California* Bay Area : The Analyst looks for opportunities to improve our Natural Language and Directed-dialog applications using the data logged by them.
[225005150040] |The Analyst is primarily the responsible team member charged with analyzing the data and making new implementation recommendations [...] Besides analyzing our speech applications and improving our Analytics framework, you will also have the opportunity to carry out independent research, which forms a big part of the success of our speech applications [...] *living in the metro DC region has taught this Northern California boy that there's more than one "Bay Area."
[225005160010] |can a machine learn jazz?
[225005160020] |There's a contest dedicated to trying to answer that question: ISMIS 2011 Contest: Music Information Retrieval.
[225005160030] |Computer scientists and engineers have long used contests and bake-offs to stimulate cutting edge research in linguistics (e.g., MUC), but linguists have lagged in this department.
[225005160040] |You rarely if ever hear about contests that pit one linguistic theory against another using a standardized data set (or maybe I've just missed them).
[225005160050] |Nobel prize winning economist Joseph Stiglitz argues here that prizes are good for stimulating academic research.
[225005160060] |I agree whole heartedly and would like to see more direct competition between theorists.
[225005160070] |Exactly how a contest would be constructed is up for debate (I have a vague memory of some group trying to devise criteria by which to evaluate linguistic theories, maybe out of UCLA, but I can't seem to track it down; it's a remarkably difficult Google query to form).
[225005160080] |HT: Jochen L. Leidner
[225005170010] |the linguistics of heaven and hell
[225005170020] |The value of pop culture data for legitimate research is being put to the test.
[225005170030] |Exactly what, if anything, can the reality show Big Brother tell us about language change over time?
[225005170040] |Voice Onset Time is a measure of how long you wait to begin vibrating your vocal folds after you release a stop consonant.
[225005170050] |Voiced stop consonants like /b/ and /d/ require two things: 1) stop all airflow from escaping the airway by closing the glottis and 2) after the air is released, begin vibrating the glottis (by using the rushing air).
[225005170060] |For non-linguists, think of a garden hose.
[225005170070] |Imagine you use your thumb to stop the water for a second and you let the pressure build, then you let go and water rushes out, but then you use your thumb to clamp down just a bit on the water to spray it.
[225005170080] |This is kinda like the speech production of voiced stop consonants in human language.
[225005170090] |(image from Kval.com)
[225005170100] |Though I’m no phoneticist, I really like VOT as a target of linguistic study for one crucial reason: it’s a clear example of a linguistic feature that varies according to your human language system but which you do NOT have conscious control over.
[225005170110] |What that means is that you cannot consciously change the length of your own personal VOT.
[225005170120] |Go ahead, try it.
[225005170130] |Make your VOT 20 milliseconds longer.
[225005170140] |Go ahead, I’ll wait… Of course you can’t. Well, not consciously, but what researchers have found is that your brain, quite independent of conscious will or knowledge, can!
[225005170150] |Lab studies have found that people will unknowingly alter their VOTs according to certain situations, and the results are predictable.
[225005170160] |For example, they found that when listening to a set of long VOT stimuli, subjects will begin to lengthen their own VOTs, in essence accommodating the longer VOTs.
[225005170170] |Over the longer term it has also been shown that people will lengthen their VOT over their lifetime to accommodate cultural shifts.
[225005170180] |It has been shown that The Queen Mother herself had a longer VOT in her later life than during her younger days (few other people have been recorded consistently over a long period to provide such valuable data).
[225005170190] |Here’s what Bane et al. did: They took recordings of confessional sequences from the UK reality TV show Big Brother (where groups of strangers are made to live with each other and occasionally speak to a camera alone like a video diary) and tested what happened to 4 crucial individuals (the ones that stayed on the show long enough to provide several months worth of data points).
[225005170200] |What they found was that their VOTs did in fact change, though no linear pattern was discovered (i.e., they did not simply get longer in a steady line).
[225005170210] |This paper is labeled as a progress report because they don't have a firm hypothesis about what actually is happening.
[225005170220] |Nice trick there boys, ;) They did find one interesting thing: During part of the show, the house mates were physically divided into basically a caste system where half the people were low caste and half were high (a heaven and a hell).
[225005170230] |And this seemed to have an effect on VOT as well (sociolinguists are slap happy about this, I'm sure).
[225005170240] |I haven’t looked at the actual numbers very closely, but in section 6, they say “Housemate trajectories seem to diverge when the divide is present…” However, just taking a glance at the Figure 3, it looks like they diverge at the beginning, then converge at the end, episode 65 (and remain somewhat similar until several episodes of non-DIVIDE have gone by).
[225005170250] |If my cursory glance is correct, I would assume it takes awhile for the convergence to manifest, and then it persists for awhile after DIVIDE is gone.
[225005170260] |But this is just me looking at the picture, not the actual data.
[225005170270] |Finally, and this is just a readability point, but I would order the names in Figure 3 in the same order as the end point of each trajectory, making it easier to follow who is doing what.
[225005170280] | Max Bane, Peter Graff, & Morgan Sonderegger (2011).
[225005170290] |Longitudinal phonetic variation in a closed system Linguistic Society of America 2011 Annual Meeting.
[225005180010] |chomsky and performance art
[225005180020] |Artist Annie Dorsen has created a chatbot performance piece around the debate between Noam Chomsky and Michele Foucault on Dutch TV in 1971 (videos here).
[225005180030] |This snippet is mildly interesting, but I couldn't help wondering what technology was used, especially the speech synthesis, because, frankly, it's a bit clunky and old-fashioned.
[225005180040] |Perhaps that's part of the point.
[225005180050] |The computer screens appear to be running DOS shells too.
[225005180060] |Nothing wrong with that, purists will likely prefer it even, but combined with the clunky speech, the performance appears to be trading on a very outdated computational linguistic aesthetic.
[225005180070] |Truly the desert of the real?
[225005180080] |(okay, I had to through in some Baudrillard)
[225005190010] |the sociolinguistics of height in China
[225005190020] |Ingrid at Language on the Move has some thoughtful comments on the relationship between height and learning English in China.
[225005190030] |If you're under 1.6 meters, forget it.
[225005190040] |There are subtle but very real socioeconomic barriers in your way.
[225005190050] |Money quote: I have supervised research related to English language learning and teaching in China for almost a decade and have read most of the research on the topic published in English.
[225005190060] |However, never before have I come across the importance of height.
[225005190070] |I take this as evidence for the importance of doing ethnographic research.
[225005190080] |Otherwise, what is the point of doing sociolinguistic research if you can’t discover anything you hadn’t already decided in advance would be important?!
[225005190090] |I taught EFL in China back in 1998 at a private school in Guangzhou catering mostly to working professionals.
[225005190100] |Much has changed since then, as China has changed so much.
[225005190110] |I don't recall ever talking about height as a factor, but certainly cost and hours were a significant issue that made it virtually impossible for any poor workers to consider taking English classes.
[225005190120] |As a 6 foot 4 blond American, though, I was treated like a rock star.
[225005190130] |It was kinda weird.
[225005200010] |the false narrative of small town slang
[225005200020] |There is a common critique of journalists that they often let an internal narrative color their reporting, to the point where they simply parrot back the narrative in their head rather than report the facts on the ground (see here for a discussion of this).
[225005200030] |My hometown of Chico got its spotlight in the sun recently because its favorite son Aaron Rogers is the star quarterback of the Packers about to play in the Super Bowl.
[225005200040] |Unfortunately, the NYT's article is a near perfect example of journalists letting a narrative do the talking when the facts blatantly contradict their claims: The usual slang words like awesome or cool are not heard much.
[225005200050] |Nice is in.
[225005200060] |As in: “You won the lottery?
[225005200070] |Nice.” The narrative this spins is that small towns are all Mayberrys where everyone is pure and innocent and righteous and better than them damned city-folk.
[225005200080] |It has been evoked routinely in political reporting.
[225005200090] |I'm a Chico boy.
[225005200100] |I graduated from Chico jr.
[225005200110] |High and walked across the street and graduated from Chico High then walked across the street and graduated from Chico State*.
[225005200120] |And I can assure you that awesome and cool are every bit as frequent there as anywhere else (personally, I had an unhealthy fondness for hella back in 1987).
[225005200130] |And believe me, if you won the lottery in Chico, no one would say nice.
[225005200140] |They would say, "No fukkin way!
[225005200150] |No fukkin way!
[225005200160] |Really!
[225005200170] |No fukkin way!"
[225005200180] |... just like everywhere else.
[225005200190] |*no joke, those three schools are literally across the street from each other.
[225005210010] |how (not) to do linguistics
[225005210020] |Jonah Lehrer, the neuro-blogger, has a mixed track record, as far as I'm concerned.
[225005210030] |His initial blogging was nice, but a tad lightweight, then he started to sound a bit too Malcom Gladwell-ee (in that I wasn't entirely sure he knew what he was talking about beyond having a few short phone calls with one or two scientists then babbling on about a topic).
[225005210040] |But he's hit a home run with this long New Yorker piece about the failure of the journal review process in science: The Truth Wears Off.
[225005210050] |He draws examples from medicine, physics, and psychology.
[225005210060] |Perhaps the most disappointing part is the realization the the standards of testing and conclusiveness in linguistics are so far from those in more established science.
[225005210070] |Before the effectiveness of a drug can be confirmed, it must be tested and tested again.
[225005210080] |Different scientists in different labs need to repeat the protocols and publish their results.
[225005210090] |The test of replicability, as it’s known, is the foundation of modern research.
[225005210100] |Replicability is how the community enforces itself.
[225005210110] |It’s a safeguard for the creep of subjectivity.
[225005210120] |Repeating studies is virtually unheard of in linguistics.
[225005210130] |Also, Lehrer mentions the publication bias in journals.
[225005210140] |When a result is discovered, there is a bias towards positive results.
[225005210150] |After a while, once the result is accepted, then only negative results are published because only that is "interesting" anymore.
[225005210160] |But I would expand this point to say this same bias exists at every stage of the research process.
[225005210170] |We want to find things that happen, we don't care about spending 5 years and thousands of hours discovering that X does NOT cause Y!
[225005210180] |So when young grad students begin scoping out a new study, they throw away anything that doesn't seem fruitful, where fruitful is defined as yielding positive results.
[225005210190] |This bias affects the very foundation of the research process, namely answering the basic question: what should I study?
[225005210200] |As a side note, engineers seem perfectly happy to follow through on null results.
[225005210210] |They need to know the full scope of their problem before solving it.
[225005210220] |Scientists can learn a lot from engineers (and vice versa).
[225005210230] |[Psychology professor Jonathan] Schooler recommends the establishment of an open-source database, in which researchers are required to outline their planned investigations and document all their results.
[225005210240] |“I think this would provide a huge increase in access to scientific work and give us a much better way to judge the quality of an experiment,” Schooler says.
[225005210250] |“It would help us finally deal with all these issues that the decline effect is exposing.” Coincidentally, I was recently tweeting with moximer and jasonpriem about this and we agreed that research wikis are worth explolring.
[225005210260] |My vision would be something akin to Wikipedia but where a researcher stores all of their data, stimuli, results, etc, finished or not.
[225005210270] |The data could be tagged as tentative, draft, failed, successful, etc.
[225005210280] |As the research goes on, the data get updated.
[225005210290] |Not only would this record failure (which, as Leherer points out in the article) is as valuable as success, it also records change.
[225005210300] |How did a study evolve over time?
[225005210310] |True, the data would become huge over time across many disciplines, but that just means means we need better and better data mining tools (and the boys at LingPipe are working away at those tools).
[225005210320] |HT rapella
[225005220010] |my classic snowclone rant
[225005220020] |As yet another winter storm threatens the US, lingo-tweeter cum lingo-grad student Lauren Ackerman marvels at the media's lust for snowmageddon and terms of its ilk, and I was reminded of my own ruminations on the many words for snow in my own peculiar dialect (it helped that I spent 6 hours in near motionless traffic a few days ago while the DC metro region was castrated by a vicious and sudden sleet storm that halted traffic as well as sanity).
[225005220030] |So I offer this re-post from February 5 2010: As the snow descends upon Northern Virginia in the latest winter storm, and as DC's elite line-up at their local Whole Foods and Trader Joe's clutching their reusable bags filled with heavily packaged prepared meals, cardboard-container salads, 6 bottles of wine, and one bottle of water ('cause, ya know, it's an "emergency"), I am struck by the fact that the great Eskimo vocabulary hoax (pdf) is no hoax at all!
[225005220040] |It turns out that I too have a great many words for snow.
[225005220050] |This evening, while running a few modest errands before the night's predicted 20 inch snow drop, I meticulously recorded the various terms I uttered as synonyms for the fluffy white stuff which descended, rather gracefully, upon the landscape.
[225005220060] |A few choice examples (NSFW): shit
[225005220070] |"Why do people drive like such morons in this shit?"
[225005220080] |"Hey asshole!
[225005220090] |This shit's not Vasoline!
[225005220100] |You can drive faster that 6 miles an hour!"
[225005220110] |crap
[225005220120] |"This crap's gonna be piled up in disgusting dirty brown heaps for weeks."
[225005220130] |fuck
[225005220140] |"Fuck these fucking fuckers who can't drive in this fuck!"
[225005220150] |asshole-shit-motherfucker*
[225005220160] |"Ahhhh!
[225005220170] |You drive on this asshole-shit-motherfucker like it's nuclear!"
[225005220180] |fucking-fuck-fuck
[225005220190] |(directed at a plow driver) "push the fucking fuck fuck onto the curb, not back into the road!"
[225005220200] |grrrrrrr
[225005220210] |"gawd I hate everybody!
[225005220220] |All of you!
[225005220230] |All because of this ... grrrrrrr!"
[225005220240] |(picture head exploding)
[225005220250] |*asshole-shit-motherfucker is actually quite productive in my dialect.
[225005220260] |It replaces a great many phrases.
[225005220270] |Addendum (1-31-2011): I want to see a Visual Thesaurus word map of my words for snow!
[225005230010] |Neuro-blogger Bradley Voytek posts a nice discussion helping us all understand how to consume neuroscience in the news: In this post, I will teach you all how to be proper, skeptical neuroscientists.
[225005230020] |By the end of this post, not only will you be able to spot "neuro nonsense" statements, but you'll also be able to spot nonsense neuroscience questions.
[225005230030] |Well worth the read.
[225005250010] |Linguist List FAIL
[225005250020] |I've been kicked around a few NLP blocks in my time so I've developed a sixth sense about what employers are looking for when they post job announcements.
[225005250030] |When I read this one from Intelius on The Linguist List today, my reaction was clear, concise, and unconditional: This is NOT for linguists.
[225005250040] |This posting says engineers only to me!
[225005250050] |There's nothing wrong with that, but why use the Linguist Lists' job postings board with a job that no actual linguist will be considered for?
[225005250060] |My reaction is based on what I consider to be engineering dog-whistles that are designed to encourage the "right" people to apply (i.e., engineers) and the wrong people to go away (i.e., linguists).
[225005250070] |A quick breakdown of their rhetorical dog-whistles:
[225005250080] |The Data Research Group is a team of scientists at Intelius... Much as I would like linguists to be considered scientists, the truth is, in the "real world" of job announcements, they are not.
[225005250090] |This is a red flag.
[225005250100] |Team members have published papers in top research conferences...Ah hah, not "conferences" per se, but "research conferences".
[225005250110] |This means ACL.
[225005250120] |Mentors will include Dr. Vitor R. Carvalho and Dr. Andrew Borthwick (diss PDF)... NOT linguists.
[225005250130] |Required Skills: Strong hands-on skills in Java and/or Python... i.e., we assume you lay awake at night worrying about arrays and functions, not unnaccusative marking and tone sandhi.
[225005250140] |Required Skills: Self-motivated, creative, and independent researching skills ... we will teach you nothing.
[225005250150] |You are on your own.
[225005250160] |Your teachers are gone.
[225005250170] |What can you give us?
[225005250180] |FYI: Recently, bulbul has quite rightly taken me to task for being a tad hypocritical in arguing two seemingly contradictory points: (1) that 21st Century linguists should study math and (2) that the time consuming effort of learning computational tools is a deterrent to being a linguist.
[225005250190] |I can imagine this post as falling victim to that same complaint.
[225005250200] |My pre-defense is that I believe there is a skill set distinct to linguists that is valuable and worthy of investment by NLP capitalists that has been largely ignored.
[225005250210] |Engineers alone will not solve the critical language issues necessary to create the great products of the next generation of NLP tools.
[225005250220] |I believe in team building where linguists and engineers work together as equals
[225005260010] |our foundational tongues?
[225005260020] |A commentator at The Daily Dish writes: I recently learned that in our foundational tongues of Latin, Greek, and Hebrew the words for breath and spirit are one and the same: spiritus, pneuma, and ruach [emphasis added].
[225005260030] |I'm not sure what the author had in mind for "our foundational tongues."
[225005260040] |Assuming the author is referring to English, then Latin, okay sure, Romance languages have had an important influence on English.
[225005260050] |Greek, less so.
[225005260060] |But Hebrew???
[225005260070] |What's most striking is the notable lack of Germanic languages as "foundational."
[225005260080] |This author needs a Ling 101 class.
[225005260090] |And as for the author's claim about words for breath and spirit being the same, there is a related poetic pairing common to good ol' fashioned English.
[225005260100] |The word breath is often used as a metonymy for life or spirit.
[225005260110] |Here are a few choice examples: The Bard Henry V -- King Henry's Once more unto the breach, dear friends speech (III, 1): Now set the teeth and stretch the nostril wide, Hold hard the breath and bend up every spirit To his full height.
[225005260120] |On, on, you noblest English. Whose blood is fet from fathers of war-proof! In my reading of this line, King Henry pairs holding of the breath with spiritual courage to draw a parallel between the two.
[225005260130] |Hamlet -- Hamlet's Mother, Queen Gertrude, whilst arguing with her tortured son (III, 4): Be thou assured, if words be made of breath, And breath of life, I have no life to breathe What thou hast said to me .
[225005260140] |Prior to this line, Hamlet prods his mother to stop sleeping with his uncle/king and to "break your own neck down." In my reading of her lines, Gertrude connects the dots between words, breath, and spirit because of her son's harsh words.
[225005260150] |She is saying it is not in my spirit to do what you are asking of me.
[225005260160] |And here is a really nice 2009 discussion of poetry and breath by Melissa Zeiger: Grace Paley's Poetics of Breath.
[225005260170] |Money quote: The Romantic poets reemphasized breath as a force in poetry, liking to imagine that poetic breath mediated between the human and the transcendent, as, famously, in Coleridge's “The Eolian Harp,” where the wind joins breath to participate in “one Life within us and abroad,/ Which meets all motion and becomes its soul” And this trope is not limited to Western literature either.
[225005260180] |The traditional Chinese concept of Qi is deeply rooted in an analogy of breath = life.
[225005260190] |From the Wikipedia page: Qi is frequently translated as "energy flow".
[225005260200] |Qi is often compared to Western notions of energeia or élan vital (vitalism), as well as the yogic notion of prana, meaning vital life or energy, and pranayama, meaning control of breath or energy.
[225005260210] |The literal translation of "qi" is air, breath, or gas.
[225005260220] |Compare this to the original meaning of the Latin word "spiritus", meaning breathing; or the Koine Greek "πνεῦμα", meaning air, breath, or spirit; and the Sanskrit term "prana", meaning breath.
[225005260230] |What this suggests to me is that there is something deeply natural to our cognitive perceptions about this analogy between breath and life.
[225005260240] |It is natural for humans to perceive breathing and thinking to be related somehow.
[225005260250] |Without breath, you cannot think.
[225005260260] |Fair enough.
[225005260270] |But this might be a deeply human logic insofar as ants or dolphins may not conceive of this relationship in the same way.
[225005260280] |I blogged about this last year in Dolphin-Bikes and The Iconicity Effect.
[225005260290] |I'm still waiting for a dolphin bike.
[225005270010] |Ngram Viewer sucks, true dat
[225005270020] |proof positive....
[225005270030] |true dat...
[225005280010] |evolution = chaos?
[225005280020] |Kottke points to a graphical variation of the Chinese whispers game whereby an original sign (in this case, a line drawn by a human) is rapidly degraded by multiple repetitions (the more people try to repeat the original line, the less line-like it becomes, eventually degrading into chaos).
[225005280030] |A Sequence of Lines Traced by Five Hundred Individuals from clement valla on Vimeo.
[225005280040] |Kottke marvels that "The lines get really messy surprisingly fast [...] this is a nice demonstration of evolution."
[225005280050] |But is it?
[225005280060] |Is it the case that evolution leads to chaos*?
[225005280070] |I don't think so.
[225005280080] |Evolution leads to variation and change, sure, but chaos?
[225005280090] |The difference between evolution and this line transformation, I think, is pressures.
[225005280100] |In evolution there are pressures that greatly effect which changes last more than one generation and hence become permanent stable.
[225005280110] |But in this game, there are no pressures, as far as I can tell.
[225005280120] |There is no survival of the fittest because each turn gets to survive for exactly one generation with no pressure to be fitter than another in order to persist beyond one generation.
[225005280130] |So this exercise, cute as it may be, does not resemble evolution at all, I don't think.
[225005280140] |*or messiness in Kottke's phrasing
[225005290010] |economists are bad linguists
[225005290020] |Dominik Lukes at Metaphor Hacker has a thorough discussion of Harvard economist Ed Glaeser's mis-use of metaphor theory by trying to use NYC restaurants as a metaphor for schools.
[225005290030] |Lukes teases out the mis-mappings that Glaeser fails to recognize.
[225005290040] |Money quote: [Restaurants] also use a number of tricks to make the dining experience better – cheat on ingredients, serve small portions on large plates, etc.
[225005290050] |They rely on ‘secret recipes’ – the last thing we want to see in education.
[225005290060] |And this is exactly the experience of schools that compete in the market.
[225005290070] |They fudge, cheat and flat out lie to protect their competitive advantage.
[225005290080] |They provide the minimum of education that they can get away with to look good.
[225005290090] |Glaeser, as he conveniently forgets, there is a huge amount of centralized oversight of New York restaurants – much more, in some ways, than on charter schools.
[225005290100] |The full discussion is thorough and well worth reading.
[225005300010] |fuck C++
[225005300020] |Andrew Vos provides us with valuable data analysis of the correlation between programming languages and profanity: The plan was to find out how much profanity I could find in commit messages, and then show the stats by language.
[225005300030] |These are my findings: Out of 929857 commit messages, I found 210 swear words (using George Carlin's Seven dirty words).
[225005300040] |Oh, Python, beautiful Python ... no wonder the NLTK guys chose it as their NLP language of choice.
[225005310010] |the linguistics of 404 FILE NOT FOUND
[225005310020] |A cute site providing humorous translations of the world's most frustrating search result.
[225005310030] |Personal favs:
[225005310040] |American South - Ah cain't find th' page yer lookin' fer.
[225005310050] |Australia - Strewth mate yer bloody page has shot through.
[225005310060] |Blond - like omg!
[225005310070] |ur file has not been found, go paint ur nails and try back later, lol^^....I FOUND A QUARTER!
[225005310080] |Cockney - No chance luv, carrnt find it neever.
[225005310090] |Pirate - Haaarr, Lubber!
[225005310100] |I've sailed yon seas with toil and trial, and yet I cannot find ye file!
[225005310110] |Pittsburghese - This page needs fixed n'at... it's all caddywhompus!
[225005310120] |Yinz needs look somewheres else.
[225005310130] |Zombie - Arrgrg 404 BrAiNs aAAArrggh No ggrrgrh page brAiNz heRe BrAAAAIIINNSSSS!
[225005320010] |Hosni prefers "Hosny" in transliterated attire
[225005320020] |Rachel Maddow et al.
[225005320030] |discovered a delicious gem fit for the annals of transliteration.
[225005320040] |Namely, how to write a specific Arabic name in the Roman alphabet (what we English speakers like to call "regular spelling").
[225005320050] |She (and her staff) reported that Hosni Mubarak attended a head-of-state meeting in Albania a couple years ago wearing the world's most narcissistic pinstriped suit*, where the pin stripes were actually composed of lines of his name written in Roman alphabetic transliteration (this man really knows how to live the life of a tyrant, am I right?):
[225005320060] |It is a troublesome fact of human language that writing the damned thing down is never easy.
[225005320070] |It's difficult enough to construct a writing system that is consistent for a single language, more difficult still to take a linguistic term (like a person's name) and write it down in a script which was not designed for that particular language.
[225005320080] |So when English language writers (like journalists) have to write down Arabic names in "regular spelling" they inevitably face difficult choices about which letters to use to represent particular sounds.
[225005320090] |Vowels are particularly difficult creatures to pin down with alphabetic rope (e.g., the whole and sometimes y fiasco).
[225005320100] |The act of writing a linguistic term in a foreign script is called transliteration, and it's troublesome enough to have spawned a cottage industry sub-field within computational linguistics.
[225005320110] |For example, if you wanted to Google information about the currently exiled president of Egypt, you would be wise to Google the term "Hosni Mubarak."
[225005320120] |That is by far the most common spelling of the man's name on the internet (by a better than 20-1 margin, at least according to Google hit counts).
[225005320130] |Even if you choose the "Hosny" variant, you're basically just redirected to the "Honsi" results anyway.
[225005320140] |Yet the tyrant himself, ever the maverick, prefers the road less traveled.
[225005320150] |Sadly, there's not much more to say about this than to emphasize the simple fact that transliteration is largely arbitrary and disputes about guidelines are largely trivial.
[225005320160] |Just flip a coin and move on ... (I just seriously pissed off the world's four transliteration experts).
[225005320170] |...and in closing I'd like to repeat my assertion that Hosni/y Mubarak looks suspiciously like The Face of Bo**:
[225005320180] |*FYI, I have no independent verification of the truth of this story.
[225005320190] |If Maddow's staff got punk'd, their bad.
[225005320200] |**Damn you Captain Jack!!
[225005330010] |turning gaga into water = 200 terabytes
[225005330020] |How much storage would it take to store the first 5 years of a child's linguistic environment?
[225005330030] |Apparently, 200 terabytes.
[225005330040] |From Fast Company: ...cognitive scientist Deb Roy Wednesday shared a remarkable experiment that hearkens back to an earlier era of science using brand-new technology.
[225005330050] |From the day he and his wife brought their son home five years ago, the family's every movement and word was captured and tracked with a series of fisheye lenses in every room in their house.
[225005330060] |The purpose was to understand how we learn language, in context, through the words we hear.
[225005330070] |A combination of new software and human transcription called Blitzscribe allowed them to parse 200 terabytes of data to capture the emergence and refinement of specific words in Roy’s son’s vocabulary.
[225005330080] |The data visualization techniques he uses are pretty cutting edge ... and awesome!
[225005330090] |I love the fact that he is trying to use visualization techniques to help us understand something beyond raw statistics (which is where most graphs and pie charts die miserable deaths).
[225005330100] |Statistics are like molecules.
[225005330110] |Visualize them one by one and it's difficult for the average person to conceptualize the big picture of how they work together to create a grander whole.
[225005330120] |Roy appears to be trying to get beyond the yawn-inducing graphs that plague modern science.
[225005330130] |I mean, he uses freaky-deaky time-worms!
[225005330140] |How cool is that!
[225005330150] |Roy talk's about feed-back loops as well: ..."Caregiver speech dipped to a minimum and slowly ascended back out in complexity.” In other words, when mom and dad and nanny first hear a child speaking a word, they unconsciously stress it by repeating it back to him all by itself or in very short sentences.
[225005330160] |Then as he gets the word, the sentences lengthen again.
[225005330170] |The infant shapes the caregivers’ behavior, the better to learn.
[225005330180] |He gave a TED talk recently, but the video is not yet available.