Speaking of the joy of archives, here’s a real nugget from the team behind the Google Books digitalising effort: the first report on a wealth of themes that was previously simply impossible because the information was not available electronically. Using simple quantitative techniques they have looked at the occurrence of certain phenomena in books from the 1700s to the present. The findings are crude, and not really subject to any stringent analysis, but just the thought of what can be gleaned from the database in the future is exciting. The dataset is currently 500 billion words large.
So what have they found so far? Well, for example, verbs are becoming more regular. Rationality prevails! Is there hope for an esperanto-speaking world? Not really.
The most interesting finding is for me the one about how we discuss the past and new technology. Check this out:
“’1951’ was rarely discussed until the years immediately preceding 1951. Its frequency soared in 1951, remained high for three years, and then underwent a rapid decay, dropping by half over the next fifteen years.” But the shape of these graphs is changing. The peak gets higher with every year and we are forgetting our past with greater speed. The half-life of ‘1880’ was 32 years, but that of ‘1973’ was a mere 10 years.
The future, however, is becoming ever more easily ingrained. The team found that new technology permeates through our culture with growing speed. By scanning the corpus for 154 inventions created between 1800-1960, from microwave ovens to electroencephalographs, they found that more recent ones took far less time to become widely discussed.
Amazing stuff. Once they get beyond simple word frequencies, this should be really exciting. The future is a wonderful place to live. But will we remember?
Filed under: Science | Tagged: Word counts | Leave a comment »