Monday, October 25, 2010

The Chinese Language is the Deep Web

Reading Nicholas Kristof's post "Liu Xiaobo and Chinese Democracy", about Mr. Liu's recent Nobel Peace Prize, I saw a piece of content stood out, not only for its content, but also for the offhand way in which it was presented:

Today, Liu presumably doesn’t know that he has won the prize, and the Chinese government is trying to censor the news. But China is changing and censorship no longer works so effectively. It can ban mobile phone users from texting the characters for his name, but young Chinese are smart enough to use substitute characters.

Assuming this actually is the case, it means that hidden within the Chinese languages (and it's clear that they are separate languages, not dialects of one overarching, crazily heterogeneous Chinese language) is a hidden world of possible ideogram-meaning combinations, connected by sound. Here's how that would work:

Every Chinese character represents a word. (Linguists: I know there are exceptions. Thanks.) For example, the word for "work" is 工, pronounced "gong" with a high, steady tone. The word for "attack" is 攻, also pronounced "gong" with a high, steady tone. The word for "supply" as in "power supply" is 供, also "gong" with a high, steady tone. So on with the words for "official business", "palace", and "bow" as in "bow and arrow".

Right now, the censors at Great Firewall HQ, actually called the Propaganda Department—I kid you not—are poring over blogposts and texts and other electronic content, finding subversive messages and stamping them out like bugs. Now, I imagine that a bit of this is done automatically, by keyword, and a great deal more is done by
a large government department, full of the average office assortment of flunkies, middle managers, angry bosses, and the ennui that comes along with this setup.

Now imagine an undercurrent of blogs that don't seem to make sense at first glance. They bring up no poisonous keyword hits. They carry no familiar subversive slogans. But for those who would read them aloud, they transfer hopeful messages of democracy, commentary on the Chinese political situation, and perhaps even plans for meetups and other events.

This sound-meaning correspondence is much like what serious internet people call the "deep web". The deep web consists of all the data on the Internet that's not directly accessible to the average end user of a search engine. Deep web data is significantly more voluminous than surface web data. From the wikipedia page:
Deep Web search reports cannot display URLs like traditional search reports. End users expect their search tools to not only find what they are looking for quickly, but to be intuitive and user-friendly. In order to be meaningful, the search reports have to offer some depth to the nature of content that underlie the sources or else the end-user will be lost in the sea of URLs that do not indicate what content lies underneath them.
By moving context outside of the scope of these messages of Chinese democracy, writers would easily circumvent any mechanical attempts at censorship. Certainly, it's not perfect, but even in a worst-case scenario, this practice could burden the Propaganda Department with the need for more human censors.

No comments:

Post a Comment