Ben Summers’ blog

What’s the equivalent of Twitter’s 140 character limit for non-Latin character sets?

Everyone on Twitter has the same 140 character limit. But if you don’t use English, can you get more into those 140 characters? This weekend, I did some Real Science to find out.

If you speak Japanese (日本語), each character you type is equivalent to several English characters, or even an entire word. For other non-Latin languages, like Russian or Thai, it’s less clear if there’s an inherent advantage.

To get some numbers, I hooked up Twitter’s API to Google’s AJAX Translate API to fetch some tweets, translate them, and measure the equivalent length in English. And so the Tweet Measurer was born, allowing anyone to perform dubious calculations of Tweet Length from the comfort of their own web browser.

Equivalent limits

Running the Measurer against a few users gives these approximate results:

Language Equivalent length in English
Russian 145
Farsi/Persian 170
Arabic 175
Thai 185
Japanese 260

The conclusion is clear. If you find that you have too much to say for a mere 140 characters, you should learn Japanese. The moderate effort of learning an entirely new language will give you the equivalent of 260 English characters, an increase of over 85%.

UPDATE! A couple of commentators have pointed out that Chinese is even better. If only I hadn’t been so lazy and restricted myself to the languages listed on the Twitter search page…

Try it yourself

It’s oddly compelling watching tweets being fetched and translated. Try it yourself – click the button to get started. It’ll open a new window.

Start by finding people twittering in a suitable language. Twitter’s advanced search is perfect: just choose a language in the “written in” drop down, and enter some English proper nouns which might be mentioned. Names of computer products seem to get good results: try ‘java’, ‘mac’, ‘apple’, and so on.

Enter the username into the Measurer and click ‘Go’. The user’s tweets will be loaded, translated, and an equivalent length calculated.

All the results are anonymously reported back to my server. If I get lots of data, I’ll analyze it in a few days and post a summary.

Implementation notes

My initial approach was to write a script and run it on my own computer, but I couldn’t find a suitable text translation API. Google’s API is for AJAX in web pages only.

The obvious answer was to write it as a web application. It’s not a particularly complicated bit of Javascript:

  • Using jQuery, directly fetch tweets from Twitter in JSON (as JSONP).
  • Loop through the tweets, using the Google API to translate them. Use language auto-detection to simply the logic and user interface.
  • Count the characters in the original and translated text, keeping a running total.
  • Display the results!

Since Google helpfully hosts jQuery as well as a simple Javascript interface to their API, everything fits in a single 7.5K HTML file. There’s no server side code since the web browser talks directly to both services.

The wonders of machine translation

There’s something quite fantastic about machine translated tweets. I leave you with these words of wisdom from a Thai twitterer:

Pants in good health, who had entered empty.

No doubt, words to live by.

 

COMMENTS

blog comments powered by Disqus

 

Hello, I’m Ben.

I’m the Technical Director of ONEIS, a platform for information management.

 

About this blog

 

Twitter: @bensummers

 

Subscribe