Dotsub.com November Statistics

Here is the December edition of our regular section giving you, the Dotsub community, an idea of where in the world our users were using Dotsub and what languages they were working in during the month of November, 2014 and as always a fascinating piece of geography trivia at the end, this month with some Natural History thrown in.

English, Spanish and Portuguese are well established at the top of the rankings these days and in the last few months French and Czech have been consistently 4th and 5th, with Dutch, Italian and German vying for the next 3 places. Russian is the first language with a different alphabet at 9th and Japanese, Chinese, Korean, Greek, Hebrew and Arabic are also represented in the top 20. So there are 13 of the top 20 using the Latin alphabet with 7 using other characters.

As always I have removed the top few (4 in this case) to make the graph a little more discernible.

 

In the countries section, Spain took second place pushing Canada to third, Brazil staying at 4th with the UK at 5th. Slovenia roared back into the top 20 at  #10 and New Zealand appeared at #14. Argentina dropped from #13 to #18 and Israel crept in at #20.

And removing the US allows everything else to be seen a little more easily.

 

Geography Trivia. The intriguing part of the data to me, as regular readers know, is the countries and/or territories that are at the other end of the list with only one or two visits. This month we primarily had unique visits from islands or groups of islands. This month, 10 of the 12 unique visitors came from island nations, the other 2 from African nations, one of those being the newest nation on earth, South Sudan.

The island group we will mention this week is Kiribati. Officially the Independent and Sovereign Republic of Kiribati, is an island nation in the central tropical Pacific Ocean. The permanent population is just over 100,000 (2011) on 800 square kilometers (310 sq mi). The nation is composed of 32 atolls and one raised coral island, Banaba, dispersed over 3.5 million square kilometers, (1,351,000 square miles) straddling the equator, and bordering the International Date Line at its easternmost point amidst the Line Islands.

The name Kiribati is the local pronunciation of Gilberts, which derives from the main island chain, named the Gilbert Islands after the British explorer Thomas Gilbert, who sailed through the islands in 1788. The capital, South Tarawa, consists of a number of islets connected through a series of causeways, located in the Tarawa archipelago. Kiribati became independent from the United Kingdom in 1979. It is a member of the Commonwealth of Nations, the IMF and the World Bank, and became a full member of the United Nations in 1999.

It reached the zenith of its popularity on December 31st, 1999 where because of its proximity to the International Date Line, it was the first nation to see the new millennium. It was probably also the first nation to realize that the Y2K fears were greatly exaggerated.

Its flag …

See you next month when we will do a wrap up of the stats for 2014 and see how things changed throughout the year.

The Languages Less Browsed

For those of you who regularly read the Newsletter’s statistics section, we write a short piece about countries that have one or two visits in a month to our Website. While I was compiling this month’s edition, it struck me that we didn’t look at the languages that were used infrequently. This was more of an academic interest rather than being particularly informative as, the internet tells me, many computers’ browsers are set to US English on installation and are never changed. However, never one to allow facts to get in the way of conjecture, I thought I would take a look at these languages that show up on less frequently used list.

All of these languages were attributed, by Google Analytics, of having less than five visits to the Dotsub website in November 2014. They are, in no particular order:

Afrikaans Irish Khmer Tamil Urdu Bosnian
Gujurati Marathi Amharic Assamese Gaelic Hausa
Latin Armenian Maori Burmese Icelandic Malayalam
Mongolian Albanian Telugu Yiddish Luxembourgish

 

 

 

As you can see they span the spectrum, from Latin and Gaelic which have very few speakers and one can only imagine that there are fewer people who set their Browser to those languages, to the Indian languages such as Gujurati, Marathi, Urdu, etc. which are spoken by large numbers of people, each of those examples are spoken by more than 70 million each, but probably have another language for their internet usage.

We, at Dotsub, are very proud of our linguistic and cultural diversity and hope to further the ideals of information accessibility, irrespective of what language(s) you speak.

The Global Language Network

Translation Big Data Mapped in New Study

What are the most influential languages in the world? Researchers at MIT, led by César Hidalgo, set out to answer that question.

How would you even begin?  They began with books, Wikipedia, and Twitter and then mapped the number of translations between languages.  [Ed Note: Why not videos and movies??]  The researchers were rigorous in weeding out commercial tweets, bot generated content, sales/marketing messages, etc.  Translations were used as a metric for the greatest ability to reach other people and thereby influence them.

The hub languages?  English turned out to be the largest hub for information translated from one language into another in all three data sets. Other languages including Russian, German, and Spanish also serve as hubs to other languages.  It should be remembered that these are not based on number of speakers or even who is doing the writing – it is based on the number of translations.

“Of the many languages that have ever been spoken, only a few of them have been able to achieve global prominence, they have been important enough to become a global language,” Hidalgo told Serious Science.

The results and details are beautifully laid out on the interactive website:  The Global Language Network.   Be forewarned that you might get in there and not get out for quite a while.

Screen shots below.

The Wikipedia Data Set

twitter

The Twitter Data Set

books

The Books Data Set

Videum Ends the Year with a Relaunch!

Videum, a Dotsub partner, recently relaunched their website focused on delivering health & medical videos to a global audience.  Videum (www.videum.com) aggregates videos from some of the top health content producers and distributes them to leading health and video sites around the world.  With over 200+ publishers in 86 countries, Videum works with its strategic partner, Publicis Healthcare, to reach a target audience of healthcare professionals and consumers.  A primary feature of the Videum platform is its capability to enable multi-language subtitles, making content accessible across language barriers.

“Videum.com is powered by Dotsub,” explains Paul Dinsmore, President & COO of Videum, “Dotsub lets us make good on the promise of outstanding health content made effective because it can be delivered in native languages.”

New functionality on the website include an enhanced user experience, viewing through topical and custom Watchlists, and ‘lean-back’ viewing of health and medical videos by categories via Videum TV.

For more information, please contact Paul Dinsmore @ paul.dinsmore@videum.com.

Make Way for Millennials!

kaykoplovitz

Kay Koplovitz founded the USA Network and is the first woman to serve as network president in television history.  Recognized as an authority in broadcast, communications and technology,  Koplovitz explores how the millennials are building the entrepreneurial community in a two-part article for Forbes.  The youth oriented organizations include the Kairos  Society and the Thieil Foundation.  The common themes are cross- discipline collaboration, focus on solving big problems on a large and holistic scale, and strong encouraging mentorship.  That’s where Dotsub’s Founder, Michael Smolens comes in (Part 2).  Koplovitz conclusion is that in such smart and hard-working hands, the future looks bright indeed.

Part 1.

Part 2.

 

What Do Countries Name Themselves?

This is a world map of Endonyms written in either the official character set or the character set most prevalent in the location.

endonym2

This map shows the names countries give themselves.  An endonym is a name used by a group or category of people to refer to themselves or their language, or their country,  as opposed to a name given to them by other groups. For example, Deutschland is the endonym of a country known in English as Germany and Finland is Suomen Tasavalta.  These names change with wars, upheavals, coups, or sometimes because the people simply decide to choose for themselves.

The map is quite controversial, as different groups exist within others, or are under debate for political or religious reasons.  Most of the data here comes from the United Nations Group of Experts on Geographic Names and the U.N.’s database of county names.

To view an interactive map which lets you focus more closely, visit here.

To read another article about this click here.

Autoplaying Brightcove Captions and Subtitles

We recently had a customer ask us how to have captions autoplay in their Brightcove player without any changes to their pages.  It turns out this is not currently possible. Since this is such an important and simple use-case, we decided to not only solve it for our client, but also to release the code on github for anyone to use.

If you are familiar with Brightcove you’ll know that there are two parts to any plugin. Flash and Javascript versions of a plugin need to be created.

First lets look at the Flash component. Brightcove provides the ‘CustomModule’ interface as a starting point for your plugin. All we have to do is override initialize() and set captions enabled to ‘true’.

package {

import com.brightcove.api.APIModules;
import com.brightcove.api.CustomModule;
import com.brightcove.api.modules.CaptionsModule;

/**
 * A Brightcove plugin that auto loads captions.
 */
public class CaptionConfigurationModule extends CustomModule {

    override protected function initialize():void {
        var captionModule:CaptionsModule = player.getModule(APIModules.CAPTIONS) as CaptionsModule;
        captionModule.setCaptionsEnabled(true);
    }
}
}

The Javascript plugin is just as simple. Once the player is ready we set captions enabled to ‘true’.

(function() {
    function onPlayerReady() {
        var captionsModule = player.getModule(brightcove.api.modules.APIModules.CAPTIONS);
        captionsModule.setCaptionsEnabled(true);
    }

    var experience = player.getModule(brightcove.api.modules.APIModules.EXPERIENCE);
    if (experience.getReady()) {
        onPlayerReady();
    } else {
        experience.addEventListener(brightcove.player.events.ExperienceEvent.TEMPLATE_READY, onPlayerReady);
    }
}());

There you have it; two simple plugins to enable caption autoplay on your Brightcove player. You can also see these plugins in our GitHub account: https://github.com/dotsub/api-samples/tree/master/brightcove-autoplay-captions

Dotsub.com October Statistics

Here is the November edition of our regular section giving you, the Dotsub community, an idea of where in the world our users were using Dotsub and what languages they were working in during the month of October, 2014 and as always a fascinating piece of geography trivia at the end, this month with some Natural History thrown in.

20Languages

English, Spanish and Portuguese are well established at the top of the rankings these days and the rest of the world is coming in a poor fourth. The major European languages French, Italian and German are always there or thereabouts and most of the other entrants in the top 20 are European languages with Japanese, Chinese and Korean the exceptions. This month languages 18-20 Hebrew, Catalan and Romanian were new replacing Slovakian, Hungarian and Arabic which were in those positions last month.

As always I have removed the top few (4 in this case) to make the graph a little more discernible.

15Languages

 

In the countries section, Canada reclaimed its second place, and Spain pushed up to 3rd pushing Brazil down to 4th. Greece climbed a few spots while Japan and Germany fell down the charts a little. Welcome to Peru who made it in at #18 at the expense of Slovenia.

And removing the US allows everything else to be seen a little more easily.

 

Geography Trivia. The intriguing part of the data to me, as regular readers know, is the countries and/or territories that are at the other end of the list with only one or two visits. This month we primarily had unique visits from islands or groups of islands. Of the singletons, 6 of the 8 fell into that category.

The island group we will mention this week is Comoros. This sovereign archipelago island nation in the Indian Ocean is located at the northern end of the Mozambique channel off the eastern coast of Africa. Comoros has endured more than 20 coups or attempted coups since gaining independence from France in 1975. In 1997, the islands of Anjouan and Moheli declared independence from Comoros. In 1999, military chief Col. AZALI seized power of the entire government in a bloodless coup, and helped negotiate the 2000 Fomboni Accords power-sharing agreement in which the federal presidency rotates among the three islands, and each island maintains its local government. Its main languages are Arabic (the official language), French and Shikomoro (a blend of Swahili and Arabic). Its population of about 770,000 is 98% Sunni Muslim.  One of its major industries is perfume distillation as it is a major producer of ylang-ylang (a perfume essence) – making up about 30% of Comoros’ exports.

One of its claims to fame is that the Coelocanth –  a so-called living fossil – can be found off its shores. This fish is thought to represent a very early step in the evolution of fish to terrestrial four legged animals like amphibians. The fish was thought to have gone extinct with the dinosaurs 65 million years ago, but one was found in 1938 and started a heated debate about how this creature fits into the evolution of land animals.

Its flag …

Comoros flag

See you next month.

Captioning Your Videojs Videos

For a little change of pace, I thought it might be fun to do a write up on adding captions to an open source video player. Using Dotsub’s API, adding captions to most players is dead simple.

I decided to use videojs’s HTML5 player for this demo. Video.js is a great player and comes with built in subtitle support. This support is provided by HTML5’s track element. Dotsub’s API allows you to directly access various subtitle formats. Video.js uses WebVTT files which we can fetch from the Dotsub API using this URL pattern:

http://dotsub.com/media/<video_id>/c/<language_code>/vtt

This will fetch the WebVTT file for your video directly from our servers. We, by default, enable CORS on all WebVTT file requests, so you do not have to worry about same-origin policy issues.

Adding captions to a video player then only requires a track tag for every language you want to add. A simple player like:

<video id="dotsub_example" class="video-js vjs-default-skin" width="640" height="264" poster="http://video-js.zencoder.com/oceans-clip.png" controls preload="auto" data-setup='[]'>
<source src="http://video-js.zencoder.com/oceans-clip.mp4" type='video/mp4' />
<source src="http://video-js.zencoder.com/oceans-clip.webm" type='video/webm; codecs="vp8, vorbis"' />
<source src="http://video-js.zencoder.com/oceans-clip.ogg" type='video/ogg; codecs="theora, vorbis"' />
</video>

Becomes caption enabled by simply adding:

<video id="dotsub_example" class="video-js vjs-default-skin" width="640" height="264" poster="http://video-js.zencoder.com/oceans-clip.png" controls preload="auto" data-setup='[]'>
<source src="http://video-js.zencoder.com/oceans-clip.mp4" type='video/mp4' />
<source src="http://video-js.zencoder.com/oceans-clip.webm" type='video/webm; codecs="vp8, vorbis"' />
<source src="http://video-js.zencoder.com/oceans-clip.ogg" type='video/ogg; codecs="theora, vorbis"' />
<track kind='captions' src='http://dotsub.com/media/5d5f008c-b5d5-466f-bb83-2b3cfa997992/c/eng/vtt' srclang='en' label='English' default />
<track kind='captions' src='http://dotsub.com/media/5d5f008c-b5d5-466f-bb83-2b3cfa997992/c/spa/vtt' srclang='es' label='Spanish' />
<track kind='captions' src='http://dotsub.com/media/5d5f008c-b5d5-466f-bb83-2b3cfa997992/c/fre_ca/vtt' srclang='fr' label='French' />
</video>

Now you have a beautiful video.js player that supports captions:

 

Around the Globe: Second Languages

This fascinating info-graphic shows the second most used languages in countries throughout the world.  The rise and fall of these secondary languages are of wide interest to companies and organizations that serve – or sell to – these populations.  Dotsub translated combinations of over 50 different languages in recent months, often to meet the demands of non-primary language speakers.

There are, of course, many different and intertwined reasons for the rise and fall of particular language usage.  There is history: war, occupation and migration.  Examples shown here include Tatar in Russia and Nahuatl (informally known as Aztec) in Mexico.  Then there is proximity that enables trade such as the use of Swedish in Finland and the use of Danish in Iceland.

Immigration is a driving factor as well.  In the U.S., the Spanish speaking population is the fastest growing population which has fueled powerhouses like Univision and Telemundo.  Meanwhile, in England, a large wave of Polish speakers have migrated to the UK since Poland joined the EU in 2004.  Still, that Polish is England’s secondary language is surprising but it shouldn’t be.

second-languages-map-1350px