Local Languages Dominate Country Code and Geo TLDs: CENTR Study

English may be the world’s lingua franca, online and offline, but when it comes to content on websites using European country code and geographic top-level domains, local languages as would be expected dominate. But the proportion of content in English and the local language varies markedly across the TLDs. And when it comes to endangered languages, as defined by UNESCO, web content is even rarer online than offline.

In a study published by CENTR titled Diversity through localization: How ccTLDs enhance linguistic diversity online, it was found the principal languages spoken in the country or territory for each of the 10 European TLDs in the study comprised at least 64% of the content in the zone, and the average rate per TLD was 76%. The TLDs with the least English content were those from the Russian Federation (.ru /.рф), while Slovakia’s .sk and Denmark’s .dk had the most. For internationalised domain names (IDNs), local language sites comprise 84% on average with the rates of 90% and above under .ch, .nu, .se and .рф. The lowest rate of local language in IDNs was under .cat at 69%.

English though is the second most popular language for web content in each of the 10 TLDs studied except for Catalonia’s .cat where Catalan is the number one language ahead of Spanish and English. Interestingly Romanian is the third most popular language on websites using .se (Sweden) internationalised domain names, albeit with less than 1%.

One TLD that in part reflects the universal rule is Switzerland’s .ch. Switzerland has 4 official languages – German, French, Italian and Romansch. German being the most spoken language in Switzerland is the most used language on .ch websites, but English is second and French third with other languages accounting for around 1% of web content.

One of the main reasons for English to be overall the second most popular is the number of domain names used for parked websites have a higher proportion of English content. Excluding parked and single page websites, which accounted for 37% of domains in the study, the amount of English content drops markedly in each of the TLDs.

The study also tried to examine endangered languages as defined by UNESCO. 2019 is UNESCO’s International Year of Indigenous languages, which identifies 2,680 languages as being in danger of extinction. Europe has several languages on UNESCO’s list of endangered languages, such as Corsican, Galician, Irish, Welsh and Basque. The study notes the web environment favours English and major languages, with endangered languages being even more rare in web content than they are in the offline world.

The study also attempted to check TLDs in the sample to see if there were indigenous or endangered languages represented in the language of web content. However this proved difficult as many of these endangered languages aren’t supported by online translation.

The TLDs analysed for the study were .cat (Catalonia), .ch (Switzerland), .dk (Denmark), .nl (Netherlands with data provided by the .nl registry SIDN), .nu (Nuie), .pt (Portugal), .ru /.рф (Russian Federation), .se (Sweden) and .sk (Slovakia). There were 16.4 million domain names analysed in the study.

To download the study by Emily Taylor in full, go to: centr.org/library/library/educational-promotional-material/20th-anniversary-paper-diversity-through-localization.html