Internationalised domain names are a key component of a truly multilingual internet. It’s slow progress making the internet, and the Domain Name System, truly multilingual. The introduction of IDNs started around a decade ago with hybridised IDNs – that is the top level domain being in ASCII characters and IDN characters available for the second, or sometimes third, level.
The slow progress has been outlined in EURid’s recently released 2017 IDN World Report which updated on progress of their introduction. As of December 2016, more than 480 TLDs were offering IDNs (at top and second level). 400 of these were new gTLDs that were offering IDNs, including more than 30 IDN new gTLDs.
When hybrid IDNs were introduced they only partially addressed the issue. It was a satisfactory outcome for Latin-based scripts used by most European languages, where the IDN element would commonly reflect accents, or other diacritical marks on Latin characters. But, as the report notes, for speakers of languages not based on Latin scripts (for example, Chinese, Arabic), the hybrid IDN/ASCII domains were unsatisfactory. Right-to-left scripts, such as Arabic and Hebrew created bi-directional domain names when combined with left-to-right TLD extensions, requiring users to have a familiarity with both their own language, and Latin scripts in order to navigate the internet. This requires the internet users to not only be familiar with Latin characters but also requires internet users to change script when typing in web addresses, as well as potentially confusing the strict hierarchy of the DNS.
As of December 2016 there were approximately 8.7 million registered IDNs, making up just 3% of the world’s 331 million registered domain names. However growth is impressive, up 28% in the year from December 2015 to December 2016. The spike in growth was significantly affected by second level registrations under the Chinese ccTLD, .cn, which grew by more than 400% during the year. Discounting the contribution of .cn, the underlying growth rate during 2016 was 4%, less than the previous year’s growth rate of 9%.
According to the report, where IDNs are in use, the language of web content is more diverse than it is with traditional ASCII domains. There is a long way to go before there is the same linguistic diversity online is as there is offline, but it appears IDNs are helping redress the balance, at least as far as the most-spoken languages are concerned.
As a result of EURid’s analysis of the language of content associated with IDNs, the report states:
- IDNs help to enhance linguistic diversity in cyberspace
- The IDN market is more balanced in favour of emerging economies
- IDNs are accurate predictors of the language of web content.
In previous reports EURid have noted that language of web content tends to follow IDN script. IDNs accurately signal what languages will be found. The analysis in 2017, like in previous years, found the relationship between language of web content and IDN (gTLDs plus .eu) script is not random. There is a very high correlation between language of web content and the script of IDN associated with it. In other words the report notes, IDNs are in practice accurate predictors of the language in which their web content appears. Only English, which is commonly spoken around the world, is associated with a large number of scripts (Latin, Arabic, Cyrillic, Han, Katakana, Hiragana, Hangul, Greek, and others), and displays the more random pattern predicted in the “no connection” hypothesis.
EURid’s report found that in 2017 there has been a growth in Chinese language associated with IDNs, which reflects the growth of IDNs under .cn during 2016.
When it comes to registered IDNs just 3 scripts represent 90% of all registered IDNs: Han (associated with Chinese language), Latin, and Cyrillic script. Han, Katakana and Hiragana (associated with Japanese language), and Hangul (associated with Korean language) together represent 8% of IDNs. Major world scripts such as Arabic and Devanagari which support some of the world’s top 10 most spoken languages are barely represented in IDNs.
Since 2014, there has been a relative increase in the proportion of Han script domains from 34% to 48%, Han, Katakana and Hiragana has increased from 2% to 5%. In the same period, there has been a relative decline in the proportion of Latin script IDNs, from 41% to 30%, and Cyrillic script which has declined by 2% to 12% since 2014.
The increased popularity of Han script IDNs is attributable to strong growth of second level IDNs in the Chinese ccTLD, .cn.
EURid also has its own IDN in addition to their .eu – .ею. As part of their application for .ею, EURid committed to each IDN domain name in the EURid stable would be a single script as part of their application. As a result 1,430 existing Cyrillic script second level .eu IDNs were cloned under the new TLD. In addition, 943 new domain names were registered under the new TLD and by the close of 2016, there were a total of 2,373 .ею registrations.