Latin and Cyrillic scripts on the same site: SEO guide for the correct hreflang setup
There is a fairly extensive documentation on the Internet that explains how to set up hreflang for a multilingual site. However, if you try to find the answer which setup is needed for a site that has content written in Cyrillic and Latin script, then things become quite complicated.
This guide explains what needs to be done so that the search bots (read: Google) understands that you are not a spammer that creates duplicated content, but a webmaster who addresses the local audience in both scripts the country is using.
First of all, let me explain what is the hreflang. Hreflang is an attribute that specifies to Google which language, script, and location you are using on a particular page. The search engine then provides users with search results exactly in that language, written in that particular script land, of course, for that location.
A solution for you who do not want to read the details
On a page with Latin script, use the following hreflang:
<link rel="alternate" href="http://www.site.com/latin-page" hreflang="sr-Latn-rs" />
<link rel="alternate" href="http://www.site.com/ћирилична" hreflang="sr-Cyrl-rs" />
<link rel="canonical" href="http://www.site.com/latin-page">
On a page with Cyrillic script, use the following hreflang:
<link rel="alternate" href="http://www.site.com/ћирилична" hreflang="sr-Cyrl-rs"/>
<link rel="alternate" href="http://www.site.com/latin-page" hreflang="sr-Latn-rs"/>
<link rel="canonical" href="http://www.site.com/ћирилична">
Pay attention to the self-referencing rel="canonical" tag. It is very important that canonical tag references the scrip URL of the page that the content is written in (for example, a Cyrillic URL canonical goes on the page that has Cyrillic content).
Below you can read more about the hreflang for the Serbian language.
What's the problem with the Latin and Cyrillic scripts?
I write the Latin text in Serbian, I publish it and then I write the same text tomorrow in English. I apply the following hreflang and everything will be OK.
The page in Serbian at www.nenad.blog/clanak has hreflang.
<link rel="alternate" href="https://www.nenad.blog/clanak" hreflang="sr-rs"/>
<link rel="alternate" href="https://www.nenad.blog/article" hreflang="en-rs"/>
<link rel="canonical" href="https://www.nenad.blog/clanak">
The page in Serbian at www.nenad.blog/article has hreflang
<link rel="alternate" href="https://www.nenad.blog/article" hreflang="en-rs"/>
<link rel="alternate" href="https://www.nenad.blog/clanak" hreflang="sr-rs"/>
<link rel="canonical" href="https://www.nenad.blog/article">
In short, hreflang="sr-rs" suggests that the text is in Serbian and that the location of Serbia, hreflang="en-rs" defines the English text, and that I want to show this piece of content to users searching for English content in Serbia.
What happens if I publish the same text on the third day, but in Cyrillic? It is not logical that both pages have rel="alternate" hreflang="sr-rs".
What does Google say?
That is exactly what I had asked John Mueller (JohnMu), a Webmaster Trends Analyst from Google who runs Sunday Webmaster Hangouts Q & A sessions.
Hi John, I have a site in the Serbian language that has content in Latin and Cyrillic script (both official letters in Serbia). ISO 639-1 only defines "sr" (Cyrillic). Is there a way to apply hreflang, and if not, will this situation cause problems with double content?
I wanted to find out how to correctly apply hreflang, and how Google looks at exactly identical content in the same language written in a different script.
ISO 639-1 is a standardized nomenclature that classifies languages.
The question is at 21:30.
The answer is:
• Google chooses on its own which page to display
• Google will not remove content in the another script from the index (double content penalties)
• There are ways to define scripts in hreflang tags
However, I did not get a concrete answer.
Finding an example on production websites
I have been unsuccessfully looking at the "source code" of major domestic sites. Politika, RTS, site of the National Assembly, Government of the Republic of Serbia, site of the National Bank of Serbia. These sites do not have much more important on-page elements, let’s not even think about the hreflang :)
Finally, I looked at the Product Forum which gave me a topic created Miloš Leković, who is, among other things, the organizer of the first SEO conference in the region, IT Open. Miloš found a solution.
Solution
It is necessary to use the ISO-15924 international standard for marking the scripts. Therefore, for Cyrillic we use sr-Cyrl, and for Latin sr-Latn.
As I stated at the beginning of the text, the correct hreflang tag markup for the Serbian language on both scripts is the following.
On a page with Latin script:
<link rel="alternate" href="http://www.site.com/latin-page" hreflang="sr-Latn-rs" />
<link rel="alternate" href="http://www.site.com/ћирилична" hreflang="sr-Cyrl-rs" />
<link rel="canonical" href="http://www.site.com/latin-page">
On a page with Cyrillic script:
<link rel="alternate" href="http://www.site.com/ћирилична" hreflang="sr-Cyrl-rs"/>
<link rel="alternate" href="http://www.site.com/latin-page" hreflang="sr-Latn-rs"/>
<link rel="canonical" href="http://www.site.com/ћирилична">
Confirmation of validity came by testing. After a few days, the algorithm validated both hreflang tags as valid, as confirmed by Gary Illyes from Google. In fact, Gary got this question on DIDS, but he did not know the exact answer, so he was pleased to read the results of my experiment.
[1/2] @methode You remember the question about hreflang for same language but different scripts (alphabets) like Serbian? I tested...
— Nenad Pantelic (@NenadPantelic) March 13, 2017
[2/2] @methode I tested with ISO-15924 by putting hreflang="sr-Latn-rs" & "sr-Cyrl-rs". Indexed, works. Luck or correct implementation?
— Nenad Pantelic (@NenadPantelic) March 13, 2017
Console was happy also :) pic.twitter.com/IhtJCc1V2t
— Nenad Pantelic (@NenadPantelic) March 13, 2017
I love it when you guys test things! If search console accepted it then it should work. It uses the production validators
— Gary "鯨理" Illyes (@methode) March 13, 2017
Pages tested:
www.nenadpantelic.com/o.html
www.nenadpantelic.com/o-аутору.html
Feel free to examine DOM or source code and check out the applied hreflang rules.
What to do with the old tag?
Google is for long ignoring the tag, which they officially acknowledged in 2016. However, it is recommended that you do not remove this tag because it helps with the accessibility. "Screen reader" software uses language information from tag for better pronunciation and content accenting.
Now there are no obstacles for Cyrillic
The heritage of many languages, among other things, is the fact that they have two scripts. Now there are no obstacles to favor one against the other. Old myth that Goolge does not support multi-script hreflangs tags is now debunked.