Hacker News new | past | comments | ask | show | jobs | submit login
The New Wave of Indian Type (design.google)
354 points by carbolite103 on March 7, 2018 | hide | past | favorite | 182 comments



> The language and script change every five hundred miles

Huh, somebody is making a vast over-simplification, presumably in a well-intentioned attempt to package this in a way the Western mind can comprehend. Which frankly is futile. India is much more diverse than that.

In my youth I spent some time bumming around Saurashtra, a region in Gujarat state that's about 150 miles square. Nearly 50 languages are indigenous to that one region alone. Not dialects. Languages. It's wonderful, but nuts.

That was almost 20 years ago; no doubt it's more homogenous now, which is sad to think about. A century ago nearly 80 languages were spoken there, so linguistic diversity has been declining fast. Any effort to preserve it deserves applause.


> Nearly 50 languages are indigenous to that one region alone. Not dialects. Languages.

I am a native Indian, born, raised and living there (I mean in India). I find it hard to process this claim. I am pretty sure there are about 10 languages in any fairly diverse region of the country, but beyond that what we find are mostly dialects of various languages.

I am not sure if you are a 'western mind' (as you self identify in a later post) of Indian descent and whether you made that claim based on reading about Indian languages or by interacting with locals and asking them if they spoke languages or dialects. You see, almost all Indian languages (AFAIK) lack words to distinguish the concept of language and dialect. For academics sake, linguists do use some words but they haven't trickled down to general public. The tendency is to use the word 'bhasha' (or its variations such as bhashe, basai etc), which just means 'language', for both language and dialect, and even for such ideas as register and slang. So, any claim made by the general public about they speaking a different 'bhashe' must be evaluated properly and not taken at face value.


The distinction between dialects and languages is always rather dodgy and politically loaded (as anyone who uses the words "Chinese Language" soon finds out). One of the better working definitions I've seen is that "a language is a dialect with an army and a navy" -- which would explain why Hindi and Urdu get to be separate languages, but Bhojpuri is sometimes counted as a dialect. Anyhow, I personally am happy to respect both political and practical labelling of things as "language", so if its community of speakers think of something as a separate language then I'll respect that. A dialect may have different inflections and word-usages, but remains mutually comprehensible and is thought of as the same language by its community of speakers. By these definitions, there are roughly 1,600 languages in India[1], and perhaps twice as many dialects.

Anyhow, for Saurashtra, I really did mean "language" and not "dialect". Honestly it's not a typical example of India, but is fun to use for dramatic effect. You know those 500 princely states that existed before unification? 200 of them were in Saurashtra (which is tiny). Politically, it was basically the India of India.

The linguistics seem to reflect that. At one point I recall finding myself in a tiny village somewhere near Dwarka where the residents spoke something completely baffling to me, with not a shred of English or Hindi or Gujarati in the mix. I was completely out of luck communicating with the locals, and had to rely on passing bus drivers for orientation. Later I read a book which said that there were a handful of tiny little Dravidian isolate languages scattered around Saurashtra; I must have stumbled across one of those. The claim of "50 languages" comes from that book, whatever it was. Feels about right, based on personal experience, but I can't really defend the claim beyond that.

(Also I should re-iterate that this was almost 20 years ago, and I very much doubt that there are any places so isolated today.)

1: Per https://en.wikipedia.org/wiki/Languages_of_India: 122 major languages and 1599 minor languages counted in the 2001 census.


> Later I read a book which said that there were a handful of tiny little Dravidian isolate languages scattered around Saurashtra

Do you remember the title or author of the book by any chance?

This is fascinating. Because there is a pocket of Brahui (a Dravidian language) speakers in Pakistan.


>>I find it hard to process this claim.

I think me means dialects. Which many times can sound very different. During my first semester engineering we had a electronics lecturer who came from North Karnataka. They speak a fairly different dialect of Kannada. When I heard her talking for the first time, I was quite startled to see that some thing like that would also mean Kannada. In Bangalore we have our typical street slang Kannada, heavily overloaded with English words.

Same is true of Urdu as well. Even the Dakhani Accent has so many varieties, some times it could sound strange hearing some one speak Urdu in Bangalore, then Tumkur and then Mysore(All districts close to each other).


I am pleasantly surprised to see Kannada and Karnataka mentioned. I am a Kannadiga from Bengaluru. As for street slang of the city, don't ignore the copious amounts of Urdu loan words!

As you might already know, the case with Kannada is more complex. Firstly, it has high amount of diglossia. Even with all the unnecessary, flowery Sanskrit loan words removed, the spoken form is greatly different from the written form. Next, the spoken language has many many regional variants, as with most languages in the world. Further, the spoken language varies according to social groups aka caste within the regions. One could even attempt to find variations based on economic status, aka rich to poor scale, but as an amateur linguist I don't find anything particularly noteworthy to hold such a classification, and, if anything, the variation is mostly confined to the tone of speech (not same as being tonal) and a bit of inflectional oddities rather than vocabulary. In modern times, the rich tend to use more English words which muddles the matter further. Beyond all this, there are about half a dozen languages that are identified as child languages of Kannada that are mostly associated with tribal groups living in forests and mountains. I guess this kind of diversity is expected of a language which is in continuous use for nearly two millennia.

For those intrigued and interested by this, I can't give any references on the internet beyond the wikipedia page because Kannada and Karnataka are severely underrated and underrepresented in Indian historical scene. Any work of worth written in English belong to the colonial times, with 1930s being the latest. The information in this post comes from knowledge acquired by reading books published, in Kannada, by language departments of universities of the state and the literary organization, and others are from personal experiences.


While I was in US one Arab told me something similar about Arabic.

Apparently if your language isn't used for economic, science or to say activities that advance the state of world affairs, your language stagnates. For example, you would run out of words to describe something like a Lorry/Truck. So you start naming it on the basis of number of wheels. Like for example you would call Auto Rickshaw as ತ್ರಿ ಚಕ್ರ ವಾಹನ. The literal translation comes down to three wheeled vehicle. But it sounds strange to say it that way. So you just say 'Auto', a loaned word from english.

So you end up borrowing a lot of words from other languages if the worldly state of affairs aren't getting invented in your language.

And yeah, the other part is spot on. Kannadigas aren't very assertive of their identity. As a language Kannada is >2000 years old. There are countries on earth today which didn't have a written language back then. And then of course with such a long history you also get a lot of literature.


> Not dialects. Languages.

It doesn't get more specific than that. He/she means languages


I would believe the top comment.

My family has their own language that is spoken by 2-3 villages max. 80 languages in a region is perfectly believable .


For example, when the article says:

> five of the most widely used scripts in India—Devanagari, Gujarati, Gurumukhi, Tamil, and Latin. Those five scripts support seven or eight of India’s most widely used languages, together spoken by hundreds of millions of people across the Indian subcontinent.

this list of scripts actually does not cover (for example) Bengali and Telugu, which together have more speakers than the population of the United States.


which means Telugu and Bengali have more speakers but their scripts are not as widely used in digital world.


In India, language and food, both change every 200km. When you tell this to people who haven't been to India, they usually think you are exaggerating but it's surreal how diverse India actually is.


The best analogy for westerners IMHO is to compare India to the EU with a stronger central government.


That is my favorite analogy too. Easily the closest Western approximation.


Here in Northern Belgium, some people might be able to pinpoint your origin down to a 10-20km region. :)

Over a 200km distance, people definitely would have trouble understanding each other without using a common standardised language ("Standard Dutch", which we learn in school), even when "officially" they are speaking the same "language". On TV, people who don't use the standardised language are subtitled.

So, I haven't been to India, but what you describe seems quite natural.


I totally believe that as it’s basically the same situation in China


Why fetishize diversity for its own sake, especially when it's a barrier to communication?

I think we should all speak one language (preferably, the one we're speaking now). The information suddenly marketable to or directly consumable by anyone at any given place will significantly increase.


Diversity is not necessarily a barrier to communication. (In fact, the idea that diversity is a problem and a barrier to communication is itself a bigger problem IMO.) In India most people are multilingual, and there's usually a link-language. For instance, in the region mentioned in the comment you're replying to, probably a majority would speak Gujarati too as a second language, and a significant number of them Hindi, and many English as well. There's going to be a fair bit of interchange and translation going on too. (For example, even in the Anglosphere one is not completely blind to French or Italian or German language or culture, say, and is aware of at least a few of their peaks: enough to recognize they have something of value too.)

The existence of diversity is generally a sign of freedom and self-sufficiency: that the people in question were/are free to retain and develop/enrich their own culture, and yet function successfully (enough) in interaction with the broader society, without too much pressure to give up their ways. (See: “melting pot” and “salad bowl”.) Monocultures, monolingualism, monotheism, etc., do have certain advantages too of course, but I hope some can see why diversity is valuable too.


Why fetishize uniformity in the name of some absurd "efficiency"?

People are all different and why should they not choose to communicate/eat/enjoy themselves as they wish, with all the advantages and disadvantages of their choices?

(Plus in the case of languages there are things which can be said in some and really can't be said in others. My wife, child and I would typically speak multiple languages, sometimes even in the same sentence, in order to convey the nuance of what we wanted to say. Hell, even in English there are things said in discipline-specific jargons that aren't really communicable any other way).


The moment that happens, that language would start diverging again. You have to unify culture in order to unify language, and that doesn't sound as good anymore.

Settling on a "common" or "business" language, though, is entirely possible, and has been done many times in history. No language ever encompassed the whole world, and if one language has it's English, right now. Latin, Arabic, Mandarin, French, English, Spanish... The world has seen its share of common languages used within certain spheres of influence.


Point in case - proto Indo European language. It's clear most of European languages (except pre-Indo-European, like Finnish, Estonian, Basque..) have common parent language. For example, look at numbers. All those languages have awfully familiar numbers. If you speak one language, you can recognise most numbers in other languages rather easily. Yet they evolved into very different languages in different regions.


I think numbering system is a recent introduction to European languages - adopted from India in 10th century or so.. but some other words have similarity like maatha- mother, pithaa- father etc!

https://en.m.wikipedia.org/wiki/Hindu–Arabic_numeral_system


Exactly, essential words that we needed thousands of years ago as well as today sound similar. Because they are leftovers from when it was a single language. Modern words that were just invented tend to sound similar as well. E.g. "computer" in different languages. The vast difference is in words that came up in between split of Indo-European tribes and modern times.


The written numerals are relatively recent, but the words for numbers are much older and still very similar across languages.


Wasn't Roman numerals in use, before the Indo-Arabic came to Europe? How come they are similar? They look worlds apart to me..

https://en.wikipedia.org/wiki/Roman_numerals


These are the words the Romans used:

unus duo tres quattor quinque six septem octo novem decem ...


Yes you are right.. indeed sounds similar..


The numerals are different but the words are the same. 5 and V are both pronounced “five”, just represented differently.


Ah my bad...


> Why fetishize diversity for its own sake, especially when it's a barrier to communication?

Ĉar diverseco estas la bazo de ĉia belo kaj ĉia kreemo. Sin diverseco, vivo estus sensignifa.

> I think we should all speak one language (preferably, the one we're speaking now).

Certe, mi konsentas -- sed nur aldone al vian unikan lokan lingvon. Kaj ne la Anglan, mi petas!


When I looked into Esperanto a few years ago, I lost interest relatively quickly as I figured that this Euro-mashup of a language probably won't ever make it very far into areas where it would be most useful to me (i.e. outside of the Western world). Is this true, in your experience? Do you know of any Indian/African/Chinese Esperantists? Has Esperanto been a worthwhile investment of your time, and would you recommend studying it over "real" languages, on the merit of its community alone?


I'd say it's been a worthwhile investment -- but not really on the merit of the community. That's probably my own fault: the community seems cool and I probably ought to be less of a hermit. There are certainly plenty of Chinese Esperantists; the China Radio International (the PRC's equivalent of Radio America, roughly) even has an Esperanto service that's pretty interesting to listen to[1].

What's really made it a worthwhile investment is that it's given me a better understanding of language itself. All languages have idiosyncrasies, including Esperanto -- but Esperanto's idiosyncrasies are unusually consistent, making it really easy to map things onto. So I often find it easier to comprehend some weird construct in (say) Hindi or Japanese by mapping it onto straightforward Esperanto, rather than trying to map it onto an English construct that's probably even weirder.

Plus it's just such an easy win. I can order off a menu and find my way around town in a fair handful of languages, but haven't really gotten much further than that. But I can easily read just about anything in Esperanto, with 1/10th the effort that I've put into any other language. Which is just kinda gratifying to be able to do.

1: http://esperanto.cri.cn/


Thank you, that's interesting, glad I asked! I'll have to take another look then.


Esperanto has speakers in China, Japan, Korea, Indonesia and Thailand. No doubt others besides but these are the ones I've met either in my own country or in theirs.

You can see some chinese uptake here: http://esperanto.china.org.cn/

Esperanto is very european in its appearance (both grammar and vocab). This seems to have helped it to gain a userbase, which allows others to adopt it who don't share its background. The regularity of the language helps people even if they have to learn a lot of weird roots and struggle to get the very latinate relative clause system (a lot of English speakers struggle with that too though, because agreement is moribund).

Esperanto havas parolantojn en Cxini, Japanio, Koreio, Indonezio kaj Tajlando. Nedube ankauw aliaj sed cxi tiuj estas tiuj de kiuj mi renkontis esperantistojn, aw en mia lando aw en ilia.

Jene vi povas vidi ion de la cxina uzigxo: http://esperanto.china.org.cn/

Esperanto estas tre ewropa law sia formo (kaj gramatike kaj vorte). Sxajne tio helpis gxin obteni fruajn uzantojn, kiuj instigas ekuzi gxin aliajn kiu ne kunhavas gxian historion. La reguleco de la lingvo helpas homojn ecx se ili devas lerni multajn strangajn radikojn kaj penas kompreni la tre latinan relativan fraz-sistemon (multaj angla-parolantoj devas peni per tiu ankaw, cxar konsento estas mortonta).


"We should also have one programming language, one processor architecture (preferably the x86-64) and one browser"

If the above statement sounds wrong, it's because __it is__

We should let other cultures and languages to thrive by themselves; as this is the primary reason of evolution of our business language too. If multiple languages would not suddenly exist, our business language will stop borrowing from other languages and would cease to advance by itself.


> I think we should all speak one language (preferably, the one we're speaking now).

Gods no. You'll have to pry my other two languages from my cold, dead tongue. Just the ability to have private conversations in public feels like a superpower. The amazing puns you can make up is also a huge benefit - I can amuse myself all the time.

Plus, you know, cultural diversity, accessing millennia of literature and history etc. But that's the boring stuff /s


sprechen se habla?


So, people should give up their culture to become easier preys to global consumerism? Diversity is not a problem - it's fundamental to the survival of a species.


No way, English is too dry and limiting due to the lack of emotional tones that are present at least in Spanish and Russian (because of the flexibility of words and their order in a sentence). I'm ok with using it for technical purposes but I would prefer something else to actually talk.


I know what you mean. I speak another language (natively), and it's surprising how more emotionally expressive it is than English. Personally, though, I like English's relative dryness; it makes English seem much more civil and rational. That other language sounds vulgar and primal. D:


I think you are comparing with "business English" instead of actual locally spoken/written English here...


No, the one spoken with family and friends – it's still limiting. Profanities in English aren't diverse either.


> Profanities in English aren't diverse either

English profanity feels unsatisfying/watered-down. I suspect overuse in pop-culture has worn away most of the taboo factor, making it less cathartic.


And of the seven words you can't say we mostly use three.


Because diversity emerges, like it or not. Go back far enough and Swedish and Nepali were dialects of the same language.


> I think we should all speak one language (preferably, the one we're speaking now).

Good grief.


Seriously, I am trying my luck in NLP with some Indian languages like Marathi, Hindi, Gujarati and Tamil. The sheer number of dialects is driving me nuts.

Even dialects have very different sounding words or pronunciations which increases the complexity exponentially. But I am trying it only with an eye on its potential. NLP will simply act as the catalyst for technology adoption in the rural and semi-rural area.

I'm thinking to study China's approach here since I heard even Chinese is incredibly diverse.


China does have very diverse dialects. But Mandarin is taught in all schools in mainland China and we treat it as standard. Also it is the unified way people from different area communicate.

For younger generation this is not a problem, back in my college life where people in the same class came from different areas, from Xinjiang to Canton to the northeast corner. We have no problem understanding each other though sometimes funny with unique words or accents.

It seems that the problem is not as severe in China as in India.


And what are called "dialects" in the Chinese context are just as much dialects as French, Italian, and Romanian are "dialects" of Romance.


These mostly get called “dialects” for political reasons: the Chinese government doesn’t want to acknowledge that they are separate mutually unintelligible languages. Media in Chinese languages other than Mandarin is restricted, children are forced to use Mandarin in school, all official business is done in Mandarin, etc. There is a concerted effort to make other languages economically unviable, and generally to disempower and discourage regional / minority cultures. The grandparent poster’s experience is evidence that this strategy is working out.

It is similar to the way the Chinese government assigns non-native political officials to rule each region, and severely censors any politically controversial communication/media. That is, it is yet another tool of authoritarian social control, an effort to forestall any political opposition to the central government and its unresponsive top-down decision-making process.

India does not have the same kind of authoritarian governing institutions, so similar forced homogenization would not be politically viable.


> the Chinese government doesn’t want to acknowledge that they are separate mutually unintelligible languages.

Except they are not mutually unintelligible. Put someone from heilongjiang province in Sichuan and they will still be able to understand the language, albeit with more difficulty.

Though there are dialects that do have completely different pronunciation, they all use the same underlying script, save a select few minority languages. Mandarin Chinese is taught in school, but everyone still uses the local dialect to speak with each other.

I'm not even denying the CCP has ulterior motives in doing this, but your original claim was simply incorrect and disingenuous.


Heilongjiang is part of the Northeast, aka former Manchuria and was settled in the mid to late 1800s from the North Chinese plain. The entire North Chinese plain speaks variants of Mandarin for the same reason North American English is far less diverse than British and Irish English, there was a relatively small recent founder population.

Wu (Shanghainese and the other related dialects of the Yangtze river delta), Yue (Cantonese), Hakka, Xiang and Min are absolutely languages. They're at least as divergent as the Romance languages or the different "dialects" of Arabic. Having a single written standard does not make the spoken varieties one language. And even if it did Cantonese has a written standard even if it's not used much, so there are at least two Chinese languages.

https://en.wikipedia.org/wiki/List_of_varieties_of_Chinese https://en.wikipedia.org/wiki/Written_Cantonese


I speak a branch of Wu myself, and it's ABSOLUTELY not a different language from other dialects of Chinese. There are small parts (also commonly used) of the dialects that's dramatically different from Mandarin, but most parts are still the same. Especially if you need to speak about things in a more formal context, or describe concepts that are more abstract, the dialects has no difference with each other if written down


The pronunciation of words in Wu Chinese and Mandarin is systematically different, and many common words are entirely unrelated. Typical Mandarin speakers can’t understand Wu Chinese at all.

The relationship is similar to that between English and German or French and Italian.


Cantonese, Hokkien, and Mandarin are not at all mutually intelligible. Mandarin speakers can't even read Hong Kong newspapers fluently.

Only Mandarin and Cantonese even have a fully developed way of writing with characters. Up until relatively recently Mandarin itself was considered a spoken language, until a written standard (ie correspondence of characters with the words people actually spoke) was developed. Hokkien is in the process of this now in Taiwan, they literally have a government department choosing characters for words (They started off with 900 or something, not sure where they are up to now).


> Only Mandarin and Cantonese even have a fully developed way of writing with characters. Up until relatively recently Mandarin itself was considered a spoken language

Native speaker here. I have absolutely no idea where you get that.

Cantonese is just one of many dialects, and in fact, it is not a single dialect: People from different parts of Guadong province actually speak Cantonese very differently. Should you consider those different languages?

Cantonese, Hokkien and Mandarin do sound like different languages, but not all dialects are. Most Chinese speaker can understand dialects spoken in central, and north parts of China, even though they usually can't speak those dialects.

Even though some of the dialects sounds very differently, the words, syntax, sentences being used are actually the same. That's how people can read what other people speaking other dialects write, with no problem.

To complicate the issue even more, there're not one, but two writing systems currently being used: Simplified Chinese is used in China mainland and Singapore, while Traditional Chinese is used in Hong Kong, Macao and Taiwan. That's the reason people from mainland China (no matter what dialect they speak, even Cantonese) cannot read Hong Kong newspaper fluently

The two writing systems are different but they have one to one mapping for each character. So it's also not two unrelated system.


> Native speaker here. I have absolutely no idea where you get that.

Mandarin wasn't really written until about 120 years ago. Before people would write classical Chinese. It's a "written vernacular/白話文".

> Cantonese is just one of many dialects, and in fact, it is not a single dialect: People from different parts of Guadong province actually speak Cantonese very differently. Should you consider those different languages?

No, I'd consider them different dialects of a language called Cantonese.

> Even though some of the dialects sounds very differently, the words, syntax, sentences being used are actually the same.

Nonsense. Hokkien has a different grammar, even the personal pronouns don't match up 1:1. They're clearly in the same language family, sure, but so are English and German.

> That's the reason people from mainland China (no matter what dialect they speak, even Cantonese) cannot read Hong Kong newspaper fluently

No, it's not. I've seen Taiwanese people try and read Hong Kong newspapers, and they can't do it fluently.


> Mandarin wasn't really written until about 120 years ago. Before people would write classical Chinese. It's a "written vernacular/白話文

This is true, but it's a separate issue from having various dialects. Language evolves as time goes by, and in the case of Chinese before "written vernacular", people would write in the same form of ancient Chinese (regardless of pronunciation) which diverged a lot from what people actually speak. With the promotion of "written vernacular", people started to write what they speak, irrelevant of dialects.

Even though each of the dialects have some special vocabulary, and even different grammar, that doesn't mean they're not still "MOSTLY" the same.

I speak a southern Chinese dialect myself (I don't know what to call it in English) which is also dramatically different from Mandarin. Yes it has a few words that we commonly used, that I thought to be unique, but it turns out all of the characters and words exist in standard Chinese dictionary. Those words actually existed for a long time, and they are just no longer commonly used by other people. That said, learning to speak Mandarin is no where close to be like learning a new language. Vast majority parts of the language are still the same. Hokkien might be slightly more different but still no where close to being a separate language.

On the other hand, Japanese is undoubtedly a separate language, even though it also use Chinese characters (plus about 100 characters of their own), and many of the Chinese characters and words in Japanese actually means the same as they do in Chinese. No one would claim Japanese as a dialect of Chinese, because the difference is both significant and clear.

I think you also agree that both Taiwanese and people from mainland China CAN read Hong Kong newspapers, just not as fluently. Part of the reason is what I said, different writing systems. The other reason is that people do use some different words, especially for new concepts. Since Taiwan, Hong Kong and mainland China has been fairly separated for a long time, divergence is inevitable. For example, new words like "program" is translated in different ways: "程序" in mainland China, and "程式“ in Taiwan. Again, this has nothing to do with dialects. Mandarin speakers form mainland China would not be able to read Taiwan newspapers as fluently even though it's also Mandarin. Cantonese speakers from Guangdong province couldn't read Hong Kong newspaper as fluently even though it's also Cantonese.

On the other hand, I grew up watching many Hong Kong movies that were spoken in Cantonese but with Chinese captions. The captions needed no translation at all - They were just what the actors were saying. After watching many of those, I could even understand some Cantonese though I still couldn't speak those. I admit there are occasionally some word that seems unfamiliar to me, but it didn't impacted much. I don't believe you can do that with a different language.


This is true, but it's a separate issue from having various dialects. Language evolves as time goes by, and in the case of Chinese before "written vernacular", people would write in the same form of ancient Chinese (regardless of pronunciation) which diverged a lot from what people actually speak. With the promotion of "written vernacular", people started to write what they speak, irrelevant of dialects.

No, people started to write down what they spoke in Mandarin. It wasn't irrelevant to the language they spoke, it was tailor made for Mandarin. Hong Kong went through this exact same process, and came out with a different written language. Which is why you have a cantonese wikipedia and a mandarin (w/ traditional chinese characters) wikipedia. And which is why Taiwanese people who can read mandarin in traditional characters fluently can't read cantonese in traditional characters fluently.

French and Spanish are 'mostly the same'. Same alphabet and everything. And they're not dialects. With languages like Hokkien it's more like the difference between English and German.

I speak a southern Chinese dialect myself (I don't know what to call it in English) which is also dramatically different from Mandarin. Yes it has a few words that we commonly used, that I thought to be unique, but it turns out all of the characters and words exist in standard Chinese dictionary.

That's because you never learned to write your language, because people don't consider it worth writing. You would have hammered mandarin characters into the right shape, because - presumably - that was the only thing your family was literate in. Of course it turned out the characters matched fairly well.

Again, that's like me finding out that "bonjour" in French actually literally translates to "Good day" in English - doesn't mean they are the same language.

I think you also agree that both Taiwanese and people from mainland China CAN read Hong Kong newspapers, just not as fluently.

Sure. I'm an English speaker that doesn't speak French or Dutch, but I can follow a French or Dutch newspaper well enough, if not all the details. Are French and Dutch just dialects of English? Goede Dag is just Good Day after all...


I don't know what to say. Believe in whatever you want to believe. I'm glad I can speak more languages than I thought.


It's funny to see how westerners talk about dialects in China. Western propaganda worked pretty well I guess :)

And yeah, most Chinese are bi-lingual by default. Now I know I can speak tons of northern Chinese languages. That's a big upgrade!

We all learn classical Chinese at school, and that must be a completely different language. Wait, are "classical Chinese" a single language? Should we count each of those spoken in different dynasty and different regions all as different language? Oh man, I can't even count how many languages I can speak


I think it's even funnier to see how many Chinese people - even those Chinese people who are fluent in English - take the folksy word 方言 and try and shoehorn that into the precise linguistic term "dialect". For whatever reason Chinese the world over seem to believe in literal 1:1 translations of Chinese -> English terms, when they really don't mean the same thing. It makes it very hard to talk to them.

We all learn classical Chinese at school, and that must be a completely different language.

Of course it's a different language. It's not completely different. Languages can be related to each other. Old English is a different language than English. If English was written in Logograms instead of a phonetic alphabet, many of us could probably read Beowulf the same many of you can read the works of 老子.

Again, if I had the Chinese mindset, there would be a language called "Indo European" which would have various dialects like "Persian" and "Russian" and "Icelandic". There would also be "Standard Indo-European" - aka, English. It makes no sense.


Sure, I love the fact that I can speak many languages. Why not.

If you don't like the translation form 方言 to dialect, should I just call it fangyan or 方言? Or is there a more precise linguistic term for that?

Our definition for "Language" is also different then. Should we call it "Yuyan" instead?


Terms in different languages don't map 1:1. You're trying to fit Chinese square pegs into English round holes. I think this is due to Chinese education methods, since you all do it - even very fluent writers such as yourself.

I'd translate "方言" to "dialect" when it's mutually understandable, and "language" when it's not. So Singapore Hokkien and Taiwan Hokkien are both dialects of the Hokkien language. But I'd translate 方言 to "language" when talking about Mandarin and Hokkien. They're languages not dialects in English because you cannot have a conversation - but they are part of the same language family.


A language is a dialect with army and navy.


If you can fluently understand people across northern China without explicitly learning to, then that means they speak “dialects” of your language.

But hundreds of millions of other people in China natively speak hundreds of other mutually unintelligible Chinese languages. Most of them are (at least) bilingual.

Here is a grossly simplified map showing major language groups: https://en.wikipedia.org/wiki/File:Map_of_sinitic_languages_...


So, even though most people cannot understand some southern dialects when they hear it, they can fluently understand other dialects without explicitly learning to do so when it's written down. Even when you explicitly write down exactly what you say in the dialects (like the Cantonese version of Wikipedia pages), people can still understand it, without much difficulties. And yes, there're some words used differently from standard Chinese, but the difference is not much bigger than the difference between UK and US English.

For that reason, we define them as dialects not languages. We can "speak" other dialects but we can certainly "read" and understand them.

If you find that definition unsatisfying, sure you can call them different languages. We just call them dialects and that's not going to change.


> If you find that definition unsatisfying, sure you can call them different languages. We just call them dialects and that's not going to change.

Sure, if you want to speak Chinglish, keep calling them dialects. If you want to speak English, listen to what people here are telling you.


Are you a native speaker? Because 廣東話 and 閩南話 are totally unintelligible to me, and I want your language superpower.


Its just typical chinese nationalist mythology. 'We all speak the same language' is just another fairy tale they drill into their heads, along with '5000 years of culture'

When youve talked to one of them youve talked to all of them.


A few dialects are unintelligible indeed, and they're mostly spoken in southern parts of China. But most people can understand dialects spoken in central and east (north of Changjiang river) fairly easily even though they cannot speak those dialects.


And Portuguese people can often understand Spanish. Are they dialects?


I don't see any part of your comment being correct.

First of all, "Mandarin" is spoken not only in China mainland, but also the standard in Singapore and Taiwan. It was a creation by the Republic of China (which later became Taiwan government) back in 1923, long before the current Chinese government came into power.

Children use Mandarin in school because they have to learn it to be able to communicate with people coming from other parts of China, which would have become a huge disadvantage to themselves. (I can't imagine how I would communicate with other people in college otherwise) It doesn't mean people will forget how to speak their own dialect. In fact, people from the same region almost always speak their own dialects.

Regional / minority cultures are generally protected by the government. The minority are almost always over-represented in all kinds of national events. Being a minority in China means you can get tons of advantage (lower score required to enter good colleges, financial aid, etc.)


The language we call Mandarin has been the native language of some parts of northern China for thousands of years, and was certainly not “created” any time recently. Some people speaking dialects of that language migrated to other parts of China. But there are various other languages natively spoken elsewhere in the country.

Singapore is a cosmopolitan port city, there are several Chinese languages spoken there, and Mandarin was not the dominant one until recently. There are also many other languages spoken in Singapore, and from what I understand English is the primary language used for official business. Taiwan was not natively Mandarin speaking but speaks it now because it was taken over (from the Japanese) by the fleeing Mandarin-speaking KMT after they were beaten militarily by the Communists during the Chinese Civil War. Both Singapore and Taiwan were ruled for decades by authoritarian governments. I’m not sure about Singapore but in Taiwan other Chinese languages were forcefully suppressed.

Plenty of other parts of the world manage to communicate across regional/national borders without restricting people’s ability to produce/distribute local media in their native languages.

There are many countries where students learn several languages in school (including their native regional language and a national language) from an early age.

(Disclaimer again: I’m not an expert in the history, politics, or comparative linguistics of China. I recommend Wikipedia as a better first summary, if you are curious to learn about these subjects.)


If you can read Chinese, the Chinese versions of wikipedia page on Mandarin Chinese has a lot more detail on its origination: https://zh.wikipedia.org/wiki/普通话

If you cannot, I found an English article for you: http://www.alittledynasty.com/history-of-mandarin-chinese.ht...

To summarize, Mandarin is not created out of nothing for sure, but the concept of "Mandarin Chinese" (or rather, Standard Chinese) started with an effort of newly established Republic of China in 1913, to develop a standard phonetic system and to use as the national language in China. They later published the standard around 1920s, which is essentially a modified version of phonetic system used in Beijing. The dialect now spoken in Beijing is very close to Mandarin, but not exactly the same.

I grew up in China and lived in Singapore for a long time. I can tell you for sure, that the different dialects spoken by Chinese should not be confused with completely different languages. First of all they share the same writing system, the words and syntax we use in various dialects are mostly the same. (Some dialects use a few words differently from others, but that's not surprising at all considering UK english and US english are not exactly the same)

I speak a southern dialect myself which sounds very different from Mandarin. But there is a somewhat systematic mapping from the dialect to Mandarin, so it was really not much an effort to learn Mandarin.

I can imagine there must have been some efforts there to promote the standard in the very beginning, maybe even "forcefully suppressing" other dialects are needed at some point, but considering the huge benefit, it undoubted is the best invention happened in the history of Chinese language.


> The language we call Mandarin has been the native language of some parts of northern China for thousands of years, and was certainly not “created” any time recently.

In the same way that Hindi has been the native language of northern India for thousands of years (which is to say that while the Mandarin of today has connections to earlier forms of Chinese, it is hardly a monolithic, unchanging remnant of thousands of years ago).


> I’m not sure about Singapore but in Taiwan other Chinese languages were forcefully suppressed.

Singapore was much more successful in suppressing Hokkien than Taiwan was.

https://en.wikipedia.org/wiki/Speak_Mandarin_Campaign


Yes, there was a promotion to speak Mandarin, but suppression of dialects was never a major part of the campaign and Hokkien is still spoken by many Singaporean.


> but suppression of dialects was never a major part of the campaign

From the article:

The initial goal of the campaign was for all young Chinese to stop speaking dialects in five years, and to establish Mandarin as the language of choice in public places within 10 years.

> Hokkien is still spoken by many Singaporean.

Again, the article shows that in 1980 81.4% of Chinese Singaporeans spoke a Chinese language at home that wasn't Mandarin. In 2015 it's 16.1%.


Please search for "suppression of dialects was never a MAJOR part" in the original article for more details. I copied the exact sentence from there.

Yes, there are much less Singaporeans speaking their dialects at home. I've actually met several Singaporeans saying that want to speak their dialects as much as possible as they feel it's endangered. However, what are the alternatives? Singaporean Chinese doesn't speak the same dialect, and it's a small city. How do they communicate with each other in school, at work, and at home after marrying someone speaking a different dialect? They can speak English, in fact they do that as well, or they can speak Mandarin if they're communicating with another Chinese speaker. Either way, the chance they'll be able to speak their own dialects will diminish eventually.

In addition, most dialects are no where close to being endangered, as they're widely used in China. We speak in Mandarin with people that doesn't speak the same dialect, but at the same time, it'd also be weird if we are from the same place, speak the same dialect and yet decide to speak in Mandarin.


Well one advantage is that Singaporean employers can put on superfluous "must speak Mandarin" sentences in their job descriptions, which lets them hire Chinese people and avoid hiring Malay and Indians, all the while landing on the right side of anti-discrimination laws.

Hard to do that when lots of people speak Hokkien, Cantonese etc.


English is the official language used in formal context in Singapore, not Chinese, Indian or Malay. All the documents you’d sign with a company are in English.


However, it's worth noting that you call the official language of China "Mandarin" for political reasons. The analogy would be if you called French "Bureaucratese" and said "Yes, but Breton and Occitan are not mutually intelligible with Bureaucratese".

The statements "Bureaucratese is not mutually intelligible with Occitan" and "Mandarin is not mutually intelligible with Cantonese" are both true, but we could just say "French is not mutually intelligible with Occitan" and "Chinese in not mutually intelligible with Cantonese".


I could call it Beijingese (or Pekingese) if you prefer. But many people might not know what I was talking about. Mandarin is the common name used in English to refer to this language.

I don’t have any problem if you want to talk about Parisian French (or pick your preferred other name for it), Castilian Spanish, etc.

Parisian French was pushed onto the people within the borders of the French nation-state by force, by a brutal authoritarian monarchy. Quoting Wikipedia,

‘The goals of the Public School System were made especially clear to the French speaking teachers sent to teach students in regions such as Occitania and Brittany; “And remember, Gents: you were given your position in order to kill the Breton language” were instructions given from a French official to teachers in the French department of Finistère (western Brittany).’

The French state continues to repress minority languages inside its borders. See https://en.wikipedia.org/wiki/Language_policy_in_France


The word Mandarin comes from the Sanskrit word "Mantri", for minister. I wonder why this word was chosen.



In fact, it's just called "Chinese" by Chinese speakers. I never heard of the word "Mandarin" before I came to US.


That's how languages form. You don't get a language like French just by hoping for it to emerge or by letting dialects run their own lives and continue to diverge; you take it by taking a bunch of Romance dialects including quite separate ones (Langue d'Oc and the Langue d'Oil groups) and pushing them together through a common system of mass media (printed for the time, but still), education and cultural acceptance of a "proper dialect" at the top end of the society. That's how you get a strong language that helps you to unify a country and reduce internal barriers of communication; and that's what the Chinese are doing.


By "that's how languages form", I assume you mean something like "standardized languages", which then becomes somewhat circular, because it's certainly not how languages form generally (and even standardized languages can form without mass media, and certainly did, even pre-writing).

That's also not how standard French formed. Standardized French is Parisian French, so just the 'dialect' of the politically-important centre. Not too dissimilar to Latin (which was originally narrowly the dialect of Rome) in that.

Whatever the propaganda, China still contains a number of non-mutually intelligible (though related) languages, but of course Mandarin enjoys much prominence and has an enormous number of speakers.


I never heard that the language was standardized taking langue d’oc elements or any other French languages or dialects. You should give some sources to affirm that. The standard French is based on the French spoken in Paris which was then extended (or imposed in some case) in the whole country via the administration and the public school (forbidding the use of dialects).


While this might be true for spoken language, luckily the Chinese writing system does represent how the the words are pronounced. All of these "dialects" share the same writing system. So it's much easier for someone speaking one of these dialects to learn and use Mandarin.


How does phonological complexity affect these things? I'm not a linguist, but Hindi at least seems way complex, with a much larger phoneme inventory than other languages I've studied.


I don't think Hindi's phonology is more complex than English, but it is all explicit in the character set rather than implicit in the etymology. (The voicing difference between "this" and "thistle" is my favorite example of this, even beyond the fact that we're representing one of the most common consonant sounds in the language with two characters.)


In the World Atlas of Language Structures http://wals.info/chapter/1 , Hindi is considered to have a large consononant inventory. Hindi consonants plus vowels is not hugely different than English. But even though it's relatively large, I certainly wouldn't describe it as complex. It's mostly just the same properties repeated in different locations.

Language areas aren't made more complex by additional phonological complexity. The question you've asked seems to be "Does phonological complexity cause more language diversity". When put this way, there doesn't seem to be any causal mechanism that could do it. For instance, one might say: Well, English has a lot of vowels. People in California might simplify them one way, whereas people in Texas might simplify them in another way so that they can't understand each other well: therefore, you get additional linguistic diversity. But this requires the Texans and the Californians to be isolated from each other which isn't what people mean when they say "India is a very linguistically diverse country".

If we try it the other way "Does language diversity cause more phonological complexity", languages tend towards each other in case of diversity (because a person who speaks both Chinese and English will sometimes adopt features of one language into the other). This can sometimes lead to the propagation of more sounds (for instance, languages pretty much only use clicks if they're in contact with other languages that use clicks). And sometimes it can lead the elimination of them. I'm not sure of any particular research about this question, but my guess is on average it would tend towards the average, but if you took English and put it in India it wouldn't be too long before English in India sounds a lot more Indian.


> presumably in a well-intentioned attempt to package this in a way the Western mind can comprehend

Did you intend for that to come across as condescending?


You mean to the Western mind? I've got one, so yeah I feel somewhat at liberty to make mildly self-depreciating jokes about it. Sort of the way that Woody Allen is allowed to make good-natured fun of pedophiles.

Egads, that's terrible company to keep. Moving on now...


There's nothing condescending about stating that there are some world views that may be beyond our immediate rapid comprehension. I have a 'western mind' and I was not offended.


I don't know if GP did, but I did not read it as such.


> but I did not read it as such.

The same here (granted, I'm not a "true" Westerner, as I live in Eastern Europe, but that's also West of India). Adding to the conversation, I have the same issue of trying to comprehend the linguistic craziness that the Caucasus area represents. I mean, as a language buff I'm always amazed whenever I look at this language-map of the area (https://upload.wikimedia.org/wikipedia/commons/b/b0/Caucasus...) and at the language families included in there.


This usually comes up in the skilled immigration backlog discussion when someone chimes in "But we want the skilled immigrants to be diverse. That's why we have a per-country cap!"


> Nearly 50 languages are indigenous to that one region alone. Not dialects.

Just nitpicking, but the distinction is entirely political. In China, people speak Chinese dialects, that are much more diverse than the difference between the Danish language and the Swedish language.

Since China wants to look united, these are "dialects", while Denmark and Sweden wants to be separate sovereign nations they want to have distinct languages, so Danish and Swedish are not some Scandinavian dialects.


We have a saying in Hindi which translate into this.

> " kos kos par badle paani,chaar kos par baani, par ek hai jo nahi badalta vo hai Hindustani"

Rough Translation:

> Taste of water changes every 2 miles, language changes every 4 miles, but Indian still don't change.


* but what doesn't change is we are Indians

Literal translation doesn't work here (also one of the shortcomings of tools like Google Translate)


So... the comment wasnt stereotypical enough?


There is an actual Hindi aphorism which is similar, basically: "the language and the water change every five miles"


>in a way the Western mind can comprehend. Which frankly is futile.

Xenophobic much?


This made me think of Metafont[1] which D. Knuth developed in conjunction with TeX. The idea was to do font-design by drawing with a virtual brush -- but with perfectly smooth curves and a build-in constraint solver. The end result is that the entire Computer Modern family is parameterized (and generated on-demand) as TeX is using it. As an example, the serif, sans-serif and typewriter fonts are the same Metafont program, but just with different parameters.

Looking at the repo for these Indian fonts you can see that they use FontLab Studio[2] for the work. Browsing the homepage reveals how complicated and involved font crafting is -- let alone the design!

[1] Metafont: https://en.wikipedia.org/wiki/Metafont

[2] FontLab Studio: https://www.fontlab.com/font-editor/fontlab-studio/


Some Indic typefaces were created using METAFONT in fact, in the 80s and 90s. They used the abilities of METAFONT to provide multiple specific fonts. Some of them have features (variants for conjunct style, e.g. the “Bombay” versus “Calcutta” styles for Devanagari, etc.) that are still not present in any modern (TrueType/OpenType) font available today.


I always wondered why many non-Latin (mostly cursive) scripts have little variation across different typefaces. Maybe I wasn't looking hard enough? Well, the article mentions a similar observation by a Sri Lankan typographer, so I guess I am not alone. Can someone maybe point me to other non-Latin typefaces that have "their own typographic style"? I found the Baloo samples (last one in the article) refreshing. The style of the Tamil and Devanagari samples is very close to the Latin sample. For the Mina samples (first figure in the article), I can see that they try to capture the character of the Exo Latin typeface, with certain strokes getting narrower towards the end and its superelliptic curves (are there typographic terms for these?). I am not used to reading Bengali, but the style of the sample looks like it is in a font that has a different weight.


Arabic has significantly more variation than Latin scripts and always has. Given that painting of living beings is typically not allowed in the muslim religion, calligraphy came to be the artistic expression of choice.


> Can someone maybe point me to other non-Latin typefaces that have "their own typographic style"?

CJK scripts have traditionally used Ming, Song, and Gothic (sans-serif) typefaces.


I guess what I mean is styles besides gothic (undecorated strokes of even thickness) and those that mimic traditional calligraphy. I see a lot of other forms for Latin scripts, but for Arabic, CJK and Indian scripts most typefaces fall into the above two categories. I might be wrong though, maybe I am just not exposed to a lot of variety. I do find it most notable in mixed script printed text, e.g., Arabic, where the predominant form seems to be Naskh, which looks like calligraphy, and Latin, which typically doesn't look like calligraphy at all. This mix creates an image that looks very uneven to me, similar to when people use too many different typefaces in Latin text. Actually, I am not even sure whether the typographic style is dictated by Naskh, or whether its just the form of writing.


Tamil and Devanagari scripts are close to latin? in what way? Please, can you elaborate?


I didn't phrase it well. I was specifically talking about the Baloo sample and how the typographic style is the same across scripts. Sorry, I am a layman, so I don't know the proper terms, but I mean the similar stroke weight and curvature, strokes endings being pointy on one side and round on the other etc.


I admit that despite of the ubiquity of Indian fonts these days, I still use English to write messages conveying Bengali/Hindi words, because I'm more comfortable that way. I don't remember the last time I wrote a letter in Bengali/Bangla or Hindi, and when it comes to electronic communication, it has always been English for me. So it feels strange for me to start writing in Bengali using my keypad all of a sudden.


और क्या हाल है भाई? If you're using an Apple device, you could try using a transliteration keyboard. Type in English, the word is printed in Hindi (I dunno if they have a transliteration keyboard for Bengali)


Is there a difference in the minimum font size that you find comfortable reading between latin and non-latin alphabets?


Oh yes absolutely. Right now I see font size 9 and I'm fine with it for the latin alphabets. For the non-latin ones, I'd prefer 11. Maybe that's just a problem with the hindi font I have right now on my Firefox (Mukta Devanagiri), in fact I downloaded that font after reading the post in the link.


I feel like there's certainly room to grow as far as input devices for Indian scripts. I learnt to type Devanagari on the InScript keyboard, though I have mixed feelings about it. On the one hand it is laid out logically with voiced and unvoiced consonants each in their own rows, vowels are neatly placed on one side, the semivowels have their corner, etc, etc. It just gets annoying when switching back and forth between InScript and Latin QWERTY though (at least क-->k?). I think an input technique based around a mixture of a keypad and gestures would be interesting to try on a touchscreen. hmm...


I absolutely hate using latin script to write Hindi or to read it. I always use either the transliteration keyboard or google's absolutely amazing text-to-speech.

I also believe that google's text to speech is more 'easy' for Hindi than for English. There is rarely an ambiguity in Hindi text-to-speech than in English text-to-speech. For instance, in English it would write the pun equivalent text "Crypto current seas", and later correct it to "cryptocurrencies" once it has more context, but this is rarely needed in Hindi.


Nice to see lot of Tamil love in there.


Tamil font had a lot of work done on it in the late 90s. Especially amongst the Sri Lankan Tamil diaspora who fled the nation but continued to want to read the news. Murasu was the font name I believe.


Tamil has the largest users in internet[1]. It does enjoy the highest adoption.

1. https://assets.kpmg.com/content/dam/kpmg/in/pdf/2017/04/Indi...


To clarify - That's % of population that has adopted, not the largest number of users.


also Google CEO himself being ...

well, I will not go there! But the absence of Telugu font in talking about Indian fonts is eerily disconcerting.


thanks for the link.


This is the first time I've ever seen this construction (using "that's" instead of "whose" as a relative possessive pronoun): "...the ongoing use of English, a language that's reach and influence has grown considerably..."


That's not very typical. (Pun intended, and John Clarke reference intended too.)

It was just a mistake. Given the topic, it's possible that that was written by someone who's not a native speaker - but whose writing is otherwise very good. (And arguably better than that showoff sentence!)

As you noted, a native American English speaker would write "...a language whose reach..." And that's that.

RIP John Clarke:

https://www.youtube.com/watch?v=3m5qxZm_JqM


Contractions in vernacular American English are such a shitshow and getting worse all the time, that I'm not sure any guardians of proper usage can stand against the tide.


Really impressive! Following Ek Type and ITF since so long and happy to see great things are coming out on Indian scale.


Based on what friends from South India tell me, it looks like South India in particular is embracing English. Even poor people are sending their children to schools where they can learn English. Most of the people I know, when they text on a phone use the Latin alphabet. India has fully embraced the global system which is based on English. I would guess that within 20 years, more people will speak English on a regular basis than Hindi.


Part of that is South India tends to be assertively anti-Hindi for various historical/cultural reasons. Given that, English is the only realistic option for communicating outside a given linguistic community.


It's not exactly true. I am from the South and I love Hindi as much as I love my mother tongue (Kannada). However, the reason there are revolts to the degree it becomes National news is primarily because of the thrust of Hindi on locals. What do I mean by that? Imagine a villager or someone not versed in English wanting to fill a form in a Governmental organisation or a Bank. For a long time, most forms were available only in English or Hindi and not in the local language. This naturally pissed that section of the populace as they had no grasp of either of the languages. This felt "imposing" on them.

I call it Governmental failure of either educating the masses with regards to learning Hindi or them being unable to cater to the local needs (not printing enough forms in the local language). This was in turn exploited by the regional parties who converted this issue to a language-war of sorts, trying to portray that the Central government is imposing Hindi and diminishing the local language. These "movements" were used mainly for political gain. The reason you don't see this happen in the North is because for most, Hindi is a natural fallback language as the script is similar (Devanagari) to their Mother tongue, unlike in the South.


>>Part of that is South India tends to be assertively anti-Hindi for various historical/cultural reasons.

We are pro-OurLanguage, whatever that language. If that feels anti-Hindi to you, one can't help.

You almost are suggesting to be pro-YourLanguage one has to throw out their their language.


There are also a few anti-OtherLanguage protests.

For example, this is hardly "pro-OurLanguage" - http://www.thehindu.com/news/cities/bangalore/karnataka-raks...


That is pro-OurLanguage protest.

Also, as long as there is English, its totally useless to learn Hindi.


How is that a pro-OurLanguage protest? The board already had Kannada and English. In addition, Hindi was being added. That is clearly an anti-Hindi protest. Not a pro-Kannada protest.


I will take your comment seriously when Kannada shows up on Delhi metro signage and no body protests, until then its shoving things down peoples throat and holding them responsible for not swallowing up.


Nah, the issue of people learning English in school is totally orthogonal to the use of the Roman script in texting. People used the Roman script for texting when that was the only script available on the mobile phones. Android has really great support for most Indian scripts now (with the notable exception of Urdu) and I'm seeing a resurgence of texting in native scripts.

Learning English is viewed as a "job" thing -- it's what you do to have a good career. And English isn't used in the same ways as native languages are. A lot of the serious literature, art and politics happens exclusively in languages that are not English and using English in these contexts outs you as the equivalent of the "condescending coastal elite" in the US.


South India in particular has a higher literacy rate than the rest of the country, so it might take a while for this to spread.


Isn't English a required subject at Indian schools? It is an official language of India


Language is the life of art & literature, much of the Indian subcontinent's culture has huge emphasis on that. So, this actually is a huge deal in preserving it for posterity.


Alright, this is cool. I was annoyed at the last Google India campaign article about how everyone was going to use voice-to-text.


Speaking of Asian fonts, does anyone know of CJK character sets that can be loaded via CDN à la Google Fonts -- for example, Google's Noto CJK Simplified Chinese fonts?


The unfortunate problem is that any one of these fonts increases the amount of data your page has to load by 8 megabytes.

I don't think there's a good solution to CJK web fonts yet. If one existed, I think it would need to involve a font rendering technology where character shapes can be generated from strokes and radicals, instead of the literal shape of every character having to appear separately in the font.


Why can't font files be compressed, as that would just be using a dictionary which I believe that GZip already does.


They are compressed.

Do you perchance mean "dynamically loaded, on demand"? Because there's no standard for it. Because – as usual – "good enough for Western audiences".


That sounds like an attack on westerners. Obviously westerners develop ideas for westerners first; That’s what they understand and that’s what they need. The lack of solutions for the billions of users of a writing system should not be the fault of the people who don’t use it. Sure they can do better, but they don’t deserve that tone.


I fail to see how the tone in your comment is at all appropriate for the comment you are responding to. You seem overly sensitive on this issue.


I cannot argue with „first“, but considering that this „first“ has been, what?, five decades ago, thinking that a „second“ should have emerged by now may not be totally unreasonable.


Seems like there would be demand for a font format that allows character modifiers to dynamically be applied to characters. This would allow the language to be represented digitally similarly to how it's taught in the real world, rather than as thousands of unique Unicode characters.


That is at least conceptually part of Unicode. You can encode „ü“ both as the character (one code point) and as „u“ followed by „modifier Umlaut“ (two code points).


Glyphs in fonts already make use of composite features (ü can be u + ¨, just the coordinates of the references). But readability of type can be complex, so I don't know if that's a viable solution to compose large-nonwestern fonts in a way that is natural to read. There may be very subtle differences that force the designer to decompose the parts and make modifications.


Ah! Knowing that, I bet that someone _could_ make a CJK font optimized for file size, which includes radicals by reference. It wouldn't work in every case, because radicals change size and strokes shift to make room for other strokes. In cases where the same radical appears in the same place, it seems like it would help.

But maybe that's not enough of a benefit unless there's also a way to say "okay you need to use this radical, but a bit narrower, but the strokes need to be the same width and not distorted".


The www.glyphsapp.com and www.fontlab.com font editor apps have "smart components" which do this. Sadly the OpenType format is developed very conservatively by Microsoft so these aren't part of the only widely supported font format today, despite recent additions to the format of run time interpolation technology.


The css standard for "dynamically loaded, on demand" is Unicode Range.



Yes, Noto SC is available from fonts.google.com/earlyaccess


Side note: I didn't know Google had their own TLD. Is this new? Seems weird that an organization has that ability to do that.


You could do it, it just took $185,000 per application (https://newgtlds.icann.org/en/applicants/agb) and a super involved process.

Edit: s/can/could (thanks to child commenter).


> You can do it, it just takes $185,000 per application [...] and a super involved process.

And when you say “super involved”, keep in mind that part of the required process is to travel back in time to before April 2012, because “The application window for the first application round closed in April 2012. Comprehensive reviews of the program are currently underway to assess its performance in meeting intended objectives. These reviews will inform ongoing discussions with the ICANN community to determine when a second round will take place.” [0]

[0] https://newgtlds.icann.org/en/about/program


They've added it for over 3 years now, you can read more about it here: https://icannwiki.org/.google

blog.google is used pretty often.



Maybe I've just never really noticed before. Maybe a case of glancing and just imagining the `.com` after.


It was pretty gradual. For a while the only public-facing domain was elgoog.google which I think started as an April Fool's joke.


Not that it matters much but IIRC joke was http://com.google



Waiting for google.google to become a thing




Anyone can apply for their own TLD but it's not cheap. £130K+ a year iirc after a long application process


> Anyone can apply for their own TLD but it's not cheap. £130K+ a year iirc after a long application process

Anyone could during the one-time “new gTLD” scramble, but applications closed for that years ago; no decision on any subsequent round has yet been made, because the review of that first round has not yet been completed.

So, other than establishing a new country and getting a new ccTLD, there's no way to get a new TLD right now.


ICAAN now allows organisations to purchase Branded TLDs. Google, Sky, Barclays among others have done it.


> ICAAN now allows organisations to purchase Branded TLDs

No, ICANN years ago allowed that for a limited time, and is currently reviewing the results of that to decide if, when, and how to do so again.


Cern has one too: https://home.cern


Why did CERN spend $185,000 plus labor on a TLD?


CERN is one of the rare cases this makes sense, since it's otherwise an international organisation with a domain under a ccTLD of a specific country (Switzerland).


> makes sense

It might make sense by the logic of domain names, but it doesn't by the logic of return on the investment of scarce resources. cern.org or cern.ch or cern.anything would meet all functional needs just as well and for much less.


What about cern.int instead of cern.ch?


People very strongly underestimate the impact on the net once many people who were hitherto disconnected, finally come online.

Provided they realize there is a greater web beyond facebook and the various trap gardens.


how is this related to the topic and not down voted?


It's in the first sentence of the article.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: