Wednesday, July 18, 2018

resources - How to find alternative writings of a kanji in electronic form?


Some kanji have alternative writings that are significantly different from each other (e.g. different numbers of strokes). For example, see here.


I'm looking for ways to find alternative writings for a given kanji in electronic form (that I can copy-paste as text; IOW, bitmapped images won't do).


Does anyone know of an online reference for finding alternative drawings of a character?




NB: I wrote the example below before I was aware of the issues that Shen Kuo explained in his first answer. I now realize that this example is likely to be confusing rather than illuminating to most readers. I've kept it as is, since several comments and at least one answer refer to it. If you do find it confusing, please ignore it.





For an example of the sort of search I'm trying to perform, consider the character


enter image description here


Every place I've looked online shows the second character of the compound そうち (device) like this:


enter image description here


Note that the last stroke of this character is a simple horizontal one across the bottom.


In contrast, I have a several printed sources that show a different writing for the second character of this compound, where the last stroke consists of a vertical component along the left side of the character, followed by a horizontal one across the bottom; some instances are shown below:


enter image description here


Note the last two characters in both books' titles. These two characters are supposed to be the compound 装置, but the second one of the characters is drawn differently from 置1. Incidentally, my old Nelson kanji dictionary (2nd revised edition, 1974; 17th printing, 1984) shows the same drawing of this character as that shown in the picture above (see character 3644). AFAICT, it does not mention the form 置 at all.


FWIW, none of the search tools that I've tried that accept handdrawn characters as input have retrieved the alternative form of 置.





1These books use this alternative drawing of 置 throughout, not just in their cover titles. If you right-click on the image, and open it at full resolution in a different tab, you will be able to find other instances of this alternative drawing of 置 besides those appearing in the main titles.



Answer



I'll leave my old answer up there, and write up what I said in the comments now that you've edited your question.


Character variants are uncountable. Officially, you already have:



  • Japanese 新字体, used in official print, according to the 常用漢字 list and 人名漢字 list

  • Japanese 旧字体 that weren't actually phased out fully, especially in names, according to the 人名漢字 list

  • 簡体字, simplified Chinese as published by the Chinese gov

  • 繁体字, traditional Chinese as used in HK and published by their gov

  • 繁体字 as used in Taiwan and published by their gov (slightly different to that in HK)


  • Hanja (漢字) taught in Korea in high school as their gov decrees


Then you get non-standard characters, like 飴, that people aren't taught in school but most can read and use, and you see often in written language anyway.


Then you get 朝日文字 such as 𦜝 (officially 臍), which aren't standard 新字体, but apply their logic to characters that weren't simplified.


On top of that, there's ゲバ字, which pretty often simply use simplified Chinese in Japanese. In calligraphy too, many characters are written like simplified Chinese (many simplified characters were based off calligraphic variants). Fonts also have variations based upon calligraphy and handwriting style.


Then do we count ryakuji, too, like 㐧 rather than 第, and 门 rather than 門?


The creation of characters is ongoing, and there are always new abbreviations coming about as part of their evolution, so you won't find any cohesive works. The Kangxi dictionary contained 47,000 characters, of which about 40% are graphical variants, and there are now new variants invented in the 200 years since.


Digitally, there isn't even one system of encryption, and not all characters that you see written are encoded. Unicode is possible the most commonly used standard, but it's not totally comprehensive. But, most people use it, and you seem to be too, hence the problems with Han unification.


You imply in your question that the chinese 直 is a valid variant of 直 in Japanese. In this case, wiktionary gets closest, because it lists out every language's simplification of any character, and has articles on most encrypted variant characters, such as 𣥂 for 步 etc.


But it's not finished, and new characters are made and encoded everyday.



You can't beat it, but you can sit along for the ride.


No comments:

Post a Comment

periodic trends - Comparing radii in lithium, beryllium, magnesium, aluminium and sodium ions

Apparently the of last four, $\ce{Mg^2+}$ is closest in radius to $\ce{Li+}$. Is this true, and if so, why would a whole larger shell ($\ce{...