google-research
This directory contains the supporting code for "The Taxonomy of Writing Systems: How to Measure how Logographic a System is" article that proposes a novel measure of the degree of logography that uses an attention based sequence-to-sequence model trained to predict the spelling of a token from its pronunciation in context.
If you are using this code, please cite the respective article:
@article{sproat:gutkin:cl:2021, title = {The Taxonomy of Writing Systems: How to Measure how Logographic a System is}, author = {Richard Sproat and Alexander Gutkin}, journal = {Computational Linguistics}, volume = {47}, number = {3}, pages = {477–-528}, year = {2021}, month = sep, doi = {https://doi.org/10.1162/coli_a_00409}, publisher = {MIT Press},}
We will expanding this documentation in due course. In the meantime, please
refer to the README
files in individual subdirectories.