Transliterated Names: An Automated System for Generating Multiple Spellings of Name Components



Development of algorithms for automated vowelization, algorithms for automated syllabification, algorithms for generating variants of syllables based on phonology, algorithms for automatic generation of mathematical combinations of syllables, and algorithms for elimination of implausible variant elements.


This invention provides automated methods for development and maintenance of knowledge bases of plausible spelling variants of transliterated names. These knowledge bases not only constitute a searchable body of linguistic information, but also support information systems users who need to match transliterated names in large data sets. It enables users to convert names to standard spellings, regardless of the original spelling.

The Transliterated Names System for generating variant spellings is linguistically sound and comprehensive. Statistical methods and regular expression representation, in contrast, over generate lists containing both valid and invalid spellings. When a set of variants is produced using the methods of this invention, however, the non-linguist can trust that the resulting spellings are correctly associated, and need no further interpretation or second-guessing. Matchers based on these sets of variants are superior in precision and recall to other methods, and the match results can be explained to the end user.

This invention permits cost savings and improvement in search time and quality realized by end users are disproportionably high. The transliteration system can be a standalone application, and can support transliteration of whole texts, or searching within native script text for transliterated strings.

Applications based on these inventions can enable users to convert names to standard spellings, regardless of the original spelling and applications that search document collections. A direct application of the automated transliteration system would enable a user who does not know the target language to transform all or part of a native script document into a Romanized form. A name or other string so transformed can then be used to search document collections without further transliteration, and without typing or even recognizing native script. For other applications, see Technology Profile Fact Sheet for the Aladdin Arabic Name Matcher.

