Technical Description: This technique is an extremely simple, fast, completely general method of sorting and retrieving machine-readable text according to language and/or topic. The method is totally independent of the particular languages or topics of interest, and relies for guidance solely upon examples (e.g., existing documents, fragments, etc.) provided by the user. It employs no dictionaries, keywords, stoplists, stemming, syntax, semantics, or grammar; nevertheless, it is capable of distinguishing among closely-related topics (previously considered inseparable) in any language, and it can do so even in text containing a great many errors (typically 10-15% of all characters). The technique can be quickly implemented in software on any computer system, from microprocessor to supercomputer, and can easily be implemented in inexpensive hardware as well. It is directly scalable to very large data sets (millions of documents). U.S. Patent No. 5,418,951.
Commercial Application:
Released: 1993
Reference Number: Acq.
If you are interested in exploring this technology further, please call 443-445-7159 or express your interest in writing to the National Security Agency, Domestic Technology Transfer Program, 9800 Savage Road, Suite 6541, Fort George G. Meade, Maryland 20755-6541.