Method of Summarizing Text Using Just the Text
Aliases:KODA text-similarity measure
Technical Challenge:To quickly and effectively process overwhelmingly large textual data sets to make it easy to study and understand. It may prove to be a powerful aid in Knowledge Management as well as in Data Mining.
Description:The KODA similarity measure may be used to extract several short passages from a document that are indicative of the content of the whole document. Since KODA does not rely on document formatting, linguistic information, nor require any training, it may be used on large diverse data sets comprised of text document in various forms and languages. Furthermore it is fast and easily adapted to data sets in varying character sets, languages, and media.
Thus KODA may be used in the following way:
The code may read in thousands of documents, and will publish two or three (this may be user-defined) sentences or passages from each document. Thus the content of each of the thousands of documents may be determined from the representative passages. KODA also includes an option to identify the longer passages that each published passage is most closely related to, leading to a rudimentary outline of each document.
Demonstration Capability:The software can be easily demonstrated.
Potential Commercial Application(s):The KODA text-similarity measure has implications for search-engines as well as for single-document and multi-document summarization technologies. It may prove to be a powerful aid in Knowledge Management as well as in Data Mining.
Patent Status:Issued: United States Patent Number 6904564
Reference Number: 1199
If you are interested in exploring this technology further, please express your interest in writing to the:
National Security Agency
Date Posted: Jan 15, 2009 | Last Modified: Jan 15, 2009 | Last Reviewed: Jan 15 2009