The Next Wave (TNW) About Subscribe to TNW Archive Security Enhanced Linux What's New Frequently Asked Questions Background Documents License Download Participating Mail List Remaining Work Contributors Related Work Press Releases Information Assurance Research NIARL In-house Research Areas Mathematical Sciences Program Sabbaticals Computer & Information Sciences Research Technology Transfer Advanced Computing Advanced Mathematics Communications & Networking Information Processing Microelectronics Other Technologies Technology Fact Sheets Publications Related Links
Summary Text Extract Association Metric (STEAM)
Aliases:Summary Extract Metric, Summary Association Measure.
Technical Challenge:Several methods currently exist for automatically summarizing a document, however there is no unanimously accepted measure for evaluating and comparing these methods. Given that a certain machine-generated summarization method is commercially regarded as best, the STEAM approach could then be used to rank its competitors against an established industry standard.
Description:STEAM is a new statistical algorithm for determining the amount of "agreement" between two textual summaries, which are generated by extracting specific sentences from the same full-text document. STEAM preserves both sentence position in the document and sentence order given in the summary. This new metric essentially provides an objective correlation-like measure between these summary extracts, i.e., on a normalized scale of -1 to +1, which is easily interpretable and allows for convenient comparisons between distinct machine-generated summarization systems or machine-generated and human-generated systems.
STEAM avoids the major problem of incurring a penalty in the evaluation score for not having exact sentence matches between the summaries. The additional problems of both subjective inputs and lack of a baseline are circumvented by utilizing STEAM since its value is independent of the summary's content or its "relevance judgment" by experts.
Demonstration Capability:This technology can be demonstrated readily by running software developed on a UNIX platform.
Potential Commercial Application(s):The primary utility of this new approach would be in evaluating two (or multiple pairs of) competing machine-generated textual summaries of a single full-text document.
Patent Status:A patent application has been filed with USPTO.
Reference Number: 1250
If you are interested in exploring this technology further, please express your interest in writing to the:
National Security Agency
Date Posted: Jan 15, 2009 | Last Modified: Jan 15, 2009 | Last Reviewed: Jan 15 2009