Summary Text Extract Association Metric (STEAM)


Technical Challenge:

Several methods currently exist for automatically summarizing a document, however there is no unanimously accepted measure for evaluating and comparing these methods. Given that a certain machine-generated summarization method is commercially regarded as best, the STEAM approach could then be used to rank its competitors against an established industry standard.


STEAM is a new statistical algorithm for determining the amount of "agreement" between two textual summaries, which are generated by extracting specific sentences from the same full-text document. STEAM preserves both sentence position in the document and sentence order given in the summary. This new metric essentially provides an objective correlation-like measure between these summary extracts, i.e., on a normalized scale of -1 to +1, which is easily interpretable and allows for convenient comparisons between distinct machine-generated summarization systems or machine-generated and human-generated systems.

STEAM avoids the major problem of incurring a penalty in the evaluation score for not having exact sentence matches between the summaries. The additional problems of both subjective inputs and lack of a baseline are circumvented by utilizing STEAM since its value is independent of the summary's content or its "relevance judgment" by experts.

Demonstration Capability:

This technology can be demonstrated readily by running software developed on a UNIX platform.

Potential Commercial Application(s):

The primary utility of this new approach would be in evaluating two (or multiple pairs of) competing machine-generated textual summaries of a single full-text document.

Patent Status:

A patent application has been filed with USPTO.

Reference Number: 1250

If you are interested in exploring this technology further, please express your interest in writing to the:

Date Posted: Jan 15, 2009 | Last Modified: Jan 15, 2009 | Last Reviewed: Jan 15 2009