StringMatch Pro - Extract terminology and repetitive strings with ease


Extracting Terminology

In the usual approach, a specialist of the subject matter extracts the terms that belong into the terminology.

StringMatch Pro applies a reverse approach: It lets you discard words that do not belong in the terminology.
This process can be iterated until a satisfactory "focus" on the subject matter is achieved.

As the final terminology is usually a tiny fraction of the entire text, chances are that you will need to discard a large fraction of the text you analyze. It's like washing gold in a brook.

Also, this work is not easily reproducible: different experts may arrive at partly different results, and there is no easy way to determine why.

StringMatch Pro provides a solution to both problems:

- It enables you to discard unneeded words singly or wholesale; and
- Reproducibility is built in: the program will only discard words you specify, so your elimination work is documented.

This approach also has the considerable advantage that as you discard unneeded words, less and less text remains for you to check again in the next iteration, and the remaining text will only contain words you have not yet decided upon.

Normally, words like 'the', 'a', 'an', 'was', 'is', past tenses of verbs, and quite a few other words and word groups do not belong into the terminology.
If you provide a list of these words in a text file, StringMatch Pro will discard them from the analyzed text, and output only the surviving strings that do not contain the 'trash' words you have provided.
You can then use the program's output to copy further words into your 'trash' list for the next iteration.

Nor do you have to pick trash words one by one: intelligent sorting options enable you copy/paste whole blocks of text from the program's output into your 'trash' list. One such option is backwards sorting, where words ending in 'a' are listed before words ending in 'b', etc. This means that linguistically similar words are grouped together, so you can copy/paste all past tenses, or all gerund forms, or all adjectives, etc. into your file of words to be discarded.

While expert knowledge is still needed to create a good terminology or glossary, StringMatchPro takes much of the hassle out of this work, and lets you concentrate on what matters.