Term extraction

Term extraction is an effective method to obtain a termbase from a representative text corpus. Using special software, the key terminology is extracted and made available for further terminological processing. This way, synonyms within the text corpus can be identified, similar terms can be distinguished and short forms can be matched with their full forms. This method helps you build an initial stock of terminological data, tap new subject fields by way of terminology or derive guidelines for term formation based on the extracted vocabulary.

Term extraction especially pays off for companies operating at an international level that have the majority of their documents translated. In such cases, having the correct specialist terminology at hand simplifies the standardization of translations and automated pretranslation, which in turn allows for a more efficient translation workflow, more consistent source and target texts, and consequently saves time and money.

However, term extraction can also be a useful method to exploit as a basis for guidelines on orthography and term formation. Such a style guide is important, for instance, if you want to employ authoring support tools employ to attain a higher level of consistency within your technical documentation.

Terminology extraction tools can be put into two categories: tools using a statistical engine and tools using a linguistic engine. Tools with a statistic engine suggest term candidates mainly based on the frequency of occurrence within a text corpus, whereas linguistic tools further draw on morphological and syntactical algorithms. Statistical methods are used for multilingual term extraction, while linguistic tools deliver term candidates in only one language. Depending on the project and the source and target languages, we decide which tools are most suitable for us to use or to combine. If you require German term candidates only, we perform a linguistic term extraction as a matter of principle.

Purposeful term extraction that delivers meaningful results requires considerable knowledge and experience. For this very reason, we are happy to support you with our know-how and years of experience in orchestrating the parameters (choice and configuration of tools) and choosing the methods (monolingual, multilingual, …) for the extraction so that you will obtain optimal results to suit your purposes.

We would be pleased to offer the following services to you:

  • selection of suitable tools and methods
  • processing of a wide variety of file formats (PDF, DOC, XLS, TMX, XML …)
  • extraction and selection of term candidates incl. significant context sentences from your text corpus to be delivered in the form of an Excel file
  • delta extraction (checking against an already existing termbase to avoid duplicate entries)
  • adding user-definable metadata to the extracted terms
  • preprocessing of the extracted terminology to enable the creation of a new terminology database or the import into an existing termbase
  • languages: German, English, French, Italian, Russian, Spanish