Term extraction

Term extraction is an effective method to obtain a termbase from a representative text corpus. Using special software, the key terminology is extracted and made available for further terminological processing. This way, synonyms within the text corpus can be identified, similar terms can be distinguished and short forms can be matched with their full forms. This method makes it possible to build an initial terminological database, to tap into new subject fields or to create guidelines for term formation based on the extracted terminology.

Term extraction especially pays off for companies operating at an international level that have the majority of their documents translated. In such cases, having the correct specialist terminology at hand simplifies the standardisation and pretranslation, which in turn allows for a more efficient translation process, more consistent source and target texts and, which save time and money.

However, term extraction can also be a useful method to use its results as a basis for writing and term formation guidelines. Such a styleguide is important, for instance, if you want to provide for a higher level of consistency within your technical documentation by using authoring support tools.

Terminology extraction tools can be put into two categories: tools using a statistical engine and tools using a linguistic engine. Tools with a statistic engine suggest term candidates mainly based on how frequent they appear within a text corpus, whereas linguistic tools further draw on morphological and syntactical algorithms. For multilingual term extraction statistical methods are used, while linguistic tools deliver term candidate in only one language. Depending on the project and the source and target languages, we decide which tools are most suitable for us to use or to combine. If you require German term candidates only, we always perform a linguistic term extraction as a matter of principle.

Term extraction that follows a certain target and delivers useful results needs considerable knowledge and experience. For this very reason, we are happy to support you with our know-how and years of experience in setting the parameters (choosing and configurating the tools) and choosing the methods (monolingual, multilingual, …) for the extraction that you obtain optimal results for your purposes.

We are happy to offer the following services to you:

  • choosing suitable tools and methods
  • processing a wide variety of file formats (PDF, DOC, XLS, TMX, XML …)
  • extracting and selecting term candidates incl. representative context sentences from your text corpus and deliver them to you in the form of an Excel file
  • delta extraction (checking against an already existing term bank to avoid duplicate entries)
  • adding individually defined metadata to the extracted terms
  • preprocessing the extracted terminology for the creation of a data bank or for the import into an existing data bank
  • languages: German, English, French, Italian, Russian, Spanish