Content Recycling Sunsets Translation Memory

Imagine you’re tasked with translating a revised 50-page contract that only has four changes. Would you opt for machine translation? Probably not, as the contract was already translated by a human, and the unchanged parts need to remain as they are. What about using a Translation Memory (TM)? Again, not the best option, considering the cost associated with every 100% match. The most efficient method would be to compare the two versions, identify the changes, and adjust only those few parts in the translation.

Problem is, how do you know that there were only four changes? The complexity increases when dealing with a mix of sections featuring minor to significant modifications or entirely new content. Fortunately, our technology offers a solution. The Content Recycler can analyze your document and apply the best translation strategy for each section.

Moving Beyond Outdated Architectures

Traditional segment-based TMs maintain a one-to-one correlation between source and target texts, disregarding context by storing segments only once. This approach, developed more than 30 years ago, was initially necessary for data compression due to hardware limitations. Surprisingly, even with those constraints long gone, the fundamental architecture of such systems, including the pioneer TRADOS, has seen little innovation. Attempts to address this, such as ICE-matches, offer a workaround at best and struggle with compatibility issues with standards like TMX.

Introducing the Content Repository

The solution lies in evolving from segment pairs to a Content Repository that operates similarly to a CMS. This repository stores bilingual XLIFF files and creates a sophisticated index, allowing for the contextual and variable-length recycling of translated content. This method not only ensures the reuse of content within its original context but also saves unnecessary review efforts and costs. The new data model also accommodates multiple translations for the same source segment. A smart recycling algorithm utilizes metadata and a taxonomy, a Multilingual Knowledge System, to select the most appropriate translation.

Content Recycling reduces unnecessary review efforts and costs. It marks a shift for organizations aiming to elevate their translation processes.

The Content Repository is a key component of the Language Factory, replacing the CAT tool-based process with an efficient multilingual data factory which makes optimal use of AI, NLP, and human know-how.

Tackling the Issue of Dirty TMs

Another significant drawback of traditional TMs is the rapid accumulation of outdated or incorrect data, making maintenance a daunting task. Attempts to clean these databases through manual correction or, as recently suggested, using AI are often futile due to the sheer volume of out-of-context segments. Users are left with drastic measures like deleting all old entries or starting anew, resulting in a considerable loss of potentially valuable data.

The Content Repository simplifies maintenance by automatically selecting the most relevant and up-to-date content for recycling. It softly phases out obsolete content while preserving unique translations that may still be valuable. Should there be a need to remove specific content, such as discontinued product lines or divested business units, the process is straightforward: remove the associated XLIFF files, re-index, and you’re done. This system also handles terminology updates, such as deprecated terms, efficiently. It prompts for corrections only when the segments containing these terms are reused, thereby avoiding efforts on segments mostly never used again.

Beyond Translation

Embracing Content Recycling over TMs marks a shift for organizations aiming to elevate their translation processes. This advanced approach not only boosts efficiency but ensures that only pertinent and up-to-date content is brought forward for reuse. The real game-changer, however, lies in the unparalleled linguistic value a multilingual Content Repository offers compared to a conventional TM.

A Content Repository is a critical asset for businesses implementing a Language Operations (LangOps) strategy. It provides high-quality data for textual AI projects, empowering enterprises to innovate and enhance their linguistic models. A Content Repository not only streamlines translation workflows but also paves the way for advanced AI applications, making it an indispensable part of modern language management.

Moving Beyond Outdated Architectures

Introducing the Content Repository

Tackling the Issue of Dirty TMs

Beyond Translation

Related articles

Harvest Your Process Data!

Discovering Data Treasures for Your LLM

Charting the Journey of AI-Powered Language Factories