Document Translation with Azure Translator service

Translator Service is a cloud-based neural machine translation service that is part of the Azure Cognitive Services family of REST APIs. Translator can be used with any operating system and powers many Microsoft products and services used by thousands of businesses worldwide to perform language translation and other language-related operations. In this overview, you'll learn how Translator can enable you to build intelligent, multi-language solutions for your applications across all supported languages.

What is Document Translation?

Document Translation is a cloud-based feature of the Azure Translator service and is part of the Azure Cognitive Service family of REST APIs. The Document Translation API can be used to translate multiple and complex documents across all supported languages and dialects, while preserving original document structure and data format.

This documentation contains the following article types:

  • Quickstarts are getting-started instructions to guide you through making requests to the service.
  • How-to guides contain instructions for using the feature in more specific or customized ways.
  • Reference provide REST API settings, values, keywords, and configuration.

Azure Document Translation key features

Feature Description
Translate large files Translate whole documents asynchronously.
Translate numerous files Translate multiple files across all supported languages and dialects while preserving document structure and data format.
Preserve source file presentation Translate files while preserving the original layout and format.
Apply custom translation Translate documents using general and custom translation models.
Apply custom glossaries Translate documents using custom glossaries.
Automatically detect document language Let the Document Translation service determine the language of the document.
Translate documents with content in multiple languages Use the autodetect feature to translate documents with content in multiple languages into your target language.

Document Translation development options

You can add Document Translation to your applications using the REST API or a client-library SDK:

  • The REST API. is a language agnostic interface that enables you to create HTTP requests and authorization headers to translate documents.
  • The client-library SDKs are language-specific classes, objects, methods, and code that you can quickly use by adding a reference in your project. Currently Document Translation has programming language support for C#/.NET and Python.

Supported document formats

The following document file types are supported by Document Translation:

File type File extension Description
Adobe PDF pdf Portable document file format.
Comma-Separated Values csv A comma-delimited raw-data file used by spreadsheet programs.
HTML html, htm Hyper Text Markup Language.
Localization Interchange File Format xlf. , xliff A parallel document format, export of Translation Memory systems. The languages used are defined inside the file.
Markdown markdown, mdown, mkdn, md, mkd, mdwn, mdtxt, mdtext, rmd A lightweight markup language for creating formatted text.
MHTML mthml, mht A web page archive format used to combine HTML code and its companion resources.
Microsoft Excel xls, xlsx A spreadsheet file for data analysis and documentation.
Microsoft Outlook msg An email message created or saved within Microsoft Outlook.
Microsoft PowerPoint ppt, pptx A presentation file used to display content in a slideshow format.
Microsoft Word doc, docx A text document file.
OpenDocument Text odt An open-source text document file.
OpenDocument Presentation odp An open-source presentation file.
OpenDocument Spreadsheet ods An open-source spreadsheet file.
Rich Text Format rtf A text document containing formatting.
Tab Separated Values/TAB tsv/tab A tab-delimited raw-data file used by spreadsheet programs.
Text txt An unformatted text document.

Legacy file types

Source file types will be preserved during the document translation with the following exceptions:

Source file extension Translated file extension
.doc, .odt, .rtf, .docx
.xls, .ods .xlsx
.ppt, .odp .pptx

Supported glossary formats

The following glossary file types are supported by Document Translation:

File type File extension Description
Comma-Separated Values csv A comma-delimited raw-data file used by spreadsheet programs.
Localization Interchange File Format xlf , xliff A parallel document format, export of Translation Memory systems The languages used are defined inside the file.
Tab-Separated Values/TAB tsv, tab A tab-delimited raw-data file used by spreadsheet programs.
Share the Post:

Related Posts