Category: Software, Data, github

Introduced at AWS re:Invent 2018, Amazon Textract is a machine learning service that automatically extracts text, handwriting and data from scanned documents that goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. In the past few months, we introduced specialized support for processing invoices and receipts and enhanced the quality of the underlying computer vision models that power extraction of handwritten text, forms, and tables with printed text support for English, Spanish, German, Italian, Portuguese, and French. Third-party auditors assess the security and compliance of Amazon Textract as part of multiple AWS compliance programs.

Customers use Amazon Textract to automate critical business process workflows (for example, in claims and tax form processing, loan applications, and accounts payable).

To learn more about Amazon Textract, see this tutorial for extracting text and structured data from a document, this code sample on GitHub, Amazon Textract documentation, and blog posts about Amazon Textract on the AWS Machine Learning Blog.

Related Articles