Amazon Textract - Developer Guide

Amazon Textract

Developer Guide

Amazon Textract Developer Guide

Amazon Textract: Developer Guide

Copyright ? 2023 Amazon Web Services, Inc. and/or its affiliates. All rights reserved. Amazon's trademarks and trade dress may not be used in connection with any product or service that is not Amazon's, in any manner that is likely to cause confusion among customers, or in any manner that disparages or discredits Amazon. All other trademarks not owned by Amazon are the property of their respective owners, who may or may not be affiliated with, connected to, or sponsored by Amazon.

Amazon Textract Developer Guide

Table of Contents

What is Amazon Textract? ................................................................................................................... 1 First-Time Amazon Textract Users ................................................................................................ 2 Working with AWS SDKs ............................................................................................................. 2

How It Works .................................................................................................................................... 4 Detecting Text ........................................................................................................................... 5 Analyzing Documents ................................................................................................................. 5 Analyzing Invoices and Receipts ................................................................................................... 7 Analyzing Identity Documents .................................................................................................... 10 Input Documents ...................................................................................................................... 11 Amazon Textract Response Objects ............................................................................................ 12 Text Detection and Document Analysis Response Objects ...................................................... 12 Invoice and Receipt Response Objects ................................................................................. 33 Identity Documentation Response Objects ........................................................................... 35 Analyze Lending Response Objects ..................................................................................... 36 Item Location on a Document Page ............................................................................................ 43 Bounding Box .................................................................................................................. 44 Polygon ........................................................................................................................... 46 Using Analyze Lending for Document Classification and Extraction ................................................. 46

Getting Started ................................................................................................................................ 51 Step 1: Set Up a User ............................................................................................................... 51 Sign up for an AWS account .............................................................................................. 51 Create an administrative user ............................................................................................ 51 Next Step ........................................................................................................................ 52 Step 2: Set Up the AWS CLI and AWS SDKs ................................................................................. 52 Download AWS CLI and SDK .............................................................................................. 52 Granting Programmatic Access ........................................................................................... 54 Next Step ........................................................................................................................ 56 Step 3: Get Started Using the AWS CLI and AWS SDK API .............................................................. 56 Formatting the AWS CLI Examples ...................................................................................... 56

Processing Documents with Synchronous Operations ............................................................................ 57 Calling Amazon Textract Synchronous Operations ........................................................................ 57 Request ........................................................................................................................... 57 Response ......................................................................................................................... 59 Detecting Document Text ........................................................................................................ 107 Analyzing Document Text ........................................................................................................ 116 Analyzing Invoice and Receipt Documents ................................................................................. 127 Analyzing ID Documents .......................................................................................................... 136

Processing Documents with Asynchronous Operations ......................................................................... 141 Calling Asynchronous Operations ............................................................................................. 141 Starting Text Detection ................................................................................................... 142 Getting the Completion Status of an Amazon Textract Analysis Request ................................. 144 Getting Amazon Textract Text Detection Results ................................................................. 145 Configuring Asynchronous Operations ....................................................................................... 151 Giving Amazon Textract Access to Your Amazon SNS Topic .................................................. 152 Permissions for Output Configuration ............................................................................... 153 Detecting or Analyzing Text in a Multipage Document ................................................................. 154 Performing Asynchronous Operations ............................................................................... 155 Using the Analyze Lending Workflow ........................................................................................ 175 Performing Asynchronous Lending Analysis ....................................................................... 175 Amazon Textract Results Notification ........................................................................................ 180

Handling Throttled Calls and Dropped Connections ............................................................................ 182 Best Practices for Amazon Textract ................................................................................................... 186

Provide an Optimal Input Document ......................................................................................... 186 Use Confidence Scores ............................................................................................................ 186

iii

Amazon Textract Developer Guide

Consider Using Human Review ................................................................................................. 186 Best Practices for Queries ........................................................................................................ 187

Example Queries ............................................................................................................ 187 General Best Practices for Queries .................................................................................... 187 Extracting Cells from Tables ............................................................................................. 187 Extracting Tables using Queries ........................................................................................ 187 Long Answers ................................................................................................................ 187 Passing Only Hints .......................................................................................................... 187 General Phrasing of Questions ......................................................................................... 188 Setting up Pages for Queries ........................................................................................... 188 Best Practices for Bulk Document Uploader ............................................................................... 189 Limits ............................................................................................................................ 189 Tutorials ........................................................................................................................................ 191 Prerequisites .......................................................................................................................... 191 Extracting Key-Value Pairs from a Form Document ..................................................................... 191 Exporting Tables into a CSV File ............................................................................................... 193 Detecting text with an AWS Lambda function ............................................................................ 200 Step 1: Create an AWS Lambda function (console) .............................................................. 201 Step 2: (Optional) Create a layer (console) ......................................................................... 202 Step 3: Add Python code (console) ................................................................................... 203 Step 4: Try your Lambda function .................................................................................... 204 Extracting and Sending Text to AWS Comprehend for Analysis ..................................................... 208 Prerequisites .................................................................................................................. 208 Starting Asynchronous Document Text Detection ................................................................ 208 Processing Your Documents and Sending the Text to Comprehend ........................................ 212 Additional Code Samples ......................................................................................................... 216 Code examples ............................................................................................................................... 218 Actions .................................................................................................................................. 218 Analyze a document ....................................................................................................... 219 Detect text in a document ............................................................................................... 222 Get data about a document analysis job ............................................................................ 225 Start asynchronous analysis of a document ....................................................................... 226 Start asynchronous text detection .................................................................................... 230 Scenarios ............................................................................................................................... 232 Get started with document analysis .................................................................................. 233 Cross-service examples ............................................................................................................ 234 Create an Amazon Textract explorer application ................................................................. 235 Detect entities in text extracted from an image .................................................................. 236 Amazon A2I and Amazon Textract .................................................................................................... 237 Core Concepts of Amazon A2I .................................................................................................. 237 Human Review Activation Conditions ................................................................................ 237 Human review workflow (flow definition) .......................................................................... 238 Human loops ................................................................................................................. 239 Get Started Using Amazon A2I ................................................................................................. 239 Create a Human Review Workflow .................................................................................... 240 Analyze the Document .................................................................................................... 243 Monitor Human Loop ...................................................................................................... 244 View Output Data and Worker Metrics .............................................................................. 245 Security ......................................................................................................................................... 248 Data Protection ...................................................................................................................... 248 Encryption in Amazon Textract ......................................................................................... 249 Internetwork Traffic Privacy ............................................................................................. 249 Identity and Access Management .............................................................................................. 250 Audience ....................................................................................................................... 250 Authenticating With Identities .......................................................................................... 250 Managing Access Using Policies ........................................................................................ 252 How Amazon Textract Works with IAM .............................................................................. 254

iv

Amazon Textract Developer Guide

Identity-Based Policy Examples ........................................................................................ 256 Troubleshooting ............................................................................................................. 259 Logging and Monitoring .......................................................................................................... 260 Monitoring ..................................................................................................................... 260 CloudWatch Metrics for Amazon Textract .......................................................................... 263 Logging Amazon Textract API Calls with AWS CloudTrail ............................................................. 264 Amazon Textract Information in CloudTrail ........................................................................ 264 Understanding Amazon Textract Log File Entries ................................................................ 265 Compliance Validation ............................................................................................................. 267 Resilience .............................................................................................................................. 267 Cross-service confused deputy prevention ................................................................................. 268 Infrastructure Security ............................................................................................................. 269 Configuration and Vulnerability Analysis .................................................................................... 269 VPC endpoints (AWS PrivateLink) ............................................................................................. 270 Considerations for Amazon Textract VPC endpoints ............................................................ 270 Creating an interface VPC endpoint for Amazon Textract ..................................................... 270 Creating a VPC endpoint policy for Amazon Textract ........................................................... 270 API Reference ................................................................................................................................. 272 Actions .................................................................................................................................. 272 AnalyzeDocument ........................................................................................................... 273 AnalyzeExpense .............................................................................................................. 278 AnalyzeID ...................................................................................................................... 284 DetectDocumentText ....................................................................................................... 288 GetDocumentAnalysis ...................................................................................................... 292 GetDocumentTextDetection ............................................................................................. 297 GetExpenseAnalysis ......................................................................................................... 302 GetLendingAnalysis ......................................................................................................... 309 GetLendingAnalysisSummary ........................................................................................... 318 StartDocumentAnalysis ................................................................................................... 322 StartDocumentTextDetection ........................................................................................... 327 StartExpenseAnalysis ....................................................................................................... 331 StartLendingAnalysis ....................................................................................................... 335 Data Types ............................................................................................................................ 339 AnalyzeIDDetections ....................................................................................................... 341 Block ............................................................................................................................. 342 BoundingBox .................................................................................................................. 346 DetectedSignature .......................................................................................................... 347 Document ...................................................................................................................... 348 DocumentGroup ............................................................................................................. 349 DocumentLocation .......................................................................................................... 350 DocumentMetadata ........................................................................................................ 351 ExpenseCurrency ............................................................................................................ 352 ExpenseDetection ........................................................................................................... 353 ExpenseDocument .......................................................................................................... 354 ExpenseField .................................................................................................................. 355 ExpenseGroupProperty .................................................................................................... 357 ExpenseType .................................................................................................................. 358 Extraction ...................................................................................................................... 359 Geometry ...................................................................................................................... 360 HumanLoopActivationOutput ........................................................................................... 361 HumanLoopConfig .......................................................................................................... 362 HumanLoopDataAttributes ............................................................................................... 363 IdentityDocument ........................................................................................................... 364 IdentityDocumentField .................................................................................................... 365 LendingDetection ........................................................................................................... 366 LendingDocument .......................................................................................................... 367 LendingField .................................................................................................................. 368

v

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download