Google Cloud Document AI REST API demo

API Endpoints

Parse US ID Driver License

/v1/documents/countries/us/ids/driver-licenses/parse

Input: US driver license file.

Example output:

{
  "address": "123 MAIN STREET\nAPT. 1\nHARRISBURG, PA 17101-0000",
  "dateOfBirth": "01/07/1973",
  "documentId": "99 999 999",
  "expirationDate": "01/08/2026",
  "familyName": "SAMPLE",
  "givenNames": "ANDREW JASON",
  "issueDate": "01/07/2022",
  "portraitImage": "base64 encoded portrait image"
}

US ID Proof

/v1/documents/countries/us/ids/id-proof

Input: US ID document.

Example output:

{
  "fraudSignalsIsIdentityDocument": "PASS",
  "fraudSignalsSuspiciousWords": "SUSPICIOUS_WORDS_FOUND",
  "evidenceSuspiciousWord": ["SPECIMEN"],
  "evidenceInconclusiveSuspiciousWord": [],
  "fraudSignalsImageManipulation": "PASS",
  "fraudSignalsOnlineDuplicate": "POSSIBLE_ONLINE_DUPLICATE",
  "evidenceHostname": ["theforumnewsgroup.com"],
  "evidenceThumbnailUrl": [
    "https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSSYKZslJGQPVhC8_IAz3wgo1gA2Hv7hO531VyxuP_J0Kgka_o7"
  ]
}

Parse US Passport

/v1/documents/countries/us/ids/passports/parse

Input: US passport file.

Example output:

{
  "address": null,
  "dateOfBirth": "05 FEB 1965",
  "documentId": "E00009349",
  "expirationDate": "09 JUL 2030",
  "familyName": "TRAVELER",
  "givenNames": "HAPPY",
  "issueDate": "10 JUL 2020",
  "mrzCode": "P<USATRAVELER<<HAPPY<<<<<<<<<<<<<<<<<<<<<<<<\nE000093499USA6502056M3007099340006673<085950",
  "portraitImage": ""
}

Parse US Patent

/v1/documents/countries/us/patents/parse

Input: US patent file.

Example output:

{
  "applicantLine1": "Colby Green,",
  "applicationNumber": "679,694",
  "classInternational": "H04W 64/00",
  "classUS": "H04W 64/003",
  "filingDate": "Aug. 17, 2017",
  "inventorLine1": "Colby Green,",
  "issuer": "US",
  "patentNumber": "10,136,408",
  "publicationDate": "Nov. 20, 2018",
  "titleLine1": null
}

Deployment

Pre-Requisites

Create a Google Cloud Organization.
Create a project on your Organization.
Create a tag in your Organization to allow the creation of public services when the service has the tag bound to it. See this article.
Install the gcloud CLI.
Run gcloud auth login.
Run gcloud auth application-default login.
Install terraform.
Own a domain name or be a domain name administrator with the ability to create A records for the domain. This will be required to set up HTTPS.

Bootstrap

This process will:

Enable the required APIs.
Create the Google Cloud Storage bucket that will contain the terraform state.
Store the terraform.tfvars file content on Secret Manager.
Create the service accounts that the system will use and give them their required IAM permissions.
Create the Cloud Build triggers that will actually deploy the workloads.

You should perform this first. To do so:

Fork this repository.
cd into the ./infra/deployment/terraform/bootstrap folder.
Copy the terraform.tfvars.template file into a terraform.tfvars file.
Fill out the variables. You can leave these two empty for now:
- sourcerepo_name
- sourcerepo_branch_name
Comment out the entire contents of the backend.tf file.
Run terraform init.
Run terraform apply -target=module.enable_apis and type yes.
Create a Cloud Source Repository by mirroring your forked Github repository.
Fill out the sourcerepo_name variable with the Cloud Source repository name.
Run terraform apply and type yes.
Uncomment the contents of the backend.tf file and set the bucket attribute to the value of the tfstate_bucket output.
Run terraform init and type yes.

Apps

This process will deploy the actual applications and their supporting infra-structure. To run it:

Go to Cloud Build -> Triggers -> Click the "Run" button on the apps trigger row -> Click the "Run Trigger" button.
Go to Cloud Build -> History, and follow the build's progress.
Go to Load Balancing -> api-url-map, and copy the IP address. Follow this guide for the SSL certificate to be signed and have HTTPS set.

Train US Patent Parser Custom Document Extractor

The US Patent Parser is a Document AI Custom Document Extractor. To train it, follow the steps below:

Go to Key Management -> take note of the location of the doc-ai-key.
Go to Cloud Storage -> click "Create" to create a GCS bucket -> You can name it <some random prefix>-us-patent-parser-v1-0-0-initial-data-import -> For the location, select the same region as the doc-ai-key -> Click "Continue" until the "Choose how to protect object data" section -> open the "Data encryption" accordion, click "Customer-managed encryption key (CMEK)" and select the doc-ai-key as the encryption key -> click "Create". Now click "Upload Folder" and upload the US patents labeled data folder.
Now go to Document AI -> My Processors -> Click the us-patent-parser processor -> Train.
Click "Show Advanced Options" -> Click "I'l specify my own location -> select the <project_id>-us-patent-parser-dataset bucket. Wait for the dataset configuration to finish.
Click the "Import Documents" button -> click "Browse" -> select the bucket you imported the US patents labeled data to and select the labeled folder -> In the "Data split" dropdown on the right, select Auto-split -> click "Import". Wait for the import to finish.

Click "Edit Schema", enable all the labels, set the labels according to the table below, and then click "Save":

Name	Data type	Occurrence
applicant_line_1	Plain Text	Required once
application_number	Number	Required once
class_international	Plain Text	Required once
class_us	Plain Text	Required once
filing_date	Datetime	Required once
inventor_line_1	Plain text	Required once
issuer	Plain text	Required once
patent_number	Number	Required once
publication_date	Datetime	Required once
title_line_1	Plain text	Required once

Go back to the "Train" tab, and click "Train New Version". You can name the version v1-0-0, and then click "Start Training". Wait for the training to finish: it can take more than 1 hour for it to finish.
Check the processor's F1 score: it should show more than 0.9 for all labels.
Go to the Manage Versions tab -> click the three dots on the right of the model version -> click "Deploy version", and wait for it to finish. It can take more than 10 minutes for it to finish.
Click the three dots again and click "Set as default".

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
api		api
data/documents/us/patents		data/documents/us/patents
infra/deployment		infra/deployment
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Google Cloud Document AI REST API demo

API Endpoints

Parse US ID Driver License

US ID Proof

Parse US Passport

Parse US Patent

Deployment

Pre-Requisites

Bootstrap

Apps

Train US Patent Parser Custom Document Extractor

Now your API should be ready to use!

About

Releases

Packages

Languages

License

marcusmonteirodesouza/google-cloud-document-ai-rest-api-demo

Folders and files

Latest commit

History

Repository files navigation

Google Cloud Document AI REST API demo

API Endpoints

Parse US ID Driver License

US ID Proof

Parse US Passport

Parse US Patent

Deployment

Pre-Requisites

Bootstrap

Apps

Train US Patent Parser Custom Document Extractor

Now your API should be ready to use!

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages