/v1/documents/countries/us/ids/driver-licenses/parse
Input: US driver license file.
Example output:
{
"address": "123 MAIN STREET\nAPT. 1\nHARRISBURG, PA 17101-0000",
"dateOfBirth": "01/07/1973",
"documentId": "99 999 999",
"expirationDate": "01/08/2026",
"familyName": "SAMPLE",
"givenNames": "ANDREW JASON",
"issueDate": "01/07/2022",
"portraitImage": "base64 encoded portrait image"
}
/v1/documents/countries/us/ids/id-proof
Input: US ID document.
Example output:
{
"fraudSignalsIsIdentityDocument": "PASS",
"fraudSignalsSuspiciousWords": "SUSPICIOUS_WORDS_FOUND",
"evidenceSuspiciousWord": ["SPECIMEN"],
"evidenceInconclusiveSuspiciousWord": [],
"fraudSignalsImageManipulation": "PASS",
"fraudSignalsOnlineDuplicate": "POSSIBLE_ONLINE_DUPLICATE",
"evidenceHostname": ["theforumnewsgroup.com"],
"evidenceThumbnailUrl": [
"https://encrypted-tbn3.gstatic.com/images?q=tbn:ANd9GcSSYKZslJGQPVhC8_IAz3wgo1gA2Hv7hO531VyxuP_J0Kgka_o7"
]
}
/v1/documents/countries/us/ids/passports/parse
Input: US passport file.
Example output:
{
"address": null,
"dateOfBirth": "05 FEB 1965",
"documentId": "E00009349",
"expirationDate": "09 JUL 2030",
"familyName": "TRAVELER",
"givenNames": "HAPPY",
"issueDate": "10 JUL 2020",
"mrzCode": "P<USATRAVELER<<HAPPY<<<<<<<<<<<<<<<<<<<<<<<<\nE000093499USA6502056M3007099340006673<085950",
"portraitImage": ""
}
/v1/documents/countries/us/patents/parse
Input: US patent file.
Example output:
{
"applicantLine1": "Colby Green,",
"applicationNumber": "679,694",
"classInternational": "H04W 64/00",
"classUS": "H04W 64/003",
"filingDate": "Aug. 17, 2017",
"inventorLine1": "Colby Green,",
"issuer": "US",
"patentNumber": "10,136,408",
"publicationDate": "Nov. 20, 2018",
"titleLine1": null
}
- Create a Google Cloud Organization.
- Create a project on your Organization.
- Create a tag in your Organization to allow the creation of public services when the service has the tag bound to it. See this article.
- Install the gcloud CLI.
- Run
gcloud auth login
. - Run
gcloud auth application-default login
. - Install
terraform
. - Own a domain name or be a domain name administrator with the ability to create A records for the domain. This will be required to set up HTTPS.
This process will:
- Enable the required APIs.
- Create the Google Cloud Storage bucket that will contain the terraform state.
- Store the
terraform.tfvars
file content on Secret Manager. - Create the service accounts that the system will use and give them their required IAM permissions.
- Create the Cloud Build triggers that will actually deploy the workloads.
You should perform this first. To do so:
-
Fork this repository.
-
cd
into the ./infra/deployment/terraform/bootstrap folder. -
Copy the
terraform.tfvars.template
file into aterraform.tfvars
file. -
Fill out the variables. You can leave these two empty for now:
sourcerepo_name
sourcerepo_branch_name
-
Comment out the entire contents of the
backend.tf
file. -
Run
terraform init
. -
Run
terraform apply -target=module.enable_apis
and typeyes
. -
Create a Cloud Source Repository by mirroring your forked Github repository.
-
Fill out the
sourcerepo_name
variable with the Cloud Source repository name. -
Run
terraform apply
and typeyes
. -
Uncomment the contents of the
backend.tf
file and set thebucket
attribute to the value of thetfstate_bucket
output. -
Run
terraform init
and typeyes
.
This process will deploy the actual applications and their supporting infra-structure. To run it:
- Go to Cloud Build -> Triggers -> Click the "Run" button on the
apps
trigger row -> Click the "Run Trigger" button. - Go to Cloud Build -> History, and follow the build's progress.
- Go to Load Balancing ->
api-url-map
, and copy the IP address. Follow this guide for the SSL certificate to be signed and have HTTPS set.
The US Patent Parser is a Document AI Custom Document Extractor. To train it, follow the steps below:
-
Go to Key Management -> take note of the location of the
doc-ai-key
. -
Go to Cloud Storage -> click "Create" to create a GCS bucket -> You can name it
<some random prefix>-us-patent-parser-v1-0-0-initial-data-import
-> For the location, select the same region as thedoc-ai-key
-> Click "Continue" until the "Choose how to protect object data" section -> open the "Data encryption" accordion, click "Customer-managed encryption key (CMEK)" and select thedoc-ai-key
as the encryption key -> click "Create". Now click "Upload Folder" and upload the US patents labeled data folder. -
Now go to Document AI -> My Processors -> Click the
us-patent-parser
processor -> Train. -
Click "Show Advanced Options" -> Click "I'l specify my own location -> select the
<project_id>-us-patent-parser-dataset
bucket. Wait for the dataset configuration to finish. -
Click the "Import Documents" button -> click "Browse" -> select the bucket you imported the US patents labeled data to and select the
labeled
folder -> In the "Data split" dropdown on the right, selectAuto-split
-> click "Import". Wait for the import to finish. -
Click "Edit Schema", enable all the labels, set the labels according to the table below, and then click "Save":
Name Data type Occurrence applicant_line_1 Plain Text Required once application_number Number Required once class_international Plain Text Required once class_us Plain Text Required once filing_date Datetime Required once inventor_line_1 Plain text Required once issuer Plain text Required once patent_number Number Required once publication_date Datetime Required once title_line_1 Plain text Required once -
Go back to the "Train" tab, and click "Train New Version". You can name the version
v1-0-0
, and then click "Start Training". Wait for the training to finish: it can take more than 1 hour for it to finish. -
Check the processor's F1 score: it should show more than
0.9
for all labels. -
Go to the Manage Versions tab -> click the three dots on the right of the model version -> click "Deploy version", and wait for it to finish. It can take more than 10 minutes for it to finish.
-
Click the three dots again and click "Set as default".