Skip to content

shashankms9/bpa

Repository files navigation

Document Processing Accelerator

Overview

Azure Static Web Apps allows you to easily build React apps in minutes. Use this repo with the React quickstart to build and customize a new static site and automate the deployment of a functional, and customizable, POC UI for document processing. This guide will present a high-level overview of the deployment architecture, with a step-by-step instructional guide for immediate deployment, without any coding.


Architecture

Once you've created a high-level Resource Group, you'll create a high-level Azure DevOps pipeline and import/clone this repo, automatically importing helper libraries and taking advantage of Azure functions to deploy the set of Azure Cognitive Services and manage all of the new Azure module credentials, in the background, within your newly created pipeline. Once the pipeline is deployed, a static webapp will be created with your newly customizable POC UI for document processing!

Currently Included Algorithms

The initial release includes two top NLP use cases, text classification, and custom named entity recognition. Additional tasks and models are on the roadmap for inclusion (see Roadmap section later in this document).

Text Classification

Text classification is a supervised learning method of learning and predicting the category or the class of a document given its text content. The state-of-the-art methods are based on neural networks of different architectures as well as pre-trained language models or word embeddings.

Custom Named Entity Recognition

Named Entity Recognition (NER) is the task of detecting and classifying real-world objects mentioned in text. Common named entities include person names, locations, organizations, etc. The state-of-the art NER methods include combining Long Short-Term Memory neural network with Conditional Random Field (LSTM-CRF) and pretrained language models like BERT.

NER usually involves assigning an entity label to each word in a sentence, such as the entities shown below:

O: Not an entity (i.e. All other words)
I-LOC: Location
I-ORG: Organization
I-PER: Person

There are a few standard labeling schemes and you can find the details here. The data can also be labeled with custom entities as required by the use case.

Prerequisities

  1. Github account
  2. Ensure your subscription has Microsoft.DocumentDB enabled
    To check:
    a. Go to your subscription within portal.azure.com
    b. Select Resource Providers at bottom of left navigation pane
    c. Within the Filter by name menu, search for Microsoft.DocumentDB
    d. Once Microsoft.DocumentDB is found, check if the status is marked as "Registered". If marked as "NotRegistered", Select "Register"
    Note:This process may take several seconds/minutes, be sure to refresh the entire browser periodically
  3. Ensure that you have accepted terms and conditions for Responsible AI "You must create your first Face, Language service, or Computer Vision resources from the Azure portal to review and acknowledge the terms and conditions. You can do so here: Face, Language service, Computer Vision. After that, you can create subsequent resources using any deployment tool (SDK, CLI, or ARM template, etc) under the same Azure subscription."

Installation Steps

1. Clone the starter backend repo

Clone https://github.com/jameshoff-msft/bpa-backend to your github account
Note: a Microsoft organization github account is not required

2. Create a Resource Group in your Azure Portal

Select your preferred Region

3. Setting up Azure DevOps Pipeline

Note: You'll use Azure DevOps for running the multi-stage pipeline with build. If you don't already have an Azure DevOps organization, create one by following the instructions at Quickstart: Create an organization or project collection.)

1. Navigate to Azure DevOps www.dev.azure.com

2. Create a new Project

Type in your Project name. And Select a Visibility setting (currently tested with Private)

3. Select Repos in left Navigation pane

4. Select Import a Repository

Select Git for Repository type. Paste the quick start repo https://github.com/jameshoff-msft/bpa-backend into the CLone URL* field. This repo is used for the POC backend, e.g. creating backend Cognitive Service, Azure functions, and managing credentials

Note: You may leave Requires Authentication unchecked
Cloning may take several minutes.

Your cloned repository should mirror the below directory:

5. Navigate to Project Settings

6. Create Service Connection

This Service Connection will allow Azure DevOps to manage resources within your newly created Resource Group

  1. Click Service Connections in left navigation pane

  2. Select Create service connection - This authorizes Azure DevOps to manage your Azure resources on your behalf.
    Select Next.

  3. Select Azure Resource Manager Note: Service principal option is recommended

  4. Select your subscription level a. Subscription level scope is recommended.
    b. Select your Subscription.
    c. Define Service Connection name (save the Service Connection name for reuse in the subsequent steps
    Note :Recommended all lower case alphanumeric only
    d. check the box for Grant access permission to all pipelines

  5. Input the same Resource group and Service connection name

  6. Select the checkbox for "Grant access permission to all pipelines
    Note alphanumeric lower case only as multiple azure services and resources are being used with different naming convention restrictions

7. Define Pipeline

  1. Navigate back to Pipelines in your left Navigation Pane
  2. Select Create Pipeline
  3. Select Azure Repos Git
  4. Select your previously cloned repo

8. Clone UI repo

This repo is used for the POC front end.
Fork the below repository to the same Github that was used previously https://github.com/jameshoff-msft/bpa-engine-frontend

  1. Ensure you are still logged into your github repo
  2. Navigate to the above repo
  3. Select Fork in upper right menu
  4. Select your github account
    We will use the link (github.com//bpa-engine-frontend) to this newly forked repo in the next steps

9. Review your Pipeline YAML

We'll only need to update lines 12-17, with the following instructions instructions

  1. Azure subscription = service connection previously created
  2. Fill in Project name - must be unique (this name is used across most of the services created during this accelerator)
  3. Fill in resource group name
  4. Select your desired location
  5. Select your previously cloned repo's bpa-engine-frontend URI.
  6. Find your repository token i. On your github repo page, click your profile
    ii. Select Settings
    iii. Select Developer settings at bottom of left navigation pane
    iv. Select Personal access tokens
    v. Select Generate personal access token
    vi. Under Select scopes, select the checkbox for workflow
    vii. Add your own description
    viii. Select Generate token
    ix. Copy your newly generated token
    Note: be sure to save this token for completing pipeline setup, else this token will need to be regenerated
    v. Paste your newly generated token in the repositoryToken field
    vi. Under Select scopes, select the checkbox for workflow

4. Save and Run!

Insert any commit message. You should see the pipeline stages workflow updating. Pipeline deployment will generally take several minutes. Monitor the status of your runs:

You can drill into each stage for a more detailed log.

5. Launch App

  1. Navigate to your Resource Group within your Azure Portal
  2. Select your static webapp
  3. Within the default Overview pane, Select your URL to navigate to the WebApp, this take you to the newly launched WebApp!

6. Load Documents!

Use the Select PDF File to load your documents
Note: your documents should be in pdf/image format. The first document loaded may take several minutes. However, all subsequent documents should be processed much faster

Check for you newly found custom entities!

You can further customize your UI via the front end repo https://github.com//bpa-engine-frontend. Simple instructions on how to quickly do so are coming soon

Contacts

Please reach out to the AI Rangers for more info or feedback aka.ms/AIRangers

Roadmap

Priority Item
Impending Adding instructions on basic UI customizations (e.g. Adding header graphics, changing title, etc..)
Impending Add standard NER capability from Language Service What is Named Entity Recognition (NER) in Azure Cognitive Service for Language
TBD Add text summarization
TBD ...

References

Subject Source (Link)
React source template This project was bootstrapped with Create React App
Custom NER https://github.com/microsoft/nlp-recipes/tree/master/examples/named_entity_recognition
Text Classification https://github.com/microsoft/nlp-recipes/tree/master/examples/text_classification

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages