Skip to content

A tool that can mask words that match regular expression, keywords or PII (Personally Identifiable Information) in an image file.

License

Notifications You must be signed in to change notification settings

aws-samples/mask-words-in-image

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

mask-words-in-image

Introduction

A tool that can mask words in an image file. User specifies which words should be masked, it can be any combinations of the following three types.

  • Regular expressions
  • Keywords
  • PII (Personally Identifiable Information)

workflow

Under the hood, it use the Text Detection capability in Amazon Rekognition or AWS Textract to detect the texts in the image, then use Amazon Comprehend to detect PII entities in texts.

There are many use cases for such tooling, e.g masking sensitive information (PII) in the shared images on social media, 2nd hand car sales website and so on. The repository also includes a sample code of using this tool to detect and mask senstive information in Slack.

Requirements

  • Python 3.8+
  • AWS account

Usage

  1. Install the modules pip install -r src/requirements.txt.

  2. Set up your AWS credential.

  3. Follow the below usage to mask your first image. Here are some examples:

    • Mask AWS account ID in the design diagram: python src/mask_it.py -i assets/design_diagram.png -r rules.yaml
    • Mask the keyword "stack-set-id" in the code sample: python src/mask_it.py -i assets/code_sample.png -k stack-set-id
usage: mask_it.py [-h] -i INPUT_FILE [-o OUTPUT_FILE] [-r RULES_FILE] [-k KEYWORDS] [-e EXCLUDE_WORDS] [-t {document,photo}] [--pii] [--pii-confidence-threshold PII_CONFIDENCE_THRESHOLD] [--use-grey-rectangle] [--verbose]

options:
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input-file INPUT_FILE
                        Original image file
  -o OUTPUT_FILE, --output-file OUTPUT_FILE
                        Masked image output file
  -r RULES_FILE, --rules-file RULES_FILE
                        Rules configuration file
  -k KEYWORDS, --keywords KEYWORDS
                        Mask the keywords (case insensitive)
  -e EXCLUDE_WORDS, --exclude-words EXCLUDE_WORDS
                        Exclude the words (case insensitive)
  -t {document,photo}, --image-type {document,photo}
                        Image type (default is document)
  --pii                 Mask Personally Identifiable Information (PII)
  --pii-confidence-threshold PII_CONFIDENCE_THRESHOLD
                        PII detection confidence threshold (in percentage). Default is 0.8
  --use-grey-rectangle  Use grey rectangle masking (default is asterisk)
  --verbose             Debugging mode

Tips

  • Use -t photo if the image is a photo of the real world (by default it is -t document).
  • Use -k <keyword1> -k <keyword2>... to specify mutliple keywords.
  • Use -e <keyword1> -e <keyword2>... to exclude multiple keywords (e.g exclude the words from the rules or pii).
  • Use --pii to mask PII in an image, e.g vehicle registration plate in a car image.
  • Use --pii-confidence-threshold to adjust the PII detection confidence threshold (in percentage).
  • Use --use-grey-rectangle if you prefer grey rectangle rather than asterisks.
  • Use --verbose to see detailed information.

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

About

A tool that can mask words that match regular expression, keywords or PII (Personally Identifiable Information) in an image file.

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Languages