Skip to content

Deidentify people's names along with pronoun substitution

License

Notifications You must be signed in to change notification settings

jftuga/deidentify

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

deidentify

Deidentify people's names along with pronoun substitution

Synopsis

This is a command-line program used to substitute a person's given name and/or surname along with any gender specific pronouns. A Windows GUI for this program is also available.

Example

Input:
I think John Smith likes programming. You can tell he enjoys using Python.

Output:
I think PERSON likes programming. You can tell HE/SHE enjoys using Python.

Configuration

Installation

git clone https://github.com/jftuga/deidentify.git
python -m venv deidentify
cd deidentify
(Windows) - scripts\activate
(Linux/MacOS) - source bin/activate
python -m pip install --upgrade pip
pip install setuptools wheel
pip install spacy
python -m spacy download en_core_web_trf

Usage

usage: deidentify.py [-h] -r REPLACEMENT [-o OUTPUT_FILE] [-H] input_file

positional arguments:
  input_file            text file to deidentify

optional arguments:
  -h, --help            show this help message and exit
  -r REPLACEMENT, --replacement REPLACEMENT
                        a word/phrase to replace identified names with
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
                        output file
  -H, --html            output in HTML format

Operation

-- Windows 

cd deidentify
scripts\activate
python deidentify.py -r PERSON -o output.txt input.txt
diff input.txt output.txt

-- Linux

cd deidentify
source bin/activate
python deidentify.py -r PERSON -o output.txt input.txt
diff input.txt output.txt

-- HTML Output

python deidentify.py -H -r PERSON -o output.htm input.txt

Possible Misses

These are listed as possible_misses in an intermeadiate JSON file named input--tokens.json when using input.txt as the input file.

About

Deidentify people's names along with pronoun substitution

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages