ProtDomRetriever is a simple Python tool for retrieving protein domain information from the InterPro database based on UniProtKB accessions and specified InterPro entries. It parses InterPro JSON data and retrieves domain positions from a protein dataset.
Created by Nicolas-Frédéric Lipp, PhD.
- Retrieve domain information for multiple UniProtKB accessions
- Filter domains based on specified InterPro entries
- Generate TSV output with domain ranges
- Create FASTA files for the retrieved protein domains
- User-friendly GUI for file selection
- Python 3.6+
- Required Python packages:
- tkinter
- requests
- Clone this repository: git clone https://github.com/yourusername/ProtDomRetriever.git
- Navigate to the project directory: cd ProtDomRetriever
- Install required packages: pip install -r requirements.txt
Run the script using Python:
python ProtDomRetriever.py
Follow the on-screen prompts to:
- Select an input file containing UniProtKB accessions
- Enter InterPro entries for domain filtering
- Choose whether to fetch FASTA files for the protein domains
The script generates three main output files in a new directory:
*_result_table.tsv
: A tab-separated file containing protein accessions, InterPro entries, and domain ranges*_domain_ranges.txt
: A text file listing the domain ranges for each protein*_output_domains.fasta
: A FASTA file containing the sequences of the retrieved protein domains (if FASTA retrieval is selected)
Two example datasets are provided in the examples
directory:
- ORP dataset (
example1
) - Spectrin dataset (
example2
)
Each example includes input files, suggested InterPro entries, and sample output files.
Users can always use the content of the output file *_domain_ranges.txt
at https://www.uniprot.org/id-mapping to map UniProtKB AC/ID to UniProtKB and retrieve the sequences manually, for instance as a comprehensive Excel file.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
If you encounter any problems or have any questions, please open an issue on the GitHub repository.
Nicolas-Frédéric Lipp, PhD
https://github.com/NicoFrL
This project was developed with the assistance of AI language models, which provided guidance on code structure, best practices, and documentation. The core algorithm and scientific approach were designed and implemented by the author.