Search-query is a Python package for parsing, validating, simplifying, and serializing literature search queries. It currently supports PubMed and Web of Science, and can be extended to support other databases. As a default it relies on the JSON schema proposed by an expert panel (Haddaway et al., 2022). The package can be used programmatically or through the command line, has zero dependencies, and can therefore be integrated in a variety of environments. The heuristics, parsers, and linters are battle-tested on over 500 peer-reviewed queries registered at searchRxiv.
To install search-query, run:
pip install search-query
To create a query programmatically, run:
from search_query import OrQuery, AndQuery
# Typical building-blocks approach
digital_synonyms = OrQuery(["digital", "virtual", "online"], search_field="Abstract")
work_synonyms = OrQuery(["work", "labor", "service"], search_field="Abstract")
query = AndQuery([digital_synonyms, work_synonyms], search_field="Author Keywords")
Parameters:
- list of strings or queries: strings which you want to include in the search query,
- search field: search field to which the query should be applied (available options: TODO: GIVE EXAMPLES AND LINK TO DOCS)
TODO : implement a user-friendly version of OrQuery / AndQuery, which accepts lists of strings/queries and search_fields as strings
To load a JSON query file, run the parser:
from search_query.search_file import SearchFile
from search_query.parser import parse
search = SearchFile("search-file.json")
query = parse(search.search_string, syntax=search.platform)
Available platform identifiers are listed here.
To validate a JSON query file, run the linter:
from search_query.linter import run_linter
messages = run_linter(search.search_string, syntax=search.platform)
print(messages)
Linter messages are documented and explained here.
To simplify and format a query, run:
query.format(*tbd: how to select/exclude rules?*)
To translate a query to a particular database syntax and print it, run:
query.to_string(syntax="ebsco")
query.to_string(syntax="pubmed")
query.to_string(syntax="wos")
To write a query to a JSON file, run the serializer:
from search_query import save_file
save_file(
filename="search-file.json",
query_str=query.to_string(syntax="wos"),
syntax="wos",
authors=[{"name": "Tom Brady"}],
record_info={},
date={}
)
Linters can be run on the CLI:
search-query lint search-file.json
Linters can be included as pre-commit hooks by adding the following to the `.pre-commit-config.yaml:
repos:
- repo: local
hooks:
- id: search-file-lint
name: Search-file linter
entry: search-file-lint
language: python
files: \.json$
To activate and run:
pre-commit install
pre-commit run --all
TODO: main citation
The package was developed as part of Bachelor's theses:
- Ernst, K. (2024). Towards more efficient literature search: Design of an open source query translator. Otto-Friedrich-University of Bamberg.
This python package was developed with purpose of integrating it into other literature management tools. If that isn't your use case, it migth be useful for you to look at these related tools:
This project is distributed under the MIT License.