Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DrugBank 5.0: extract pubmed IDs for references #2

Open
AlexanderHauser opened this issue Aug 21, 2017 · 3 comments
Open

DrugBank 5.0: extract pubmed IDs for references #2

AlexanderHauser opened this issue Aug 21, 2017 · 3 comments

Comments

@AlexanderHauser
Copy link

AlexanderHauser commented Aug 21, 2017

ref_text = protein.findtext("{ns}references[@format='textile']".format(ns=ns))

doesn't seem to catch anything on the latest drugbank 5 release.

Any bugfix for this?

@dhimmel
Copy link
Owner

dhimmel commented Aug 23, 2017

Looks like DrugBank 5.0 uses a different schema for references. From https://www.drugbank.ca/releases/5-0-7/downloads/all-full-database, I see the following XML:

<references>
<articles>
  <article>
    <pubmed-id>10505536</pubmed-id>
    <citation>Turpie AG: Anticoagulants in acute coronary syndromes. Am J Cardiol. 1999 Sep 2;84(5A):2M-6M.</citation>
  </article>
  <article>
    <pubmed-id>10912644</pubmed-id>
    <citation>Warkentin TE: Venous thromboembolism in heparin-induced thrombocytopenia. Curr Opin Pulm Med. 2000 Jul;6(4):343-51.</citation>
  </article>
  <article>
    <pubmed-id>11055889</pubmed-id>
    <citation>Eriksson BI: New therapeutic options in deep vein thrombosis prophylaxis. Semin Hematol. 2000 Jul;37(3 Suppl 5):7-9.</citation>
  </article>
  <article>
    <pubmed-id>11467439</pubmed-id>
    <citation>Fabrizio MC: Use of ecarin clotting time (ECT) with lepirudin therapy in heparin-induced thrombocytopenia and cardiopulmonary bypass. J Extra Corpor Technol. 2001 May;33(2):117-25.</citation>
  </article>
  <article>
    <pubmed-id>11807012</pubmed-id>
    <citation>Szaba FM, Smiley ST: Roles for thrombin and fibrin(ogen) in cytokine/chemokine production and macrophage adhesion in vivo. Blood. 2002 Feb 1;99(3):1053-9.</citation>
  </article>
  <article>
    <pubmed-id>11752352</pubmed-id>
    <citation>Chen X, Ji ZL, Chen YZ: TTD: Therapeutic Target Database. Nucleic Acids Res. 2002 Jan 1;30(1):412-5.</citation>
  </article>
</articles>
<textbooks/>
<links/>
</references>

So you have to modify parse.ipynb. Perhaps you can create an XPath query to find all pubmed-id subelements of references. Perhaps something like (untested):

pubmed_ids = protein.findall("{ns}references//{ns}pubmed-id".format(ns=ns))
row['pubmed_ids'] = '|'.join(x.text for x in pubmed_ids)

Let us know whether this works. Also pull requests to upgrade this repo to DrugBank 5.0 would be of interest.

@AlexanderHauser
Copy link
Author

AlexanderHauser commented Aug 23, 2017 via email

@dhimmel dhimmel changed the title find references DrugBank 5.0: extract pubmed IDs for references Aug 23, 2017
@khughitt
Copy link

khughitt commented Aug 20, 2019

In case it helps anyone else, the following changes (based on the suggestion above) fixed the issue for me:

pubmed_ids = protein.findall("{ns}references//{ns}pubmed-id".format(ns=ns))
row['pubmed_ids'] = '|'.join([x.text for x in pubmed_ids if x.text is not None])

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants