DrugBank 5.0: extract pubmed IDs for references #2

AlexanderHauser · 2017-08-21T21:34:07Z

ref_text = protein.findtext("{ns}references[@format='textile']".format(ns=ns))

doesn't seem to catch anything on the latest drugbank 5 release.

Any bugfix for this?

dhimmel · 2017-08-23T15:49:30Z

Looks like DrugBank 5.0 uses a different schema for references. From https://www.drugbank.ca/releases/5-0-7/downloads/all-full-database, I see the following XML:

<references>
<articles>
  <article>
    <pubmed-id>10505536</pubmed-id>
    <citation>Turpie AG: Anticoagulants in acute coronary syndromes. Am J Cardiol. 1999 Sep 2;84(5A):2M-6M.</citation>
  </article>
  <article>
    <pubmed-id>10912644</pubmed-id>
    <citation>Warkentin TE: Venous thromboembolism in heparin-induced thrombocytopenia. Curr Opin Pulm Med. 2000 Jul;6(4):343-51.</citation>
  </article>
  <article>
    <pubmed-id>11055889</pubmed-id>
    <citation>Eriksson BI: New therapeutic options in deep vein thrombosis prophylaxis. Semin Hematol. 2000 Jul;37(3 Suppl 5):7-9.</citation>
  </article>
  <article>
    <pubmed-id>11467439</pubmed-id>
    <citation>Fabrizio MC: Use of ecarin clotting time (ECT) with lepirudin therapy in heparin-induced thrombocytopenia and cardiopulmonary bypass. J Extra Corpor Technol. 2001 May;33(2):117-25.</citation>
  </article>
  <article>
    <pubmed-id>11807012</pubmed-id>
    <citation>Szaba FM, Smiley ST: Roles for thrombin and fibrin(ogen) in cytokine/chemokine production and macrophage adhesion in vivo. Blood. 2002 Feb 1;99(3):1053-9.</citation>
  </article>
  <article>
    <pubmed-id>11752352</pubmed-id>
    <citation>Chen X, Ji ZL, Chen YZ: TTD: Therapeutic Target Database. Nucleic Acids Res. 2002 Jan 1;30(1):412-5.</citation>
  </article>
</articles>
<textbooks/>
<links/>
</references>

So you have to modify parse.ipynb. Perhaps you can create an XPath query to find all pubmed-id subelements of references. Perhaps something like (untested):

pubmed_ids = protein.findall("{ns}references//{ns}pubmed-id".format(ns=ns))
row['pubmed_ids'] = '|'.join(x.text for x in pubmed_ids)

Let us know whether this works. Also pull requests to upgrade this repo to DrugBank 5.0 would be of interest.

AlexanderHauser · 2017-08-23T20:15:35Z

Thanks for your quick response! Your suggested XPath query seems to work, only 3 entries were *None* is returned, which might be a database issue. I have no further upgrades to the repo for Drugbank 5.0 compatibility, so hence please go forward with this (minor) change.

khughitt · 2019-08-20T23:09:29Z

In case it helps anyone else, the following changes (based on the suggestion above) fixed the issue for me:

pubmed_ids = protein.findall("{ns}references//{ns}pubmed-id".format(ns=ns))
row['pubmed_ids'] = '|'.join([x.text for x in pubmed_ids if x.text is not None])

dhimmel changed the title ~~find references~~ DrugBank 5.0: extract pubmed IDs for references Aug 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DrugBank 5.0: extract pubmed IDs for references #2

DrugBank 5.0: extract pubmed IDs for references #2

AlexanderHauser commented Aug 21, 2017 •

edited by dhimmel

Loading

dhimmel commented Aug 23, 2017

AlexanderHauser commented Aug 23, 2017 via email •

edited by dhimmel

Loading

khughitt commented Aug 20, 2019 •

edited by dhimmel

Loading

DrugBank 5.0: extract pubmed IDs for references #2

DrugBank 5.0: extract pubmed IDs for references #2

Comments

AlexanderHauser commented Aug 21, 2017 • edited by dhimmel Loading

dhimmel commented Aug 23, 2017

AlexanderHauser commented Aug 23, 2017 via email • edited by dhimmel Loading

khughitt commented Aug 20, 2019 • edited by dhimmel Loading

AlexanderHauser commented Aug 21, 2017 •

edited by dhimmel

Loading

AlexanderHauser commented Aug 23, 2017 via email •

edited by dhimmel

Loading

khughitt commented Aug 20, 2019 •

edited by dhimmel

Loading