Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Huge description.csv file size? #11

Closed
kvbulusu opened this issue Feb 22, 2020 · 3 comments
Closed

Huge description.csv file size? #11

kvbulusu opened this issue Feb 22, 2020 · 3 comments

Comments

@kvbulusu
Copy link

Hi, a quick qn. When we run generatelabels.py , I see the descriptions.csv is almost 350GB? Not sure how to reduce the file size? Is it dependant on the umls config file? Can you share yours?

@leanderme
Copy link
Owner

Hi, sorry for your troubles. This project was written rather fast and the code base is a mess. The reason is indeed most likely your umls installation. I'm suspecting that you've included to many languages and would recommend to select only those you're interested in (f.i., like in this project only ENG and GER). I don't have access to my workstation atm, but the installation process and configuration settings are well documented.

@leanderme
Copy link
Owner

Closing for now, let me know if the issue persists.

@Duxtie
Copy link

Duxtie commented Mar 11, 2020

Also, What Semantic Types is required, as the descriptions.csv is climbing 1TB.
Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants