Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xena_dataset.py: Add logging functionality #45

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ayan-b
Copy link
Collaborator

@ayan-b ayan-b commented Jun 15, 2019

No description provided.

@ayan-b ayan-b force-pushed the improvements branch 3 times, most recently from c17e851 to 30cb8aa Compare June 15, 2019 09:09
@ayan-b ayan-b changed the title xena_dataset.py: Raise exception if gdc.search() fails xena_dataset.py: Improvements Jun 15, 2019
Copy link
Collaborator

@yunhailuo yunhailuo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I never know for sure? Is there a buffer for log file so that you can't trace the progress in real time?

@@ -36,6 +37,8 @@
GDC_RELEASE_URL,
)

logging.basicConfig(format='%(asctime)s - %(message)s', level=logging.INFO)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, logger is configured in gdc2xena.py. I believe you should getLogger as self.logger when an object is instantiated: https://docs.python.org/3/howto/logging.html#advanced-logging-tutorial

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Difficulties would be the functions below. It's not quite normal to have a logger passed over to a function, is it? Maybe we can leave them along for now and just deal with the dataset classes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I was wondering how can I get the root_dir without making significant changes in the code.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we use the logger in the module level, functions can be covered as well.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. You don't get root_dir. You get the instance. Example: yunhailuo@d72d5ad

I run xge etl -p TCGA-OV -t muse_snv & and get a log as:

2019-06-16 22:31:34 Xena-GDC-ETL [INFO]: Importing [1/1] projects: TCGA-OV
2019-06-16 22:31:34 root [INFO]: Starts to download...
2019-06-16 22:31:34 root [INFO]: This is from read_by_ext.
2019-06-16 22:31:34 root [ERROR]: Test error message!
2019-06-16 22:31:34 Xena-GDC-ETL [WARNING]: No muse_snv data for cohort TCGA-OV.
Traceback (most recent call last):
  File "/Users/yunhailuo/Github/xena-GDC-ETL/xena_gdc_etl/gdc2xena.py", line 87, in gdc2xena
    dataset.download().transform().metadata()
  File "/Users/yunhailuo/Github/xena-GDC-ETL/xena_gdc_etl/xena_dataset.py", line 652, in download
    raise ValueError(msg)
ValueError: Test error message!

I'm not sure if your code has something special or you haven't notice this. If I do what you've done here in my test branch. I end up with having the log going to stdout instead of a file. To me, this is because xena_database.py is imported before logging.basicConfig in gdc2xena.py which makes the config here take precedence because "This function does nothing if the root logger already has handlers configured for it."

The reason I don't like logging/logger in function is not because they can't be covered. Those function are in a way general functions. I don't want to have log there (which is equivalent to print when there is no log) to mess up other potential logs. But I guess we can use logger since it couldn't hurt that much for a log... And they are not really "general"...

Copy link
Collaborator Author

@ayan-b ayan-b Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, another basicConfig is not required.

For 31b69b6 + basicConfig I end up with 2 logs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you help me with this (yunhailuo@a447162) or share the code again? I couldn't get 2 logs and wondering how that works.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, two logs can be useful to accommodate your unfinished.json and long less useful info. I just don't know a simple clean way to do it.

Copy link
Collaborator Author

@ayan-b ayan-b Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay, I used some FileHandler if I remember correctly. Sending the code in a while.
UPD: Something like:

log_format = '%(asctime)-15s %(name)s [%(levelname)s]: %(message)s'
logger = logging.getLogger('Xena-GDC-ETL')
logger.setLevel(logging.INFO)
formatter = logging.Formatter(log_format)
file_handler = logging.FileHandler('etl_' + time.strftime("%Y%m%d-%H%M%S") + '.err',)
file_handler.setFormatter(formatter)
logger.addHandler(file_handler)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worry. That's the way I don't prefer for now. I thought your initial code without handler will do.

@@ -475,7 +487,8 @@ def projects(self, projects):
elif isinstance(projects, list):
self._projects = projects
else:
raise ValueError('"projects" must be either str or list.')
logging.exception('"projects" must be either str or list.')
raise ValueError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Errors raised should still have their message. Same below. You can do a msg string.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How is this resolved?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I must have misunderstood. Can you please clarify? I thought something like this: https://github.com/yunhailuo/xena-GDC-ETL/pull/45/files#diff-ff44a6f244700f099c71d2285046c32aR487 .

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry. I didn't notice that. You are right.

A side note: is it a right use of logging.exception? Should this function "only be called from an exception handler"?

Copy link
Collaborator Author

@ayan-b ayan-b Jun 17, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to work, we can replace this by logging.error(msg, exc_info=True).

xena_gdc_etl/xena_dataset.py Outdated Show resolved Hide resolved
xena_gdc_etl/xena_dataset.py Outdated Show resolved Hide resolved
xena_gdc_etl/xena_dataset.py Outdated Show resolved Hide resolved
xena_gdc_etl/xena_dataset.py Outdated Show resolved Hide resolved
@ayan-b
Copy link
Collaborator Author

ayan-b commented Jun 16, 2019

I think logging.infos in this PR maybe done using print. They are populating the log file too much.

@yunhailuo
Copy link
Collaborator

I think logging.infos in this PR maybe done using print. They are populating the log file too much.

We should trim it if that's reasonable. Wondering how the log currently looks like before we trimming it?

@ayan-b
Copy link
Collaborator Author

ayan-b commented Jun 17, 2019

I didn't do full run, but it looks like this:

2019-06-16 21:30:37 [INFO]: Starts to download...
2019-06-16 21:30:37 [INFO]: Searching for raw data ...
2019-06-16 21:30:40 [INFO]: 
499 files found for mirna data of ['TCGA-OV'].
2019-06-16 21:30:42 [INFO]: 
[1/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-1512-01A.bbc678f8-ce50-43ae-96dd-b7e1da51ebab.txt" ...
2019-06-16 21:30:43 [INFO]: 
[2/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-0912-01A.3c32784a-6a7b-4cb7-888a-387dc1bb66ee.txt" ...
2019-06-16 21:30:45 [INFO]: 
[3/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-61-2094-01A.2fabb2d2-ffd2-4fa0-accf-360cbcf0002b.txt" ...
2019-06-16 21:30:47 [INFO]: 
[4/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-0924-01A.461c4bf9-7207-457d-9113-8a0364a141c6.txt" ...
2019-06-16 21:30:49 [INFO]: 
[5/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-23-2072-01A.305f5852-36a9-4e9f-a1e6-e120799da4b8.txt" ...
2019-06-16 21:30:51 [INFO]: 
[6/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-A5FU-01A.acbe9b7c-259d-4a60-827e-5611764a03d2.txt" ...
2019-06-16 21:30:53 [INFO]: 
[7/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-0805-01A.60f18ee9-9203-43c0-b00e-5e99912ba7ff.txt" ...
2019-06-16 21:30:55 [INFO]: 
[8/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-24-1563-01A.08782f35-bac0-436a-8591-4a7e85d23fc9.txt" ...

@yunhailuo
Copy link
Collaborator

yunhailuo commented Jun 17, 2019

I didn't do full run, but it looks like this:

2019-06-16 21:30:37 [INFO]: Starts to download...
2019-06-16 21:30:37 [INFO]: Searching for raw data ...
2019-06-16 21:30:40 [INFO]: 
499 files found for mirna data of ['TCGA-OV'].
2019-06-16 21:30:42 [INFO]: 
[1/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-1512-01A.bbc678f8-ce50-43ae-96dd-b7e1da51ebab.txt" ...
2019-06-16 21:30:43 [INFO]: 
[2/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-0912-01A.3c32784a-6a7b-4cb7-888a-387dc1bb66ee.txt" ...
2019-06-16 21:30:45 [INFO]: 
[3/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-61-2094-01A.2fabb2d2-ffd2-4fa0-accf-360cbcf0002b.txt" ...
2019-06-16 21:30:47 [INFO]: 
[4/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-0924-01A.461c4bf9-7207-457d-9113-8a0364a141c6.txt" ...
2019-06-16 21:30:49 [INFO]: 
[5/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-23-2072-01A.305f5852-36a9-4e9f-a1e6-e120799da4b8.txt" ...
2019-06-16 21:30:51 [INFO]: 
[6/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-A5FU-01A.acbe9b7c-259d-4a60-827e-5611764a03d2.txt" ...
2019-06-16 21:30:53 [INFO]: 
[7/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-13-0805-01A.60f18ee9-9203-43c0-b00e-5e99912ba7ff.txt" ...
2019-06-16 21:30:55 [INFO]: 
[8/499] Downloading to "/run/media/ayanb/Development/Open-Source/xena/GDC_data/TCGA-OV/Raw_Data/mirna/TCGA-24-1563-01A.08782f35-bac0-436a-8591-4a7e85d23fc9.txt" ...

Did you say:

end does work for logs, \r and \n work.

?
Seems like that doesn't work? Or am I missing something?

@ayan-b
Copy link
Collaborator Author

ayan-b commented Jun 17, 2019

Sorry, I meant to say end doesn't work. 😅

@ayan-b ayan-b force-pushed the improvements branch 2 times, most recently from a9614ca to 0e20436 Compare June 17, 2019 07:27
@ayan-b ayan-b changed the title xena_dataset.py: Improvements xena_dataset.py: Add logging functionality Jun 17, 2019
@yunhailuo
Copy link
Collaborator

yunhailuo commented Jun 17, 2019

I'd say put this on hold and re-focus on data importing for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants