Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error handling when NCBI connection is lost #29

Closed
tiagofilipe12 opened this issue Apr 15, 2017 · 4 comments
Closed

Error handling when NCBI connection is lost #29

tiagofilipe12 opened this issue Apr 15, 2017 · 4 comments
Assignees
Milestone

Comments

@tiagofilipe12
Copy link
Member

tiagofilipe12 commented Apr 15, 2017

Often I lose connection to NCBI server (for some mysterious reason related with my router), but the interesting thing I found is that when connection is lost, bionode-ncbi returns an endless error message like the following:

try 21https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y
try 22https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y
try 23https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y
...

In this instances it would be better if bionode-ncbi returned a clearer error message. Like

"cannot download assembly Stenotrophomonas ginsengisoli because it was unable to establish a connection to NCBI link (https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?&retmode=json&version=2.0&db=assembly&term=Stenotrophomonas%20ginsengisoli&usehistory=y)"

The current error is not a NDJSON, thus it is not even "parsable" to other programs or scripts. But, I want to discuss if you think a "parsable" type of NDJSON would be useful in any circumstance, because from my experience if I cannot connect, I have to reset my router.
My idea is that we can convert the current string error message to a NDJSON, but would it be useful? In which circumstances?

Alternatively, if we get error messages related with the connection to NCBI, we could output a clearer error message in order to be more user friendly.

@thejmazz
Copy link
Member

I think a standard JSON schema for an "error chunk" could be beneficial, and we could use it across multiple bionode-* modules, maybe something like:

{
   module: 'bionode-ncbi',
   message: 'cannot download assembly Stenotrophomonas ginsengisoli because it was unable to establish a connection to NCBI link',
   sra_id: 12345,
   reason: 'network'
}

and it could be written to stderr while regular output sent to stdout. This would help greatly when handling errors in a big pipeline if each tool emits errors with a standard object schema. Or maybe each "error type" could declare what it's object schema is.

However, not sure how this plays with the paradigm where stderr is logging and stdout is data. Perhaps can emit from same stream (logging stream - stderr), but error object has isError: true or something.

Not sure what exactly this schema should contain (perhaps a simple { tool, message } is enough); might take some experimentation for us to see what is useful/required in a generic error logging utility. ndjson for errors makes perfect sense for reasons of consistency, parsability. Similar to the loading progress, there could be a "--pretty" option for transforming ndjson errors when running commands on CLI (e.g. formatting and red text).

A good place to test this out could be a pipeline that runs the same tasks on a bunch of SRA IDs, and some ideas are preemptively made incorrect (won't resolve in search), and so a bunch of reads for the pipeline won't get downloaded, but we should gracefully handle that case and give as much information to the user as possible.

There is also the issue of not "swallowing" the original errors from a module we imported (just something to remember and try to avoid), for example,

// `errorCb` is in scope

someModule.doStuff(function (err, data) {
   if (err) {
      // the true `err` is lost
      return errorCb(new Error(`Failed downloading ${sraId}`))
  }
 
  console.log(data)
})

@tiagofilipe12
Copy link
Member Author

tiagofilipe12 commented Apr 16, 2017

If we make this a general feature (error handling) I think the easier implementation would be something like {module, message}, since not all modules will have the same identifiers to output to json, however for this specific case, parsing the sra id might be important. For instance, one can be interested in storing

sra_id: 12345

in some text file for posterior checking. So maybe we could append an identifier to a standard NDJSON (global to all bionode) in this module (bionode-ncbi) that has the sra_id (which could be the input provided by the user, e.g., UID or species name) and the type of run that was being performed (download, fetch and so on...).

Then a --pretty option might be easily implemented for those that do not care less for the parsing of downloading error messages, like @thejmazz said.

However, I am concerned if we should move this to a bionode module thread, since it might be considered a general issue for all bionode, and then implement all specific error handling on each module.

@bmpvieira
Copy link
Member

Bioinformatic tools fail a lot for many reasons. Sometimes it's just because of one badly formatted input file out of many that we left running for the weekend while expecting to see some results by Monday. Most pipeline tools behave in a "if-error-crash" way and then wait for a human to fix and rerun them. Because of this, we want our pipelines to be robust and keep running like a web server instead of throwing errors that cause a crash.

So I think that having errors outputted as NDJSON to stderr would make it a lot easier to deal with them. While keeping the pipeline running, we could internally do things like try again, change parameters on the fly, email human, or use an external tool like a realtime GUI dashboard that reads those NDJSONs.

I think this reinforces that --pretty should just in fact be a bionode-pretty module that takes stderr and stdout NDJSON and print them in a human friendly format with colors. So bionode-pretty would be the only module that doesn't output NDJSON. We can still keep --pretty as a a shortcut for each module CLI by just require('bionode-pretty') and piping to it internally if needed, but this means having bionode-pretty as an extra dependency.

bmpvieira added a commit that referenced this issue Apr 22, 2017
Switch from minimist to yargs for CLI parsing
Add list of possible databases with yargs
Indent JSON output when --pretty
Allow piping JSON, even for fetch
bmpvieira added a commit that referenced this issue Apr 22, 2017
Switch from minimist to yargs for CLI parsing
Add list of possible databases with yargs
Indent JSON output when --pretty
Allow piping JSON, even for fetch
bmpvieira added a commit that referenced this issue Apr 22, 2017
Switch from minimist to yargs for CLI parsing
Add list of possible databases with yargs
Indent JSON output when `--pretty`
Allow piping JSON, even for fetch
@bmpvieira
Copy link
Member

In #32 I've changed the code so that in this case we just emit one error and then give a more user friendly message. Also, now --pretty indents the JSON output (similar to bionode-ncbi | json). However, I still think we need to improve a lot how errors are handled in general in all modules, so I think we should continue this discussion in bionode/bionode#41

bmpvieira added a commit that referenced this issue Apr 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants