Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HED Validator integration into bids schema based validator. #1648

Open
wants to merge 23 commits into
base: master
Choose a base branch
from

Conversation

rwblair
Copy link
Member

@rwblair rwblair commented Mar 27, 2023

For #1639.

bids-validator/src/deps/hed-validator dependency was generated by running npm run build on latest commit for hed-javascript:
hed-standard/hed-javascript@bba927b

Two main functions located in bids-validator/src/validators/hed.ts are hedAccumulator and hedValidate

hedAccumulator is called once per context like other checks (filename validation, schema checks) in bids-validator/src/validators/bids.ts. It catches any events.tsv files and json files and builds up an object to be passed to the actual validator call.

Instead of recreating the old data structures that were passed into bids-validator/validators/events/hed.js I tried to get everything in shape a bit closer to where the outgoing calls were made, not sure if I got it quite right. The contexts have the proper sidecar data in them already so no need to mess with potentialSidecars.

hedValidate is called once, after the context walk is complete.

  • copy over the hed-validator issue to bids-validator issue functions, modify to work with new issue objects.
  • Add new issues to bids-validator/src/issues/list.ts as needed.
  • Copy over detectHed function to skip calls when no HED data present.
  • Pin dependency to actual release of hed-validator.
  • Add tests

Currently am getting "ERROR: [HED_SCHEMA_LOAD_FAILED] The supplied schema specification is invalid. Specification: no sche..." when run on bids-examples/eeg_ds003645s_hed

…ion. Add hed validator dep built from current master.
@rwblair rwblair marked this pull request as draft March 27, 2023 22:10
@codecov
Copy link

codecov bot commented Mar 28, 2023

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 85.68%. Comparing base (cfc4664) to head (0ac20d3).
Report is 156 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1648      +/-   ##
==========================================
- Coverage   88.57%   85.68%   -2.90%     
==========================================
  Files          39       91      +52     
  Lines        2372     3792    +1420     
  Branches      273     1220     +947     
==========================================
+ Hits         2101     3249    +1148     
- Misses        265      457     +192     
- Partials        6       86      +80     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@VisLab
Copy link
Member

VisLab commented Aug 9, 2023

@rwblair --- should we be nervous that this is PR is still a draft? Is there anything that we need to prepare for? Thx! VisLab

@rwblair rwblair marked this pull request as ready for review September 28, 2023 20:30
@rwblair rwblair requested a review from nellh September 28, 2023 20:31
@VisLab
Copy link
Member

VisLab commented Oct 16, 2023

I know you have been swamped, but could you give us an update on this @nellh? Thx.

Copy link
Member

@nellh nellh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to merge. I ran through some test cases and was able to match the failures to the legacy validator. ds004395 runs out of memory with the default heap size but does pass increasing the size to 8GB. There could be some memory optimization to be done because this doesn't happen without the HED patches but that dataset is an exception, every other test case passed with the default heap size.

@VisLab
Copy link
Member

VisLab commented Jan 11, 2024 via email

@rwblair
Copy link
Member Author

rwblair commented Jan 12, 2024

Looks like the memory issue is coming from code I wrote, via the hedAccumulator in the bids-validator. Commenting out the call to the hed-validator and to the file accumulator allows ds004395 to finish validation. Commenting out just the call to the hed-validator but leaving the accumulator active causes it to run out of memory on default heap size.

I should finish profiling before running my mouth.

…l hed-validator call commented out for testing memory usage
@happy5214
Copy link
Collaborator

I'm not sure what's going on with the Deno build. The lines in the errors do not appear in the version of that file in our code base, and we don't use Buffer objects directly (though some dependencies might).

@happy5214
Copy link
Collaborator

@rwblair Has there been any further progress on this?

@rwblair
Copy link
Member Author

rwblair commented Jun 12, 2024

@happy5214 Merged master to get it up to date. The out of memory issue is from keeping an object around with all the sideccar and tsv data in memory while the file tree is walked. I think I was mirroring how the old validator called the hed validator. I'm going to try to:

  1. Detect which files have hed data at the accumulator step, to keep the arguments for hedValidator.validator.BidsDataset leaner.
  2. Have the accumulator only track file names and then reload tsv and json data at hed validation time after hopefully all the per bids file context objects have been garbage collected.
  3. Generate a hedValidator.validator.BidsDataset for each events file and all of its associated sidecars. Are there any HED rules that require multiple tsv files to be loaded at the same time for validation? or is it only that we need to ensure all the sidecars are present for each separate tsv file?

@VisLab
Copy link
Member

VisLab commented Jun 12, 2024

Are there any HED rules that require multiple tsv files to be loaded at the same time for validation? or is it only that we need to ensure all the sidecars are present for each separate tsv file?

HED processes each tsv and its sidecars separately and does not need to have multiple event files at one time.

@happy5214 will need to address the details of the other issues. Thanks for looking at this -- we'd like to get this resolved.

@rwblair
Copy link
Member Author

rwblair commented Jun 13, 2024

I rewrote bids-validator/src/validators/hed.ts to work on a file per context basis in line with the 3rd option in my last post. This allows ds004395 to complete validation.

I need to rewrite/add tests for this way of invoking HED validation.

Copy link
Member

@VisLab VisLab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx! In the longer term, we'll probably want to build the HED schema object from the dataset description before validation and pass it in as it doesn't change. We'll get back to you on this. @happy5214 comment?

@happy5214
Copy link
Collaborator

Thx! In the longer term, we'll probably want to build the HED schema object from the dataset description before validation and pass it in as it doesn't change. We'll get back to you on this. @happy5214 comment?

To be honest, hearing that is a little annoying, because that's how it was originally implemented, and then we implemented a design philosophy to move as much of the code as possible to hed-validator. The schema-loading function is still exported (hedValidator.validator.buildSchemas), but there's nowhere to pass it right now in the external API in order to run BIDS validation. I'll discuss more with @VisLab when we meet today.

@rwblair
Copy link
Member Author

rwblair commented Jun 17, 2024

For now its still passing the dataset_description to each validation call for each file. Not sure how inefficient this is CPU wise, but the memory issue is gone for now.

@VisLab
Copy link
Member

VisLab commented Jun 17, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants