Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word2Vec tests #173

Merged
merged 4 commits into from
Jun 27, 2018
Merged

Word2Vec tests #173

merged 4 commits into from
Jun 27, 2018

Conversation

meiamsome
Copy link
Contributor

@meiamsome meiamsome commented Jun 26, 2018

This tweaks the karma config to webpack the ml5 package similarly to the standard webpack outputs. This is because there is some weird behaviour in tfjs (I think, not 100%) that results in tensors being returned from functions like tf.add being disposed before being returned when there is more than one instance on the page. This means that the testing files can no longer import the files they are testing, instead they have to rely on the global ml5 variable. This is likely the issue seen in #151 where Tensors aren't being correctly type-checked in tfjs, as that's what I narrowed the weird behaviour that caused the premature disposal I saw here to as well.

I then added a basic test suite for Word2Vec, which flagged up the fact that there were a lot of leaking tensors, so I fixed those.

- Fixes a weird bug where tensors were spuriously disposed by tfjs in testing environment due to there being multiple tfjs instances present on the page from the different builds
- Improves stack traces to make debugging easier by disabling minification in karma webpack.
- Only webpack the main package, this means that the test files have to use the global ml5 object.
- validates that there are no leaked tensors
- validates basic functionality of nearest
- validates that add, subtract and average return things

- Fixes memory leaks in add, subtract, average, and addOrSubtract functions
- Adds a general dispose to the Word2Vec class
describe('subtract', () => {
it('returns a value', () => {
let word1 = word2vecInstance.getRandomWord();
let word2 = word2vecInstance.getRandomWord();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it technically possible the same word could be picked twice?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is, but the probability is 0.0001%

@shiffman
Copy link
Member

Thank you @meiamsome, amazing! ❤️

@shiffman shiffman added the tests Label for unit testing related items. label Jun 26, 2018
@shiffman
Copy link
Member

FYI I made a "tests" label which we can add to issues and pull requests related to unit testing!

'src/**/*_test.js',
],
preprocessors: {
'src/**/*_test.js': ['webpack'],
'src/index.js': ['webpack'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had this config wrong the whole time? Why aren't the *_test.js files being processed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So what was happening is each *_test.js file was including all of the dependencies and webpacking that. That would probably be fine, but for some reason tfjs was struggling with that - somehow the Tensors from one instance were going through functions from the other instance and not being checked correctly, as they wouldn't be an instance of the Tensor class from that version of tfjs. This just builds the dependencies of all test suites (aka the whole library) and includes that first, so you only get one version of all the deps

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you pass a Tensor through multiple instances of TensorFlow.js, we fail when doing input validation for op calls because "instanceof Tensor" refers to the wrong Tensor instance (from the other instantiation of TensorFlow.js). This problem is artificial and not actually fundamental -- I will send an PR to fix this in TensorFlow.js core, however it's less pressing now that you've solved it here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

amazing, thanks for the detailed description @meiamsome!

@cvalenzuela
Copy link
Member

Thanks @meiamsome! this is great! This will help get a bunch of PR merged.
I think we should just we add a script in package.json to run a single test,

"test:single": "./node_modules/karma/bin/karma start karma.conf.js --single-run",

@nsthorat, I think this solves the issues we were discussing.

@nsthorat
Copy link
Contributor

--single-run is usually not what you want.

Two ways to just run one test:

  1. "it" => "fit"
  2. Use yarn test --grep myTestName to run the single test. There's a little setup you have to add: https://github.com/tensorflow/tfjs-core/blob/master/karma.conf.js#L72

@cvalenzuela
Copy link
Member

@nsthorat, I'm just curious why is it better to run single tests instead of a --single-run?

@nsthorat
Copy link
Contributor

You can run --single-run, but the test suite dies after running the single test. Totally up to you, but it slows things down to have to continually restart a browser.

@nsthorat
Copy link
Contributor

Also nice work on this PR! I was going to make the change in tfjs-core to allow multiple instances of tensorflowjs, but sharing a ML5 and transitively a tensorflowjs instance is what you want to do.

return Word2Vec.nearest(this.model, sum, inputs.length, inputs.length + max);
return tf.tidy(() => {
const sum = Word2Vec.addOrSubtract(this.model, inputs, 'ADD');
console.log(sum);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you forgot the remove this log

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch :)

@meiamsome
Copy link
Contributor Author

Re: --single-run this would be very helpful to me for development as often I'll make a small change and just want to verify it without keeping the browser open (Like when removing the console.log I left in this PR). Also, I would appreciate a lint command in the package.json so that I can lint without having to build, but that might just be me having a weird workflow

@nsthorat
Copy link
Contributor

nsthorat commented Jun 26, 2018

Sure, regarding --single-run, it's a tradeoff. We don't use --single-run anywhere in TensorFlow.js because usually when we develop we want to be able to quickly see whether unit tests are failing / passing as testing is a core part of our development loop (keeping the browser window open the whole time).

Totally your call.

@cvalenzuela
Copy link
Member

I think this is ready to merge! Let's just add the --single-run script, I think is better to have it.

@meiamsome
Copy link
Contributor Author

@cvalenzuela Ok, I added that for you

@cvalenzuela
Copy link
Member

awesome, thanks @meiamsome!

@cvalenzuela cvalenzuela merged commit 82843f8 into ml5js:master Jun 27, 2018
@shiffman
Copy link
Member

woweeee amazing work @meiamsome @cvalenzuela @nsthorat!

joeyklee pushed a commit that referenced this pull request Jun 2, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
tests Label for unit testing related items.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants