Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PR1: Add mkmmd loss #168
base: main
Are you sure you want to change the base?
PR1: Add mkmmd loss #168
Changes from 180 commits
fb43681
9c9d70a
81f37e3
c3278a4
538578c
cb5013b
374d0aa
a578838
3b58481
f42d777
81257ae
d903f50
96f1914
39c1e8f
44d091c
083174d
66c1735
7b11f84
95ca579
d18a926
50bcf59
6149021
598b98a
0b3368d
43bd85c
6efaf35
2e285ff
be52d0a
299e939
15c25e4
616d727
fd9529d
77b5f54
96ab534
637a825
2853cae
1ff8efa
5553fe2
19895c5
67b6c88
62e1536
db8396e
ccf1522
0f45556
112e47e
adca64c
f612392
5289678
cf07955
82d861a
e5a7c97
f0d2445
9d83f43
6174732
f22b555
0590c0e
dcb1f21
e2d4ce6
6011c5d
b82abea
0b42d78
d373138
068e009
0b6c65f
89c2200
2066889
0b67c58
0bc3d5d
d0e0541
56303ef
8b5f479
bff9b1e
b579a6c
afea507
b55c3cd
c81dbfa
71d80b8
2a53c58
6802811
b7e4088
e492c16
aba0bee
a994b8b
4dd7fc2
408e6d2
515d79f
a63d079
638e69e
d204c2c
87b1fef
db443e6
3c2acb4
75f5bbe
ae387b5
df1a780
3793143
459aa2c
3e8dacc
610c472
75373df
fa036e2
f825301
90dcebf
1c6aec3
bb6a322
70336d5
055d903
eeb3d55
d61a77f
25b3974
ac594a2
ad9b31a
eae0065
aeb67dd
2b41f70
68632c0
dea3b57
fef1303
7e37b72
606ec7f
99ef911
b12fe77
05d3610
d012dad
db2ac05
3d91119
da1b61b
e6c17af
0f9a22a
cc72a88
3e8c354
8444667
18767c9
cecaf60
3234c06
fb91d4b
ebbec10
bef1b4a
5c9efff
ce46dfb
2395e12
3850739
f56390d
2f7dbc3
f5cfdd9
f670b02
dffa1ad
c284473
9ca1704
2cede39
afd69df
45ca9ba
701d710
c5ab04d
5894dd8
2952364
c7a3da7
333d9e9
e1f3de1
80f3c5e
27447fb
cc4579b
b1bd794
5654a8e
1125de2
a0bd92b
9b5dfc3
6cbcc25
255ab52
970ccea
f69bba1
f8854f2
cc05d6f
f14dad2
ab064f2
6fe5280
9d36c0d
a49aa70
d5381c3
e3a997d
02bf3ef
2c84089
31aef6b
bf34662
14df7da
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor, the acronym capitalization we used for the loss function was MkMmd (vs. Mkmmd) since its MK-MMD as an acronym and we treated the - as a space? We can do either, I don't really care, but maybe we can be consistent? We can also do the same for MR-MTL as either MrMtl or Mrmtl (I like the first a little better looking at it now)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I update these, but MrMtlMkMmd looks so spiky :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it doesn't look great 😂. Still, it follows our convention I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Setting a default empty dictionary here is dangerous (because python is dumb), since dictionaries are mutable. Consider the code below and the resulting output
Output:
For whatever reason, python ties the reference to the default dictionary to all instances of that class. So changes to a1 unexpectedly affect a2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standard way to implement a default value for these is to use an
Optional[Dict[...]] = None
then in the__init__
to doThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh nooo that is really scary 😱
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...initial global model of each round...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe useful just to state that the keys here are the model layer names as extracted from the named_modules?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very minor, but is there any reason we don't register the local hooks in setup_client where we create
self.local_feature_extractor
since that will only be run once anyway?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see, if the hooks are removed for checkpointing, we need to put them back.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe adding a note here would be good. That is, something like
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you need this assertion, as it's type is just
int
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If
beta_global_update_interval
is -1 it looks like we'll always skip the optimization of the betas, which, I think would contradict the documentation above?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is true. When
beta_global_update_interval
is -1, I update it in line 253 as:The reason to do so is that I don't want to update buffer with whole data and I only give the computed features of that batch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I see what you mean there. Maybe we can just add a comment to make that obvious?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than using the keys of
self.flatten_feature_extraction_layers
here, it seems cleaner to doThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I may be misunderstanding the accumulating feature functionality here, it looks like we enable accumulation here, add the features and then turn it off immediately after and there are no other places where we flip these switches. Is there any reason we wouldn't just accumulate by default?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disable it because of memory issues we might get, if we accumulate always. Because model hooks are applied and called whenever model gets an input, if we don't disable it, feature going to keep accumulate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I think I see. If you don't disable accumulation, there will be accumulation during training as well. I was thinking that you clear the buffers immediately after filling them, so no need to worry about it, but that's not the case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I sort of see what's happing when
beta_global_update_interval
is -1. If it's not >0, then we use the whole training set to update the betas at the correct interval. If it's -1 it's updated only using the current batch statistics. I'm okay with leaving this option, but maybe we can be a bit more detailed in the documentation to explain this a bit more?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, sure I will try to add more documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just add it to where you discuss what
-1
is above in the__init__
docstring.