Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix custom_objects for regularizers and other issues #5012

Merged
merged 7 commits into from
Jan 21, 2017

Conversation

bstriner
Copy link
Contributor

Hi everybody!

Please let me know your thoughts regarding this quick fix and slight quality-of-life for custom_objects.

  • Instead of copying custom objects into globals(), they are copied into a separate dictionary.
  • get_from_module checks that dictionary before checking the requested module.
  • users can clear the custom objects or otherwise manipulate the dictionary if they want to

This also means that custom_objects work for custom regularizers.

A side effect of the code is that you now can do something like:

get_custom_objects()["CustomRegularizer"]=CustomRegularizer
d = Dense(dim, activity_regularizer='CustomRegularizer')

Below code is broken currently but works with this PR.

#4990

level_1_dim = 256
level_2_dim = 64
encoding_dim = 16

from keras import regularizers
from keras.layers import Input, Dense
from keras.models import Model, model_from_json

class CustomRegularizer (regularizers.ActivityRegularizer):
  def __call__ (self, x):
    # this is just for purposes of demonstrating a bug
    return super(CustomRegularizer, self).__call__(x)

input_img = Input(shape=(784,))
encoded = Dense(level_1_dim, activation='relu')(input_img)
encoded = Dense(level_2_dim, activation='relu')(encoded)
encoded = Dense(encoding_dim, activation='relu', activity_regularizer=CustomRegularizer(1e-6))(encoded)
decoded = Dense(level_2_dim, activation='relu')(encoded)
decoded = Dense(level_1_dim, activation='relu')(decoded)
decoded = Dense(784, activation='sigmoid')(decoded)

autoencoder = Model(input=input_img, output=decoded)

encoder = Model(input=input_img, output=encoded)

with open('auto.encoder.model', 'w') as outfile:
  outfile.write(encoder.to_json())

"""
# This was the workaround, no longer required
import keras.regularizers
keras.regularizers.CustomRegularizer=CustomRegularizer
"""

with open('auto.encoder.model', 'r') as infile:
  encoder = model_from_json(infile.read(), custom_objects={"CustomRegularizer":CustomRegularizer})

As a side note, I think

for cls_key in custom_objects:
            globals()[cls_key] = custom_objects[cls_key]

could be replaced by globals().update(custom_objects)

There are some places in the code that are accessing globals directly instead of through get_from_module. If we use get_from_module consistently then there is a consistent place we can deal with customizations. I didn't make those fixes yet but I could based on discussion.

Cheers,
Ben

@bstriner
Copy link
Contributor Author

Just realized this is also a fix for the contrib repository. #4944

@bstriner
Copy link
Contributor Author

The rest of the code should be checked to always use get_from_module instead of directly accessing globals. I can do that as part of this PR or separately.

@bstriner bstriner mentioned this pull request Jan 12, 2017
@@ -9,10 +9,21 @@
import marshal
import types as python_types

global_custom_objects = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unsafe, because it would be reused across different unrelated calls to layers_from_config.

I would suggest using a custom object scope inside layers_from_config:

with custom_objects_scope(custom_objects):
   layer = ...

Or something like that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @fchollet ! The code currently has that behavior, but it might be worth changing. Seems inconsistent with other usages but I didn't want to change something in case someone was depending on it.

Current keras behavior:

config=...
#This call passes custom_objects
layer_from_config({'class_name':'Qwerty','config':config},{'Qwerty':Dense})
#This call will use the custom_objects from the previous call
layer_from_config({'class_name':'Qwerty','config':config})

It is odd behavior but I don't know if anyone is currently depending on it. This PR keeps that behavior, but if someone wants to clear that global cache they can. If the global custom objects is unsafe, is that something you would want to change?

The intent of this PR was not to change that unsafe behavior, but to be able to ameliorate it.

I like your idea of a clean RAII wrapper for help manage the scope.

So between these two directions, which would you prefer? Or is there another option entirely?

  • Keep current behavior with custom_objects leaking, but add some scope functions to control it
  • Change behavior so custom_objects don't leak, but add some scope functions if you want to share cusom_objects across calls

RE: Contrib
Another thing to think about is the contrib repository. A lot of people are going to be writing objects in other modules and wanting those to load correctly and be created by name. One way to do that would be something like a global custom object registry that modules could add to. That could work with either the scope idea or the global idea.

Cheers,
Ben

@bstriner
Copy link
Contributor Author

@fchollet Just uploaded a version with a new scope function based on your suggestion.

So, the code still has the existing behavior that custom_objects of one call leak into the next. This code works the same on old and new:

from keras.utils.layer_utils import layer_from_config
from keras.layers import Dense
layer_config={'output_dim':10}
config={'class_name':'Qwerty', 'config':layer_config}
custom_objects={'Qwerty':Dense}
try:
	print(layer_from_config(config))
except ValueError as v:
	print("Fail because no custom_objects yet")
print(layer_from_config(config, {'Qwerty':Dense})) #success
print(layer_from_config(config)) #success

Fail because no custom_objects yet
<keras.layers.core.Dense object at 0x00000000082E5E10>
<keras.layers.core.Dense object at 0x00000000082E5E10>

However, with this PR, either of the following two styles will not leak:

with custom_object_scope(custom_objects):
	print(layer_from_config(config))

with custom_object_scope():
	print(layer_from_config(config, custom_objects))

Also, with this PR, if something has leaked, you can just get_custom_objects().clear(), which is a lot easier than trying to clean out the globals of layer_utils.

Cheers,
Ben

@bstriner
Copy link
Contributor Author

Rebased on current master. Also updated optimizers and Lambda to respect global_custom_objects.

That means custom_object_scope should work for all types of objects. Please let me know if there are any that I forgot.

@danmackinlay
Copy link

@bstriner if I'm not mistaken we need this also to allow custom e.g. Layer classes to re-use keras test infrastructure as well as contrib, since, e.g. keras.utils.test_utils.layer_test also does not accept custom_objects parameters, so I suppose what contrib layers there are currently use the global monkey-patching solution.

@danmackinlay
Copy link

@bstriner @fchollet Is it also desirable to handle different custom objects in different custom_objects dictionaries? or to check for clobbering other objects in the namespace?

Perhaps that is a matter for a separate pull request, but AFAICT we could easily have name collisions - would one notice that they had inadvertently masked some non-descriptive name such as ELU?

@fchollet
Copy link
Member

AFAICT we could easily have name collisions - would one notice that they had inadvertently masked some non-descriptive name such as ELU?

Absolutely, we need a way to separate different namespaces on a per-layer or per-model basis. A shared namespace is a recipe for trouble here.

@bstriner
Copy link
Contributor Author

bstriner commented Jan 19, 2017

@fchollet contrib modules are only be added to custom_objects as required instead of importing everything from contrib. custom_objects is a just a string dictionary, so people could use some sort of fully qualified syntax as long as it matches what they have in get_config.

I would imagine a contrib module keras_contrib.regularizers.mymodule like this:

class L3Regularizer(...):
  ...
custom_objects={'regularizers.mymodule.L3Regularizer': L3Regularizer}.

I could use it like so:

from keras_contrib.regularizers import mymodule
with custom_object_scope(mymodule.custom_objects):
   reg = regularizers.get('regularizers.mymodule.L3Regularizer')

Nothing persists between calls to custom_object_scope and you can nest calls to custom_object_scope, so you just import the custom modules you want when you need them.

@howthebodyworks it could be beneficial to be able to mask names, if you assume people aren't just doing it by accident. For example, I could write a custom module with some custom implementation of the Dense layer. You could wrap loading in custom_object_scope to swap the custom layer for the original layer.

Conflicts in different modules are OK as long as you don't load them both into custom_object_scope at the same time. Interestingly, this means different modules could provide different implementations of certain layers and you could switch between them.

I don't think name conflicts will be too much of an issue if you only import names into the registry when you want them. On the other hand, it would be much easier to write patches or swap out objects.

However, I'd be interested in any other ideas to fix current issues with custom objects that might be helpful for implementing contrib.

Cheers,
Ben

@bstriner
Copy link
Contributor Author

Just added varargs which should make things cleaner. If you have multiple contrib modules you are using:

from keras_contrib.regularizers import module1
from keras_contrib.layers import module2
with custom_object_scope(module1.custom_objects, module2.custom_objects):
  model.load(...)

Cheers,
Ben

@danmackinlay
Copy link

@bstriner I think that your patch is way better than the current system, which has obvious shortcomings (and your patch in fact is easing my pain in a current project to write a new keras layer).
And the custom_object_scope context manager is a cool idea! I like it way more than typical python plugin systems which make it hard to switch config, and have non-local configuration hidden in some settings file somewhere.

I also agree that overwriting objects is potentially a feature; non-explicit substituion of implementations is however, a little scary to me. I feel like you probably more often want to just edit the code and do it explicitly.

still, I think there is an underlying issue, which is that the entire way the namespace is access in keras is a little strange if we want to allow contrib modules; presumably this underlying object access system will change eventually?

eventually, if we want to allow contrib modules to work seamlessly I imagine we might want to so something similar to other projects that have plugin systems, which is to use importlib to import objects from module path strings, (e.g. Django), and possibly allow said modules to register their own names by module import path, e.g. module1.custom_object? this would make it unambiguous which namespace an object came from. However, keras internally relies on unqualified object names, Recurrent, not keras.layers.recurrent.Recurrent to deserialize, so that would require significant internal changes.

Hmm.

I guess the real question is what kind of short and medium term solutions keras wants to live with. How well does this need to be fixed for the road maps to versions 1.0 and 2.0, as per #4996? I presume that's @fchollet 's call.

@bstriner
Copy link
Contributor Author

@howthebodyworks all good points! This started as a bugfix but is becoming a roadmap issue, as sometimes happens. importlib is a good point. There are alternative solutions to discuss, like do away with custom_objects entirely, refer to everything by fully qualified names and automatically import the correct modules. Would have a little back-compatibility patch to search keras if the name is not qualified.

Maybe after 1.0 need to reexamine saving and loading for 2.0. Custom objects is one major issue. The ability to save and load a group of related models without breaking shared weights is another item on my wishlist.

Anyways, this PR should keep existing functionality and code while adding some features to fix some bugs. If the decision comes to change the existing functionality, I assume that is going to be a much longer process.

Cheers

Copy link
Member

@fchollet fchollet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with your approach for now. We may refactor it later, e.g. in Keras 2. Please address my comments (just style issues).

global_custom_objects = {}


class CustomObjectScope(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a docstring explaining the behavior and giving a usage example.


def custom_object_scope(*args):
"""
Provides a scope that changes to global_custom_objects cannot escape.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should be right after """. Put "`" around function names.

"""
Provides a scope that changes to global_custom_objects cannot escape.

# Returns
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also add a # Arguments section for *args


def get_from_module(identifier, module_params, module_name,
instantiate=False, kwargs=None):
"""Retrieves a class of function member of a module.
"""Retrieves a class of function member of a module. First checks global_custom_objects then the requested module.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two line breaks after the first sentence.

@@ -9,10 +9,47 @@
import marshal
import types as python_types

global_custom_objects = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a private global variable, this should be _GLOBAL_CUSTOM_OBJECTS.

@bstriner
Copy link
Contributor Author

@fchollet updated code with your suggestions. Also created a new unit test file for generic_utils and added generic_utils to generated documentation.

Does travis run mkdocs? It would be useful to have it save documentation as an artifact, so you could easily check what it looks like. It would also ensure no commits break the documentation generation. If it isn't doing that already and it seems like a good idea, I can help put it together: #5112

Cheers,
Ben

@bstriner
Copy link
Contributor Author

@fchollet I can squash the commits after you review but I figured it would be harder to review after I squash. Let me know if you want any other changes and I'll do the squash.

@bstriner
Copy link
Contributor Author

I fully expect this to be tossed in v2 but hopefully this will put custom objects in a good place for now. Thanks!

@fchollet
Copy link
Member

LGTM, thanks a lot.

@fchollet fchollet merged commit d7e0621 into keras-team:master Jan 21, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants