Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AVRO-2518: Ruby Avro doesn't validate enum fields with default values #618

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

kyleboyer-optum
Copy link

@kyleboyer-optum kyleboyer-optum commented Aug 22, 2019

Jira

Tests

  • My PR adds the following unit tests OR does not need testing for this extremely good reason:
    • test_enum_field_default

Commits

  • My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "How to write a good git commit message":
    1. Subject is separated from body by a blank line
    2. Subject is limited to 50 characters (not including Jira issue reference)
    3. Subject does not end with a period
    4. Subject uses the imperative mood ("add", not "adding")
    5. Body wraps at 72 characters
    6. Body explains "what" and "why", not "how"

Documentation

  • In case of new functionality, my PR adds documentation that describes how to use it.
    • No new functionality - bug fix

@Fokko Fokko requested a review from tjwp August 26, 2019 11:47
Copy link
Contributor

@tjwp tjwp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the contribution.

I'm not surprised that the support for enum defaults is lacking in the Ruby implementation. The implementations in different languages do not move in lock-step.

I'm not convinced that this is the right change to make to the Ruby implementation. There is no other reference to defaults for any other field types in the SchemaValidator class. This class is used to verify that a datum is compatible with a writer's schema.

Defaults are used during resolution, and the default comes from the reader's schema if a value is not supplied, or in the case of an enum if the symbol is not part of the reader's schema.

I suspect there is a larger set of changes need to support enum defaults in the Ruby implementation.

@@ -95,7 +95,11 @@ def validate_recursive(expected_schema, logical_datum, path, result, options = {
fail TypeMismatchError unless datum.is_a?(Hash)
expected_schema.fields.each do |field|
deeper_path = deeper_path_for_hash(field.name, path)
validate_recursive(field.type, datum[field.name], deeper_path, result, options)
if field.type.type_sym == :enum then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is the right place in the code to special case the handling of enum. There is a line in validate_simple that handles enum. I think that any modification should be there:

https://github.com/apache/avro/pull/618/files#diff-a1c5bb746106005a7d7ef59a2b973cdbL137

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tjwp Thanks for the feedback! After debugging the (newly added in my PR) test, I traced it down to this point. The issue with validate_simple is that in the case of this test, the default is defined at the field level, which would get removed/stripped when recursively trying to validate the field.type in the next nested recursive iteration. I'm not 100% sure, but are you asking/would it be better/merge-able to refactor those two functions to call validate_simple against enums(maybe with an overloaded function)?

{
  "type": "record",
  "name": "enum_field_default",
  "fields": [
    {
      "name": "msg",
      "type": "string"
    },
    {
      "name": "logClass",
      "type": {
        "type": "enum",
        "name": "logClass",
        "symbols": [
          "UNCATEGORIZED",
          "E1",
          "E2",
          "E3",
          "E4",
          "E5",
          "E6",
          "E7",
          "E8",
          "E9",
          "E10"
        ]
      },
      "default": "UNCATEGORIZED"
    }
  ]
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
2 participants