Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about JSON serialization/deserialization #22

Open
ethindp opened this issue Mar 29, 2022 · 14 comments
Open

Question about JSON serialization/deserialization #22

ethindp opened this issue Mar 29, 2022 · 14 comments
Labels

Comments

@ethindp
Copy link

ethindp commented Mar 29, 2022

I'm thinking about writing a schema crate for the Matrix specification. The problem is that the Matrix specification is quite dynamic: the JSON content contains a "content" object that changes depending on the event in question. For example, when an m.identity_server event is received, the JSON might look like this:

{
  "content": {
    "base_url": "https://example.org"
  },
  "type": "m.identity_server"
}

Whereas if the event type is m.room.member, the JSON is significantly different and has far more fields:

{
  "content": {
    "membership": "join"
  },
  "event_id": "$26RqwJMLw-yds1GAH_QxjHRC1Da9oasK0e5VLnck_45",
  "origin_server_ts": 1632489532305,
  "room_id": "!jEsUZKDJdhlrceRyVU:example.org",
  "sender": "@example:example.org",
  "state_key": "@user:example.org",
  "type": "m.room.member",
  "unsigned": {
    "age": 1567437,
    "redacted_because": {
      "content": {
        "reason": "spam"
      },
      "event_id": "$Nhl3rsgHMjk-DjMJANawr9HHAhLg4GcoTYrSiYYGqEE",
      "origin_server_ts": 1632491098485,
      "redacts": "$26RqwJMLw-yds1GAH_QxjHRC1Da9oasK0e5VLnck_45",
      "room_id": "!jEsUZKDJdhlrceRyVU:example.org",
      "sender": "@moderator:example.org",
      "type": "m.room.redaction",
      "unsigned": {
        "age": 1257
      }
    }
  }
}

As you can see, the JSON can contain multiple events within one event classification, and I'm wondering how I could handle that. I'd use a library like JWX but I need to be able to both generate and parse JSON. (You can find many more examples of the Matrix JSON schema here.

So, I'm wondering how I might define this using utilada? I like the idea of being able to strongly type my data -- it makes things a lot easier to debug later.

@ethindp
Copy link
Author

ethindp commented Apr 26, 2022

cc @stcarrez

@stcarrez
Copy link
Owner

Sorry for the delay, I've not seen your question...

For dynamic JSON, the library uses the Util.Beans.Objects.Object type which is capable of holding
various types such as integer, long, dates, strings and still be strongly typed.

I must admit it is not easy to use and I don't have many examples.

One example that uses this is the jsonobj.adb example:

https://github.com/stcarrez/ada-util/blob/master/samples/jsonobj.adb

@ethindp
Copy link
Author

ethindp commented Apr 27, 2022

@stcarrez Thanks for the example! I really love the ada Beans interface -- that's amazing! What I'm explicitly asking for, however, is parsing a dynamic JSON object. Another case of where this would be useful is in parsing AWS IAM policies. An IAM policy in AWS must have a "statement" object, which is either a list of objects or a single object containing allow/deny rules, conditions, etc., for a given AWS resource such as Amazon S3 buckets. I'm not sure how I would declare this in a record mapping, since you can't dynamically set the discriminant of a variant record after its already been set (unless there's something I'm missing). I suppose I could just store it in a vector of statements... Would you mind documenting the process for parsing nested objects? You currently only parse a single object, not objects within an object.

@stcarrez
Copy link
Owner

Yes, this is what the OpenAPI Ada library is doing. I realize there is no example for that except if you look at the swagger Ada library.

Basically, the parsing is done with:

Parser   : Util.Serialize.IO.JSON.Parser;
Mapper   : Util.Beans.Objects.Readers.Reader;

   Parser.Parse_String (Client.Response.Get_Body, Mapper);
   Reply := Mapper.Get_Root;

Here, it gives to Parse_String the JSON content as a string and the Parse will populate the Mapper that will build up the Ada Beans Object that contains the complete JSON object tree. The Get_Root returns the parsed JSON as Ada bean object.

@ethindp
Copy link
Author

ethindp commented May 7, 2022

@stcarrez Thanks for that... Could you write a more concrete example and push it to the repo? One that demonstrates the serialization and deserialization of JSON data structures dynamically? It would help me understand things a bit better. In the meantime I'll dive into the code and try to work it out myself, but a good set of examples on how to do this would make things a lot easier to grok, to be honest.

@ethindp
Copy link
Author

ethindp commented May 7, 2022

I know, I know... I should probably be able to figure it out from minute code snippets, but examples never hurt anyone, and you can (IMO) never have too many examples. I'll look at the swagger library too -- thanks for pointing me to that.

@ethindp
Copy link
Author

ethindp commented Oct 8, 2022

@stcarrez One more question abotu JSON, I've done some digging and think I understand it a bit better: what's the process for deserializing arrays? I can get the length, and iterate through each of its items, but is there a better way?

@stcarrez
Copy link
Owner

stcarrez commented Oct 8, 2022

When the JSON array is deserialized, it is stored in the Util.Beans.Objects.Vectors.Vector_Bean instance which contains a Util.Beans.Objects.Vectors.Vector which is an Ada.Containers.Vectors. It is possible, but not easy, to access that Vector instance and benefit from other operations.

Now, to access a specific element of the array you can use the following operation:

package Util.Beans.Objects is ...
   --  Get the array element at the given position.
   function Get_Value (From     : in Object;
                       Position : in Positive) return Object;

@ethindp
Copy link
Author

ethindp commented Oct 8, 2022

@stcarrez I know, but my question was more along the lines of: given an array like that found in the Google Discovery Document response, what is the most idiomatic way of deserializing it? This applies more generally to other objects. Take, for example, a snippet of the "items" array of the google discovery document:

    "items": [
        {
            "kind": "discovery#directoryItem",
            "id": "abusiveexperiencereport:v1",
            "name": "abusiveexperiencereport",
            "version": "v1",
            "title": "Abusive Experience Report API",
            "description": "Views Abusive Experience Report data, and gets a list of sites that have a significant number of abusive experiences.",
            "discoveryRestUrl": "https://abusiveexperiencereport.googleapis.com/$discovery/rest?version=v1",
            "icons": {
                "x16": "https://www.gstatic.com/images/branding/product/1x/googleg_16dp.png",
                "x32": "https://www.gstatic.com/images/branding/product/1x/googleg_32dp.png"
            },
            "documentationLink": "https://developers.google.com/abusive-experience-report/",
            "preferred": true
        },
...
        {
            "kind": "discovery#directoryItem",
            "id": "youtubereporting:v1",
            "name": "youtubereporting",
            "version": "v1",
            "title": "YouTube Reporting API",
            "description": "Schedules reporting jobs containing your YouTube Analytics data and downloads the resulting bulk data reports in the form of CSV files.",
            "discoveryRestUrl": "https://youtubereporting.googleapis.com/$discovery/rest?version=v1",
            "icons": {
                "x16": "https://www.gstatic.com/images/branding/product/1x/googleg_16dp.png",
                "x32": "https://www.gstatic.com/images/branding/product/1x/googleg_32dp.png"
            },
            "documentationLink": "https://developers.google.com/youtube/reporting/v1/reports/",
            "preferred": true
        }
    ]

I can easily define the core structure of this entire document, based off of the response documentation, in ada, like so:

with Ada.Containers.Indefinite_Hashed_Sets;

-- ...
subtype WWString is new Wide_Wide_String;
-- ...
type Discovery_Item_Icons is record
   X16 : WWString;
   X32 : WWString;
end record Discovery_Item_Icons;

type Discovery_Item is record
   Kind, Id, Name, Version, Title, Description, REST_URL, Discovery_Link, Documentation_Link : WWString;
   Icons : Discovery_Item_Icons;
end record Discovery_Item;

package Discovery_Item_Sets is new Ada.Containers.Indefinite_Hashed_Sets(Discovery_Item);
-- probably also need to specify the remaining parameters too

type Discovery_Document is record
   Kind, Version : WWString;
   Items : Discovery_Item_Sets.Set;
end record Discovery_Document;

However, there are a few uncertainties about the discovery document:

  1. Not all of the fields in the response may be available.
  2. We don't precisely know how many items will be returned.

So, it follows that, for any generic JSON document, there must be a way of "recursively" deserializing it, regardless of what objects may be present. For example, if I'm deserializing the above snippet, I clearly don't want to have two case statements within one another: that just gets annoying and repetitive (especially for large documents). The solution then, at least for me, would be to naively create an instance of Discovery_Item, then call Set_Member(Item, Value) as I loop through the array. However, I'm just wondering: is there a better way of doing that than this "naive" solution?

@ethindp ethindp closed this as completed Oct 8, 2022
@ethindp ethindp reopened this Oct 8, 2022
@ethindp
Copy link
Author

ethindp commented Oct 8, 2022

@stcarrez I've edited my comment. I probably over-described the problem or something. (I accidentally closed the issue and submitted the comment before I was done).

@stcarrez
Copy link
Owner

stcarrez commented Oct 9, 2022

The OpenAPI generator generates some code that you may look at to be inspired or even re-use. From the Google Discovery OpenAPI description, it generates the following declaration (I only put extracts to illustrate):

package Discovery.Models is
   type DirectoryListItemsInnerIcons_Type is record
      X_16 : OpenAPI.Nullable_UString;
      X_32 : OpenAPI.Nullable_UString;
   end record;
   type DirectoryListItemsInner_Type is record
      Description        : OpenAPI.Nullable_UString;
      Discovery_Link     : OpenAPI.Nullable_UString;
      Discovery_Rest_Url : OpenAPI.Nullable_UString;
      Documentation_Link : OpenAPI.Nullable_UString;
      Icons              : Discovery.Models.DirectoryListItemsInnerIcons_Type;
      Id                 : OpenAPI.Nullable_UString;
      Kind               : OpenAPI.Nullable_UString;
      Labels             : OpenAPI.UString_Vectors.Vector;
      Name               : OpenAPI.Nullable_UString;
      Preferred          : OpenAPI.Nullable_Boolean;
      Title              : OpenAPI.Nullable_UString;
      Version            : OpenAPI.Nullable_UString;
   end record;
   package DirectoryListItemsInner_Type_Vectors is new Ada.Containers.Vectors
     (Index_Type   => Positive,
      Element_Type => Discovery.Models.DirectoryListItemsInner_Type);
   procedure Deserialize
     (From  : in     OpenAPI.Value_Type;
      Name  : in     String;
      Value :    out Discovery.Models.DirectoryListItemsInner_Type);
   procedure Deserialize
     (From  : in     OpenAPI.Value_Type;
      Name  : in     String;
      Value : in out DirectoryListItemsInner_Type_Vectors.Vector);

And the magic you are looking for is within the Deserialize procedures that are generated. The OpenAPI.Value_Type is a rename of Util.Beans.Objects.Object type which contains a JSON document or a JSON sub-tree (or a final value). The Deserialize is quite simple as it looks like:

   procedure Deserialize
     (From  : in     OpenAPI.Value_Type;
      Name  : in     String;
      Value :    out Discovery.Models.DirectoryListItemsInnerIcons_Type)
   is
      Object : OpenAPI.Value_Type;
   begin
      OpenAPI.Streams.Deserialize (From, Name, Object);
      OpenAPI.Streams.Deserialize (Object, "x16", Value.X_16);
      OpenAPI.Streams.Deserialize (Object, "x32", Value.X_32);
   end Deserialize;
   procedure Deserialize
     (From  : in     OpenAPI.Value_Type;
      Name  : in     String;
      Value :    out Discovery.Models.DirectoryListItemsInner_Type)
   is
      Object : OpenAPI.Value_Type;
   begin
      OpenAPI.Streams.Deserialize (From, Name, Object);
      OpenAPI.Streams.Deserialize (Object, "description", Value.Description);
      OpenAPI.Streams.Deserialize
        (Object, "discoveryLink", Value.Discovery_Link);
      OpenAPI.Streams.Deserialize
        (Object, "discoveryRestUrl", Value.Discovery_Rest_Url);
      OpenAPI.Streams.Deserialize
        (Object, "documentationLink", Value.Documentation_Link);
      Deserialize (Object, "icons", Value.Icons);
      OpenAPI.Streams.Deserialize (Object, "id", Value.Id);
      OpenAPI.Streams.Deserialize (Object, "kind", Value.Kind);
      OpenAPI.Streams.Deserialize (Object, "labels", Value.Labels);
      OpenAPI.Streams.Deserialize (Object, "name", Value.Name);
      OpenAPI.Streams.Deserialize (Object, "preferred", Value.Preferred);
      OpenAPI.Streams.Deserialize (Object, "title", Value.Title);
      OpenAPI.Streams.Deserialize (Object, "version", Value.Version);
   end Deserialize;
   procedure Deserialize
     (From  : in     OpenAPI.Value_Type;
      Name  : in     String;
      Value : in out DirectoryListItemsInner_Type_Vectors.Vector)
   is
      List : OpenAPI.Value_Array_Type;
      Item : Discovery.Models.DirectoryListItemsInner_Type;
   begin
      Value.Clear;
      OpenAPI.Streams.Deserialize (From, Name, List);
      for Data of List loop
         Deserialize (Data, "", Item);
         Value.Append (Item);
      end loop;
   end Deserialize;

Now, the OpenAPI provides simple functions that helps in the deserialization. They look like:

  procedure Deserialize (From  : in OpenAPI.Value_Type;
                          Name  : in String;
                          Value : out Nullable_UString) is
      Item : OpenAPI.Value_Type;
   begin
      if Name = "" then
         Item := From;
      else
         Deserialize (From, Name, Item);
      end if;
      Value.Is_Null := Util.Beans.Objects.Is_Null (Item);
      if not Value.Is_Null then
         Value.Value := Util.Beans.Objects.To_Unbounded_String (Item);
      end if;
   end Deserialize;

The Nullable_XXX types are simple Ada records with a Boolean Is_Null a Value. It allows to handle optional values (check Util.Nullables package).

You will probably find interesting operations for you in the OpenAPI Ada library. It is split in a client and a server support. You can use only the client support which only requires Ada Util Library. You can use that without the OpenAPI code generator. You can also generate some code, pick it, rename methods and remove others that you don't need.

@ethindp
Copy link
Author

ethindp commented Oct 9, 2022

@stcarrez You misunderstand my question. I'm not specifically talking about OpenAPI specifications in particular (though your example did give me an idea on what I need to do), I was just referring to deserializing documents in general. I've translated the main discover document into something like this:

	package Discovery_Item_Sets is new Ada.Containers.Indefinite_Hashed_Sets(Element_Type => Discovery_Item, Hash => Compute_Hash);
	package Discovery_Item_Label_Vectors is new Ada.Containers.Indefinite_Vectors(Index_Type => Positive, Element_Type => Wide_Wide_String);

	type Discovery_Document_Fields is (Field_Kind, Field_Discovery_Version, Field_Items);
	type Discovery_Item_Fields is (Field_Kind, Field_Id, Field_Name, Field_Version, Field_Title, Field_Description, Field_Discovery_Rest_Url, Field_Discovery_Link, Field_Icons, Field_Documentation_Link, Field_Labels, Field_Preferred);

	type Discovery_Document is record
		Kind: Wide_Wide_String;
		Version: Wide_Wide_String;
		Items: Discovery_Item_Sets.Set;
	end record Discovery_Document;

	type Discovery_Item is record
		Kind: Wide_Wide_String;
		Id: Wide_Wide_String;
		Name: Wide_Wide_String;
		Version: Wide_Wide_String;
		Title: Wide_Wide_String;
		Description: Wide_Wide_String;
		Discovery_Rest_Url: Wide_Wide_String;
		Discovery_Link: Wide_Wide_String;
		Icons: Discovery_Item_Icons;
		Documentation_Link: Wide_Wide_String;
		Labels: Discovery_Item_Label_Vectors.Vector;
		Preferred: Boolean;
	end record Discovery_Item;

	type Discovery_Item_Icons is record
		X16: Wide_Wide_String;
		X32: Wide_Wide_String;
	end record Discovery_Item_Icons;

	type Discovery_Document_Access is access all Discovery_Document;
	type Discovery_Item_Access is access all Discovery_Item;

	procedure Set_Member(Doc: in out Discovery_Document; Field: Discovery_Document_Fields; Value: Object) is
	begin
		case Field is
			when Field_Kind =>
			Doc.Kind := To_Wide_Wide_String(Value);
			Assert(Doc.Kind = "discovery#directoryList");
			when Field_Discovery_Version =>
			Doc.Version := To_Wide_Wide_String(Value);
			when Field_Items =>
			declare
				Obj: Object;
				Item: Discovery_Item;
			begin
				for API in 1 .. Get_Count(Value) loop
					Obj := Get_Value(Value, API);
					Set_Member(Item, Obj);
					Doc.Items.Insert(Item);
				end loop;
			end;
		end case;
	end Set_Member;

	procedure Set_Member(Item: in out Discovery_Item; Field: Discovery_Item_Fields; Value: Object) is
	begin
		case Field is
			when Field_Kind =>
			Item.Kind := To_Wide_Wide_String(Value);
			when Field_Id =>
			Item.Id := To_Wide_Wide_String(Value);
			when Field_Name =>
			Item.Name := To_Wide_Wide_String(Value);
			when Field_Version =>
			Item.Version := To_Wide_Wide_String(Value);
			when Field_Title =>
			Item.Title := To_Wide_Wide_String(Value);
			when Field_Description =>
			Item.Description := To_Wide_Wide_String(Value);
			when Field_Discovery_Rest_Url =>
			Item.Discovery_Rest_Url := To_Wide_Wide_String(Value);
			when Field_Discovery_Link =>
			Item.Discovery_Link := To_Wide_Wide_String(Value);
			when Field_Icons =>
			Item.Icons.X16 := To_Wide_Wide_String(Get_Value(Value, "x16"));
			Item.Icons.X32 := To_Wide_Wide_String(Get_Value(Value, "x32"));
			when Field_Documentation_Link =>
			Item.Documentation_Link := To_Wide_Wide_String(Value);
			when Field_Labels =>
			for Label in 1 .. Get_Count(Value) loop
				Item.Labels.Append(To_Wide_Wide_String(Get_Value(Value, Label)));
			end loop;
			when Field_Preferred =>
			Item.Preferred := To_Boolean(Value);
		end case;
	end Set_Member;

   package Discovery_Document_Mapper is
     new Util.Serialize.Mappers.Record_Mapper (Element_Type        => Discovery_Document,
                                               Element_Type_Access => Discovery_Document_Access,
                                               Fields              => Discovery_Document_Fields,
                                               Set_Member          => Set_Member);

   package Discovery_Item_Mapper is
     new Util.Serialize.Mappers.Record_Mapper (Element_Type        => Discovery_Item,
                                               Element_Type_Access => Discovery_Item_Access,
                                               Fields              => Discovery_Item_Fields,
                                               Set_Member          => Set_Member);

This theoretically should work, but I'm wondering if its correct or not. I need two mappers for both objects, which means I (somehow) need to make a mapper hierarchy. But is there a better way to do this?

My other question, about dynamic deserialization, is still something I feel remains unanswered. There are numerous fields in each APIs relevant discovery document.The core structure is something I could probably record-ize, but I would definitely need to use a hash map for the actual properties, schemas, etc., because the nested objects within those top-level objects could have any kind of structure, and trying to record-ize them would be a ridiculous notion to even contemplate. So how do you properly combine these two into something that's easy to read and understand by the reader?

Both of these questions were what mainly drove me to create this issue in the first place. I feel like I've almost grasped how this library works, and I want to teach a friend how to use it, but in order for me to do that I need to fully understand the library first.

@stcarrez
Copy link
Owner

stcarrez commented Oct 9, 2022

Ok, I didn't know you were using the record mapper. This is not easy to use, although I'm using it extensively in Ada Servlet, Ada Server Faces and Dynamo. It is more efficient than the method used by OpenAPI because it allows you to read very long documents without mapping them completely in memory.

The most complete example is in Dynamo with the reading and mapping of UML XMI document. It reads XML document but it's the same for the mapping mechanism. After you create the Set_Member procedure, you should define some mapping definition. The Set_Member is called when the mapping is matched and you get the value with the field that tells you what was matched.

The value you get in Set_Member is always a single attribute value. You will never get a tree or an array of objects. If you have an array in the JSON/XML document, the Set_Member will be called several times.

When I created this record mapper 10 years ago, I wanted to be able to compose data structure, so there is the Record_Mapper and Vector_Mapper. You will find an example with the samples/city_mapping.adb examples. I reached compilation issues with gcc with the Vector_Mapper package and I stopped there. I've never really used the composition ability and I'm using only one Record_Mapper package to read a complete document.

For each field you want to collect, you should define a path to access the field. For your example, I think it should look like:

"items"
"items/@kind"
"items/@description"
"items/@id"
"items/icons/@x16"
"items/icons/@x32"
"items/labels"

You associate each path with a field enum and it will be called with the value. If we look at your JSON document, it will match the @kind and the value will contain discovery#directoryItem, then it will match @id with the value abusiveexperiencereport:v1 and so on. When the first item is finished (after preferred), it will call Set_Member with the enum field associated with the "items" path. You know at this time, you have finished the current element and will start a new one. This is the place where you should insert the collected information in your array.

For the "items/labels", I'm not completely sure about the path. According to the OpenAPI description, it is an array of strings, so it will be called for each string in the array. The correct path could be "items/labels/@".

The most complete reader that I've written is in Dynamo to read an XMI file, and you can look at:

  • Set_Member which shows how to populate complex data structure,
  • Add_Mapping code block shows a series of path->field configuration with several complex mappings

@ethindp
Copy link
Author

ethindp commented Oct 9, 2022

@stcarrez What is the easiest solution according to you? I've looked at the examples and they do seem complicated. I'll look at your Set_Member example that you linked to but if there's a better/significantly simpler/easier way of doing this I'd appreciate learning about it. I know of JSON-Ada, but that doesn't (yet) support Unicode. I know of JWX, but that library is kinda awkward to use. It would be nice if I could just stay within the abilities of utilada, but if that isn't possible that's okay too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants