Skip to content
This repository has been archived by the owner on Dec 11, 2021. It is now read-only.

Idea: use "packed ASTs" #8

Open
Araq opened this issue Sep 6, 2021 · 11 comments
Open

Idea: use "packed ASTs" #8

Araq opened this issue Sep 6, 2021 · 11 comments

Comments

@Araq
Copy link

Araq commented Sep 6, 2021

Have you tried this design:

type
  Node = distinct int32

template kind*(n: Node): JsonKind = JsonKind(n and KindMask)
template operand*(n: Node): int32 = n shr KindShift

type
  JsonTree = object
    kids: seq[Node]
    atoms: BiTable[string] # store numbers as strings too and parse them to numbers in the accessor procs

The crucial idea here is that atoms use 'operand' bits to index into the 'atoms' BiTable while compound nodes use it to store the total number of children. Via some clever arithmetic operations you can easily compute the ith-child anyway. See my PackedAST code in compiler/ic.

This should be close to the optimum design. The API on top should encourage efficient operations (don't use random access, iterate over the tree instead).

@planetis-m
Copy link
Owner

planetis-m commented Sep 8, 2021

What's the purpose of the BiTable though? Do JSON keys get stored there? Note currently every JNode object has a table containing the ids of its key nodes.

@Araq
Copy link
Author

Araq commented Sep 9, 2021

You need both the string -> ID and ID -> string lookups to be fast.

@planetis-m
Copy link
Owner

thanks for the explanation and the proposal, I will study the compiler source more and I will create a prototype eventually.

@Araq
Copy link
Author

Araq commented Sep 9, 2021

And yes, JSON keys get stored in there too, effectively compressing the key-space, every key is stored only once.

@Araq
Copy link
Author

Araq commented Sep 9, 2021

Oh and one more thing: What I'm proposing here is in the end less code than your current solution, there is really just a single seq for the complete tree structure plus the BiTable. A single JSON node would just be an index into the tree seq.

@planetis-m
Copy link
Owner

planetis-m commented Sep 9, 2021

Yes, I am aware, it sounds ideal. Regarding the API, previously I just grouped the type JsonNode = (nodeid, ref JsonTree) and provided an API nearly identical to std/json. I suppose you're not a fan of this? What could be an alternative?

@Araq
Copy link
Author

Araq commented Sep 10, 2021

I don't know exactly but my PackedJson distinguishes between JsonTree and JsonNode and it worked reasonably well. The API could also offer a JsonBuilder type.

@planetis-m
Copy link
Owner

New repo is here: https://github.com/planetis-m/packedjson2 its wip.

@Araq
Copy link
Author

Araq commented Sep 13, 2021

Awesome! Looks good!

@planetis-m
Copy link
Owner

planetis-m commented Sep 17, 2021

Sorry not much progress last week but I am planning on resuming the next. I am kind of clueless from this point onward. How does a JsonBuilder definition/api look like? Note I have made % procs that output directly to string, but they are unoptimized. Sure a proc signature of proc toJson(result: var string) might be faster but less convenient. If there is already a distinction of JsonNode/JsonTree would there still be a need for a JsonBuilder thing? Finally this definition of JsonNode makes sense to you?

type
  JsonNode* = object
    k: JsonNodeKind
    a: NodePos
    t: ref JsonTree

My worries are: 1. too much duplication of information, the kind field is also stored in the JsonTree.nodes seq, 2. a pretty hefty JsonNode type (sizeof is 24 bytes) and 3. too many indirections, calling getStr results in a pointer dereference of ref JsonTree, then JsonTree.nodes at NodePos and finally BiTable.vals[LitId x]. At least I was thinking of also fusing the kind with the NodePos field.

There is also the issue of taking a unsigned LitId and casting it to int32 then using the lowest 3 bit for the kind. I suppose bitabs needs some refactoring, first is there a reason for reserving 0 - 255? Then maybe changing LitId to int32 and overflow checks like making sure the len of the BiTable, doesn't exceed 2^29.

Hope it makes sense and sorry for the long text.

@Araq
Copy link
Author

Araq commented Sep 17, 2021

I am kind of clueless from this point onward. How does a JsonBuilder definition/api look like?

The JsonBuilder exists to keep a seq of PatchPos so that you can do more easily: beginArray/endArray and beginObject/endObject.

Finally this definition of JsonNode makes sense to you?

Nah, just use a distinct int that indexes into the JsonTree.

I suppose bitabs needs some refactoring, first is there a reason for reserving 0 - 255?

No, it's a legacy and not required. Cannot be too hard to remove it.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants