Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recursive data types #623

Open
tomsmeding opened this issue Nov 9, 2019 · 16 comments
Open

Recursive data types #623

tomsmeding opened this issue Nov 9, 2019 · 16 comments
Labels
difficult nice-to-have under discussion Things we've not come to a conclusion about.

Comments

@tomsmeding
Copy link
Contributor

tomsmeding commented Nov 9, 2019

Currently, deftype doesn't seem to support recursive data types:

(deftype SExpr (Number [Int]) (List [(Array SExpr)]))

This makes sense, because in general recursive data types could make the C struct "infinitely large"; at least, without putting the recursive values on the heap explicitly.

However, since I don't think one can currently put a value on the heap explicitly in Carp (barring manual use of malloc/free), and since anyway it seems this fails in the parser already, I don't think it's possible to define an s-expression type like the above in Carp.

Is such a feature planned? Am I misunderstanding something?

@eriksvedang
Copy link
Collaborator

That's correct. Actually lists are heap allocated so they could allow for members referring to the data type itself (and you can trick the compiler by redefining the type). But it really isn't supported yet. I'm not totally convinced it's a feature the language need but if/when we try to support it, it will have to be through something like Rust's Box type.

@tomsmeding
Copy link
Contributor Author

Rust's Box is indeed also what I was thinking of. Also not sure whether you want it in the language proper, but I was wondering whether it could be implemented in the library without touching the compiler. Perhaps I'll have a try later this week.

@hellerve
Copy link
Member

I think for sumtypes it definitely makes sense. I thought of it before for data types like JSON (or really any kind of AST) that you define as:

(deftype JSON
  (Null [])
  (Num [Double])
  (Str [String])
  (Arr [(Array JSON)])
  (Obj [(Map String JSON)])
)

This is how most languages I know of define it (see Rust or Haskell). Without self-reference this becomes unnecessarily tedious.

I’ve needed it before for Mustache; without it the List constructor of the type, which contains its own type definition, is not expressable, and we can’t have feature parity with other Mustache libraries.

@scolsen
Copy link
Contributor

scolsen commented Nov 12, 2019

+1! I looked into recursive data types a bit in the past. I did play with some approaches to implementing them in Carp instead of the compiler, but if I recall correctly, the first few attempts I took turned out to be unfeasible.

I just pushed a repo containing some private experiments on implementing recursion schemes in Carp. Presumably, if it's possible to implement recursion schemes using the current compiler + vanilla carp we can fake recursive datatypes. If you want to check it out to see some dead ends or get some inspo, here it is: https://github.com/scolsen/morphisms

I think this lead to nothing. Here's a good reference on other recursion schemes: http:https://okmij.org/ftp/Computation/fixed-point-combinators.html -- it might be possible to use one of the approaches here. I believe the straight up Y combinator implementation is impossible w/ the type system, but I think the site even covers alternatives for this situation.

Some kind of macro-level trickery might be possible too.

@eriksvedang eriksvedang added difficult nice-to-have help wanted under discussion Things we've not come to a conclusion about. and removed help wanted labels Apr 30, 2020
@hellerve
Copy link
Member

hellerve commented May 9, 2020

Some musing about this problem, as it relates to the borrow checker, ownership, and hidden allocations:

A type won’t be able to unconditionally contain itself unless we wrap it in a pointer and do a hidden allocation, which is, as far as I am concerned, a non-option. The obvious solution for a type to contain and own itself would be using something like Rust’s Box, which also allocates, but is clear about it.

Apart from that, there are situations in which a type should be able to refer to itself. They are:

  • When it’s wrapped in a Ref.
  • When it’s wrapped in a Ptr.
  • When it’s wrapped in an Array (because that is a hidden indirection).
  • When it’s wrapped in a generic type that only destructures into something containing any of the above (like a Map, which will have an Array indirection).

The first three are fairly mechanical, but the fourth one is tricky, because it requires inspection of the child type, possibly recursively.

I think adding support in C should not be an issue, but getting the semantics right to make sure that all the cases that should be allowed are allowed will be somewhat hard.

@tomsmeding
Copy link
Contributor Author

@hellerve If you also want to support mutual recursion in types, then you need the fourth point anyway. The C compiler already does this (it allows pointers to self but not values of self in a struct), but it has the advantage of the language enforcing that types must be defined before they are used.

However, given that the compiler needs to know all relevant types to be able to compile a piece of code anyway, it doesn't sound very hard to me, at least in theory, to work this out recursively. There just needs to be a (small) set of built-in types that contain their argument indirectly: it sounds like that is Ref, Ptr and Array (is Array built-in?). These are then the types that "break" the recursive search for self in the check that implements this semantic validity.

@hellerve
Copy link
Member

hellerve commented May 9, 2020

We don’t currently support mutual recursion because Carp is currently a one-pass compiler, meaning that it actually goes through your code top-to-bottom trying to figure things out. So in order to make mutual recursion work we’d have to change that, which would be a medium-sized to big change; it would give us the opportunity to make other things work as well and is definitely worth it, but it is a big chunk of work.

The check itself shouldn’t really be hard to do per se, it might be a little intricate to get right, though, since with type variables a later specialization might introduce something that is not quite right. I’m not quite sure off the top of my head whether that could even ever happen, but type variables and concretization definitely make this a little more difficult to reason about (especially now that we have higher kinds, which abstract the constructors).

@scolsen
Copy link
Contributor

scolsen commented May 19, 2020

So, we've discussed adding something like Box, which could get us toward the right direction.

I don't have the full context yet, and it sounds like the one-pass restriction might prevent this, but perhaps we can support this by introducing a new type system rule that more or less says the following?

  • Given a type, T, and a constructor for T C, C may refer to T if and only if T is wrapped in a Ref

Such that:

(deftype (Foo a) (Bar [(Ref (Foo a))])) is a legal type but
(deftype (Foo a) (Bar [(Foo a)])) is not.

Which is more or less what @hellerve suggested above. But I think restricting the functionality to a single type simplifies things (we just need to make sure it's flexible enough to support all cases).

I think 🤔 ? that might work. But I'm not sure. Nods to @hellerve again, I suppose the issue becomes disambiguating between generic types...

@tomsmeding
Copy link
Contributor Author

Just thought about this, maybe you've already discussed this but I wanted to note it nonetheless: what if I write the following?

(deftype (Foo a) (Bar [(Ref (Foo (Maybe a)))]))

You then quickly land in the terrain of polymorphic recursion, which is a whole topic in itself.

@hellerve
Copy link
Member

@tomsmeding this is actually an infinite type, because (Foo a) contains a (Foo (Maybe a)) which would contain a Foo (Maybe (Maybe a))) and so on.

@scolsen this might work, but the ergonomics of having to hold onto the contained types becomes fairly questionable. Most importantly, you won’t be able to return that type from the function that owns the wrapped and wrapper type; that is more or less the problem that Box solves, as I understand it.

@tomsmeding
Copy link
Contributor Author

I know that's kind of an infinite type, but for example, Haskell supports that just fine, and you use polymorphic recursion in functions to traverse the type.

If you're worried about the data structure itself being infinite: sorry, assume there's a second data constructor, say Bar2, that doesn't recurse. :)

@hellerve
Copy link
Member

Oh, interesting! I didn’t know Haskell supported that! Yeah, I feel like that’s a fairly complicated beastie then. I’m not entirely sure how we would represent that, but I think it’s one of those problems where thinking about it a lot will help :)

Thanks for that example, definitely changes the perspective a little bit for me.

@scolsen
Copy link
Contributor

scolsen commented May 19, 2020

I know that's kind of an infinite type, but for example, Haskell supports that just fine, and you use polymorphic recursion in functions to traverse the type.

If you're worried about the data structure itself being infinite: sorry, assume there's a second data constructor, say Bar2, that doesn't recurse. :)

Haskell only supports polymorphic recursion when types are explicitly annotated though, no? We could just piggy back off of haskell and have the same restriction—polymorphic recursion is ok, but you lose inference in such cases.

@tomsmeding
Copy link
Contributor Author

tomsmeding commented May 19, 2020

It seems you're indeed right about that; consider the following Haskell example:

import Control.Monad (join)

data Foo a = Bar (Foo (Maybe a)) | The a
  deriving (Show)

joinFoo :: Foo a -> Maybe a
joinFoo (Bar foo) = join (joinFoo foo)
joinFoo (The x) = Just x

main :: IO ()
main = do
    print (joinFoo (Bar (Bar (Bar (Bar (The (Just (Just (Just (Just 42))))))))))

This compiles and runs fine, but if one removes the joinFoo :: Foo a -> Maybe a type signature, compilation fails because GHC infers two conflicting types for joinFoo: Foo a -> Maybe a and Foo (Maybe a) -> Maybe (Maybe a).

EDIT: For the non-Haskellers reading this: join has type Monad m => m (m a) -> m a, and in the case of Maybe it has the definition join (Just (Just x)) = x, join (Just Nothing) = join Nothing = Nothing.

@scolsen
Copy link
Contributor

scolsen commented May 19, 2020

Cool! I think for a start I'll experiment with:

  • Allowing some type T to refer to itself when its wrapped in a Ref.

and see how that goes. I suspect I'll quickly run into the ergonomic issues @hellerve called out; I also suspect cases that are polymorphic recursive like @tomsmeding mentioned won't be unifiable.

From there we can puzzle out a better way to encode this other that Refs (possibly just by introducing a Box type) and determine some appropriate warning/error messaging for the polymorphic recursion cases.

This exercise will at least confirm whether or not the simple change of allowing types to refer to themselves is in fact simple--from there we can figure out the more complex backend changes needed to make it work.

@eriksvedang
Copy link
Collaborator

One thing to note is that currently types created with deftype can't contain Ref:s. So maybe that's a restriction that needs to be lifted first, for this to work? (my intuition is that it should be possible anyways, though)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
difficult nice-to-have under discussion Things we've not come to a conclusion about.
Projects
None yet
Development

No branches or pull requests

4 participants