Skip to content
This repository has been archived by the owner on Aug 12, 2022. It is now read-only.

Register new package IteratorTraits.jl v0.0.1 #11692

Closed
wants to merge 1 commit into from

Conversation

attobot
Copy link
Contributor

@attobot attobot commented Oct 22, 2017

Repository: davidanthoff/IteratorTraits.jl
Release: v0.0.1
Travis: Travis Build Status
cc: @davidanthoff

Please make sure that:

  • CI passes for supported Julia versions (if applicable).
  • Version bounds reflect minimum requirements.

@davidanthoff This PR will remain open for 24 hours for feedback (which is optional). If you get feedback, please let us know if you are making changes, and we'll merge once you're done.

@davidanthoff
Copy link
Contributor

Bump, could this be merged? It had its 24 hours, right?

This is part of the big breakup of Query into smaller, self-contained parts.

@JeffBezanson
Copy link
Sponsor Member

What does getiterator(x) mean if x is already iterable? It seems to me such a function should have a name describing the sort of alternate iterator it returns.

@davidanthoff
Copy link
Contributor

The default implementation is just the identity function, so if x is already iterable it will just return x. (here). But you can add a custom method for your own type that returns something else. The main use-case for this is that you have a type like DataFrame that in itself doesn't have enough type information to implement the iterator interface in a type stable way, you can add a method to getiterator that returns an instance of another type that does have enough type info to implement the iterator protocol in a type stable way. For example, the implementation for DataFrames returns an instance of DataFrameIterator that has enough type info to implement type stable versions of start, next and done.

Both Query and IterableTables follow this convention: when they have to handle something that can be iterated (i.e. isiterable returns true) they will actually iterate whatever they get back from a call to getiterator.

I'm not super happy with this design. On one hand it kind of resembles the IEnumerable and IEnumerator distinction in .Net, but on the flipside we kind of have that with start already. But I couldn't think of something else that enables this kind of type-stable iteration of things like DataFrame and that fits with the whole Query and TableTraits design... I think for now it does the job, so I'd just like to keep it as is. Mainly because I think this whole aspect of Query and TableTraits will be redesigned in any case once traits or interfaces come to julia in some 1.x version.

I should say that at some point I thought this would be a good addition to base, and that it would be cool if for i in X would be lowered in such a way to hook into getiterator, but I'm not convinced anymore and think the best thing is to just wait for traits and interfaces and then see what a better design might be.

@quinnj
Copy link
Member

quinnj commented Oct 24, 2017

I've actually had the thought over the last week that maybe we shouldn't hold DataFrames hand so much w/ all the table-interops. We're going through all these indirections and extra hoops just for DataFrames. If DataFrames doesn't provide enough type information, then it won't have type-stable iterators. @nalimilan and I have discussed adding a TypedDataFrame type or making the default DataFrame retain type information and have a UnTypedDataFrame that could be used in a transition period.

@JeffBezanson
Copy link
Sponsor Member

If DataFrames doesn't provide enough type information, then it won't have type-stable iterators

💯

Agreed, this is a design decision in DataFrames. It would be great if DataFrames could just be fully-typed (ala https://github.com/JuliaComputing/IndexedTables.jl/blob/master/src/columns.jl), but that doesn't seem to be practical in all cases. So for the time being we probably need both DataFrames and TypedDataFrames.

But, if the real purpose of this function is to add type-stability, then it could be called something like typediterator, and/or could be implemented with something like

struct TypedIterator{T,I}
    itr::I
end

eltype(::Type{TypedIterator{T,I}}) where {T,I} = T

...

@quinnj
Copy link
Member

quinnj commented Oct 24, 2017

I've actually been playing around w/ that exact definition here

@davidanthoff
Copy link
Contributor

Well, for now DataFrame is around and this all works, so I don't really think there is a great need to redesign anything, right? In particular, given that I expect a redesign will occur once traits/interfaces land, so any redesign at this point is probably just extra work that wouldn't buy us any longterm solution.

I should also say that the same pattern is also used with a lot of the other iterable table sources, so it is useful far beyond just DataFrames. I think the pattern that there is a container type that doesn't have all the type info you need for type-stable iteration is pretty common, so it is nice to be able to support those scenarios.

Just now saw @JeffBezanson comment. I think I actually had the name as gettypediterator at some point (or at least thought about that). In the end I didn't use it because while type-stability is certainly the motivating story here, it is not necessarily the only use. At the end any custom type can just use this as a hook to inject some second type to be used for iteration, and who knows, there might be other uses for this as well.

I don't really feel strongly about the name, though. Having said that, I would like to merge this as is for now. Changing the name would be a major piece of work because there are just a lot of sinks and sources at this point that use this, spread over many packages. My idea for this PR is simply to move existing, working, functional stuff into a separate package, which is a very non-disruptive change to the ecosystem.

Once this is registered and I've moved TableTraits over to use it, we can think about a rename. I think there is a way to do that smoothly and slowly, but I'd like to do that as a second step once I have a bit more time.

@davidanthoff
Copy link
Contributor

I've actually been playing around w/ that exact definition here

Here is another version :).

@JeffBezanson
Copy link
Sponsor Member

I just don't understand why there would be a generic hook to get another kind of iterator for an iterator. How do you know when to call or define it? I also think the issue extends beyond just the name. Type-stable iteration shouldn't be required for things to work.

@nalimilan
Copy link
Member

DataFrame is currently not iterable at all, so you need a wrapper iterator anyway. I also don't see how we could make iterating over rows of a DataFrame type stable without a generic function defined somewhere. I agree getiterator isn't a great name, though.

@nalimilan and I have discussed adding a TypedDataFrame type or making the default DataFrame retain type information and have a UnTypedDataFrame that could be used in a transition period.

We've discussed that, but I don't think I said I was convinced encoding column types in the DataFrame type was a good idea in general. I'm still afraid it would force recompilation of lots of functions which won't benefit from specialization on the column types. That would mainly be useful in the context of by where the user-provided function would need that to be inferrable.

@davidanthoff
Copy link
Contributor

What is the next step here? I'd love if we found a better solution to this eventually, and even better, if that was a pattern in base, but in the meantime, can we merge this here? Query and IterableTables have used this for more than a year, it works and solves a problem and for now I just want to move the various bits and pieces from the original Query.jl into smaller packages so that I can enable new integration scenarios without forcing folks to take a dependency on Query.jl.

@JeffBezanson
Copy link
Sponsor Member

I don't think there's any need to add UnTypedDataFrame; I'm fine with DataFrame being untyped and having a separate type for the fully-typed version.

@JeffBezanson
Copy link
Sponsor Member

DataFrame is currently not iterable at all, so you need a wrapper iterator anyway

It sounds like this is the real problem. DataFrame should just adopt row iteration, which IIUC is the protocol for tabular data.

@nalimilan
Copy link
Member

It sounds like this is the real problem. DataFrame should just adopt row iteration, which IIUC is the protocol for tabular data.

Yes, we've discussed doing that. But in R iteration happens over columns, so leave me a bit of time to adapt mentally. There's also the issue that df[:a]/df[1] returns a column, not a row.

@davidanthoff
Copy link
Contributor

Bump, can this please be merged now? Its been four days since I opened this PR.

We can iterate over the precise naming of the things in this package as times goes on, but as far as I understand the package registration process, this should just be registered at this point because I don't intend to make any changes for version v0.0.1.

@KristofferC
Copy link
Sponsor Member

KristofferC commented Oct 25, 2017

This package has a very official sounding name. I have problems merging something with this name that is 3-4 lines of code that seems to mostly be to try something out. I would suggest using something less official sounding (see https://github.com/MikeInnes/TakingBroadcastSeriously.jl for inspiration) and then if this turns out to be useful, it would be simple to rename the package.

@davidanthoff
Copy link
Contributor

This is not trying something out. This is splitting things that have been working well for more than a year in a pretty widely used package (Query.jl) into smaller pieces so that things become more composable. It will allow folks to integrate more closely with Query.jl without having to take a dependency on Query.jl. It will also allow me to not have Query.jl depend on TableTraits.jl and friends anymore. There is literally zero question whether this is useful or not, it has been useful for a long while.

@KristofferC
Copy link
Sponsor Member

KristofferC commented Oct 25, 2017

This does not define any Traits for Iterators in the sense that a Trait is used throughout the Julia community. It also uses interactive features not meant to be called in actual code like method_exists.

A QueryOverloads, QueryBase etc or something where you can put whatever functions that you want to packages to overload seems like a much better idea than to pretend that this is somehow a package for traits for general iterators, that is useful outside the Query-verse.

@davidanthoff
Copy link
Contributor

The package name really shouldn't have Query in it because the stuff in it is useful far beyond Query.jl. For example, TableTraits.jl also uses it, and that can be used entirely independently from Query.jl. The whole goal of this exercise is to decouple TableTraits.jl and Query.jl from each other as much as possible.
The package really extends the iterator interface in base with a couple of things that can be useful in a variety of contexts, two of them being Query.jl and TableTraits.jl. It does that in a completely backwards compatible way. My plan is to add one more extension eventually, and then that should be it.

So what this package provides is 1) a way to test whether a type is iterable, 2) a way to get an iterator that is type stable and 3) it will provide JuliaLang/julia#22467 in a package. None of these are Query.jl specific, all of them might well find use in entirely different scenarios outside the Query-verse and all of them apply to general iterators.

Re the use of method_exists, it was my understanding of the process for registering packages that it would not involve a detailed discussion of implementation details of the package? I think METADATA PRs should check that packages don't do things that mess up other packages (type piracy etc.), that they don't do malicious things (delete user files) and initially that the name of the package follows the guidelines from the doc. Can we please keep the discussion here to those things and move forward with the registration? I'm thrilled to get feedback on the details of the package as issues, PRs etc., but please lets not do this as part of the package registration process.

Other potential names might be IteratorExtensions, IteratorInterfaceExtensions, IteratorInterface2 or something like that.

@JeffBezanson
Copy link
Sponsor Member

a way to get an iterator that is type stable

If that's really what this is for, great, but then it should be named appropriately and actually implement that functionality.

@davidanthoff
Copy link
Contributor

If that's really what this is for, great, but then it should be named appropriately

Are you suggesting a name change for the package or for the getiterator function? If the latter, didn't I respond to that one above already? I'd be happy to work on that in a second step, after this has been registered, because that is the less disruptive way to introduce this change into the pretty large ecosystem and jives better with my current project plan for all of this.

actually implement that functionality

I don't understand that part.

While I appreciate these design suggestions, at this point I would like to get clear guidance what I need to do to get this registered. I take it that @KristofferC wants a different name for the package, that seems fair game for the registration review here, so if I could get feedback on the alternative proposals I made, or some alternative suggestions, it would be great.

For now I'm interpreting all the other comments about the design of the package itself as suggestions. I really appreciate them, but my understanding of the policy for package registration is that those are suggestions and that a package author is free to incorporate them or not. In this case I plan to do that, but not at this point.

@JeffBezanson
Copy link
Sponsor Member

I second @KristofferC 's opinion on the name. IteratorTraits sounds pretty comprehensive and official. It also seems like this could stay within TableTraits, which is also a very small package, so depending on it shouldn't really be a problem.

@davidanthoff
Copy link
Contributor

I don't want to keep this stuff in TableTraits because I want to decouple QueryOperators from TableTraits by removing the dependency of QueryOperators on TableTraits (for a lot of reasons related to some future plans around TableTraits).

Perfectly happy to rename the package, though. Any thoughts about the alternative names I suggested? Any other naming ideas?

@davidanthoff
Copy link
Contributor

Would IteratorInterfaceExtensions be an acceptable name? It seems a) to properly describe what is in the package (a bunch of extensions to the iterator interface) and b) the name makes clear that these are extensions and not something official about iterators.

If folks don't like that, I would appreciate other name suggestions that a) don't include the term "query" and b) ideally include the term "iterator" in some form.

@KristofferC
Copy link
Sponsor Member

It seems a) to properly describe what is in the package (a bunch of extensions to the iterator interface)

I don't understand. What is in the package right now is a function that does what method_exists does, except it only looks for the function start, and an identity funcion.

@davidanthoff
Copy link
Contributor

What is in the package right now is a function that does what method_exists does, except it only looks for the function start, and an identity funcion.

Other packages are adding methods to both functions for their own types, i.e. this package here defines the functions that one can implement for ones own types. I guess a comparable situation would be a package that defines the base iterator interface: it would define the functions start, next and done (and a bunch of other ones), but not necessarily provide any interesting implementations, those really come with the types that implement the interface. Same story here, except that I provide fall-back standard implementations in the package for types that don't add their own methods.

@attobot
Copy link
Contributor Author

attobot commented Oct 30, 2017

Repository: davidanthoff/IteratorTraits.jl
Release: v0.0.1
Travis: Travis Build Status
cc: @davidanthoff

Please make sure that:

  • CI passes for supported Julia versions (if applicable).
  • Version bounds reflect minimum requirements.

@davidanthoff This PR will remain open for 24 hours for feedback (which is optional). If you get feedback, please let us know if you are making changes, and we'll merge once you're done.

@davidanthoff
Copy link
Contributor

I've added the iteratorsize2 stuff now and also added some documentation to the README.

@davidanthoff
Copy link
Contributor

Alright, this one can be closed, superseded by #11800, as suggested on slack.

@nalimilan nalimilan closed this Dec 26, 2017
@attobot attobot deleted the IteratorTraits/v0.0.1 branch December 26, 2017 14:28
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants