Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Minimise the cube API #3429

Closed
pp-mo opened this issue Oct 1, 2019 · 13 comments
Closed

Minimise the cube API #3429

pp-mo opened this issue Oct 1, 2019 · 13 comments

Comments

@pp-mo
Copy link
Member

pp-mo commented Oct 1, 2019

In my opinion, there is far too much in cube.py, and particularly the Cube class.

I would like to see this minimised by removing purely "convenience" instance methods : By this, I mean things which don't rely (?much) on private class implementation details, so they could just as well be cast as independent operations, which would take the cube as an argument.
As 'utility' functions, these would then exist at the 'cube' package level or even in the generic iris.utils package.

In my mind, this means making a distinction between core operations which define what a cube "is" and are heavily concerned with the implementation, like add_dim_coord and __getitem__, and those which are secondary operations that just "use" cubes.
Certainly, if an operation can be rewritten to work entirely via the public Cube API, then it does not belong in the class. Surprisingly, even functions like 'transpose' may fall into this category.

Specifically, I think all of the following could potentially be moved outside of the Cube class:

  • aggregated_by
  • collapsed
  • interpolate
  • intersection
  • is_compatible
  • regrid
  • replace_coord
  • rolling_window
  • slices
  • slices_over
  • subset

There is another set which could also be addressed, but which are currently in-place operations (see also #3430). If they were rewritten as functions which always return a new cube, then these could also be "taken out" :

  • transpose :
  • rename
  • convert_units

There are some interesting parallels here with the numpy API.
For instance you could remove all those array instance methods, like array.astype, array.reshape, array.transpose etc, when you can instead just use np.astype(array, X) etc.
Most of those do simply redirect to static methods.
I have also seen a comment on the numpy project that they "wish" they had reduced the proliferation of unnecessary instance methods, but it is now too late !

@njsmith If we could go back in time then I'd probably argue for not having these methods on ndarray at all, and using functions exclusively ... Too bad we don't have a time machine :-)

@pp-mo pp-mo added this to the v3.0.0 milestone Oct 1, 2019
@pp-mo
Copy link
Member Author

pp-mo commented Oct 1, 2019

NOTE: I've labelled this for Iris 3.0, because it suggests a serious breaking change.
Of course, we won't be doing this in Iris 3. Maybe I should start up an Iris 4.0 milestone ??

@rcomer
Copy link
Member

rcomer commented Oct 1, 2019

😨

I think I can see where you're coming from, but from a pure user perspective this might just look like "making us change our code for no good reason". I note that the comment you link specifically says that removing the numpy.ndarray methods is "impossible because it would break everyone's existing code that uses methods".

One big difference between the numpy and iris APIs is that all the numpy function are available in the top level package, so I am only ever 3 keystrokes (np.) away from the function I want to use. Iris has a proliferation of modules, making it less obvious where to find things. Having methods attached to the cube (and cubelist) makes them easy to find!

@rcomer
Copy link
Member

rcomer commented Oct 1, 2019

Note that is has also previously been suggested that there is too much in iris.util.

@pp-mo
Copy link
Member Author

pp-mo commented Oct 1, 2019

I think I can see where you're coming from

TBH this is all about making the code tidier, which is basically for developers, not users !

A counter-example though is how hard it is to navigate the iris.cube docspage
That is pretty bloated, is it not ?!?

@pp-mo
Copy link
Member Author

pp-mo commented Oct 1, 2019

Though, re:

proliferation of modules

I do enjoy a good multilevel hierachy, don't you ? 😉

re:

there is too much in iris.util.

But that is really the same problem + I think supports my case, because that is just what's wrong with iris.util : it has no structure + is just a random collection of stuff.

@rcomer
Copy link
Member

rcomer commented Oct 2, 2019

I do enjoy a good multilevel hierachy, don't you ? 😉

I couldn’t possibly comment 😆

I think it’s fair to say that a lot of functionality is not where you would put it if you were starting from scratch. But if we’re going to move things around, we need to consider that an awful lot of code has been written on top of Iris, so it should be as easy as possible to update said code to work with new versions. Perhaps consider creating tools to automate this? Like how 2to3 automatically changes next methods to functions.

@rcomer
Copy link
Member

rcomer commented Oct 2, 2019

Or we could borrow the time machine from the numpy folks....

@rcomer
Copy link
Member

rcomer commented Oct 2, 2019

NOTE: I've labelled this for Iris 3.0, because it suggests a serious breaking change.

BTW, are we still doing the deprecation thing? https://scitools.org.uk/iris/docs/latest/developers_guide/deprecations.html

@pp-mo
Copy link
Member Author

pp-mo commented Oct 2, 2019

are we still doing the deprecation thing?

Well, I still think so.
But I also fear that we have often been over-cautious in the past 😨 ,
when rigid rules have made it unreasonably hard to make simple changes for the better

For specific instance : did this issue today introduce a breaking change,
and will that actually matter to anyone ? It's a judgement call.

@rcomer
Copy link
Member

rcomer commented Oct 3, 2019

did this issue today introduce a breaking change

that link isn't working for me 😕

@pp-mo
Copy link
Member Author

pp-mo commented Oct 3, 2019

link isn't working

Sorry, I fixed it, meant to link #3431.

The form [link text](#issue-num) does not work !

@pp-mo
Copy link
Member Author

pp-mo commented Oct 3, 2019

borrow the time machine from the numpy folks

😁
🐔 ⏲️ ⚙️ ⏳ ⚙️ 🐣

@vsherratt
Copy link
Contributor

vsherratt commented Jan 21, 2021

Can I just raise that this is a perfectly valid way to define a method:

def collapsed(self, coords, aggregator, **kwargs):
    pass  # actual contents of Cube.collapsed here...

class Cube:
    collapsed = collapsed

In fact, so is this (unless you've done something particularly egregious with metaclasses):

class Cube:
    pass

Cube.collapsed = collapsed

Basically there's nothing special about a def block inside a class block, or about naming a function argument self; the "magic" behind methods, ie automatically passing the caller instance in as the first argument, is inherent to the actual function type.

I think moving functions away from the class definition and into dedicated modules is a good idea regardless, 4000 lines is far too much for one file. So really I'd say the choices are:

  1. Completely drop these functions from the Cube API, instead exposing them as functions across various modules - probably mostly iris.cube.* submodules, but I suppose some functions may fit better in other existing modules. As a breaking change, this would force a major version, and ideally deprecation warnings well beforehand.

  2. Attach the functions as Cube methods, without putting the functions themselves in the public API. The change would purely be for developer convenience in dealing with more manageable files.

  3. Expose both function and method options, similar to numpy. This is contentious. Certainly I wouldn't support writing a lot of one-line wrapper methods - that would be terrible for maintainability - but reusing the same function object in both situations is a strong option.

    On one hand, it does break the "one obvious way to do something" guideline, and muddies the Cube namespace. Documentation could be more confusing, because sphinx will find and render exactly the same docstring in both cases; it will generate an appropriate function signature, but the hand-written argument lists and descriptions would either mention a cube argument that isn't there, or not mention one that is. Potentially addressable with a decorator that cleverly alters the docstring on a copy of the function, or just being more careful when writing them.

    On the other, it doesn't force the user into either "always use methods" or "always use functions", nor force them to check all the time which to use because the package itself is inconsistent - they can choose whichever they prefer or fits in best with the rest of their project. Ambivalence on this choice of styles is arguably a good thing and a reasonable exception to the guideline, especially for numpy as such a core package, but just as applicable to iris.

    A sub-option is to only do this temporarily, to provide an easier transition into the first option.

@bjlittle bjlittle modified the milestones: v3.0.0, v3.1.0 Feb 4, 2021
@bjlittle bjlittle modified the milestones: v3.1.0, v3.3.0 Nov 1, 2021
@SciTools SciTools locked and limited conversation to collaborators Sep 30, 2022
@bjlittle bjlittle converted this issue into discussion #5006 Sep 30, 2022
@bjlittle bjlittle removed this from the Candidate for next release milestone Sep 30, 2022

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Projects
None yet
Development

No branches or pull requests

5 participants