This whole problem is because in python you can not specify interfaces and say "...

burntsushi · 2024-04-16T23:51:12.000000Z

No. It's a problem because you can only have one version of any given package in your dependency tree. You can't have `foo 2` and `foo 3` in your dependency tree. Without that limitation, there is a release valve of sorts where you can incur two different semver incompatible releases of the same package in your dependency tree in exchange for a working build. The hope is that it would be transitory state until all of your dependencies migrate.

Rust, for example, has precisely this same problem, except that it is limited to public dependencies. For example, if `serde 2` were ever to be published, then there would likely be a period of immense pain where, effectively, everyone needs to migrate all at once. Even though `serde 1` and `serde 2` can both appear in the same dependency tree (unlike in Python), because it is a public dependency, everyone needs to be using the same version of the library or else the `Serialize` trait from `serde 1` will be considered distinct from the `Serialize` trait (or whatever) in `serde 2`.

But if I, say published a `regex 2.0.0` tomorrow, then folks could migrate at their leisure. The only downside is that you'd have `regex 1` and `regex 2` in your dependency tree. Potentially for a long time until everyone migrated over. But your build would still work because it is uncommon for `regex` to be a public dependency.

(Rust does have the semver trick[1] available to it as another release valve of sorts.)

This problem is definitely not because of missing interfaces or whatever.

[1]: https://github.com/dtolnay/semver-trick

Chris_Newton · 2024-04-17T03:14:32.000000Z

It's a problem because you can only have one version of any given package in your dependency tree. You can't have `foo 2` and `foo 3` in your dependency tree.

That does seem to be the fundamental problem with the Python model of dependency management.

If your dependencies have transitive dependencies of their own but your dependency model is a tree and everything is clearly namespaced/versioned, you might end up with multiple versions of the same package installed, but at least they won’t conflict.

If your dependency model is flat but each dependency bakes in its own transitive dependencies so they’re hidden from the rest of the system, for example via static linking, again you might end up with multiple versions of the same package (or some part of it) installed, but again they won’t conflict.

But if your dependency model is flat and each dependency can require specific versions of its transitive dependencies to be installed as peers, you fundamentally can’t avoid the potential for unresolvable conflicts.

A pragmatic improvement in the third case is, as others have suggested, to replace the SemVer-following mypackage 1.x.y and mypackage 2.x.y with separate top-level packages mypackage1 x.y and mypackage2 x.y. Now you have reintroduced namespaces and you can install mypackage1 and mypackage2 together as peers without conflict. Moreover, if increasing x and y faithfully represent minor and point releases, always using the latest versions of mypackage1 and mypackage2 should normally satisfy any other packages that depend on them, however many there are.

Of course it doesn’t always work like that in practice. However, at least the problem is now reduced to manually adjusting versions to resolve conflicts where a package didn’t match its versions to its behaviour properly and/or Hyrum’s Law is relevant, which is probably much less work than before.

int_19h · 2024-04-17T05:38:51.000000Z

As the article explains, this is precisely why the social expectations around Python package versioning are very different from JS package version (i.e. you can't just break things willy nilly even in major releases and cite semver as justification).

That aside, note the obvious problems here for any language that uses nominal typing - like, say, Python. Since types from dependencies can often surface in one's public API, having a tree of dependencies means that many libraries will end up referring to different (and thus ipso facto incompatible) versions of the same type.

Chris_Newton · 2024-04-17T07:05:45.000000Z

social expectations around Python package versioning are very different from JS package version

If anything, I’d say in my experience the Python community tends to be more willing to make big changes. After all, Python itself famously did so with the 2 to 3 transition, and to some extent we’re seeing a second round of big changes even now as optional typing spreads through the ecosystem.

Admittedly, the difference could also be because so few packages in JS world seem to last long enough for multiple major versions to become an issue. The Python ecosystem seems more willing to settle on a small number of de facto standard libraries for common tasks.

Since types from dependencies can often surface in one's public API, having a tree of dependencies means that many libraries will end up referring to different (and thus ipso facto incompatible) versions of the same type.

Leaving aside the questionable practice of exposing details of internal dependencies directly through one’s own public interface, I don’t see how this is any different to any other potential naming conflict. Whatever dependency model you pick, you’re always going to have the possibility that two dependencies use the same name as part of their interface, and in Python you’re always going to have to disambiguate explicitly if you want to import both in the same place. However, once you’ve done so, there is no longer any naming clash to leak through your own interface either.

int_19h · 2024-04-17T18:39:32.000000Z

> After all, Python itself famously did so with the 2 to 3 transition

That transition has been so traumatic for the whole ecosystem that, if anything, it became an abject lesson as to why you don't do stuff like that. "Never again" is the current position of PSF wrt any hypothetical future Python 3 -> 4 transition.

Major Python libraries pretty much never just remove things over the course of a single major release. Things get officially announced first, then deprecated for at least one release cycle but often longer (which is communicated via DeprecationWarning etc), then finally retired.

> Leaving aside the questionable practice of exposing details of internal dependencies directly through one’s own public interface

Not all dependencies are internal. If library A exposes type X, and library B exposes type Y that by design extends X (so that instances of Y can be passed anywhere X is expected), that is very intentionally public.

Now imagine that library C exposes type Z that also by design extends X. If B and C both get their copy of A, then there are two identical types X that are not type-compatible.

Now suppose we have the app that depends on both B and C. Its author wants to write a generic function F that accepts an instance of X (or a subtype) and does something with it. How do they write a type signature for F such that it can accept both Y and Z?

Chris_Newton · 2024-04-18T00:51:09.000000Z

Major Python libraries pretty much never just remove things over the course of a single major release. Things get officially announced first, then deprecated for at least one release cycle but often longer (which is communicated via DeprecationWarning etc), then finally retired.

I’m not sure that’s a realistic generalisation. To pick a few concrete examples, there were some breaking changes in SQLAlchemy 2, Pydantic 2, and as an interesting example of the “rename the package instead of bumping the major version” idea mentioned elsewhere, from Psycopg2 to Psycopg (3). I think it’s fair to say all of those are significant packages within the Python ecosystem.

Not all dependencies are internal. If library A exposes type X, and library B exposes type Y that by design extends X […] Now imagine that library C exposes type Z that also by design extends X

Yes, you can create some awkward situations with shared bases in Python, and you could split all of the relevant types into different libraries, and this isn’t a situation that Python’s object model (or those of many other OO languages) handles very gracefully.

Could you please clarify the main point you’d like to make here? The shared base/polymorphism complications seem to apply generally with Python’s object model, unless you have a set of external dependencies that are designed to share a common base type from a common transitive dependency and support code that is polymorphic as if each name refers to a single, consistent type and yet the packages in question are not maintained and released in sync.

That seems like quite an unusual scenario. Even if it happens, it seems like the most that can safely be assumed by code importing from B and C — unless B and C explicitly depend on exactly the same version of A — is that Y extends (X from A v1.2.3) while Z extends (X from A v1.2.4). If B and C aren’t explicitly managed together, I’m not convinced it’s reasonable for code using them both to assume the base types that happen to share the same name X that they extend and expose through their respective interfaces are really the same type.

cardanome · 2024-04-16T23:34:43.000000Z

> And to people about to mention types in python: those are also checked at runtime.

They are not checked at runtime at all. Type declarations are only used for static analyzing tools and not by the runtime.

So types are checked BEFORE runtime by the tooling just like they would be in TypeScript or any other language that offers gradual typing.

Yes, the dynamic nature of Python does make type safety and certain performance optimizations very difficult but then again it is the dynamic nature that allows for the high productivity of the language. A static language would be far less ergonomic to use for the typical prototyping and explorative programming done in Python.

int_19h · 2024-04-17T05:40:25.000000Z

> A static language would be far less ergonomic to use for the typical prototyping and explorative programming done in Python.

A static language without type inference, sure. But that's not the only option.

OCaml, for example, will infer object types for you based on what methods are called with what kinds of arguments inside the body.

flakes · 2024-04-17T07:09:27.000000Z

> They are not checked at runtime at all. Type declarations are only used for static analyzing tools and not by the runtime.

This is the common usecase, but types certainly are used at runtime by many libraries. Frameworks like FastAPI use the type annotations to declare dependency injection which is resolved during application startup. In other cases like Pydantic, they are used to determine marshalling/unmarshalling strategies.

oivey · 2024-04-16T23:08:52.000000Z

It makes me pretty mad and sad that people think static languages solve this problem pretty much at all. If you do, I have a version of liblzma for you to install. If you do, do you release versions of your libraries without version numbers because the compiler will catch any mistakes?

o11c · 2024-04-17T02:44:36.000000Z

Theoretically static languages don't solve this problem, but in practice, programmers writing packages in a static language don't gratuitously break their API every release or so. Which seems far too common in Python-land.

oivey · 2024-04-17T05:24:38.000000Z

I’m not entirely sure that’s true, and I’m not sure it makes sense to extrapolate all dynamic languages from Python.

Huge amounts of effort are expended on Linux distros ensuring that all the packages work together. Much of and maybe most of those packages are written in static languages.

Many Python packages don’t have issues with things constantly breaking. I find NumPy, SciPy, the Scikits, and more to be rather stable. I can only think of making trivial fixes in the last few years. I have lots of exotic code using things like Numba that’s been long lived. I’m guessing Flask and Django are pretty stable at this point, but I don’t work on that side of things.

Packages undergoing a lot of construction still are less nice. I think that might be the nature of all new things, though. The example at the beginning of this article, TensorFlow, is still a relatively new sort of package and is seeing tons of new development still.

Packaging in Python in 2024 still sucks, which is a uniquely Python issue. Python’s slowness necessitating wrapping lots of platform specific binaries doesn’t help. Seemingly even major Python projects like TensorFlow have really only just started making an attempt to version their dependencies. In one of the issues in the article, the issue was TF pinning things way too specifically in the main project. One of the satellite projects had the opposite issue, not even setting min bounds. The Wild West of unpinned deps make it hard for upstream authors to even know they are breaking things.

Many people know Python packaging sucks, but I don’t think they know how bad it really is. The slowness is also special to Python. Other languages like Julia and Clojure seem to be much better with these difficulties, and I think in large part this is due to early investments preventing the problems from festering.

Rust vs C++ is a good comparison I think. Cargo is better than anything C++ has by far. In C++, it’s common to completely avoid dependencies altogether because the best you’ve had historically is the OS-specific package manager. The issue isn’t static vs dynamic. The issue is early investment in packaging and community uptake.

hnfong · 2024-04-17T08:58:43.000000Z

> TensorFlow, is still a relatively new sort of package and is seeing tons of new development still.

But I thought Tensorflow is already "dead" and everyone is moving to Torch...?

Even if it's not dead, tf has been around for almost a decade by now.

The landscape of ML is changing rapidly I'll grant you that, so I guess that might necessitate more visible changes esp. on API and dependencies...

VS1999 · 2024-04-17T00:55:10.000000Z

It solves the issue of finding out a function signature changed at compile time instead of runtime, which is infinitely better. The real answer is that serious software developers don't just leave their packages on auto-update and rarely, if ever, update dependencies unless there's a good reason.

llm_trw · 2024-04-17T00:57:27.000000Z

It doesn't solve the problem of the function body changing though.

oivey · 2024-04-17T05:43:37.000000Z

It finds the most trivial mistake. It isn’t infinitely better because static typing is far from free.

LegionMammal978 · 2024-04-16T22:34:21.000000Z

Since when has static typing eliminated backward-compatibility problems in interfaces? Not all visible runtime behavior can be encoded in the type system, unless you're using a proof assistant or formal-verification system.

AlotOfReading · 2024-04-16T23:04:10.000000Z

Even formal verification misses visible behaviors, the line is just a little farther down. Correct runtime visible behavior can also include side effects like the amount of time taken (e.g. constant time cryptography), or even the amount of heat generated [0]. You're not going to encode everything you could possibly want in a type system without modeling the entire universe, so draw an opinionated line somewhere reasonable and be happy with it.

[0] https://news.ycombinator.com/item?id=39751509#39761349

eesmith · 2024-04-17T05:46:38.000000Z

I need a function which, given a graph, returns the graph diameter as an integer.

I need another function which, given a graph, uses a faster but approximate method to return the graph diameter as an integer.

I need a third function which, given a graph, returns the graph radius as an integer.

All three of these functions have an identical type signature.

Oh, now I need something which takes a regex pattern string and a haystack string, and returns 1 if the pattern is found in the haystack, otherwise 0.

And the regex "|" pattern must match longest first, not left-right.

And it needs to support "verbose" mode, which allows comments.

And it supports backreference matches.

How do you express that type?

Now I need to numerically integrate some arbitrary function "f" in the range 0.0 to 1.0. Which of the many numeric integration methods should I use which prevents runtime issues like being unable to converge?

leni536 · 2024-04-16T22:29:24.000000Z

Function interfaces are better, but not necessarily sufficient. Semantics of functions can also change in a breaking way.

cqqxo4zV46cp · 2024-04-16T23:47:52.000000Z

What!? A function’s signature does not completely describe its behaviour. This doesn’t remotely address the problem. Have your preferences all you want, but this is blatantly a case of being blinded by some silly programming language culture war.

hot_gril · 2024-04-16T23:06:06.000000Z

Dynamic typing can make importing libraries riskier, but the benefits can outweigh the costs. Also, it's not hard to just not break APIs (unless you're node-redis), and you should have tests anyway if you really care.

317070 · 2024-04-16T22:20:05.000000Z

There are static type checkers, though: https://github.com/google/pytype

So you can specify interfaces (protocols) and check them in your installation process.

teaearlgraycold · 2024-04-16T22:20:04.000000Z

Just use TypeScript when possible.