Hacker News new | past | comments | ask | show | jobs | submit login

Having played around with Julia recently there are two main reasons I would not recommend using it for ML:

1-Plotting takes too long. Forget fast iterations. You'll be left waiting for non trivial amounts of time to your first plot. The latency between tests is simply not good enough.

2-Libraries are just not mature. Plotting anything not too standard yields broken graphs and you have to look for a billion of backends to find one that does not break (hello twinx). Matplotlib is slow for some graphs but more often than not you get the right visualization (except for that suptitle being cut out of the graph bug that will never be fixed).

The requests library is not robust enough. This one is quite possibility my fault. I have received "bad headers" errors for some websites, but doing the exact same "get"with python works just fine.

The impression that Libraries die all the time. Could be just an impression, but it is quite common to find top stackoverflow answers that point you to dead libraries (with the common "use this other library instead" hint)

I like Julia speed and well done memory management (especially from what I saw from Dataframe.jl high memory benchmarks). But right now I'd not use it in a critical environment. (And I also suck at reading and understanding the error messages thus far, but it's likely a lack of experience from my side)




The Julia REPL starts in less than a second, importing the Plots module takes a couple of seconds, and the first plot appears in a few more seconds (on my weak computer). If you keep the REPL open, subsequent plots are quite speedy. This may indeed not be snappy enough for some workflows that involve starting a fresh Julia process for each iteration—in which case, it’s pretty simple to create a sysimage with the plotting functions and anything else you use routinely precompiled in: https://www.matecdev.com/posts/julia-package-compiler.html


For concrete numbers, here's my experience on a super-crappy laptop:

* `add Plots` on a new temp environment takes about four and a half minutes, most of which is precompilation.

* `using Plots` took 16 seconds, and

* a `plot` call after that is 7 seconds.

For comparison, on 1.6, the times are seven and a half minutes for precompilation, 17 seconds for `using` and 10 seconds for the first `plot`. I'm not patient/masochistic enough to try the same on 1.5 too, but the trend is definitely one of continuous improvement.

This is certainly not the sort of machine any production Julia code will be run in, but it is in the category of machines that a lot of students and many others trying out the language will have access to, so it gives an idea of their experience.

The times on 1.6 and 1.7 have been a qualitative change for me. plot being sub-10 seconds means that instead of it being "run it, switch to something else while I wait (potentially losing my mental context), come back and hope it's done", it's now "run it, stare at my dog for a few seconds, the plot is ready".


My dog loves Plots.jl


I think it depends on the version of Julia the OP was using. If you're using 1.6/1.7 then startup and time to first plot are vastly improved. Earlier versions are quite a bit slower. There's been a lot of improvements in these areas in recent releases.


Quite right. Since I’ve been keeping track, at about v1.5, there has been significant speedup with every release.


That's true, though they said "recently".

I don't think time to first plot is that bad anymore.

Time to first gradient can be bad in Zygote/Flux.


As a longtime user of Python for ML/AI, I love Julia... but agree that most people will find Julia challenging to use for everyday data munging, or for developing production code, say, in a business environment. The Python ecosystem (people, resources, libraries, frameworks, etc.) is superior for most common, plain-vanilla use cases.

However, if you are developing compute-intensive scientific software, I would recommend switching to Julia as soon as possible. IMHO, Julia is superior to the likes of Python, Fortran, C++, and C for those use cases.

--

PS. By the way, you can fix matplotlib's suptitle issue by calling this Figure method after the fact:

  fig.tight_layout(rect=(0, 0, 1, 0.95))  # leaves 5% of top space for suptitle


For many compute intensive tasks Julia beats the competition hands down. I got to use for loops again and it feels wonderful (in python you either write it as some numpy operation out meddle with numba to run with some speed)

But many of the "main" ml packages are written in actual fast languages and are well optimized so python's inefficiencies can be mostly ignored (pytorch/tensorflow/catboost are really fast. Sklearn is decent and statsmodels is dead slow)

But again if need some specific image processing algorithm you either have to hope for it to be implemented in c++ with python bindings (like opencv) or you are in for a world of pain.

With Julia you can just count on it to be fast and implement it yourself natively


> you either have to hope for it to be implemented in c++ with python bindings (like opencv) or you are in for a world of pain.

Yes, I agree -- Julia is superior for scientific research -- including developing or working with new, cutting-edge algorithms that are not yet implemented in any third-party library.

That said, if one is willing to do a bit of upfront work, in many and perhaps a majority of cases, it's possible to implement new algorithms/ideas in terms of tensor/matrix/vector operations, i.e., using fast library primitives instead of resorting to slow Python loops.


Actually

  using PyCall
  plt = pyimport("matplotlib.pyplot")
and just using it as in Python does work very seamless as an alternative.


It’s worth noting that PyPlot.jl is a nearly seamless way to use matplotlib from Julia. JIT compilation of the middle layer does mean that it suffers from time to first plot problems though.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: