I kinda gave up on Julia for deep learning since it’s so buggy. I am using PyTor...

barbarr · on Aug 30, 2024

I didn't find it too buggy personally, in fact it has an unexpected level of composability between libraries that I found exciting. Stuff "just works". But I felt it lacked performance in practical areas such as file I/O and one-off development in notebooks (e.g. plotting results), which is really important in the initial stages of model development.

(I also remember getting frustrated by frequent uninterruptible kernel hangs in Jupyter, but that might have been a skill issue on my part. But it was definitely a friction I don't encounter with python. When I was developing in Julia I remember feeling anxiety/dread about hitting enter on new cells, double and triple checking my code lest I initiate an uninterruptible error and have to restart my kernel and lose all my compilation progress, meaning I'll have to wait a long time again to run code and generate new graphs.)

adgjlsfhk1 · on Aug 30, 2024

Julia does definitely need some love from devs with a strong understanding of IO performance. That said, for interactive use the compiler has gotten a bunch faster and better at caching results in the past few years. On Julia 1.10 (released about 6 months ago) the time to load Plots.jl and display a plot from scratch is 1.6 seconds on my laptop compared to 7.3 seconds in Julia 1.8 (2022)

Tarrosion · on Aug 30, 2024

I'm curious what kind of slow IO is a pain point for you -- I was surprised to read this comment because I normally think of Julia IO being pretty fast. I don't doubt there are cases where the Julia experience is slower than in other languages, I'm just curious what you're encountering since my experience is the opposite.

Tiny example (which blends Julia-the-language and Julia-the-ecosystem, for better and worse): I just timed reading the most recent CSV I generated in real life, a relatively small 14k rows x 19 columns. 10ms in Julia+CSV+DataFrames, 37ms in Python+Pandas...ie much faster in Julia but also not a pain point either way.

barbarr · on Aug 30, 2024

My use case was a program involving many calls to an external program that generated an XYZ file format to read in (computational chemistry). It's likely I was doing something wrong or inefficient, but I remember the whole process was rate-limited by this step in a way that Python wasn't.

ChrisRackauckas · on Aug 31, 2024

IO is thread-safe by default, but that does slow it down. There's a keyword argument to turn that off (if you know you're running it single threaded) and right now it's a rather large overhead. It needs some GC work IIRC to reduce that overhead.

catgary · on Aug 30, 2024

I’m jealous of your experience with its autograd if it “just worked” for you. It was a huge pain for me to get it to do anything non-trivial.

Lyngbakr · on Aug 30, 2024

Did you ever try alternatives to Jupyter like Pluto.jl? I'm curious if there's the same sort of friction.

pkage · on Aug 30, 2024

Same here. I started my PhD with the full intention of doing most of my research with Julia (via Flux[0]), and while things worked well enough there were a few things which made it challenging:

- Lack of multi-GPU support,

- some other weird bugs related to autograd which i never fully figured out,

- and the killer one: none of my coauthors used Julia, so I decided to just go with PyTorch.

PyTorch has been just fine, and it's nice to not have to reinvent to wheel for every new model architecture.

[0] https://fluxml.ai/

samsartor · on Aug 31, 2024

This was exactly my experience too

xiaodai · on Sept 10, 2024

yeah. unless Julia has some killer framework with massive investment, it's hard to see 99.99% of cases moving to Julia. No point really.

ssivark · on Aug 30, 2024

I think it would be more useful to list concrete bugs/complaints that the Julia devs could address. Blanket/vague claims like "Julia for deep learning [...] so buggy" is unfalsifiable and un-addressable. It promotes gossip with tribal dynamics rather than helping ecosystems improve and helping people pick the right tools for their needs. This is even more so with pile-on second hand claims (though the above comment might be first-hand, but potentially out-of-date).

Also, it's now pretty easy to call Python from Julia (and vice versa) [1]. I haven't used it for deep learning, but I've been using it to implement my algorithms in Julia while making use of Jax-based libraries from Python so it's certainly quite smooth and ergonomic.

[1] https://juliapy.github.io/PythonCall.jl/

moelf · on Aug 30, 2024

what is so buggy, Julia the language or the deep learning libraries in Julia? in either case it would be good to have some examples.

currymj · on Aug 30, 2024

julia the language is really good. but a lot of core infrastructure julia libraries are maintained by some overworked grad student.

sometimes that grad student is a brilliantly productive programmer + the libraries reach escape velocity and build a community, and then you get areas where Julia is state of the art like in differential equation solving, or generally other areas of "classical" scientific computing.

in other cases the grad student is merely a very good programmer, and they just sort of float along being "almost but not quite there" for a long time, maybe abandoned depending on the maintainer's career path.

the latter case is pretty common in the machine learning ecosystem. a lot of people get excited about using a fast language for ML, see that Julia can do what they want in a really cool way, and then run into some breaking problem or missing feature ("will be fixed eventually") after investing some time in a project.

catgary · on Aug 30, 2024

This is an old-ish article about Julia, but from what I can tell the core issues with autograd were never fixed:

https://kidger.site/thoughts/jax-vs-julia/

xiaodai · on Sept 10, 2024

the deep learning libraries. can't figure out why one of my gradient didn't work so i switch implementation to pytorch and it worked perfectly fine.

catgary · on Aug 30, 2024

Yeah I was pretty enthusiastic about Julia for a year or two, even using it professionally. But honestly, JAX gives you (almost) everything Julia promises and its automatic differentiation is incredibly robust. As Python itself becomes a pretty reasonable language (the static typing improvements in 3.12, the promise of a JIT compiler) and JAX develops (it now has support for dynamic shape and AOT compilation) I can’t see why I’d ever go back.

The Julia repl is incredibly nice though, I do miss that.

adgjlsfhk1 · on Aug 30, 2024

IMO the python JIT support won't help very much. Python currently is ~50x slower than "fast" languages, so even if the JIT provides a 3x speedup, pure python will still be too slow for anything that needs performance. Sure it will help on the margins, but a JIT can't magically make python fast.

catgary · on Aug 30, 2024

I’m only really thinking about ML/scientific computing workflows where all the heavy lifting happens in jax/torch/polars.

adgjlsfhk1 · on Aug 30, 2024

right and in those cases, a python JIT will do nothing for your performance because all the computation is happening in C/CUDA anyway.

physicsguy · on Aug 31, 2024

It depends on whether you’re transforming data out of files or whatever to get it into these libraries to start with to be fair. Overall I wouldn’t expect that to have an effect on a long-running computation but when starting up a project it can be a bit slow.

mccoyb · on Aug 30, 2024

Can you link dynamic shape support? Big if true — but I haven’t been able to find anything on it.

Edit: I see — I think you mean exporting lowered StableHLO code in a shape polymorphic format —- from the docs: https://jax.readthedocs.io/en/latest/export/shape_poly.html

This is not the thing I usually think when someone says dynamic shape support.

In this model, you have to construct a static graph initially —- then you’re allowed to specify a restricted set of input shapes to be symbolic, to avoid the cost of lowering — but you’ll still incur the cost of compilation for any new shapes which the graph hasn’t been specialized for (because those shapes affect the array memory layouts, which XLA needs to know to be aggressive)

catgary · on Aug 30, 2024

It’s part of model export/serialization, it is documented here:

https://jax.readthedocs.io/en/latest/export/export.html#supp...

Edit: I think you need to look here as well, the Exported objects do in fact serialize a function and support shape polymorphism:

https://jax.readthedocs.io/en/latest/export/shape_poly.html#...

mccoyb · on Aug 30, 2024

Thanks! See above — I don’t think this is exactly dynamic shape support.

My definition might be wrong, but I often think of full dynamic shape support as implying something dynamic about the computation graph.

For instance, JAX supports a scan primitive — whose length must be statically known. With full dynamic shape support, this length might be unknown — which would mean one could express loops with shape dependent size.

As far as I can tell, shape polymorphic exports may sort of give you that — but you incur the cost of compilation, which will not be negligible with XLA.

catgary · on Aug 30, 2024

I think you’re right, so it is now as shape polymorphic as any framework that with an XLA backend can be.

I work with edge devices, so I have also been experimenting with IREE for deployment, that can handle dynamic shapes (at times, it stopped working for a version but may be working again in the development branch).

mccoyb · on Aug 30, 2024

I can’t comment on the lowest leaf of this thread, but thanks for update! I’ll read through this section and see if my intuitions are wrong or right.

enkursigilo · on Aug 30, 2024

Can you elaborate a bit more?