I don’t understand this logic. If someone has $1000 for an entry level m1 machin...

ipsum2 · on March 5, 2021

If someone is shelling out for a brand new, early adopter product, then they probably have a decent amount of money.

Even when TensorFlow and PyTorch implement training support on the M1, it will be useless for practically anything except training 2-3 layer models on MNIST.

So why should valuable engineering time be spent on this?

sumnuyungi · on March 5, 2021

This is just patently false. Most of the folks I know that have an M1 are students that were saving up to upgrade from a much older computer and got the M1 Air. I can assure you that they don't have a decent amount of money.

Tensorflow has a branch that's optimized for metal with impressive performance. [1] It's fast enough to do transfer learning quickly on a large resnet, which is a common use-case for photo/video editing apps that have ML-powered workflows. It's best for everyone to do this locally: maintains privacy for the user and eliminates cloud costs for the developer.

Also, not everyone has an imagenet sized dataset. A lot of applied ML uses small networks where prototyping is doable on a local machine.

[1] https://blog.tensorflow.org/2020/11/accelerating-tensorflow-...

tpetry · on March 5, 2021

Because with support for M1 you can prototype your network on your local machine with „good“ performance. There are many cloud solutions etc. but for convenience nothing beats your local machine. You can use an IDE you like etc.

rfoo · on March 5, 2021

Because contrary to what you believe, M1 simply is not performant enough to be used to "prototype" your network. NNs can't be simply scaled up and down. It is *NOT* like those web apps which you can run on potatoes just fine as long as nobody are hitting them heavily.

jefft255 · on March 5, 2021

Not so sure about that. Here’s two things you can do (assuming you’re not training huge transformers or something).

1. Test your code with super low batch size. Bad for convergence, good for sanity check before submitting your job to a super computer.

2. Post-training evaluation. I’m pretty sure the M1 has enough power to do inference for not-so-big models.

These two reasons are why I’m sometimes running stuff on my own GTX 1060, even though it’s pretty anemic and I wouldn’t actually do a training run there.

There quite a bit of friction to training in the cloud, especially if it’s on shared cluster (which is what I have access to). You have a quota, and wait time when the supercomputer is under load. Sometimes you just need to quickly fire up something!

microtonal · on March 5, 2021

1. Test your code with super low batch size. Bad for convergence, good for sanity check before submitting your job to a super computer.

Or you can buy a desktop machine for the same price as an M1 MacBook with 32GB or 64GB RAM and an RTX2060 or RTX3060 (which support mixed-precision training) and you can actually finetune a reasonable transformer model with a reasonable batch size. E.g., I can finetune a multi-task XLM-RoBERTa base model just fine on an RTX2060, model distillation also works great.

Also, there are only so many sanity checks you can do on something as weak (when it comes to neural net training). Sure, you can check if your shapes are correct, loss is actually decreasing, etc. But once you get at the point your model is working, you will have to do dozens of tweaks that you can't reasonably do on an M1 and still want to do locally.

tl;dr: why make your life hard with an M1 for deep learning, if you can buy a beefy machine with a reasonable NVIDIA GPU at the same price? Especially if it is for work, your employer should just buy such a machine (and an M1 MacBook for on the go ;)).

jefft255 · on March 5, 2021

Absolutely agree! My points were more about the benefits of running code on your own machine rather than in the cloud or on a cluster. I don’t own an M1, but if I did I wouldn’t want to use it to train models locally... When on my laptop I still deploy to my lab desktop; this adds little friction compared to a compute cluster, and as you mention we’re able to do interesting stuff with a regular gaming GPU. When everything works great and I now want to experiment at scale, I then deploy my working code to a supercomputer.

rsfern · on March 5, 2021

Sure you probably don’t want to do full training runs locally, but There’s a lot you can do locally that has a lot of added friction on a gpu cluster or other remote compute resource

I like to start a new project by prototyping and debugging my training and cunning config code, setting up the data loading and evaluation pipeline, hacking around with some baseline models and making sure they can overfit some small subset of my data

After all that’s done it’s finally time to scale out to the gpu cluster. But I still do a lot of debugging locally

Maybe this kind of workflow isn’t as necessary if you have a task that’s pretty plug and play like image classification, but for nonstandard tasks I think there’s lots of prototyping work that doesn’t require hardware acceleration

jefft255 · on March 5, 2021

Coding somewhat locally is a must for me too because the cluster I have access to has pretty serious wait times (up to a couple hours on busy days). Imagine only being able to run the code you’re writing a few times a day at most! Iterative development and doing a lot of mistakes is how I code; I don’t want to go back to punch card days where you waited and waited before you ended up with a silly error.

sumnuyungi · on March 5, 2021

This is false. You can prototype a network on an M1 [1] and teacher-student models are a de facto standard for scaling down.

You can trivially run transfer-learning on an M1 to prototype and see if a particular backbone fits well to a small dataset, then kickoff training on some cloud instance with the larger dataset for a few days.

[1] https://blog.tensorflow.org/2020/11/accelerating-tensorflow-...

tpetry · on March 5, 2021

You can just run your training with a lot less of data.

sjwright · on March 5, 2021

I think you underestimate just how many MacBooks Apple sells. There’s only so many affluent early adopters out there. And the M1 MacBooks are especially ideal for mainstream customers.