Keep in mind the minimum configuration that has 512GB of unified RAM is $9,499.

stego-tech · on March 5, 2025

I cannot express how dirt cheap that pricepoint is for what's on offer, especially when you're comparing it to rackmount servers. By the time you've shoehorned in an nVidia GPU and all that RAM, you're easily looking at 5x that MSRP; sure, you get proper redundancy and extendable storage for that added cost, but now you also need redundant UPSes and have local storage to manage instead of centralized SANs or NASes.

For SMBs or Edge deployments where redundancy isn't as critical or budgets aren't as large, this is an incredibly compelling offering...if Apple actually had a competent server OS to layer on top of that hardware, which it does not.

If they did, though...whew, I'd be quaking in my boots if I were the usual Enterprise hardware vendors. That's a damn frightening piece of competition.

kllrnohj · on March 5, 2025

> By the time you've shoehorned in an nVidia GPU and all that RAM, you're easily looking at 5x that MSRP

That nvidia GPU setup will actually have the compute grunt to make use of the RAM, though, which this M3 Ultra probably realistically doesn't. After all, if the only thing that mattered was RAM then the 2TB you can shove into an Epyc or Xeon would already be dominating the AI industry. But they aren't, because it isn't. It certainly hits at a unique combination of things, but whether or not that's maximally useful for the money is a completely different story.

stego-tech · on March 5, 2025

You're forgetting what Apple's been baking into their silicon for (nearly? over?) a decade: the Neural Processing Unit (NPU), now called the "Neural Engine". That's their secret sauce that makes their kit more competitive for endpoint and edge inference than standard x86 CPUs. It's why I can get similarly satisfying performance on my old M1 Pro Macbook Pro with a scant 16GB of memory as I can on my 10900k w/ 64GB RAM and an RTX 3090 under the hood. Just to put these two into context, I ran the latest version of LM Studio with the deepseek-r1-distill-llama-8b model @ Q8_0, both with the exact same prompt and maximally offloaded onto hardware acceleration and memory, with a context window that was entirely empty:

  Write me an AWS CloudFormation file that does the following:
  
  * Deploys an Amazon Kubernetes Cluster
  * Deploys Busybox in the namespace "Test1", including creating that Namespace
  * Deploys a second Busybox in the namespace "Test3", including creating that Namespace
  * Creates a PVC for 60GB of storage

The M1Pro laptop with 16GB of Unified Memory:

  * 21.28 seconds for "Thinking"
  * 0.22s to the first token
  * 18.65 tokens/second over 1484 tokens in its responses
  * 1m:23s from sending the input to completion of the output

The 10900k CPU, with 64GB of RAM and a full-fat RTX 3090 GPU in it:

  * 10.88 seconds for "thinking"
  * 0.04s to first token
  * 58.02 tokens/second over 1905 tokens in its responses
  * 0m:34s from sending the input to completion of the output

Same model, same loader, different architectures and resources. This is why a lot of the AI crowd are on Macs: their chip designs, especially the Neural Engine and GPUs, allow quite competent edge inference while sipping comparative thimbles of energy. It's why if I were all-in on LLMs or leveraged them for work more often (which I intend to, given how I'm currently selling my generalist expertise to potential employers), I'd be seriously eyeballing these little Mac Studios for their local inference capabilities.

kllrnohj · on March 5, 2025

Uh.... I must be missing something here, because you're hyping up Apple's NPU only to show it getting absolutely obliterated by the equally old 3090? Your 10900K having 64gb of RAM is also irrelevant here...

stego-tech · on March 5, 2025

You're missing the the bigger picture by getting bogged down in technical details. To an end user, the difference between thirty seconds and ninety seconds is often irrelevant for things like AI, where they expect a delay while it "thinks". When taken in that context, you're now comparing a 14" laptop running off its battery, to a desktop rig gulping down ~500W according to my UPS, for a mere 66% reduction in runtime for a single query at the expense of 5x the power draw.

Sure, the desktop machine performs better, as would a datacenter server jam-packed full of Blackwell GPUs, but that's not what's exciting about Apple's implementation. It's the efficiency of it all, being able to handle modern models on comparatively "weaker" hardware most folks would dismiss outright. That's the point I was trying to make.

kllrnohj · on March 5, 2025

We're talking about the m3 ultra here, which is also wall powered and also expensive. Nobody is interested in dropping upwards of $10,000 on a Mac Studio to have "okay" performance just because an unrelated product is battery powered. Similarly saving a few bucks on electricity to triple the time the much, much more expensive engineer time spent waiting on results is foolish

Also Apple isn't unique in having an NPU in a laptop. Fucking everyone does at this point.

stego-tech · on March 6, 2025

It almost feels like you're deliberately missing the forest for the trees, in order to fit some argument that I'm not quite able to sus out here.

The point is that, in terms of practical usage, the M3 Ultra is uniquely competent and highly affordable in a sea of enterprise technology that is decidedly not. I tried to demonstrate why I'm excited about it by pointing out the similar performance of a battery-powered, four-year-old laptop and a quite gargantuan gaming PC that's pulling over 500W from the wall, as an example of what several years of additional refinements and improvements to the architecture was expected to bring.

The point is that it's affordable, more flexible in deployment, and more efficient than similarly-specced datacenter servers specifically designed for inference. For the cost of a single decked-out Dell or HP rackmount server, I can have five of these Mac Studios with M3 Ultra chips - and without the need for substantial cooling, noise isolation, or other datacenter necessities. If the marketing copy is even in the same ballpark as actual performance, that's easily enough inference to serve an office of fifty to a hundred people or more, depending on latency tolerances; if you don't mind "queuing" work (like CurrentCo does with their internal Agents), one of those is likely enough for a hundred users.

That's the excitement. That's the point. It's not the fastest, it's not the cheapest, it's just the most balanced.

seec · on March 6, 2025

Apple defenders have some special sauce reasoning that makes no sense to anyone but them. Are you a boomer?

I have Apple hardware but it sucks for anything AI, buying it for that purpose is just extremely dumb, just like buying Macs for engineering CADs or things of the sort.

If you are buying Macs and it's not for media production related reasons you are doing something wrong.

stego-tech · on March 7, 2025

> Apple defenders have some special sauce reasoning that makes no sense to anyone but them. Are you a boomer?

I continue to be in awe of the lengths some people will go just to fling insults and shake out some salt. We're, what, ten layers deep? With all the context above, the best you have to contribute to the discussion are baseless accusations and ageist insults?

Your finite time would have been better spent on literally anything else, than actively seeking out a comment just to throw subjective, unsubstantiated shade around. C'mon, be better.

seec · on March 10, 2025

Makes no mistake, it's not an insult. I'm saying that precisely because I have been there.

Apple is the master at creating desire and building narrative in their customers' mind about the many things their devices would allow them to do. It's very aspirational and in practice most of the Macs get used for things that could have been done with a much cheaper option.

It may not be obvious to you but it's somewhat funny seeing you rationalise all kinds of dreams of what this machine could potentially be when in practice the people who would really be working on the kind of stuff you are talking about don't even consider them viable for many good reasons.

It's not that those machines cannot potentially do it, it's just that they don't really fit the goal very well.

A lot like people buying Cybertruck to "haul" stuff when they are a lot more option that are just plain better and make a lot more economic/practical sense.

It's OK to desire the thing and be excited about it but it really doesn't serve anyone to rationalise it so hard, you are lying to yourself as much as everyone else, it's not healthy.

If that was not clear, people working on AI stuff professionally really don't have to deal with a Mac Studio, they have access to better stuff. If you want to get one personally to experiment/toy around it's ok but it's not going to be this amazing thing for AI.

kiratp · on March 5, 2025

10K doesn’t get you 512 GB of VRAM in Nvidia land.

sgt101 · on March 6, 2025

indeed, it does not.

I am thinking that 7 A100's would be the lowest price for that, and that would be $80k with good discounts.

rbanffy · on March 5, 2025

Had the M3 GPU been much wider, it would be constrained by the memory bandwidth. It might still have an advantage over Nvidia competitors in that it has 512GB accessible to it and will need to push less memory across socket boundaries.

It all depends on the workload you want to run.

AlchemistCamp · on March 5, 2025

It's not quite an apples to apples comparison, no pun intended. I guess we'll see how it sells.

cubefox · on March 5, 2025

I assume there is a very good reason why AMD and Intel aren't releasing a similar product.

stego-tech · on March 5, 2025

From my outsider perspective, it's pretty straightforward why they don't.

In Intel's case, there's ample coverage of the company's lack of direction and complacency on existing hardware, even as their competitors ate away at their moat, year after year. AMD with their EPYC chips taking datacenter share, Apple moving to in-house silicon for their entire product line, Qualcomm and Microsoft partnering with ongoing exploration of ARM solutions. A lack of competency in leadership over that time period has annihilated their lead in an industry they used to single-handedly dictate, and it's unlikely they'll recover that anytime soon. So in a sense, Intel cannot make a similar product, in a timely manner, that competes in this segment.

As for AMD, it's a bit more complicated. They're seeing pleasant success in their CPU lineup, and have all but thrown in the towel on higher-end GPUs. The industry has broadly rallied around CUDA instead of OpenCL or other alternatives, especially in the datacenter, and AMD realizes it's a fool's errand to try and compete directly there when it's a monopoly in practice. Instead of squandering capital to compete, they can just continue succeeding and working on their own moat in the areas they specialize in - mid-range GPUs for work and gaming, CPUs targeting consumers and datacenters, and APUs finding their way into game consoles, handhelds, and other consumer devices or Edge compute systems.

And that's just getting into the specifics of those two companies. The reality is that any vendor who hasn't already unveiled their own chips or accelerators is coming in at what's perceived to be the "top" of the bubble or market. They'd lack the capital or moat to really build themselves up as a proper competitor, and are more likely to just be acquired in the current regulatory environment (or lack thereof) for a quick payout to shareholders. There's a reason why the persistent rumor of Qualcomm purchasing part or whole of Intel just won't die: the x86 market is rather stagnant, churning out mediocre improvements YoY at growing pricepoints, while ARM and RISC chips continue to innovate on modern manufacturing processes and chip designs. The growth is not in x86, but a juggernaut like Qualcomm would be an ideal buyer for a "dying" or "completed" business like Intel's, where the only thing left to do is constantly iterate for diminishing returns.

kridsdale1 · on March 5, 2025

Well said.

BoredPositron · on March 5, 2025

Still cheap if the only thing you look for is vram.

adgjlsfhk1 · on March 6, 2025

This chip has 0GB vram. It has 8 channel lpddr5.

imatrix · on March 6, 2025

This is not correct. It is like VRAM, not like normal PC RAM.

Lowest latency of DDR5-6400 on normal PC starting at 60ns+

Lowest latency of VRAM on GeForce RTX 4090 starting at 14 ns

Lowest latency of Apple M1 Memory starting at 5 ns, its more like L3 cache

And on Apple M chip, this ultrafast memory is available for CPU, GPU and NPU.

https://www.anandtech.com/show/17024/apple-m1-max-performanc... https://chipsandcheese.com/p/microbenchmarking-nvidias-rtx-4...

adgjlsfhk1 · on March 6, 2025

You are misreading that chart. 5ns is the L2 latency. The ram latency from your source is 111 ms (once the size gets bigger than the size of the cache). The ram on a 4090 is 240 ns https://chipsandcheese.com/p/microbenchmarking-nvidias-rtx-4...

BoredPositron · on March 6, 2025

Fun at parties and stuff.

baq · on March 5, 2025

This is a ‘shut up and take my money’ price, it’ll fly off the shelves.

nsteel · on March 5, 2025

And how is it only £9,699.00!! Does that dollar price include sales tax or are Brits finally getting a bargain?

vr46 · on March 5, 2025

The US prices never include state sales tax IIRC. Maybe we're finally getting some parity.

seanmcdirmid · on March 5, 2025

You could always buy one at an apple store without sales tax (e.g. Portland Oregon). But they might not have that one in stock...

kgwgk · on March 5, 2025

What's the bargain?

There is also "parity" in other products like a MacBook Pro from £1,599 / $1,599 or an iPhone 16 from £799 / $799. £9,699 / $9,499 is worse than that!

nsteel · on March 6, 2025

The bargain is the lower price in the UK compared to US, once US sales tax is added. It's not like the pound is strong. It's just cheaper in the UK. And you're right, all Apple products are better value in that UK. I'm not used to any electronics being good value in the UK.

kgwgk · on March 7, 2025

> It's not like the pound is strong. It's just cheaper in the UK.

You know that £9,699 are over $12k, right?

mastax · on March 5, 2025

Tariffs perhaps?

DrBenCarson · on March 5, 2025

Cheap relative to the alternatives

jread · on March 5, 2025

$8549 with 1TB storage

rbanffy · on March 5, 2025

It can connect to external storage easily.