Do any of you use this as a replacement for Claude Code? For example, you might ...

Schekin · 2026-04-03T10:20:27 1775211627

This matches my experience.

The weights usually arrive before the runtime stack fully catches up.

I tried Gemma locally on Apple Silicon yesterday — promising model, but Ollama felt like more of a bottleneck than the model itself.

I had noticeably better raw performance with mistralrs (i find it on reddit then github), but the coding/tool-use workflow felt weaker. So the tradeoff wasn’t really model quality — it was runtime speed vs workflow maturity.

FullyFunctional · 2026-04-03T04:36:12 1775190972

Ollama made it trivial for me to use claude code on my 48GB MacMini M4P with any model, including the Qwen3.5…nvfp4 which was so far the best I’ve tried. Once Ollama has a Mac friendly version of Gemma4 I’ll jump right on board (and do educate me if I’m missing something).

ar_turnbull · 2026-04-02T19:01:17 1775156477

Following as I also don’t love the idea of double paying anthropic for my usage plan and API credits to feed my pet lobster.

hacker_homie · 2026-04-03T02:33:37 1775183617

Honestly for that [Qwen3-Coder-Next-GGUF](https://huggingface.co/unsloth/Qwen3-Coder-Next-GGUF)

still seems to be the best in class.

I am testing the Gemma4 now I will update this comment with what I find.

downrightmike · 2026-04-02T21:59:58 1775167198

Did you try it?

logicallee · 2026-04-03T07:29:20 1775201360

yes, I've now I tried both the 20 GB version (gemma4:31b) which is the largest on the page[1], and the ~10 GB version (gemma4:e4b). The 20 GB version was rather slow even when fully loaded and with some RAM still left free, and the 10 GB version was speedy. I installed openclaw but couldn't get it to act as an agent the way Claude Code does. If you'd like to see a video of how both of them perform with almost nothing else running, on a Mac Mini M4 with 24 GB of RAM, you can see one here (I just recorded it):[2]

[1] https://ollama.com/library/gemma4

[2] https://www.youtube.com/live/G5OVcKO70ns

tr33house · 2026-04-03T12:00:06 1775217606

Thank you for the video. It was super helpful. the 20g version was clearly struggling but the 10g version was flying by. I think it was probably virtualized memory pages that were actually on disk causing the issue. Perhaps that and the memory compression.

a96 · 2026-04-05T14:02:36 1775397756

The massive black borders are making the actual part of the video hard to see. Recording just the window and/or zooming the text as big as you can would make it work better.

Also, I think I can see some swap being used. The way to see if a model is loaded completely in ollama is to use ollama ps to check the output. If it starts hitting limits you'll see the split there and a unified memory box will start to swap. Along with the performance crashing down, of course.

Thanks for the video and results, though. Just hopefully constructive tips.

logicallee · 2026-04-05T15:42:22 1775403742

Thanks for the feedback! I appreciate the tips. I'll look into zooming on the text more, seems like something worth learning in case I want to present anything in the future and I'll keep it in mind.

Regarding the black borders, I've cropped, re-encoded this and reuploaded this as 1080p (the resolution the headless Mac gave over VNC) so you can watch that version without any black borders if you want: https://www.youtube.com/watch?v=5VOiH2zjAss

(not sure how large your screen is but this should be full size if you maximize it I guess). It's a re-encoding so it doesn't look as good as the original but you should be able to read anything you were interested in seeing. Next time I'll be sure to zoom in on the text more.