On-device models are the future. Users prefer them. No privacy issues. No dealing with connectivity, tokens, or changes to vendors implementations. I have an app using Foundation Model, and it works great. I only wish I could backport it to pre macOS 26 versions.
Users don’t care about “privacy”. If they did, Meta and Alphabet wouldn’t be worth $1T+.
Users really don’t matter at all. The revenue for AI companies will be B2B where the user is not the customer - including coding agents. Most people don’t even use computers as their primary “computing device” and most people are buying crappy low end Android phones - no I’m not saying all Android phones are crappy. But that’s what most people are buying with the average selling price of an Android phone being $300.
I worked for a research focused AI startup that had a strict "no external LLM" policy for code touching our core research.
You're right that the average consumer doesn't care about privacy, but there are many, many users who do. The average consumer also don't have a desktop with GPU or high end Mac Studio, but that doesn't mean there aren't many people working with AI how do have these things.
If we continue to see improvements in running local models, and RAM prices continue to fall as they have in the last month, then suddenly you don't have to worry about token counts any more and can be much more trusting of your agents since they are fully under your control.
Those users are addressed by being able to rent their own exclusive machines to run the model on. There will be some compromise that will be made to get access to the best intelligence available.
Different users. Many people care about privacy and aren’t using Meta products. And many businesses care about it too and have information policies to protect their IP.
> Different users. Many people care about privacy and aren’t using Meta products.
Yeah but if they can rake in 100x as much by making products for people who don't care about privacy, then why spend time developing stuff for people who care?
There is still a small market left, of course, but that market will not have the billions of R&D behind it.
It's largely out of Meta's hands now anyway. The risk here not so much to privacy (it's Apple) but they'll walled garden the model space somehow for sure.
70% of the world’s population use at least one Meta property at least once per day. How many of the other 30% are too poor/young/computer illiterate to be part of an addressable market?
Every company has dozens of SaaS products that store their business critical information. Amazon installs Office on each computer, Slack (they were moving away from Chime when I left), and the sales department uses SalesForce - SA’s and Professional Services (former employee).
The addressable market of even companies that care about privacy is not a large addressable market. How long will it be before computers become cheap enough that can run even GPT 4 level LLMs that companies will give it to all of their developers?
The banking industry absolutely does care about privacy of their business data btw.
We do use tools like Confluence but they're all hosted in our own data centers.
These are all great statistics, but how do you explain ClawdBot explosion. Even in lower income countries like China. So much demand that Apple can’t keep up production of Mac Minis. Why aren’t these folks going towards cloud solutions? Is it cost or is there some consideration for having more control over their data?
ClawBot doesn't generally run the model locally, it just talks to remote APIs. No different than any other agentic harness. You could run a local model on the same Mac Mini as your agent, but it wouldn't be very smart and many agentic tasks around computer GUI/browser use, etc. would be out of reach.
> Why aren’t these folks going towards cloud solutions?
They are. The majority aren't doing inference on a Mac Mini, but instead using it as a local host for cloud-based inference. You could have the same general experience on a $200 Chromebook or $300 Windows box.
They are running cloud models in almost all cases. Like saying it isn’t cloud when you use the Facebook app on your phone (it is ON your phone and running there).
I see it as a long-term tradeoff on user freedom.
You pay upfront for a capable hardware, you get your services running locally (you don’t pay subscriptions).
Or you buy cheap hardware, you still need the same services “running in some cloud” for $X monthly. X goes up depending on the corporate bottom-line
In the history of cloud computing, prices have mostly only come down especially as inference becomes a commodity. Realistically, just looking at Mac prices, the cost of a computer with decent local inference would be around $6000 per person.
> Realistically, just looking at Mac prices, the cost of a computer with decent local inference would be around $6000 per person.
As someone who has hardware in that price range and plays with local LLMs: The gap between Opus or GPT and the local models is still very large for work beyond simple queries.
Self-hosted also starts making my office hot due to all of the power consumption when I use it for anything more than short queries. If you haven't heard your Mac's fans spin up much yet, running local LLMs will get you acquainted with the sound of their cooling systems at full blast.
Your customers are an anecdote, now compare that to the publicly reported numbers from AWS, GCP and Azure where they all say the only thing keeping them from growing more is the chip shortage.
Oh I'm sure they'll continue to have some cloud services, no doubt. But look at VMware for example, even after the insane price increases. Nutanix also seems to be doing quite well. I'm seeing a fair amount of on-prem bare metal k8s too.
Again - anecdotes is not data. We have data. That would be about as silly as me citing my own experience as proof that “everyone is moving to AWS” when I work for a company that is exclusively an AWS partner consulting company.
You have data showing growth in cloud, which I expect and don't disagree with. The data I come across shows this too!
What I disagree with, from my own experiences and all the data I can seem to find online is that the growth rate in repatriation is MUCH higher than the growth in cloud.
It has flipped over the last 3yr.
US Enterprises, Fortune 100, especially. Also a lot of public entities (gov).
"In 2025, repatriation is still generally an upward trend. Data from the end of 2024 showed that 86% of CIOs planned to move some public cloud workloads back to private cloud or on-premises — the highest on record for the Barclays CIO Survey."
"Real examples of cloud repatriation include Dropbox, Adobe, and GEICO. All three companies moved a significant portion of their infrastructure onto public cloud before moving it to a combination of on-premises and hybrid cloud providers."
Noted: SaaS accounts for 46.10% of market revenue, while PaaS is the fastest-growing segment at 21.35% CAGR
Again, anecdotes. I have public company quarterly statements - you have unsourced quotes. You can quote Geico - I can quote Netflix. If on prem was really growing, I wouldn’t expect Intel to be in the shitter and I would expect Capex to be focused on Colo centers not cloud.
Also when I searched for your quotation the very next paragraph was
“ This trend does not represent a rejection of cloud computing. Organizations continue investing heavily in cloud services, with Gartner forecasting that global cloud spending will reach approximately $723 billion by the end of 2025.”
Have you done A/B tests to see if consumers prefer Facebook with or without privacy?
No? What? Oh, you can't?
Neither can consumers. Most consumers are very aware of the lack of privacy, the manipulation, and have very cynical feelings about Facebook and similar companies. But it's where their friends and family are.
For most people the web is a mine field maze where basic things they want are compromised everywhere. And they are routinely creeped out by ads that reveal they know them far too personally.
You are mistaking network capture for preference.
Another telling example. Lots of privacy valuing technical people, who would never have a Facebook account, send unencrypted text emails.
Consumers pro actively tell Facebook their age, sexual preference, race, relationship status, likes and dislikes, they check in to where they are and who they are there with…
Yes, they do. That's is exactly the phenomena my comment addressed.
But the way you wrote that implies an improbable motivation or choice framing.
Perhaps their real motive/choice is to share with other people on the site.
It is called a network effect.
If (1) Facebook had been the surveillance/manipulation capital of the world from inception, (2) an equally inviting privacy protecting site took off at the same time, and (3) everyone chose Facebook over E2EE anyway, then sure, we could throw up our hands! Those silly users!
The term I have for when people discuss choices involving many-dimensional criteria, as if the choice involved just one or two selected dimensions, is "dimension blindness". It happens in a lot of heated discussions about phone choices too.
Wouldn’t the most obvious way for people to protect their privacy while using FB if they cared and still wanted to use FB be not to proactively give them information? You don’t have to share everything I mentioned just to be involved in a group.
They are explicitly adding their information to FB why do they need a button to not share the information? Would the button disable them from checking in and updating their profile?
An E2EE system (e.g. as offered by Apple iCloud). Or a terms of service guarantee. (e.g. Dropbox, Anthropic and 1000 other companies that partition sharable user content from non-support divisions.)
> Would the button disable them from checking in and updating their profile?
When you post a check in, your relationship status, your pictures without setting your sharing preferences and update your profile - you are specifically doing with the intention to share. WhatsApp is E2E encrypted
You’re arguing that people care about their privacy when they are explicitly sharing private information above what is needed to participate in FB.
You are completely wrong and your argument is illogical. People may not know that FB is making a profile of you based on your behavior. But logically, if I add to my profile that my favorite site is “grandma-midget-porn.com” [1], that I care that people don’t know I like senior citizen midgets
Because it is irrelevant to whether people are purposefully explicitly sharing their likes and dislikes, and other information to let FB know more about them.
"Users" is a large set of people. Many don't care about privacy, but some do. There's also a difference between where you post random social media stuff vs what you run with something like OpenClaw and give access to your machine.
User's care about privacy when they understand the threat and impact. The issue is most user's don't understand this, especially when it comes to use of products like Meta where on the surface, everything appears harmless.
It’s not all or nothing there ads trade offs. The fact that Apple still bothers to expend marketing effort on its privacy chops suggests significant numbers of people still do care.
Users here probably means corporations. I still don’t see much use of LLMs in my personal life, other than one thing. Googling stuff in a foreign language.
you are missing a but 'given a choice' disclaimer. Meta is pretty much a monopoly in social space. So is Android. given a choice people will absolutely gravitate towards not-always-snooping device. most people with resources anyway, who matter for the AI adoption.
Oh an wait till ad companies start selling your healthcare data and you will see how fast things turn 'given a choice'.
People don't have a choice between Facebook and not-Facebook-but-still-has-all-of-your-friends-and-family. Abstinence isn't a choice here any more than shutting off your cell phone service is a choice; true in the literal sense, but only if you don't mind being unreachable to everyone who still has a phone.
There’s been some success training models on top of differential privacy.
I imagine that with live requests it would be quite challenging but not impossible, assuming you could somehow sanitize all sorts of private data that people throw at these prompts.
I think two recent advances make your statement more true. The new Qwen 3.5 series has shown a relatively high intelligence density, and Google's new turboquant could result in dramatically smaller/efficient models without the normal quantization accuracy tradeoff.
I would expect consumer inference ASIC chips will emerge when model developments start plateauing, and "baking" a highly capable and dense model to a chip makes economic sense.
Who will be funding state of the art local models going forward? AI models are never done or good enough. They will have to be trained on new data and eventually with new model architectures. It will remain an expensive exercise.
I could be wrong because I'm not following this too closely, but the open weights future of both Llama and Qwen looks tenuous to me. Yes, there are others, but I don't understand the business model.
Good points. What local models have you found work best for your use cases? I feel like if we get to opus 4.6 level intelligence running on local hardware, we’re in the clear for a lot of day to day use cases.
most of the llm tooling can handle different models. Ollama makes it easy to install and run different models locally. So you can configure aider or vscode or whatever you're using to connect to chatgpt to point to your local models instead.
None of them are as good as the big hosted models, but you might be surprised at how capable they are. I like running things locally when I can, and I also like not worrying about accidentally burning through tokens.
I think the future is multiple locally run models that call out to hosted models when necessary. I can imagine every device coming with a base model and using loras to learn about the users needs. With companies and maybe even households having their own shared models that do heavier lifting. while companies like openai and anhtropic continue to host the most powerful and expensive options.
What models have you found capable? I was recently recommended Qwen3 Coder Next and I did not find it very successful. I have a good amount of VRAM/RAM so would love to run something locally.
Qwen3.5 is like an old version of ChatGPT and I can use it the same way I used GPT4 — writing emails, reading documentation and answering questions about it, reviewing code, answering trivia, etc.
Yes so far do we have a working practice that, with a given local mode, any infra we could use, that provide a good practice that can leverage it for local task?
Yes, but you don’t always want the power/expense of these models for the task at hand. A hammer is good enough to push a nail inside a wall. Save the nail gun for when you are building a house.
They’re not far behind, unless you mean for “vibe coding”. And for probably 85% of queries that people use LLMs for, you can’t even really perceive the difference between frontier and local.