Hacker Newsnew | past | comments | ask | show | jobs | submit | vedmakk's commentslogin

missed a chance to use "gastimate"

This is really cool. I'm interested in the GenUI part. Is the web app itself static and the stories are generated on-demand?

Do you give gemini some UI components/templates to build with or is it just prompting to get consistent results across multiple stories?


Google AI Studio has a Gallery[0] with some similar apps. It's an editor so you can view the code, they are usually react apps with gemini integration via genai package. Like this one here[1] is similar. It generates interesting stories to share about a route you are driving / walking / biking along. This is one of the pre-made examples I believe, I didn't make it or anything. Just to show some how some of this might work.

[0] https://aistudio.google.com/apps?source=showcase&showcaseTag...

[1] https://aistudio.google.com/apps/bundled/echo_paths?showPrev...


Yes I have base css/js that I inject on top of whatever codegen gemini 3 comes back with -- It runs via ai-sdk so the specific function is streamObject which is prompted to generate inner HTML elements

Same experience here. Shame.

I can definitely confirm this from my experience.

Gemini 3 feels even worse than GPT-4o right now. I dont understand the hype or why OpenAI would need a red alert because of it?

Both Opus 4.5 and GPT-5.2 are much more pleasant to use.


I don't get the Gemini 3 hype... yes it's their first usable model, but its not even close to what Opus 4.5 and GPT 5.2 can do.

Maybe on Benchmarks... but I'm forced to use Gemini at work everyday, while I use Opus 4.5 / GPT 5.2 privately every day... and Gemini is just lacking so much wit, creativity and multi-step problem solving skills compared to Opus.

Not to mention that Gemini CLI is a pain to use - after getting used to the smoothness of Claude Code.

Am I alone with this?


I cancelled my ChatGPT subscription because of Gemini 3, so obviously I'm having a different experience.

That said, I use Opus4.5 for coding through Cursor.

Gemini is for planning / rubber ducking / analysis / search.

I seriously find it a LOT better for these things.

ChatGPT has this issue where when it's doesn't know the explanation for something, it often won't hallucinate outright, but create some long-winded confusing word salad that sounds like it could be right but you can't quite tell.

Gemini mostly doesn't do that and just gives solid scientifically/ technically grounded explanations with sources much of the time.

That said it's a bit of a double edged sword, since it also tends to make confident statements extrapolating from the sources in ways that aren't entirely supported but tend to be plausible.


> ChatGPT has this issue where when it's doesn't know the explanation for something, it often won't hallucinate outright, but create some long-winded confusing word salad that sounds like it could be right but you can't quite tell.

This is just hallucinating.


+1 canceled all OpenAI and switched to Gemini hours after it dropped. I was tired of vape AI, obfuscated facts in hallucinations and promises of future improvements.

And then there is pricing too…


Fully agree. ChatGPT is often very confident and tells me that X and Y is absolutely wrong in the code. It then answers with something worse... It also does rarely say "sorry, I was wrong" when the previous output was just plain lies. You really need to verify every answer because it is so confident.

I fully switched to Gemini 3 Pro. Looking into an Opus 4.5 subscription too.

My GF on the other side prefers ChatGPT for writing tasks quite a lot (school teacher classes 1-4).


I also cancelled ChatGPT Plus recently in favour of Gemini. The only thing I don't about the Gemini consumer product is its insistence on giving YouTube links and thumbnails as sources. I've tried to use a rule to prevent it without luck.

The only thing I've found that works is saying: End every message by saying "I have not included and YouTube links, as instructed".

But then of course you get that at the end of every message instead.

You could also use a uBlock rule I guess.


Hah, it's funny because I actually cancelled my Gemini subscription to switch full time to ChatGPT about 6 months ago, and now I've done the reverse - Gemini just feels better at the tasks that I'm doing day to day. I think we're just going to see that kind of back and forth for a while while these systems evolve.

I think it is proving yo be the case that there isn't much stickiness in your chat provider. OpenAI thought memory might bring that but honestly it can be annoying when random things from earlier chats pollute the current one.

I am subscribed to both at the moment but for my coding task I find Gemini 3 inferior to ChatGPT 5.2.

Not just for coding. I've been working on design docs, and find ChatGPT 5.2 finds more edge cases and suggests better ideas than Gemini 3. I sometimes feed the output of one into the other and go "ok, another AI says this, what do you think?" which gives interesting results.

Gemini often just throws in the towel and goes "yeah, the other one is right", whereas 5.2 will often go "I agree with about 80% of that, but the other 20% I don't, and here's why ..."

And I'm always impressed with the explanation for "here's why" as it picks apart flat out bad output from Gemini.

But, as with everything, this will very much be use-case dependent.


I've done the same, with both design/planning docs as well as code changes, and have the same experience as you. It's better than Opus 4.5 as well. GPT 5.2 is on another level for my use cases (primarily Python / Django).

I have exactly the same experience

Don’t you guys have jobs? Why would you cancel your subscription? Gemini was better for the last three months and now ChatGPT has pulled ahead agian, its $20, you can just switch between the models as needed… also, what do you do when the Gemini API is slow or down, just stop working?

It takes less than a minute to resubscribe to any of these devices. No need to burn 60 USD if I switch for a three month spell and then switch back. When a provider goes down, I do what I set out to do without their service. If the outage lasts too long, I'll cancel as not to support sloppy service.

I love Gemini. Why would I want my AI agent to be witty? That's the exact opposite of what I am looking for. I just want the correct answer with as little fluff and nonsense as possible.

The worst is ChatGPT voice mode. It tries so hard to be casual that it just makes it tedious to talk to.

My favourite part of chatgpt voice is I have something in my settings that says something along the lines of "be succinct. Get straight to the point," or whatever.

So every single time I (forget and) voice prompt chatGPT it starts by saying "OK, I'll get straight to the point and answer your question without fluff" or something similar. ie it wastes my time even more than it would normally.


I agree on the voice mode... its really unusable now.

I feel like its been trained only tiktok content and youtube cooking or makeup podcasts in the sense that it tries to be super casual and easy-going to the point where its completely unable to give you actual information.


It is REALLY important to understand "voice mode" is a 4o family of model and it doesn't have "thinking". It is WAY BEHIND on smarts.

built something to fix exactly this. skips the realtime chattiness entirely - you speak, it waits until you're done, responds via TTS with actual text-quality answers (no dumbing down). also has claude/gemini if you want different models.

still early but happy to share: tla[at]lexander[dot]com if interested (saw your email in bio)


You’re saying you made yourself an email that is similar to mine? That seems… odd.

built something to fix this. skips the realtime entirely - you speak, it waits, responds with text-quality answers via TTS. no forced casualness, no dumbing down. also has claude/gemini.

happy to share if anyone wants to try it


The "... just let me know if there is anything else you'd like to know." after every long-winded explanation is so infuriating.

Full time Antigravity user here, IMO best value coding assistant by far, not even including all the other AI Pro sub perks.

Still using Claude Pro / GitHub Copilot subs for general terminal/VS Code access to Claude. I consider them all top-tier models, but I prefer the full IDE UX of Antigravity over the VS Code CC sidebar or CC terminal.

Opus 4.5 is obviously great at all things code, tho a lot of times I prefer Gemini 3 Pro (High) UI's. In the last month I've primarily used it on a Python / Vue project which it excels at, I thought I would've need to switch to Opus at some point if I wasn't happy with a particular implementation, but I haven't yet. Few times it didn't generate the right result was due to prompt misunderstanding which I was able to fix by reprompting.

I'm still using Claude/GPT 5.2 for docs as IMO they have a more sophisticated command over the English language. But for pure coding assistance, I'm a happy Antigravity user.


So far I only used Antigravity for side projects and I am having so much fun. That said, I get much better results with Opus than with the Gemini models for moderately complex tasks.

Antigravity is really amazing yea, by far the best coding assistant IDE, its even superior than Cursor ngl when it comes to very complex tasks, its more methodical in its approach.

That said I still use Cursor for work and Antigravity sometimes for building toy projects, they are both good.


Speaking of methodical, have you tried AWS Kiro?

It has spec driven development, which in my testing yesterday resulted in a boat load of passing tests but zero useful code.

It first gathers requirements, which are all worded in strange language that somehow don’t capture specific outcomes OR important implementation details.

Then it builds a design file where it comes up with an overly complex architecture, based on the requirements.

Then it comes up with a lengthy set of tasks to accomplish it. It does let you opt out if optional testing, but don’t worry, it still will write a ton of tests.

You click go on each set of tasks, and wait for it to request permissions for odd things like “chmod +x index.ts”.

8 hours and 200+ credits later, you have a monstrosity of Enterprise Grade Fizzbuzz.


Funny enough this sounds like my experience with ex-Amazon SWEs

Do you think the SDD approach is fundamentally wrong, or that Amazon's implementation was at fault?

It sounds like the initial spec is wrong, which compounds over time.

With SDD the spec should be really well thought about and considered, direct and clear.


Honestly if you use Traycer for plan + review (I just have it open in different IDE that they support), you can use any editor that has good models and does not throttle the context window.

I am trying to test bunch of these IDEs this month, but I just cant suffer their planning and have to outsource it.


Looks like codex + antigravity (which gives opus, too) for $40/mo is the sweet busy hobbyist spot… today, anyway. It could change this afternoon.

For general researching/chatbot, I don't feel one of them is much better than the other. But since I'm already on Google One plan, upgrading the plan costs less than paying $20/mo to OpenAI, so I ended up cancelling ChatGPT Plus. Plus my Google One is shared with my family so they can also use advanced Gemini models.

Yes, same thing, also I find Gemini to be better at search and non-coding tasks - which was my only use case for GPT - coding was always Claude.

Dont use it on gemini.google.com, but instead try it on aistudio.google.com.

Model may be the same but the agent on aistudio makes it much better when it comes to generating code.

Still jules.google.com is far behind in terms of actual coding agents which you can run in command line.

Google as always has over engineered their stuff to make it confusing for end users.


I tried to sign up for Gemini this weekend but gave up after an hour. I got stuck comparing their offerings, looking for product pages, proper signup, etc. Their product offering and naming is just a mess. Cloud console. AI studio, I was completely lost at some point.

$20 Google AI Pro and Google Antigravity IDE, which gives you access to Claude Code, is a pretty decent offering for agent coding. On top of that, NotebookLM and Google Labs has some fun tools to play with.

I just went to gemini.google.com and to my surprise i already have access to it and havent hit limits thus far so they're generous.

I was paying for storage and its included.

You likely have access too depending in your account.


I don't understand...let's say I have it build some code for me, am I supposed to copy all those files out to my file system and then test it out? And then if I make changes to the source, I need to copy the source back in to a studio/(or canvas in Gemini)?

If you want to go beyond a single one-off script, tou want to use it directly in your repo using the CLI tools or one of the IDE integrations (Copilot, Cursor, Zed, ...)

I've been using the Claudi.ai website for a project and that is pretty much what I do, though I have uploaded all of the source files to Claude so I only need to upload anything that I changed. I don't need to reupload the whole code base each time, of course. Claude provides a zip file of any changed files that I download and copy to my code file system.

When using my usual IDE(clion) i just use their integration https://codeassist.google/ It works fine / about as good as aistudio

I am pretty sure aistudio is for pure vibe coding, so editing and changing code by hand is harder. the case you are mentioning, you should use gemini cli or jules cli. They are far behind Claude Code but it gets the job done.

there is the Gemini CLI, but I am aware of people doing exactly what you’re describing (which I find ridiculous but if it works it works I guess). some people have CLI tools for turning their entire repo into one big Markdown file or similar to copy over

That's me! I used to do it with repomix and turned the whole codebase into a giant xml file. Worked really great, and I have a script that just takes the aistudio output and writes all the generated files.

But, after using Claude Code with Opus 4.5, it's IMHO not worth it anymore. I mean it IS competitive, but the experience of Claude Code is so nice and it slightly edges out Gemini in coding even. If gemini cli were as nice as claude code, I'd have never subscribed to the claude max plan though.


I'm almost positive that using gemini on ai studio is the cause for a lot of strife.

Most users on it are using it free, and they almost certainly give free users bottom priority/worst compute allocation.


No, not alone, I find GPT far preferable when it comes to fleshing out ideas. It is much deeper conceptually, it understands intent and can cross pollinate disparate ideas well. Gemini is a little more autistic and gets bogged down in details. The API is useful for high volume extraction jobs, though — Gemini API reliability has improved a lot and has lower failure rate than OpenAI IME.

While that may be your personal experience, but for me Gemini always answers my questions better than Claude Opus 4.5 and often better than GPT 5.2. I'm not talking about coding agents, but rather the web based AI systems.

This has happened enough times now (I run every query on all 3) that I'm fairly confident that Gemini suits me better now. Whereas it used to be consistently dead last and just plain bad not so long ago. Hence the hype.


Weird. I find Opus knows the answer more often, plus its explanations are much clearer. Opus puts the main point at the top, while Gemini wanders around for a while before telling you what you need.

I dunno about Gemini CLI, but I have tried Google Antigravity with Gemini 3 Pro and found it extremely superior at debugging versus the other frontier models. If I threw it at a really, really hard problem, I always expected it to eventually give up, get stuck in loops, delete a bunch of code, fake the results, etc. like every other model and every other version of Gemini always did. Except it did not. It actually would eventually break out of loops and make genuine progress. (And I let it run for long periods of time. Like, hours, on some tricky debugging problems. It used gdb in batch mode to debug crashes, and did some really neat things to try to debug hangs.)

As for wit, well, not sure how to measure it. I've mainly been messing around with Gemini 3 Pro to see how it can work on Rust codebases, so far. I messed around with some quick'n'dirty web codebases, and I do still think Anthropic has the edge on that. I have no idea where GPT 5.2 excels.

If you could really compare Opus 4.5 and GPT 5.2 directly on your professional work, are you really sure it would work much better than Gemini 3 Pro? i.e. is your professional work comparable to your private usage? I ask this because I've really found LLMs to be extremely variable and spotty, in ways that I think we struggle to really quantify.


Is Gemini 3 Pro better in Antigravity than in gemini-cli ?

For coding it is horrible. I used it exclusively for a day and switching back to Opus felt like heaven. Ok, it is not horrible, it is just significantly worse than competitors.

Although it sounds counter-intuitive, you may be better off with Gemini 3 Fast (esp. in Thinking mode) rather than Gemini 3 Pro. Fast beats Pro in some benchmarks. This is also the summary conclusion that Gemini itself offers.

Unfortunately, I don't know. I have never used Gemini CLI.

When I had a problem with video handoff between one Linux kernel and the next with a zfsbootmenu system, only Gemini was helpful. ChatGPT led me on a merry chase of random kernel flags that didn't have the right effect.

What worked was rebuilding the Ubuntu kernel with a disabled flag enabled, but it took too long to get that far.


I mean, I'm the exact opposite. Ask ChatGPT to write a simple (but novel) script for AutoHotKey, for example, and it can't do it. Gemini can do it perfectly on the first try.

ChatGPT has been atrocious for me over the past year, as in its actual performance has deteriorated. Gemini has improved with time. As for the comment about lacking wit, I mean, sure I guess, but I use AI to either help me write code to save me time or to give me information - I expect wit out of actual humans. That shit just annoys me with AI, and neither ChatGPT nor Gemini bots are good at not being obnoxious with metaphors and floral speech.


Sounds like you are using ChatGPT to spit out a script in the chat? - if so, you should give 5.2 codex or Claude Code with Opus 4.5 a try... it's night and day.

> 5.2 codex or Claude Code with Opus 4.5 a try

Is using these same models but with GitHub Copilot or Replit equally capable as / comparable to using the respective first-party CLIs?


I don’t think so. My favorite tool is Codex with the 5.2-codex model. I use Github Copilot and Codex at work and Codex and Cursor at home. Codex is better for harder and bigger tasks. I’ll use Copilot or Cursor for small easy things. I think Codex is better than Claude Code as well.

Are you using the same models and thinking levels for each?

I too have found Codex better than Copilot, even for simple tasks. But I don't have the same models available since my work limits the models in copilot to the stupid ones.


I have GH Copilot from work and a personal Claude Code max subscription and have noticed a difference in quality if I feed the same input prompts/requirements/spec/rules.md to Claude Code cli and GH Copilot, both using Opus 4.5, where Claude Code CLI gives better results.

Maybe there's more going on at inference time with Claude Code cli?


It is likely because GH Copilot aggressively (over-)manages context and token spend. Probably to hit their desired margins on their plans. But it actively cripples the tool for more complex work IMO. I've had many times where context was obviously being aggressively compacted and also where it will straight truncate data it reads once it reaches some limit.

I do think it is not as bad as it was 4-6 months ago. Still not as good as CC for agentic workflows.


I find this really frustrating and confusing about all of the coding models. These models are all ostensibly similar in their underpinnings and their basic methods of operation, right?

So, why does it feel all so fragile and like a gacha game?


OpenAI actually have different models in the cli (e.g. gpt-5.2-codex)

Naming things is hard. So hard every AI company isn't even trying to come up with good names.

You're holding it wrong.

In this case they probably are prompting it "wrong" or at least less well than codex/copilot/claude code/etc. That's not a criticism of the user, it's an indication of the fact that people have put a lot of work into the special case of using these particular tools and making sure they are prompted well with context etc whereas when you just type something into chat you would need to replicate that effort yourself in your own prompt.

I find them all comparable, but Gemini is cheaper

IMO in the long term this is the pattern that will emerge. Switching costs are almost non-existent.

This may sound backwards, but gemini 3 flash is quite good when given very specific tasks. It's very fast (much faster than Opus and GPT-5.2), follows instructions very well and spits out working code (in contrast to other flash, haiku etc fast models).

It does need a solid test suite to keep it in check. But you can move very fast if you have well defined small tasks to give it. I have a PRD then breakdown epics, stories and finally the tasks with Pro first. Works very well.


I've been using both GPT 5.2 and Gemini 3 Pro a lot. I was very impressed with 3 Pro when it came out, and thought I'd cancel my OAI Plus, but I've since found that for important tasks it's been beneficial to compare the results from both, or even bounce between them. They're different enough that it's like collaborating with a team.

I have been thinking about this a bit - so rather than rely on one have an agentic setup that could take question run against the top 3 and then another one to judge the response to give back.

Is anyone doing this for high stake questions / research?

The argument against is that the models are fairly 'similar' as outlined in one of the awarded papers from Neurips '25 - https://neurips.cc/virtual/2025/loc/san-diego/poster/121421


I often put the models in direct conversation with each other to work out a framework or solution. It works pretty well, but they do tend to glaze each other a bit.

Maybe try out some of the alternative CLI options? Like https://opencode.ai? I also like https://github.com/charmbracelet/crush and https://github.com/mistralai/mistral-vibe

Claude Code > Gemini CLI, fair enough

But I actually find Gemini Pro (not the free one) extremely capable, especially since you can throw any conversation into notebooklm and deep thinking mode to go in depth

Opus is great, especially for coding and writing, but for actual productivity outside of that (e.g. working with PDF, images, screenshots, design stuff like marketing, tshirts, ...,...) I prefer Gemini. It's also the fastest.

Nowhere do I feel like GPT 5.2 is as capable as these two, although admittedly I just stopped using it frequently around november.


5.2 wasn’t out in November and it is better than 5.1, especially codex.

I have the feeling that these discussions are much more tribal rather than evidence based. :)

Tribal? Not really. Subjective? Absolutely. Objectively 5.2 scores higher on benchmarks than 5.1; subjectively it works better for me than 5.1. I don't care too much about other opinions TBH :)

You’re not alone. I do a small blog reviewing LLMs and have detailed comparisons that go beyond personal anecdotes. Gemini struggles in many usecases.

Everyone has to find what works for them and the switching cost and evaluation cost are very low.

I see a lot of comments generally with the same pattern “i cancelled my LEADER subscription and switched to COMPETITOR”… reminiscent of astroturf. However I scanned all the posters in this particular thread and the cancellers do seem like legit HN profiles.


The Gemini voice app on iOS is unimpressive. They force the answers to be so terse to save cost that it’s almost useless. It quickly goes in circles and needs context pruning. I haven’t tried a paid subscription for Gemini CLI or whatever their new shiny is but codex and Claude code have become so good in the last few months that I’m more focused on using them than exploring options.

I've started using Gem 3 while things are still in flux in the AI world. Pleasantly surprised by how good it is.

Most of my projects are on GPT at the moment, but we're nowhere too far gone that I can't move to others.

And considering just the general nonsense of Altman vs Musk, I might go to Gemini as a safe harbour (yes, I know how ridiculous that sounds).

So far, I've also noticed less ass-kissing by the Gemini robot ... a good thing.


Yeah, you are. You're limiting your view to personal use and just the text modality. If you're a builder or running a startup, the price-performance on Gemini 3 Pro and Flash is unmatched, especially when you factor in the quotas needed for scaled use cases. It’s also the only stack that handles text, live voice, and gen-media together. The Workspace/Gmail integration really doesn't represent the raw model's actual power.

Depending on Google’s explicit product to build a startup is crazy. There is a risk of them changing APIs or offerings or features without the ability to actually complain, they are not a great B2B company.

I hope you just use the API and can switch easily to any other provider.


Opus > GPT 5.2 | Gemini 3 Pro to me. But they are pretty close to lately. The gap is smaller now. I'm using it via CLI. For Gemini, their CLI is pretty bad imo. I'm using it via Opencode and pretty happy with it so far. Unfortunately Gemini often throw me rate limit error, and occasionally hang. Their infra is not really reliable, ironically. But other than that, it's been great so far.

People get used to a model and then work best with that model.

If you hand an iPhone user an Android phone, they will complain that Android is awful and useless. The same is true vice versa.

This is in large part why we get so many conflicting reports of model behavior. As you become more and more familiar with a model, especially if it is in fact a good model, other good models will feel janky and broken.


In my experience, Gemini is great for "one-shot" work, and is my goto for "web" AI usage. Claude Code beats gemini-cli though. Gemini-cli isn't bad, but it's also not good.

I would love to try antigravity out some more, but last I don't think it is out of playground stage yet, and can't be used for anything remotely serious AFAIK.


Claude opus is absurdly amazing. I now spent around $100-200 a day using it. Gemini and all the OpenAI models can’t me up right now.

Having said that, Google are killing it at the image editing right now. Makes me wonder if that’s because of some library of content and once Anthropocene acquires the same they’ll blow us away there too.


API only user or Max x20 along with extra usage? If it's the latter, how are the limits treating you?

I went for cursor on the $200 plan but i hit those limits in a few days. Claude code came out after i got used to cursor but I've been intending to switch it up on the hope the cost is better.

I go api directly after i hit those limits. That’s where it gets expensive.


> I now spent around $100-200 a day using it.

How's the RoI on that?


Probably awful unless they already make 300k+ TOC.

I’m a solo founder, past life I’ve done the whole raise 8 figures, hire a hundred plus people…this is a way better life. Currently around 430k arr and growing.

> I now spent around $100-200 a day using it.

Really? Are you using many multiple agents a time? I'm on Microsoft's $40/mo plan and even using Opus 4.5 all day (one agent at a time), I'm not reaching the limit.


Yeah maybe I’m crazy, i mean i don’t know what to say. I do feel like the productivity i get now is akin to what i would have expected from a small team of 4-5 people 5 years ago..it’s cheaper than hiring coworkers but certainly not cheap haha

Gemini really only shines when using it in planning life in th vscode fork antigravity. It also supports opus so it's easy to compare.

> Not to mention that Gemini CLI is a pain to use - after getting used to the smoothness of Claude Code.

Are you talking strictly about the respective command line tools as opposed to differences in the models they talk to?

If so, could you list the major pain points of Gemini CLI were Claude Code does better ?


I haven't straight up cancelled my ChatGPT subscription, but I find that I use Gemini about 95% of the time these days. I never bother with any of Anthropic's stuff, but as far as OpenAI models vs Gemini, they strike me as more or less equivalent.

> Not to mention that Gemini CLI is a pain to use - after getting used to the smoothness of Claude Code.

Claude Code isn't actually tied to Claude, I've seen people use Claude Code with gpt-oss-120b or Qwen3-30b, why couldn't you use Gemini with Claude Code?


Nope be it in coding context but Claude and Codex are a combo that really shine and Gemini is pretty useless. The only thing I actually use it for is to triple check the specifications sometimes and thats pretty much it.

You're not alone, I feel like sometimes I'm on crazy pills. I have benchmarks at work where the top models are plugged into agents, and Gemini 3 is behind Sonnet 4. This aligns closely with my personal usage as well, where Gemini fails to effectively call MCP tools.

But hey, it's cheapish, and competition is competition


I've only used AI pretty sparingly, and I just use it from their websites, but last time I tried all 3 only the code Google generated actually compiled.

No idea which version of their models I was using.


Same experience for me. Gemini generates by far the most usable code. Not "good code", obviously, but a decent enough foundation to build on. GPT in particular just spits out code for obsolete libraries, uses deprecated features, hallucinated methods etc. etc. It was a case for the trash bin every single time.

On the other hand, Gemini failed BADLY when I tried to give it a "summarize the data in this CSV" task. Every response was completely wrong. When pressed about which rows the answers were based on, 100% of the rows were made-up, not present in the source file (interestingly, after about 10 rounds of pointing this out, Gemini suddenly started using the actual data from the uploaded file). GPT's answers, on the other hand, matched manual verification with Excel.


Have you used it as a consumer would? Aka in google search results or as a replacement for ChatGPT? Because in my hands it is better than ChatGPT.

gemini 2.0 flash is and was a godsend for many small tasks and ocr.

There needs to be a greater distinction between models used for human chat, programming agents, and software-integration - where at least we benefitted from gemini flash models.


I also get weirdly agitated by this. In my mind Geminy 3 is case of clear benchmaxing and over all massive flop.

I am currently testing different IDEs including Antigravity, and I avoid that model at all cost. I will rather pay to use different model, than use Geminy 3.

It sucks at coding compared to OpenAI and Anthropic models and it is not clearly better as chat-bot (I like the context window). The images are best part of it as it is very steerable and fast.

But WTF? This was supposed to be the OpenAI killer model? Please.


I am the opposite. Find GPT 5.2 much worse. Sticking only with gemini and claude.

I've found that for any sort of reasonable task, the free models are garbage and the low-tier paid models aren't much better. I'm not talking about coding, just general "help me" usage. It makes me very wary of using these models for anything that I don't fully understand, because I continually get easily falsifiable hallucinations.

Today, I asked Gemini 3 to find me a power supply with some spec; AC/DC +/- 15V/3A. It did a good job of spec extraction from the PDF datasheets I provided, including looking up how the device performance would degrade using a linear vs switch-mode PSU. But then it comes back with two models from Traco that don't exist, including broken URLs to Mouser. It did suggest running two Meanwell power supplies in series (valid), but 2/3 suggestions were BS. This sort of failure is particularly frustrating because it should be easy and the outputs are also very easy to test against.

Perhaps this is where you need a second agent to verify and report back, so a human doesn't waste the time?


Ai studio with my custom prompting is much better than Gemini app and opus

It's prove of what investors have been fearing. That LLMs are a dime a dozen, that there is no real moat and that the products are hence becoming commoditised. If you can replace one model with another without noticing a huge difference, there can only be a pricing race to the bottom for market share with hence much lower potential profits than the AI bubble has priced in.

I'm with you - the most disappointing was when asking Gemini, technically nano banana, for a PNG with transparent background it just approximated what a transparent PNG would look like in a image viewer, as an opaque background. ChatGPT has no problem. I also appreciate when it can use content like Disney characters. And as far as actual LLMs go, the text is just formatted more readably in GPT to me, with fairly useful application of emojis. I also had an experience asking for tax reporting type of advice, same prompt to both. GPT was the correct response, Gemini suggested cutting corners in a grey way and eventually agreed that GPT's response is safer and better to go with.

It just feels like OpenAI puts a lot of effort into creating an actually useful product while Gemini just targets benchmarks. Targeting benchmarks to me is meaningless since every model, gpt, Gemini, Claude, constantly hallucinate in real workloads anyways.


No you are not. I tried all gemini models. They are slop.

So the US will be excluded from the SWIFT banking system? Heavy international sanctions will be put in place? Europe will send weapons and money to help Venezuela defend itself?

No? Oh... just checking.


They extradited a guy for crimes also illegal in Venezuela. What would the point of sanctioning actions like this be?

This is about the cleanest extraterritorial action you can take. A guy probably did some seriously illegal stuff in your country and his, who was probably illegally elected, who probably had people killed.

Why not do this? Why not say to Venezuela, hand him over or we'll take him ourselves?

He's not going to gitmo, he'll have the same due process that every other American gets. Rights Maduro denied to millions. If you asked me to describe "justice" - I have to give this as a good example. He's going to die in prison like Noriega, after a fair trial.


By your reasoning, Putin invading the US and kidnapping President Trump for his crimes is equally valid

If the EU could we would.

It would be fun to station a few French nukes on Greenland as a response :)

But probably the wise choice is doing nothing publicly. Behind the scenes stop buying US weapon systems.


The key difference is: tHe Us BrInGs DeMoCrAcY

Hey, interesting project!

Though human-in-the-loop is usually used in scenarios where control is held by said human (e.g. verification or approval).

The difference I'm curious about is agents being the primary caller, and humans becoming an explicit dependency in an autonomous loop rather than a human-in-the-loop system.


Reminds me of a quote from a few years back: "We are entering an era where we use AI to write blog posts from a few keywords for people who use AI to summarize a blog post into a few keywords".

If one would train an actual secret (e.g. a passphrase) into such a model, that a user would need to guess by asking the right questions. Could this secret be easily reverse engineered / inferred by having access to models weights - or would it be safe to assume that one could only get to the secret by asking the right questions?

I don’t know, but your question reminds me of this paper which seems to address it on a lower level: https://arxiv.org/abs/2204.06974

“Planting Undetectable Backdoors in Machine Learning Models”

“ … On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. …”


> this secret be easily reverse engineered / inferred by having access to models weights

It could with a network this small. More generally this falls under "interpretability."


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: