I use them as follows: o1-pro: anything important involving accuracy or reasonin...

rushingcreek · on April 14, 2025

Phind was fine-tuned specifically to produce inline Mermaid diagrams for technical questions (I'm the founder).

underlines · on April 15, 2025

I really loved Phind and always think of it as the OG perplexity / RAG search engine.

Sadly stopped my subscription, when you removed the ability to weight my own domains...

Otherwise the fine-tune for your output format for technical questions is great, with the options, the pro/contra and the mermaid diagrams. Just way better for technical searches, than what all the generic services can provide.

bsenftner · on April 15, 2025

Have you been interviewed anywhere? Curious to read your story.

shortcord · on April 14, 2025

Gemini 2.5 Pro is quite good at code.

Has become my go to for use in Cursor. Claude 3.7 needs to be restrained too much.

artdigital · on April 15, 2025

Same here, 2.5 Pro is very good at coding. But it’s also cocky and blames everything but itself for something not working. Eg “the linter must be wrong you should reinstall it”, “looks to be a problem with the Go compiler”, “this function HAS to exist, that’s weird that we’re getting an error”

And it often just stops like “ok this is still not working. You fix it and tell me when it’s done so I can continue”.

But for coding: Gemini Pro 2.5 > Sonnet 3.5 > Sonnet 3.7

valenterry · on April 15, 2025

Weird. For me, sonnet 3.7 is much more focussed and in particular works much better when finding the places that needs change and using other tooling. I guess the integration in cursor is just much better and more mature.

behnamoh · on April 15, 2025

This. sonnet 3.7 is a wild horse. Gemini 2.5 Pro is like a 33 yo expert. o1 feels like a mature, senior colleague.

benhurmarcel · on April 15, 2025

I find that Gemini 2.5 Pro tends to produce working but over-complicated code more often than Claude 3.7.

torginus · on April 15, 2025

Which might be a side-effect of the reasoning.

In my experience whenever these models solve a math or logic puzzle with reasoning, they generate extremely long and convoluted chains of thought which show up in the solution.

In contrast a human would come up with a solution with 2-3 steps. Perhaps something similar is going on here with the generated code.

motoboi · on April 14, 2025

You probably know this but it can already generate accurate diagrams. Just ask for the output in a diagram language like mermaid or graphviz

bangaladore · on April 14, 2025

My experience is it often produces terrible diagrams. Things clearly overlap, lines make no sense. I'm not surprised as if you told me to layout a diagram in XML/YAML there would be obvious mistakes and layout issues.

I'm not really certain a text output model can ever do well here.

resters · on April 14, 2025

FWIW I think a multimodal model could be trained to do extremely well with it given sufficient training data. A combination of textual description of the system and/or diagram, source code (mermaid, SVG, etc.) for the diagram, and the resulting image, with training to translate between all three.

bangaladore · on April 14, 2025

Agreed. Even simply I'm sure a service like this already exists (or could easily exist) where the workflow is something like:

1. User provides information

2. LLM generates structured output for whatever modeling language

3. Same or other multimodal LLM reviews the generated graph for styling / positioning issues and ensure its matches user request.

4. LLM generates structured output based on the feedback.

5. etc...

But you could probably fine-tune a multimodal model to do it in one shot, or way more effectively.

behnamoh · on April 15, 2025

I had a latex tikz diagram problem which sonnet 3.7 couldn't handle even after 10 attempts. Gemini 2.5 Pro solved it on the second try.

gunalx · on April 15, 2025

Had the same experience. o3-mini failing misreably, claude 3.7 as well, but gemini 2.5 pro solved it perfectly. (image of diagram without source to tikz diagram)

resters · on April 14, 2025

I've had mixed and inconsistent results and it hasn't been able to iterate effectively when it gets close. Could be that I need to refine my approach to prompting. I've tried mermaid and SVG mostly, but will also try graphviz based on your suggestion.

antman · on April 15, 2025

Plantuml (action) diagrams are my go to

wavewrangler · on April 15, 2025

You probably know this and are looking for consistency but, a little trick I use is to feed the original data of what I need as a diagram and to re-imagine, it as an image “ready for print” - not native, but still a time saver and just studying with unstructured data or handles this surprisingly well. Again not native…naive, yes. Native, not yet. Be sure to double check triple check as always. give it the ol’ OCD treatment.

barrkel · on April 15, 2025

Gemini 2.5 is very good. Since you have to wait for reasoning tokens, it takes longer to come back, but the responses are high quality IME.

czk · on April 15, 2025

re: "grok-3 is r1 with mods" -- do you mean you believe they distilled deepseek r1? that was my assumption as well, though i thought it more jokingly at first it would make a lot of sense. i actually enjoy grok 3 quite a lot, it has some of the most entertaining thinking traces.