Hacker Newsnew | past | comments | ask | show | jobs | submit | jiggawatts's commentslogin

> Kudos to Lisa Su and the team.

They're a typical hardware maker unable to focus on software, which is why NVIDIA is now a multi-trillion dollar corporation and AMD is "just" a few hundred billion.

They've focused too much on CPUs and completely dropped the ball on AI and compute accelerators.

It's especially sad considering that the MI300 and related accelerators on paper are competitive with NVIDIA hardware, it's just that they have nowhere near the same software stack, so nobody cares.


Don’t really care.

We were stuck with Intel, its nice that we have better CPUs.


Yeah, remember when 4 core 8 threads were the high-end CPUs until AMD Ryzen came out? If AMD didn't do their best job then we're still stuck with the norm of 4 cores for more years as I can imagine.

How many synapses do you have right now in your brain?

You must be a stupid brain if you don’t even know that!

Similarly: you can’t use software to figure out the “process” used to manufacture the chip it is running on.


You can learn a lot from a model when you ask about its sizing, although not necessarily anything about the sizing.

For instance, you can learn how much introspection has been trained in during RL, and you can also learn (sometimes) if output from other models has been incorporated into the RL.

I think of the self-knowledge conversations with models as a nicety that's recent, and stand by my assessment that this model is not trained using modern frontier RL workflows.

> you can’t use software to figure out the “process” used to manufacture the chip it is running on.

This seems so incorrect that I don't even know where to start parsing it. All chips are designed and analyzed by software; all chip analysis, say of an unknown chip, starts with etching away layers and imaging them using software, then analyzing the layers, using software. But maybe another way to say that is "I don't understand your analogy."


> I don't even know where to start parsing it.

If it helps, the key part is: "that it is running on".

You can't use software to analyse images of disassembled chips that it is running on because disassembled chips can't run software!

A surgeon can learn about brain surgery by inspecting other brains, but the smartest brain surgeon in the world can't possibly figure out how many neurons or synapses their own brains have just by thinking about it.

Your meat substrate is inaccessible to your thoughts in the exact same manner that the number of weights, model architecture, runtime stack, CUDA driver version, etc, etc... are totally inaccessible to an LLM.

It can be told, after the fact, in the same manner that a surgeon might study how brains work in a series of lectures, but that is fundamentally distinct.

PS: Most ChatGPT models didn't know what they were called either, and tended to say the name and properties of their predecessor model, which was in their training set. Open AI eventually got fed up with people thinking this was a fundamental flaw (it isn't), and baked this specific set of metadata into the system prompt and/or the post-training phase.


> For instance, you can learn how much introspection has been trained in during RL,

That's not introspection: that's a simulacrum of it. Introspection allows you to actually learn things about how your mind functions, if you do it right (which I can't do reliably, but I have done on occasion – and occasionally I discover something that's true for humans in general, which I can later find described in the academic literature), and that's something that language models are inherently incapable of. Though you probably could design a neural architecture that is capable of observing its own function, by altering its operation: perhaps a recurrent or spiking neural network might learn such a behaviour, under carefully-engineered circumstances, although all the training processes I know of would have the model ignore whatever signals it was getting from its own architecture.

> all chip analysis, say of an unknown chip, starts with etching away layers

Good luck running any software on that chip afterwards.


Introspection: all heard. As a practical matter, you can rl or prompt inject information about the model into context and most major models do this, not least I expect because they’d like to be able to complain when that output is taken for rl by other model training firms.

I agree that an intermediate non anthropomorphic but still looking at one’s own layers sort of situation isn’t in any architecture I’m aware of right now. I don’t imagine it would add much to a model.

Chip etching: yep. If you’ve never seen an unknown chip analyzed in anger, it’s pretty cool.


This feels like a bug in the SQL query optimizer rather than Dapper.

It ought to be smart enough to convert a constant parameter to the target column type in a predicate constraint and then check for the availability of a covering index.


There's a data type precedence that it uses to determine which value should be casted[0]. Nvarchar is higher precedence, therefore the varchar value is "lifted" to an nvarchar value first. This wouldn't be an issue if the types were reversed.

0: https://learn.microsoft.com/en-us/sql/t-sql/data-types/data-...


It's the optimizer caching the query plan as a parameterized query. It's not re-planning the index lookup on every execution.

The parameter type is part of the cache identity, nvarchar and varchar would have two cache entries with possibly different plans.

How do you safely convert a 2 byte character to a 1 byte character?

Easily! If it doesn't convert successfully because it includes characters outside of the range of the target codepage then the equality condition is necessarily false, and the engine should short-circuit and return an empty set.

Many similar incidents occurred in Ukraine, where Russia targeted apartment blocks that were built on the former site of some sort of military building that was demolished decades ago.

The ultimate hubris is launching a multi million dollar missile to kill civilians because you couldn’t be bothered to check Google street view (or whatever).


Russia actively targets hospitals, fire departments and schools for years and you attribute it to "outdated info".

Shame on you.


It was quite obviously outdated info.

What people don’t seem to understand is the word “targeted”.

They see some obviously civilian target in ruins with screaming parents outside and they have an instant visceral emotional reaction: “What kind of monster would do something like this on purpose!?”

Practically nobody targets civilian building with expensive precision munitions! They’re expensive! There’s limited supply! Targets are chosen to maximise the military effect.

The problem is that the victims and journalists have “boots on the ground”. They’re right there and can clearly see the civilian nature of the target with their own eyes.

The person doing the targeting from som bunker thousands of miles away can see only blurry rectangles on an outdated map, has sparse intelligence reports, and targets coordinates. They’re not walking up to the missile like it’s some sort of intelligent war animal and whispering “kill civilians!” in its ear.

Similarly, they’re not on the ground standing outside the civilian target waving the missile in with light sticks like some airport tarmac staff.

I repeat: they’re thousands of miles away and have to target hundreds of buildings that all look the same-ish from space and aren’t magically labelled by God as “no longer valid under the Geneva conventions” or whatever.

I’m not saying that this makes war good or in any way ethical, but you can see how a mistake is made that doesn’t require cartoonish evil people to explain.


Terrorbombning is a thing, you should look it up.

Oh sure, and the US did it against both Japan and Germany in WW2, but those were not even remotely the same scenario as precision strikes against the IRGC and Iranian leadership in general.

This was clearly a horrific mistake, especially obvious since the girls school used to be a military building.


I was talking about Russia, not US incompetence or malice, who have an explicit tactic to target civilians.

They target civilian infrastructure like power plants and the like, but again, that's "not the same" as purposefully targeting a school or an apartment block. The latter they do fairly clearly by accident, because I've seen at least four video clips of Ukranians interviewed outside of a bombed civilian building saying something to the effect of "Oh yeah, back in 1990 there was a military training facility here but it was demolished in `91."

Note that 1991 was the year Ukraine and Russia split and Russia stopped getting a "direct feed" of things like urban planning information from Kiev.


> civilian infrastructure like power plants

The Russians have bombed multiple children’s hospitals.


Yes, well... the Russians seem especially unconcerned with checking targets for validity before mashing the fire button.

The logic they're presenting is largely the same as Israel's excuse for bombing hospitals in Gaza.

When there's a war in a civilian area, injured soldiers from the front line will be mostly treated at the nearest available hospital, which then overflows into regional hospitals further back, etc... A country under siege at the scales seen in Ukraine and Gaza don't get to pick and choose specific hospitals, they're all overflowing, so they use every available medical facility, including children's hospitals.

Worse, the convoys taking the wounded to these hospitals are more than likely military trucks and are driven by and/or escorted by military personnel in uniform.

On a blurry satellite picture or drone video the enemy will see a building frequently visited by the military.

"Legitimate target!"

Boom.

"Oops."


That's a lot of contortions to go through to avoid the clear Occam's Razor conclusion that these people are simply evil scumbags doing evil scumbag things. Bombing hospitals because they thought they contained wounded troops isn't a defense, that's a whole war crime of its own!

They have an extremely long track record of committing atrocities. You don't need to go out of your way to give them the benefit of the doubt, unless you're literally in Russia where they'll imprison you for telling the truth about what they're doing.


>Practically nobody targets civilian building with expensive precision munitions! They’re expensive! There’s limited supply! Targets are chosen to maximise the military effect.

We're not dealing with a rational or competent military chain of command. We're dealing with people who believe they're bringing about the Biblical Second Coming and that rules of engagement are "woke." These are literally cartoonishly evil people. They probably chose targets by asking Grok.


I'm going to confidently state that nobody in the US military chain of command gave the order to "mix some schools into the target list" for any reason, religious or not.

That's absurd on its face, and if you honestly believe that, then your mental model of how the world (and people in general) function is fundamentally broken.


>That's absurd on its face, and if you honestly believe that, then your mental model of how the world (and people in general) function is fundamentally broken.

I'm not talking about the world or people in general, I'm talking about about the Commander in Chief Donald Trump and "Secretary of War" Pete Hegseth, the people who set the tone and make the decisions. And if you listen to either one of them, especially Hegseth, you'll realize it isn't absurd on its face at all.

Even if no one gave a specific order to "mix some schools into the target list" this administration clearly and explicitly - as in, has literally stated on the record - does not care about morality, ethics, rules of engagement or anything of the sort. It's not out of the question that they would intentionally target civilian infrastructure just as a show of force and aggression, or simply not care because their goal is and I'm quoting here "killing people and breaking things."


You forgot about churches and shopping malls.

Some multiplayer real-time strategy (RTS) games used deterministic fixed-point maths and incremental updates to keep the players in sync. Despite this, there would be the occasional random de-sync kicking someone out of a game, more than likely because of bit flips.

For RTS games I wish we could blame bit flips, but more typically it is uninitialized memory, incorrectly-not-reinitialized static variables, memory overwrites, use-after-free, non-deterministic functions (eg time), and pointer comparisons.

God I love C/C++. It’s like job security for engineers who fix bugs.


Some games are reliable enough. I found out the DRAM in my PC was going bad when Factorio started behaving weird. Did a memory test to confirm. Yep, bitflips.

> My belief based on personal experience is that in software engineering it wasn't until November/December 2025 that AI had enough impact to measurably accelerate delivery throughout the whole software development lifecycle.

Gemini 3 and Opus 4.6 were the "woah, they're actually useful now!" moment for me.

I keep saying to colleagues that it's like a rising tide. Initially the AIs were lapping around our ankles, now the level of capability is at waist height.

Many people have commented that 50% of developers think AI-generated code is "Great!" and 50% think its trash. That's a sign that AI code quality is that of the median developer. This will likely improve to 60%-40%, then 70%-30%, etc...


I don’t see definitive evidence that there is some kind of Moore’s law for model improvement though. Just because this year’s model performs better than last year’s model doesn’t mean next year’s model will be another leap. Most of the big improvements this year seem to be around tooling - I still see Opus 4.6 (which is my daily driver at work) making lots of mistakes.

Things like the METR benchmark aren't sufficient?

I mean Moore's law is just a rule of thumb but the curve fits METR just as well..


Was that the benchmark that showed developers think they're 20% faster with AI, but are 20% slower?

Reminds me of the story of someone's woman working for a research lab to improve the computer-controlled automatic emergency landings of planes with total power failure.

... or so she was told.

She was unknowingly designing glide-bomb avionics.


“someone’s woman”?

lol I am guessing that was an autocorrect error.

I once saw the word nickel autocorrected incorrectly into something far worse. It was funny given the context (metals, not coins) but I wondered why someone would even have that word in their autocorrect dictionary.

My worst autocorrect story is a message to my mother in law referring to my sister in law. I told my mother in law that I’d give my wife’s sister “a*al’ when I got there. It was supposed to be ”a call” I’m still traumatized decades later.

What's in the autocorrect dictionary usually has nothing to do with what you typically write. No reason to wonder (i.e. if the insinuation being that that's a word they'd typically use).

We could joke about the auto correct knowing your subconscious mind.

Except if Facebook has auto correct, you can be sure it’s driven by a personal dossier on each of us, correlated by AI with every other person on the planet.

They know you were thinking that word!

The neverending benefits of personalization.


I feel like these stories are apocryphal. I mean, I can't say for certain that no US DoD research program used subterfuge to trick the performers into working on The Most Racist Bomb. But I can say that in 20 years I've never seen a dearth of people ready, willing, able, and actively participating with full knowledge that they are creating The Fastest Bomb and The Sneakiest Bom and The Biggest Bomb Without Actually Going Nuclear.

IDK, maybe it's different outside the National Capitol Region. But here, you could probably shout "For The Empire" as a toast in the right bars and people wouldn't think you were joking.


I feel like these stories are apocryphal.

They're not. But if it makes you feel better to believe that, everyone has their own coping mechanism.


What? I'm not questioning whether the weapons research actually happened. I'm questioning the sincerity of people claiming they didn't know what they were doing. I've seen plenty of weapons programs. They aren't a secret to the people working on them. My point is, the government doesn't need to lie to researchers or even pay them very well to get them to develop weapons because there are plenty of intelligent-enough people willing to do it almost for free.

I've worked as a contractor for a safety system that turned out to be for a foreign military. I was given a signal, and told to write software to fit it. The signal could plausibly be collected for a wide variety of civilian purposes.

What I realized later was that none of the civilian markets could possibly justify the cost of the project.

The particular type of signal fitting I was doing was only achievable by a few thousand expensive domain experts in the world, so, I think that addresses your other point.


Lots of people working on the Manhattan project did not know what they were working on. The core group of physicists did, but not many others.

I think you could get away with that excuse in 1945 when this whole system was first being created from scratch. It's been 80 years since then.

They knew the US was at war and they knew it was a government program for military purposes and they knew they were dealing nuclear materials.

A journalist not involved at all figured it out just fine, but at the very least it's not like it wasn't going to be a weapon.

Frankly though I wonder what the various judgemental people in these comments think about say, the tens of thousands of people who at the time were just straight up making artillery ammo.


Because working on things that go boom is like working on fireworks. The fact the end up on people is incidental.

If "This doesn't fit into my mental model, so everyone else must be lying" is how you deal with things you didn't personally experience, do what you have to.

The inability to accurately cite any story about this, and the "friend of a friend" structure is what implies it's garbage.

Not to mention it itself requires a conspiracy theory: "no one would do this work voluntarily" (or "all the smart people have to be tricked because they're so smart they obviously agree with me").

As though people don't just go and work at Boeing or Lockheed Martin.


It was posted on HN by the husband of the person involved. Find it yourself.

> "no one would do this work voluntarily"

The much more common reason is compartmentalisation. Employees are told as much as they need to know, no more.

If someone can design a glide bomb without knowing that it has an explosive payload, then they're not told.

The fear is not so much the employees themselves (they might be quite patriotic!) but that the information will leak out to the enemy, giving them a chance to counter the weapon or copy it.


That's a very different proposition to what the various parent posters are implying though. Like if you work for a defense contractor, you know what your work is for even if you wouldn't know exactly what the product or application was.

> Biden pardoned his son.

Yes, that's bad.

It's not even remotely the same.

Biden pardoned his son to protect him from being hounded for the rest of his life by rabid Republicans that still can't shut up about Hillary's emails (despite Trump doing 10x worse with the top secret files in his toilet), Benghazi (with 4 deaths, far less than the current Iran boondogle), etc...

Trump has weaponised the pardon power, which was previously used by other presidents to pardon people who didn't deserve their punishment. Non-violent drug crimes like possession of a bit of weed, life imprisonment over a technicality, that kind of thing.

Trump instead enables rampant corruption coupled with blind obedience with the promise of a pardon as the get-out-of-jail-free card.

He's also made pardons pay-for-play, letting out crypto scammers, drug lords, and anyone else willing to pay him a few million each.

It's obscene. It's corrosive. It's destroying your democracy, so very very visibly that the rest of the world is staring with slack-jawed horror.

Seriously.

Over here on the other side of the little pond we call the Pacific, we're worried about you yanks.


Pacific, or Atlantic… I can’t tell if you’re a Brit or an Australian.

Does it matter?

Our British mates are equally appalled.


The president has full discretionary pardon powers with no check or balance. It was inevitable that the power would be abused. You have an old system bursting at the seams of the maximalist modern world. Trump took advantage of your inadequacies.

Nothing felt inevitable about having such an arch traitor to the US in power. With all of congress turning their backs on America & our constitution, and a Supreme Court rabidly helping him.

There were lots of checks built in. But all three branches of government are bolstering this worship of vice, death, chaos, disease, and suffering


> Nothing felt inevitable about having such an arch traitor to the US in power.

I vehemently disagree. The US has been headed towards authoritarianism for close to a century.


That doesn't seem all that different to a MoE architecture.

It's the opposite of a MoE architecture in many ways. MoE splits every individual feed-forward layer into many tiny subnetworks, only a small number of which contribute to the layer output, and they get trained together to complement each other.

Ensembling makes multiple copies of the entire model, trains them independently on the same task, and then has every copy contribute to the output.

Reducing computation vs. increasing it; operating at per-layer granularity vs. whole model; specialization vs. redundancy.


Yes.

Even the latest NVIDIA Blackwell GPUs are general purpose, albeit with negligible "graphics" capabilites. They can run fairly arbitrary C/C++ code with only some limitations, and the area of the chip dedicated to matrix products (the "tensor units") is relatively small: less than 20% of the area!

Conversely, the Google TPUs dedicate a large area of each chip to pure tensor ops, hence the name.

This is partly why Google's Gemini is 4x cheaper than OpenAI's GPT5 models to serve.

Jensen Huang has said in recent interviews that he stands by the decision to keep the NVIDIA GPUs more general purpose, because this makes them flexible and able to be adapted to future AI designs, not just the current architectures.

That may or may not pan out.

I strongly suspect that the winning chip architecture will have about 80% of its area dedicated to tensor units, very little onboard cache, and model weights streamed in from High Bandwidth Flash (HBF). This would be dramatically lower power and cost compared to the current hardware that's typically used.

Something to consider is that as the size of matrices scales up in a model, the compute needed to perform matrix multiplications goes up as the cube of their size, but the other miscellaneous operations such as softmax, relu, etc.. scale up linearly with the size of the vectors being multiplied.

Hence, as models scale into the trillions of parameters, the matrix multiplications ("tensor" ops) dominate everything else.


I'm not following the whole LLM space, but

> the compute needed to perform matrix multiplications goes up as the cube of their size,

are they really not using even Strassen multiplication?


I'm not aware of any major BLAS library that uses Strassen's algorithm. There's a few reasons for this; one of the big ones is Strassen is much worse numerical performance than traditional matrix multiplication. Another big one is that at very large dense matrices--which are using various flavors of parallel algorithms--Strassen vastly increases the communication overhead. Not to mention that the largest matrices are probably using sparse matrix arithmetic anyways, which is a whole different set of algorithms.

AFAIK the best practical matrix multiplication algorithms scale as roughly N^2.7 which is close enough to N^3 to not matter for the point that I'm trying to make.

The 100 class Nvidia chips are targeted at training. With Nvidia buying Groq it will further move in that direction.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: