Nvidia R&D chief on how AI is improving chip design

maxwells-daemon · on April 20, 2022

I work on this team! (Specifically: applied deep learning research, chip design).

It's a shame to see so many people dismissing this work as marketing. I see lots of clever people working hard on really novel and interesting stuff, and I really do think that ML has real potential to customize a design much more "deeply" than traditional automation tools.

marginalia_nu · on April 20, 2022

This is directed at AI marketing in general: "AI" has been used to market so much nonsense it's probably becoming a problem communicating actual interesting uses of AI. I very much get a dot com vibe off it, like nobody on the team knows how it works but we're sure we're gonna be rich somehow! In my head, I've begun substituting AI with "wizards" when I read it.

It's very much the sort of problems crypto is having. So many grifters actual interesting uses of the technology are very hard to identify and take seriously.

maxwells-daemon · on April 20, 2022

I guess so, but the fact of the matter is that ML/AI is actually, right now, doing useful things that would have been impossible 10 years ago. I don't think I could say the same about crypto (as distinct from cryptography).

q-big · on April 20, 2022

> "AI" has been used to market so much nonsense it's probably becoming a problem communicating actual interesting uses of AI.

On the other hand: if the people who do serious work in this area don't call out this nonsense, they must accept that their (serious) work becomes devalued.

> It's very much the sort of problems crypto is having. So many grifters actual interesting uses of the technology are very hard to identify and take seriously.

Here, the same holds.

maximus-decimus · on April 20, 2022

It's a lost cause, you have to pick a new word.

"Diet" was used for women's food product to the point that men didn't want to buy anything with "diet" on it, so CocaCola created coke zero just for men instead of trying to make them drink "diet" coke. They knew it was a lost battle.

PaulHoule · on April 20, 2022

I think it's funny how "the old AI" had combinatorical optimization as a major theme, for instance

https://en.wikipedia.org/wiki/Travelling_salesman_problem

which is closely related to the central operation of logic, the canonical NP problem

https://en.wikipedia.org/wiki/Boolean_satisfiability_problem

as well as the playing of games like Chess, Poker, etc.

Modern neural networks also have optimization as a theme even when the output is a classification or something that doesn't look like optimization... That is, the network itself is trained to minimize an error function. People used these kind of algorithms back in the 1980s to layout chips

https://en.wikipedia.org/wiki/A*_search_algorithm

and it's only natural that new techniques of optimization (both direct and through heuristics like the neural network used in AlphaGo) are used today for chips.

maxwells-daemon · on April 20, 2022

Yes, the way I see it, one of the major benefits of deep learning is that it lets you define functions (in the R^n -> R^m sense) that would be basically impossible to define with traditional programming techniques. I think this comes up a lot in subroutines of combinatorial optimization, like heuristics for guiding search on subsets of NP-complete problems. The fact that you can automatically evaluate the heuristic and train by RL is also very convenient.

xbmcuser · on April 20, 2022

It is the same with a lot of the machine learning stuff posted here the 2nd or 3rd comment is that how it could be achieved with normal algos etc. But slowly as more people start applying to different problems machine learning is solving many of them.

lvl102 · on April 20, 2022

To be fair, Nvidia does a lot of “selling” when they’re basically making money from crypto and CUDA monopoly.

hoosieree · on April 20, 2022

When I briefly used Cadence's stuff I always thought about how fixing DRC errors could be crowdsourced as an "idle game" because it's so puzzle-like. The other thing was how it's even slower than Vivado...

Using RL to automate DRC fixes, and modeling standard cells as graph/flow problems are things I'd love to learn more about. What papers would you recommend reading to get started (for a grad student already familiar with machine learning basics)?

kraussvonespy · on April 20, 2022

A quick tangent if you have time and can discuss it: some really interesting, effective and odd antenna designs came from AI:

https://ti.arc.nasa.gov/m/pub-archive/1244h/1244%20(Hornby)....

Have there been any odd, surprising or wildly efficient chip designs that have come out of the AI designs?

hardolaf · on April 20, 2022

I saw the in-depth presentation at DAC. Until your company is willing to actually release your work, it's marketing.

CreateAccntAgn · on April 20, 2022

Any word on how the accuracy/quality of final results compare to traditional flows? Are process variations handled differently (with regards to training or modelling) compared to IR? I assume traditional vendors (CDNS/SNPS/MENT) all have (or working on) AI driven tools as well. How do they compare?

maxwells-daemon · on April 20, 2022

In general, a function approximation solution like deep learning does worse on cases where exhaustively finding the exact optimum is possible (small combinatorial problems), but can be applied to much larger instances than the exact algorithms can.

selimthegrim · on April 20, 2022

Ha, this does sound awesome. Are you guys hiring?

rektide · on April 20, 2022

Did nvidia just promise us singularity? :)

Hard to read a talk like this from a pulpit & not see shout outs to the incredibly super-fantastic open-source innovative projects like OpenROAD which have been shipping amazingly well-routed-by-AI chips for a while now. There's papers you can cite, galore, many open source designs[1].

It's not like Nvidia is promising anyone else will benefit from this work. This seems to be very high level coverage their R&D department is looking at, perhaps/perhaps not using. The article makes it hard to find out what is available, what has been published or otherwise deeply discussed (which is I think the best we can hope from Nvidia not real participation). There's only one paper linked, on NVCell[2], described as:

> The first is a system we have called NVCell, which uses a combination of simulated annealing and reinforcement learning to basically design our standard cell library.

This just feels like so much else going on in computing. WSL coming to windows, the recent Unity vs Unreal topic[3]. It's hard to imagine refusing to participate with others. It's hard to imagine not being part of the open source community working shoulder to shoulder to push for better. NVidia patently doesn't get it, patently isn't participating, patently isn't there. It's cool we can hear what they are up to, but it's also extremely NVidia that they're doing it all on their own. Anyhow, Looking forward to more AI based chip power system design starting to emerge; that sounds like a good idea NV.

[1] https://theopenroadproject.org/

[2] https://research.nvidia.com/publication/2021-12_nvcell-stand...

[3] https://news.ycombinator.com/item?id=31064552 (412 points, 3 days ago, 311 comments)

nomel · on April 20, 2022

> but it's also extremely NVidia that they're doing it all on their own.

Having a lead in chip design is their literal bread and butter. I think it's extremely "publicly traded company" more than "NVidia". Do you have an example of a company releasing an open source version of their secret sauce (foundation of their profits)?

bawolff · on April 20, 2022

Netscape.

Worked out great for them /s (albeit writing was already on the wall for them by that point.)

rektide · on April 20, 2022

> Do you have an example of a company releasing an open source version of their secret sauce?

The chip design itself should be the secret sauce. Not the tools you make the chip with. Nvidia is resolutely not-contributing. Many other companies are starting to get onboard with open chip design. This doesn't mean the chips have to be open, but the tooling needs to be something shared & co-developable. If this is a little pet research project that's one thing, but there really needs to be ongoing workforce development, a strong advance. The NSF's TILOS, a strong alliance/nexus of researchers within & around the OpenROAD community, get this[1]:

> TILOS – The Institute for Learning-enabled Optimization at Scale – is an NSF National AI Research Institute for advances in optimization, partially supported by Intel Corporation. The institute began operations in November 2021 with a mission to "make impossible optimizations possible, at scale and in practice".

> There are six universities in TILOS: UCSD, MIT, National University, Penn, UT-Austin, and Yale. The institute seeks a new nexus of AI and machine learning, optimization, and use in practice. Figure 4 shows four virtuous cycles envisioned for the institute: 1. mutual advances of AI and optimization provide the foundations; 2. challenges of scale, along with breakthroughs from scaling, bind together foundations and the use domains of chip design, networks and robotics; 3. the cycle of translation and impact brings research and the leading edge of practice closer together; and 4. the cycle of research, education, and broadening participation grows the field and its workforce.

The virtues written here are self evident & obvious. Trying to just get good yourself without trying to help advance the field, not participating, not taking advantages of scale of many working together, not participating in open research, the risks of having isolated teams, and not participating in cycles of development: whatever the nvidia or "publicly traded company" worlds think they're doing, they're missing out, and hurting everyone and especially themselves for this oldschool zero-sum competitive thinking.

There are plenty of company's releasing the chips too. Google's OpenTitan[2] security chip. WD's Swerv RISC-V core for their driver controller ARM R-series replacement[3]. Open standards if not chips like UCI for chiplets or CXL for interconnect are again examples of literally everyone but NVidia playing well together, trying for better, standardizing a future for participation & healthy competition & growth. Nvidia again and again is the company which simply will not play with others.

I challenge you to answer your own question in reverse: are any companies other than Nvidia embarking up AI/ML chipmaking in a closed fashion? There probably are, let's follow & watch them.

[1] https://theopenroadproject.org/news/leveling-up-a-trajectory...

[2] https://opentitan.org/

[3] https://github.com/chipsalliance/Cores-SweRV

rrss · on April 20, 2022

> are any companies other than Nvidia embarking up AI/ML chipmaking in a closed fashion?

https://www.cadence.com/en_US/home/solutions/machine-learnin...

https://www.synopsys.com/implementation-and-signoff/ml-ai-de...

https://www.plm.automation.siemens.com/global/en/our-story/n...

I think you'll be hard pressed to find any company using open source tools to design a business-critical chip for a foundry that gives any information without an NDA.

AFAIK Google doesn't have a open-source ASIC design for OpenTitan, and WD doesn't have an open-source ASIC design for SweRV.

there are a lot of interesting initiatives, but in the semiconductor industry open source tooling and processes are only a tiny niche that a few companies are playing around with.

blihp · on April 20, 2022

> The chip design itself should be the secret sauce. Not the tools you make the chip with.

The secret sauce is generally whatever gives one a competitive advantage. Businesses typically open things up when the want to reduce the cost of something and/or cause pain for someone else (i.e. killing their cash cow), not because they're benevolent and want to share.

> I challenge you to answer your own question in reverse: are any companies other than Nvidia embarking up AI/ML chipmaking in a closed fashion? There probably are, let's follow & watch them.

Isn't Google's floorplanning work closed source? https://ai.googleblog.com/2020/04/chip-design-with-deep-rein...

andrepd · on April 20, 2022

The GP said nothing about "benevolence", he was arguing it is in the interest of NVidia

blihp · on April 21, 2022

I suspect they would disagree with what's in their best interest. Most large businesses (especially nVidia) operate with a zero-sum mindset: they're not winning unless someone else is losing. To them, sharing information when not absolutely necessary is losing.

xbmcuser · on April 20, 2022

So intel, TSMC and Samsung have open-sourced their chip manufacturing systems I did not know that. ASML is giving out lithography systems so anyone can make them.

Cyph0n · on April 20, 2022

> The chip design itself should be the secret sauce. Not the tools you make the chip with.

I’m sure Cadence, Synopsys, and Mentor would love to hear more about this.

rektide · on April 20, 2022

Everyone keeps hammering home how much of the process is proprietary. Whose interest is that in though? Is it in Nvidias & Intels & Qualcomm's interest to let these chip design software companies have extremely proprietary cake, that no one can advance or enhance, that has no machine-learning capabilities surrounding it?

To me it feels like so many are missing the picture here. Chip designers ought to cooperate on tooling, to burst exactly this batch of crooks you've just cited's game. Designers should stop being held back by limited, small minded, heavily controlled proprietary software, & collaborate on making a better tooled world we can all openly advance.

Some day I hope we have similar overthrows of ASML & other layers of the stack, as what's happening in chip design now (for basically everyone except nvidia and apple, the two behemoths). Competition & cooperation mixing at various levels is good, is healthy, keeps the world from ossifying.

Edit: oh look, a comment full of these same proprietary chip-design-software companies (not chip-designers) trying to make ML software! https://news.ycombinator.com/item?id=31092673

_lqaf · on April 20, 2022

Let the tooling classes tremble at an open source revolution. The chip designers have nothing to lose but their chains. They have a world to win.

Fabbing Folks of All Countries, Unite!

PhaseLockk · on April 20, 2022

The skills needed to create a chip and the skills needed to create chip design software are fundamentally different. Of all the engineers I've met who work on the physical implementation and timing closure of digital chips, only a very limited number would have any hope of creating some sort of place and route tool, and it would be rudimentary and inefficient. They are not expert programmers.

rektide · on April 20, 2022

Huge part of why OpenROAD (and as this article.indicates, nvidia) are so focused on machine learning! Because the nitty gritty of chip design has abundant gnarly problems requiring deep deep expertise. Deploying software engineers is hard. But building ml is kind of our bag!

There's another nice upstart opensource project with even fancier ml placememt systems that spawned recently out of the openroad world, dreamplace, https://github.com/limbo018/DREAMPlace

This is just gonna get more & more biased against a couple super smart engineers who we've deeply entrusted to divine inner the workings of the chips on, & become increasingly a set of better modelled problems that we can machine learningly optimize.

areskay3 · on April 20, 2022

"The NVIDIA Deep Learning Accelerator (NVDLA) is a free and open architecture that promotes a standard way to design deep learning inference accelerators. With its modular architecture, NVDLA is scalable, highly configurable, and designed to simplify integration and portability. The hardware supports a wide range of IoT devices. Delivered as an open source project under the NVIDIA Open NVDLA License, all of the software, hardware, and documentation will be available on GitHub. Contributions are welcome." http://nvdla.org/

nomel · on April 20, 2022

> The chip design itself should be the secret sauce.

The chip comes from the chip design, and the chip design is made with tools. None can exist alone.

> Many other companies are starting to get onboard with open chip design.

> There are plenty of company's releasing the chips too. Google's OpenTitan[2] security chip. WD's Swerv RISC-V core for their driver controller ARM R-series replacement[3].

These chips aren't the foundation of Google or WD's revenue stream. You won't see them significantly affecting a line item in their quarterly reports.

> are any companies other than Nvidia embarking up AI/ML chipmaking in a closed fashion?

Nvidia is in a unique position where the foundation of their profits (chips) happens to be what makes practical AI possible. They're literally running the vast majority of the show. If something falls into the "foundation of existence" circle in their Venn diagram of concerns, they're going to be less open about it. Improving the ability to design chips is at the exact center of that "foundation of existence" circle.

TomVDB · on April 20, 2022

Designing a modern ASIC requires experts across the whole spectrum, from top level architects who increase performance and reduce power from first principles, to RTL designers who have a feel for what kind of code will result in less area or less toggling wires, to standard cell designers who optimize a cell library for an optimal speed vs power vs area trade-off, to floor planning for the area density and speed while not running into IR drop and congestion issues, to DFT to make sure testing is as fast as possible with a high coverage, to DFM engineers who come up with strategies for optimal yield.

All these aspects are part of the chip design, and being bad at one can significantly compromise the competitiveness of the final piece of silicon.

So this statement is hopelessly naive and ignorant:

> The chip design itself should be the secret sauce. Not the tools you make the chip with.

Because all the steps that I listed above are done with tools. And in many cases, having better tools is the secret sauce that makes your design better than the competition.

The article mentions a runtime of minutes instead of a day to do IR drop checking: that's the kind of acceleration that allows trying out multiple configurations for an optimal solution instead of settling for good enough. A lower amount of IR drop allows for a more aggressive, less conservative power curve. End result: a chip that can be clocked at a higher speed without needing to increase the voltage. A major competitive advantage.

> are any companies other than Nvidia embarking up AI/ML chipmaking in a closed fashion?

Of course there are. AI can be used for almost anything where large amount of data is already available, and where there's a clear cost function that must be optimized. AI is a natural for many steps in the ASIC design flow. You could have figured this out by yourself: Nvidia is talking about it. If it were such a big novelty, they'd keep it under wraps.

> WD's Swerv RISC-V core for their driver controller ARM R-series replacement [snip snip] everyone but NVidia playing well together, trying for better, standardizing a future for participation & healthy competition & growth.

Let's talk Swerv: a piece of IP that's definitely useful to Western Digital. Useful to general world too. But not something that scores particularly high on the list of the secret sauce ingredients that makes or breaks their products. Does Nvidia have similar open source IP offerings? Yes, they do! Check out NvDLA: Nvidia's open source DL accelerator. Your day must be a whole lot better now, knowing that, just like WD, Nvidia also open sources some non-critical IP.

I'm sure that you're aware that AMD uses a neural network in their CPU branch predictor. Do you think that AMD should release the tool that was used to figure out the optimal weights? After all, the tool itself is not part of the actual CPU design...

JustLurking2022 · on April 20, 2022

If the tools to make it would more easily allow others to achieve similarly efficient designs, then that is the secret sauce...

hackernewds · on April 20, 2022

They are getting onboard with open chip design, because they need to get onboard with open chip design, and borrow to even be competitive and survive. You only get to use the black rocket when you're not in the lead.

root_axis · on April 20, 2022

> Having a lead in chip design is their literal bread and butter

Sounds tasty, I'll have to take a trip to the nvidia cafe some time =)

mdp2021 · on April 20, 2022

(I see it was meant to be indirectly expressed violent censorship against the use of 'literal' in rhetoric speech... That without a carefully respectful use of 'literal' we will lose the irreplaceable "safe word" out of the bondage of figurative dungeons. Very considerate. Root_axis, rushed writing and reading defies a subtlety hidden in "dad jokes"...)

wil421 · on April 20, 2022

The meaning of the word has changed. If everyone uses it incorrectly is it really incorrect?

root_axis · on April 21, 2022

I don't think it has changed, and it's not true that everyone uses it incorrectly.

jwlake · on April 20, 2022

That article doesn't actually have enough data to really know. The anecdotes are all just design assist, make things faster from 20 mins to 3 seconds. Not singularity. There are teasers but nothing clear singularity. I think the big problem is they are just using it for fuzzy algorithm optimization, which is clearly not self learning.

areskay3 · on April 20, 2022

All of the work discussed is published to some extent. References are in the slides

https://arxiv.org/pdf/2012.10597.pdf

https://research.nvidia.com/publication/2020-07_grannite-gra...

https://research.nvidia.com/sites/default/files/pubs/2020-07...

https://ieeexplore.ieee.org/document/8920342

rrss · on April 20, 2022

which openroad tools use "AI" rather than normal optimization like traditional EDA tools have used for decades?

ImportOllie · on April 20, 2022

I don't understand the backlash here. The jist seemed to be traditional tools that are exact take a long time to process complex designs. Deep learning offers a statistical approach that can give a 'coarse' prediction and they're using this to reduce development time. That seems to make sense to me, especially in the earlier verification phases of the hardware design lifecycle.

To me this sounds like a good use-case of AI and Neural Nets. It doesn't appear to be looking to replace the traditional tools, just augment.

jonnycomputer · on April 20, 2022

I seem to recall that the original title of the post was more sensationalist; something about replacing human designers.

b20000 · on April 20, 2022

the last time I checked autorouters were still not capable of doing all the routing on a multi layer PCB properly, and manual work was still required to produce a decent design.

TomVDB · on April 20, 2022

How is that a relevant comment in a discussion about ASIC design?

I hope you don't have the idea that chip routing is done manually.

beambot · on April 20, 2022

IIRC, Place & Route is a known NP complete problem. In this regard, autorouters (whether IC or PCB) can benefit from "better" heuristics -- i.e. it's an optimization problem where AI can help.

TomVDB · on April 20, 2022

PCB routing is generally considered a much harder problem. There are a bunch of reasons that add up, but one of them is almost certainly that a PCB is supposed to look good too. The routing on an IC is total chaos (which actually reduces crosstalk issues), but nobody will ever notice.

raverbashing · on April 20, 2022

That's an interesting point

PCBs "look good" because of different production contraints (and I guess because you're mostly connecting tight-timed busses together)

In ICs you're doing the above but also pulling signals from 10 different places together.

Taniwha · on April 20, 2022

So here's my autorouter first-guess analog computer design:

On a basketball court create a pile of metal plates with bar codes and hooks, each one represents a gate, robot uses rubber bands of sizes vaguely representing timing budget from synthesis to hook the gates together. Robot picks up the whole thing using rubber bands representing the external ports and gives the whole thing a few shakes. Puts it all back down, goes over each metal plate and reads the bar code and resting position - that's your initial routing guess - anneal from there

atq2119 · on April 20, 2022

What you've described is actually for placement, not routing, and is in fact a good analogy for the first step in many placement algorithms.

Interestingly, placement is much harder than routing from a complexity theory point of view (specifically, there are fairly strong inapproximability results for placement-style problems).

Taniwha · on April 20, 2022

true - most chip tools I've used for layout do both place and route (and iterate, moving std cells apart to create routing resources) so I've always sort of thought of place&route as a single thing (though I've also interrupted it before initial routing to do scan insertion)

tboerstad · on April 20, 2022

Are the analog parts (current nitrist etc) autorouted now?

I worked on MCU layout around 2011, and only the digital logic was autorouted/placed.

TomVDB · on April 20, 2022

My comment is in the context of this topic: large digital ASICs with billions of wires, where the only analog parts are PLLs, DLLs, and digital IOs.

bsder · on April 20, 2022

What is extremely telling is what is missing ... Design Rule Checking (DRC) and Layout Vs Schematic (LVS).

These require:

1) Longer bit length arithmetic

32-bit float simply isn't enough. 64-bit float is close, but limited. You really want 128-bit integer. And nVidia isn't delivering that.

2) Real algorithmic improvements

We're still stuck with computational geometry algorithms that don't parallelize. It would be awfully useful if nVidia would actually research some new algorithms instead of just waving around the ML/AI marketing wand.

But, then, this is the company that built itself on benchmarketing, so ...

jiggawatts · on April 20, 2022

Can you explain why such large numbers are required?

Back-of-the-napkin maths is that a chip that is 3cm on each side -- which is huge -- can be subdivided into 0.007 nanometre increments using 32 bit integers. That's 1/7th of the diameter of a hydrogen atom!

The resolution with 64-bit floats (let alone integers) would be absurd, roughly a million times finer-grained still. That's probably enough to simulate individual electrons zipping around in their orbitals with acceptable precision.

Even if the simulation codes did something silly like simply assigning 1.0 = 1cm, a 64-bit float still allows resolutions of something like a billionth of a nanometre...

bsder · on April 20, 2022

> Can you explain why such large numbers are required?

Absolutely.

Even if you start with 32 bits, you often have polygons with many sides. In the worst case, you are modeling a "circle" and have to increase your precision to enough level to be accurate (please note that nobody in the right mind in VLSI would ever draw a "circle"--however, you wind up with an "implied" one due to DRC, more down below ...)

The problem is that line sweep intersection checks in DRC require approximately 3n+a couple bits to differentiate intersections that may be close to degenerate or have multple intersections near to each other. So, if you start with 32-bit numbers, you require approximately 96 bits plus a little for your intermediate calculations. (See: Hobby -- "Practical segment intersection with finite precision output" -- I'll let people find their own copy of the paper so HN doesn't splatter some poor site that I link)

You would think that doesn't matter since VLSI tends to limit itself to rectilinear and 45 degree angles. Unfortunately life isn't that simple.

If you take a simple rectangle and say "Nothing can be within distance x", you get a slightly larger rectangle parallel to the sides. Easy. The problem is that you also wind up with an implied quarter circle (told you this would come back) near each corner. Not so easy.

Put those circles such that they overlap only very slightly and you may have segments that are pretty close to tangent. Super not easy. Unfortunately, VLSI design often consists of putting those metals such that they are riiiight at the limit of spacing. Consequently, your super-not-easy case also becomes a very common case. Ouch.

Of course, you could just move the rectangle completely outward so that you have squares at the corners. However, that gives up a non-trivial amount of area that most places aren't willing to concede.

There is a reason why Siemens (nee Mentor) Calibre is so egregiously expensive.

jiggawatts · on April 20, 2022

Disclaimer: I have zero silicon design experience.

However, I have designed computer game engines that use 32-bit floats throughout and encountered rounding errors in practice.

I’ve found that there’s always a solution that avoids the need to go past 64 bits, and even that is a last resort.

So for example the circle could be approximated with a polygon. Or fixed-point arithmetic can be used. Or simply use a quad-tree or related space partitioning algorithms to check for intersections.

There are literally thousands of algorithms that sidestep these issues and are used extensively in computer games, typically at 32-bit precision exclusively.

For example “back in the day” you would often see shimmering due to “z-fighting”. You would also often see white pixels due to “cracks” between adjacent polygons.

All of these are largely gone now. The problems have been solved without the enormous performance hit of 64-bit doubles, let alone 128!

Meanwhile contemporary CAD programs would insist on using 64-bit doubles, even through the OpenGL pipeline out to the screen.

But if you sit down for a second and just divide your screen (or wafer mask) by the range of your chosen numbers you’ll instantly see that you have thousands of steps per pixel (or atom!) to work with.

Any visible noise is your fault for choosing the wrong algorithm, not the fault of the hardware for not providing enough bits.

q3k · on April 20, 2022

Please keep in mind that even the slightest error in a silicon mask is likely to cause hundreds of millions of dollars of losses and months of delay in time to market for a modern chip.

With that in mind, does it make more sense to come up with new, experimental, untested algorithms... or just use wider numbers and slowly iterate on well known algorithms? Especially with LVS/DRC you really want the dumbest, easiest to reason about thing that is most likely to catch design issues no matter what. Even if it's excruciatingly slow, it's your last line of defense against writing off a set of masks as a hundreds of millions of dollars loss.

EDA / silicon CAD is a totally different world of design requirements compared to video games or even MCAD software.

EDIT: and just for context, here's a DRC set for the (very much not modern) SKY130 process: https://github.com/RTimothyEdwards/open_pdks/blob/master/sky...

jiggawatts · on April 20, 2022

The exact same arguments were made by CAD people insisting on 64-bit maths for OpenGL. They were wrong. They too were working on projects worth billions of dollars, over decades, where mistakes were very costly.

Your link to a "DRC set" doesn't mean much to me out of context. I see some basic looking code with small-ish numeric constants in it. So what? This is not that different to the input to a simple physics simulation or a computer game.

hardolaf · on April 20, 2022

So let's get this straight. You know nothing about this area and you assume the experts in it are wrong? Do you know what happens if you accidentally couple lines during one of the manufacturing steps? The wafer can, in the absolute worst case scenario, explode from super heating destroying not just the wafer but potentially the entire chamber it is in (any defect beyond what was designated as allowable by the design engineers means the chamber and everything in it is now scrap).

For somewhat obvious reasons, we have a vested interest in this never occurring. So we default to safety over speed. Meanwhile in the CAD world with 64-bit math not making it into OpenGL, they just wrote a library to do 64-bit math anyways on-top of or in parallel to OpenGL. They didn't switch away from 64-bit math, they just reduced its use where it isn't needed and kept it where it is needed. The semiconductor industry is full of absolutely brilliant engineers who know far too much about all of the problems and if they could use 64-bit instead of 128-bits for a data structure, they'd switch in a heartbeat to save massive amounts of compute time (and thus money).

jiggawatts · on April 21, 2022

> Do you know what happens if you accidentally couple lines during one of the manufacturing steps?

I understand the consequences. I also understand both both physics and computer science. A 32-bit integer is sufficient to subdivide something the size of a wafer mask to well under the wavelength of the light used for photolithography. There is literally no way for additional precision to matter for things like "coupling lines". It is impossible.

See for yourself: https://www.wolframalpha.com/input?i=3+cm++*+2%5E-32

Iterated algorithms are a different beast entirely, but there are fixed-point or integer algorithms that sidestep these issues.

You cannot imagine the volume of computer science research that has been written on shape-shape intersections in both 2D and 3D! Literal textbooks worth. Hundreds if not thousands of PhD-level papers. The sheer intellectual effort that has gone into optimisations in this space is staggering.

Hence my incredulity. I've worked with 128-bit numbers and even arbitrary-precision numbers, but only in the context of computational mathematics. There are no "physics constraints" in mathematics to limit the benefit of additional range or precision.

Also, the financial argument doesn't hold water either. Modern chips have tens of billions of features. The data volume can exceed the size of main memory of even the largest computers. Data representation efficiency and simulation speed absolutely would have tangible business benefits: faster iteration cycles, lower simulation cost, better optimisation solutions, etc...

This is literally the point of the article -- being able to do things in GPUs using their native 32-bit maths capabilities is a huge benefit to the chip design workflow. This requires clever algorithms and data structure design. You can't be wasteful because "it feels safer" if you have a budget of 24 GB (or whatever) to squeeze the mask data into.

> assume the experts in it are wrong?

Yes! Something I've noticed is that there is surprisingly little "cross pollination" between fields. You can have very smart people in one industry blithely unaware that another industry has solved their "very hard problem". I've seen this with biology, physics, medicine, etc...

How many chip design automation experts have also done low-level game engine programming? Maybe half a dozen in the whole world? Less?

jiggawatts · on April 21, 2022

Replying to myself to add:

Here's a Design Rule Checking (DRC) paper that mentions using 32-bit floats for things like checking distances between polygonal traces: https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.49...

If 32-bit is sufficient, then 64-bit definitely is, even for the latest & greatest silicon processes...

gchadwick · on April 20, 2022

Of course when you're doing such intersection calculations you know the things you're intersecting are very close. You don't need a general method that can test arbitrarily sized and spaced polygons against each other. You need a method to determine what is sufficiently close to each other to be worthy of a more detailed check. Then a more specific method to do this check.

You could use 32 bit integers with all shapes specified vs say a 0.1 nm grid giving you around a maximum 0.4m x 0.4m chip size which seems ample. Then when you want to check for rules violations in the cases like you specify with very fine precision use a dedicated check that can assume the relevant geometry is within a small number of grid points of each other. For example the check could work using relative coordinates rather than absolute so say switch to a grid on a 0.00001nm basis (to pull an arbitrary precision out of a hat) and convert the 32-bit absolute 0.1nm coords to relative 32-bit 0.00001nm coords.

Easier said then done to be sure (as you say the tools are egregiously expensive!) but just saying I need a 64-bit or a 128-bit float isn't trying to get to the grips with the problem, just hoping you wave it away with more bits.

bsder · on April 21, 2022

> You need a method to determine what is sufficiently close to each other to be worthy of a more detailed check.

Your line sweep data structure effectively already does that. I recommend reading the Hobby paper and thinking about how it works. And then you should think about how you differentiate inside from outside when you union/difference your polygons.

Any segments in the line sweep data structure simultaneously have already demonstrated that they need the detailed check.

If you want to argue this, you're going to need to study up on about 40 years of prior art. Given how much money this is worth and how many really smart people went after it (it basically drove the field of Computational Geometry for decades), the probability of you contributing something new to the current algorithms is basically zero.

However, the probability of you contributing something new to parallel DRC algorithms is really quite decent. Nobody I know of has yet come up with "good" parallel algorithms for DRC--most of them are hacks that quite often break down when they hit even common cases.

Being able to handle DRC on a billion VLSI polygons/10 billion line segments in a parallel fashion would be quite an advance, and the field is waiting for it.

qayxc · on April 20, 2022

> The resolution with 64-bit floats (let alone integers) would be absurd, roughly a million times finer-grained still.

Careful there! Floating point numbers do not form a proper field, not even a semi-group. Due to the uneven distribution of elements, the field axioms don't hold (e.g. both commutativity and distributivity can be violated) and great care has to be taken to assure the numeric stability of computations.

jiggawatts · on April 20, 2022

My physics professor had some good examples of how numeric precision vastly outstrips reality: if modelling a 1m iron bar using 32-bit numbers, the error in length is substantially less than a dust mote landing on the end of it. It's about the same as a virus (not a bacterium) on the end... or not. The oil from a fingerprint is thicker. The mere presence of a human in the room will warm up the iron rod enough to cause it to expand more than this.

You only get physically significant errors when using iterated algorithms where the errors accumulate, or when doing what amounts to equality comparisons, which is almost always an error.

Note that 64-bit numbers aren't "twice" as precise.[1] They're four billion times more precise. Going to 128 bits is absurd beyond belief. Numbers like these would allow the entire visible universe to be modelled, down to the width of a proton. You do not need 128 bit numbers for anything made on Earth, by humans, ever. If you think you do, you've made a mistake. It's as simple as that.

[1] floating point numbers and integers are obviously different, but the concepts are the same. A 64-bit double is "just" 536 million times more precise that a 32-bit float, but that is still an awful lot of precision for anything made of matter...

qayxc · on April 20, 2022

> [1] floating point numbers and integers are obviously different, but the concepts are the same.

No they're not. That's the entire point. 32 bit IEEE floats get you 6 to 9 significant digits, whereas 64 bit IEEE floats get you 15 to 17 significant digits. Loss of significance and catastrophic cancellation are real problems in numerical analysis.

Physics in particular doesn't often have closed form solutions, so you're forced to use iterative approximations. Same goes for large matrix operations, which is why you actually have to be very careful with the algorithm you choose and the order of operations.

If you're still not convinced, feel free to try it yourself:

  #include <iostream>
  #include <iomanip>
  #include <cmath>
  #include <tuple>

  // solve ax^2+bx+c=0
  template<typename T> auto
  solve_quadratic(T const& a, T const& b, T const& c) -> std::tuple<T, T> {
    auto t = std::sqrt(b*b - T(4)*a*c);
    return {(-b + t) / (T(2)*a), (-b - t) / (T(2)*a)};
  }

  int main() {
    std::cout << "Expected x1=0.000000075 x2=-200.000000075\n"
    std::cout << "Single precision:\n";
    {
      float a=1.0f, b=200.0f, c=-0.000015f;
      auto [x1, x2] = solve_quadratic<float>(a, b, c);
      std::cout << "x1=" << x1 << ", x2=" << x2 << "\n";
      std::cout << "ax1^2+bx1+c=" << (a*x1*x1 + b*x1 + c) << "\n";
      std::cout << "ax2^2+bx2+c=" << (a*x2*x2 + b*x2 + c) << "\n";
    }
    std::cout << "\nExpected x1=1.000000028975958 x2=1.000000000000000\n";
    std::cout << "Double precision:\n";
    {
      double a=94906265.625, b=-189812534, c=94906268.375;
      auto [x1, x2] = solve_quadratic<double>(a, b, c);
      std::cout << std::setprecision(16);
      std::cout << "x1=" << x1 << ", x2=" << x2 << "\n";
      std::cout << "ax1^2+bx1+c=" << (a*x1*x1 + b*x1 + c) << "\n";
      std::cout << "ax2^2+bx2+c=" << (a*x2*x2 + b*x2 + c) << "\n";
    }
  }

Even 64-bit IEEE floats don't help if you use the standard quadratic formula. Note that you can improve the 32-bit case above significantly by reformulating the solution to

  template<typename T> auto
  solve_quadratic(T const& a, T const& b, T const& c) -> std::tuple<T, T> {
    auto sign_b = b < 0.0 ? -1.0 : 1.0;
    auto t = -b - sign_b*std::sqrt(b*b - T(4)*a*c);
    return {(T(2)*c) / t, t / (T(2)*a)};
  }

This simple change will yield the precise result (within the fp32 precision) for x1 and x2 with a=1, b=200, c=-0.000015 (i.e. the error of ax²+bx+c will be lower than the 9 max significant digits).

However this won't help with the second (64-bit) example, which will still be wrong after the 8th digit (i.e. well within the supposed 12 to 15 significant digits of a 64-bit IEEE float).

Also note that in both cases all numbers involved have less significant digits than supported by the respective FP format. Just to give you a little insight on why available bits ≠ precision in floating point maths.

Dylan16807 · on April 20, 2022

> You really want 128-bit integer. And nVidia isn't delivering that.

How much slower (per unit area) is that to do in software, compared to a full 128-bit hardware unit?

erwincoumans · on April 20, 2022

"as of 11.5, CUDA and nvcc support __int128_t in device code when the host compiler supports it (e.g., clang/gcc, but not MSVC). 11.6 added support for debug tools with __int128_t."

See:

https://developer.nvidia.com/blog/cuda-11-6-toolkit-new-rele... https://developer.nvidia.com/blog/implementing-high-precisio...

tboerstad · on April 20, 2022

DRC and LVS are just logical checks right?

“Is the minimal distance between all metal routing > 10 nm” etc.

Can you explain why high precision is needed for that?

atq2119 · on April 20, 2022

Having worked in EDA myself, though not on these final signoff steps, I agree that purely geometry-based checks really don't need doubles. Most of it should just be done as 32-bit fixed point. Both because it's better for performance, and because it drives home the point that you need to think carefully about precision issues for correctness reasons. Using doubles is just a band-aid.

I'm less confident about it when it comes to anything that involves calculating anything electromagnetic because I just don't know that subfield.

FutureZeitgeist · on April 20, 2022

Can you explain why you need greater precision/range?

productceo · on April 20, 2022

Please keep up the processing power progress!

Economics of the software industry (or at least the products that I work on) depend on the assumption that cost of computing (including storage) diminish exponentially over time! <3

kevincox · on April 20, 2022

You say this like it is a good thing. It seems to me that if a whole industry is dependant on exponential growth of another than the former is being quite reckless.

Of course exponential growth will help, but relying on it seems like a bit too much risk.

hurflmurfl · on April 20, 2022

I think that's the point GP is making with his sarcastic remark :)

W-Stool · on April 20, 2022

I've got a whole "HAL9000" feeling going here right now.

"Sorry Dave - I can't quite do that ..."

_blz2 · on April 20, 2022

I remember him from the vlsi text dally and poulton.

orangebeet · on April 20, 2022

I really hope that they can apply some of these AI approaches on the driver situation on Linux as well. I will never buy an Nvidia product after the nightmares they've put me through.