I've probably worked on 70 chips over the last 30 years.
Tape out time always sucks. I'm in physical design which is fixing all the timing violations, DRC violations, LVS errors, and dealing with late design changes.
Working 80 to 100 hours a week for a month really sucks and makes you wonder why you didn't go into software.
When you combine it with a fixed shuttle date like in the article it is even worse because if you miss that date it might be another 1-2 months for the next shuttle instead of just a day for day slip when you control all the masks.
Don’t worry we have those 80 hour weeks in software too. I can think of a few examples. For example with mobile App Store review time used to be kind of like that. You submitted your app waited a few business days and prayed there wasn’t an obscure rejection that lead to an appeal which could take even longer. Very stressful when you are cueing up a launch and press releases on a certain date. you had to make sure you were done a few weeks in advance to account for everything.
I don’t work much on apps anymore but I hear it’s somewhat better now.
Another big area is compliance, those processes can take forever.
Can I ask how often you guys end up doing gate-level netlist ECOs, instead of re-running synthesis when you're close to a deadline? Also, post-fabrication, if a mistake is found, have you been able to fix it just with a new M1 or M2 mask, instead of paying for a full new mask set?
If the change is under 1000 logic cells and no new flip flops then we do a it as an ECO. If there are tons of new flip flops we resynthesize and start over.
Lots of chips have metal spins to fix errors. The blank areas of the chips are filled with filler cells but most of them are special "ECOFILLER" cells that are basically generic pairs of N/P transistors like a gate array. These can then be turned into any kind of cell just by using metal. They are a little slower but work fine.
I've worked at one huge company where they planned 3 full base layer mask sets and 1-2 metal spins for each full base layer set. This was when doing a chip on a brand new process node where you couldn't always trust the models the fab gave you so you wanted more post silicon characterization to recalibrate models.
> The blank areas of the chips are filled with filler cells but most of them are special "ECOFILLER" cells that are basically generic pairs of N/P transistors like a gate array. These can then be turned into any kind of cell just by using metal. They are a little slower but work fine.
The other alternative is that you sprinkle spare gates around the chip. If the chip is 10mm x 10mm then every 100 microns you put a group of cells that just have their inputs tied to 0 and the outputs go nowhere. You put in a good mix of flip flops, and combinational logic cells. Then when you need to do a metal ECO the RTL team says "We need 2 AND gates, 1 OR gate, 1 mux, and they are connected to these 5 cells." So you highlight those 5 cells and find the closest spare logic group and use those.
The ECOFILLER gate array style cells are easier to use.
Then during the DRC check process in Calibre we run a check to make sure that the base layers stayed the same and only the metal layers changed. Since we have 18 metal layers in a leading edge node hopefully only metal layers 1 to 3 changed for the metal ECO so you only have to pay to make new versions of that.
A full mask set in 3nm can be over $30 million. Just a new set of metal masks is around $20 million.
A full mask run takes about 4 months in the fab. Normally you tell the fab to keep a few wafers after the base layers and don't manufacture the metal layers. Then when you do a metal respin they get those out of storage and save a month.
Blocks are never 100% full. If it was then you would never be able to route the design. High utilization may be 70% but if a block has tons of IO then I've worked on blocks that are only 25% utilized. For various manufacturing and yield purposes the empty spaces need filler cells.
Sometimes we put in decoupling cap cells. But the ecofiller cells go in everywhere else.
About 25 years ago we were using spare gates that we had preplaced on the die.
About 5 years ago we started using spare gates preplaced and ALSO the ecofiller cells. The reason I was told was to save money because the ecofiller cells require some other mask layer to change. I think that was in the $500K range but it's still money.
In general I hate doing ECO's with the preplaced spare gates as it is manual and time consuming to find the best cells to use.
Wow, awesome thanks for the details! I have once or twice on projects added extra gates as fillers in some 28nm mixed-signal designs for metal layer re-work, but I had no idea that in larger digital teams there was also the practice of adding these types of individual transistor arrays. Super clever!
Tape out time always sucks. I'm in physical design which is fixing all the timing violations, DRC violations, LVS errors, and dealing with late design changes.
Working 80 to 100 hours a week for a month really sucks and makes you wonder why you didn't go into software.
When you combine it with a fixed shuttle date like in the article it is even worse because if you miss that date it might be another 1-2 months for the next shuttle instead of just a day for day slip when you control all the masks.