My entire career in computers spans the 40 years in that graph. The constant leaps in fundamental speed were exhilarating and kind of addictive for technologists like myself. As the rate of progress has fallen off over the past decade it's been sad to see the end of an era.
I'm sure speeds and capabilities will continue to increase, albeit much more gradually, but significant gains are going to come slower, harder and at greater cost. The burden will have to be shouldered by system architects and programmers in finding clever ways to squeeze out net gains under increasingly severe fundamental constraints (density, leakage, thermals, etc).
Back when I started programming as a teen in 1980 with 4k of RAM and ~1 Mhz 8-bit CPUs, knowledge of the hardware underneath the code and low-level assembly language skills were highly valuable. Over the years, the ability to think in instruction cycles and register addressing modes grew anachronistically quaint. Now I suspect those kinds of specialized 'down-to-the-metal' optimization skills may see a resurgence in value.
I think it is the opposite. I have almost as much experience as you, I started a little later, didn't get serious until my teens with 68k assembly language and custom chip programming on the Amiga 500. Not all nostalgia, some of it germane context.
I think it is important to have a mental model of the hardware so that the architecture of the program has some mechanical sympathy. But the ability to think abstractly is more important, that is what allows Moore's law to be realized. Our compute topology is changing and if the perf curve is to continue to be exponential, our code and more importantly the expression of our ideas has to be able to exercise 30B transistors today, and 150B in 8 years. Knowing how to compose neural networks is one of the new skills that is akin knowing how to shave off cycles in the 80s. Mod playback, Doom, Quake, mp3 decompression, emulation all redefined our relationship with computing.
The Amiga had this custom hardware for doing bitblits and sprite compositing, it could do these trippy multi-layered backgrounds that used parallax to give it an Arcade like 2.5D rendering (wow that sentence, not fixing it). These had a bunch of registers you had to muck with, I only ever called them from assembly, I knew C but it just felt more natural to do it in asm defined files. My point is, you can do the same things using some high level garbage collected code. In Python or JS, you could implement Quake, using naive algorithms. No asm, just regular code, no custom memory copying and compositing hardware, just assignment statements in a dynamic, GCd language.
The programmer that can code an awesome parallax demo using numpy arrays is not going to be the next Carmack. The programmer that can compose 3-ai models to make something we have never thought of is going to make Quake or some other piece of software that changes our relationship with computing and Moore's law. Abstraction gets us there.
I agree with the parent of your post. I work in a field where Moore's law gets artificially arrested for often a decade at a time - console games - and we no stranger to being critically aware of how much memory we are copying around - we will reach for hand coded SIMD math and we stare at our shader assembly looking for more performance. You should see what some do to get top line performance in collision detection. It even leaves me a bit sweaty... I'm not discounting what you conjecture about the next Carmack being in the machine learning arena - that's how I feel too, but I still strongly believe that we will see more demand for programming that can eke our performance with what we have.
Physical simulation is unique due to latency requirements. The impossibility of using the data center is the common denominator in high performance programming.
In my field, Spark, functional programming for data parallelism, few if any problems of Moore's law ever truly eventuate.
"Compute bottlenecks" are so uncommon. Databricks has almost no lines of Scala written, SQL/Python are "fast enough." Commoditization, "good enough" libraries, packaged in SQL/Python for the lowest common denominator.
Carmack's genius of the inverse square misses the point.
Carmack's genius was the video game Quake itself.
The mathematical brilliance, the high performance programming, was genius applied to overcome a bottleneck.
(And what temporary genius. Contrast Carmack with Unity).
Originality, usefulness. Imagination meeting relevance, is the engine that powers software.
But within reason, these are areas where huge returns can be made with higher performance programming as opposed to speed of development - a 10% performance increase can save stupid amount of money on hardware - and with hardware lasting longer I think there will be an increasing focus on that.
When I play console games on my Xbox 360 the biggest annoyance by far is the loading times. You run around in Skyrim and you enter a house so you have to wait 30 seconds for the content to load. Then you leave the house and have to wait 30 seconds again. My point is that the relevant performance metric isn't speed of number crunching anymore - it is speed of transporting data from one part of the system to another.
I believe a critical difference between the high performance of now vs yesteryear is the degree to which it's a design problem vs an implementation problem.
When writing 6502 assembly, you have "tricks" galore. You do have a design trade-off to make: memory vs CPU cycles, and when looking at algorithms in really old programs, they often dispensed with even basic caching to save a few bytes. But a lot of the savings came from gradually making the program as a whole a tighter specimen, doing initializations and creating reports with just a few less instructions. The "middle" of the program was of similar importance to the design and the inner loops, and it popularized ideas like "a program with shorter variable names will run faster" or "a program with the inner loop subroutines at the top of the listing will run faster". (both true of many interpreters) An engineer of this period worked out a lot of stuff on paper, because the machine itself wasn't in a position to give much help. And so the literal "coding" was of import: you had to polish it all throughout.
Today, the assumption is that the middle is always automated: a goop of glue that hopefully gets compiled down to something acceptable. Performance is really weighted towards the extremes of either finding a clever data layout or hammering the inner loop, and to get the most impactful results you usually have a little of both involved.
The hardware is in a similar position to the software: the masks aren't being laid out by hand, and they increasingly rely on automation of the details. But they still need a tight overall design to get the outcome of "doing more with less."
And the justifications for getting the performance generally have little to do with symbolic computation now: we aren't concerned about simply having a lot of live assets tracked in a game scene(a problem that was still interesting in the 90's, but more-or-less solved by the time we started having hundreds of megabytes of RAM available), we're concerned about having a lot of heavy assets being actively pushed through the pipeline to do something specific, which leans towards approaches that see the world in less symbolic or analytical terms and as more of a continuous space sampled to some approximation. Which digital computing can do, but isn't the obvious win like it once was.
The video game industry has downloaded more memory leaks personal machines than all the other domains of software combined. So many lines of terrible C++ have been written...
The importance of Moore's law falls flat in front of good old "bugger good code, Morrowind's rebooting the Xbox."
I love your comment. I can only imagine how thrilling it would have been in the early days to see order of magnitude improvements in generalised single threaded computer performance every couple of years.
Today, as it happens with all fields that become more complex over time, excitement is found in more nuanced areas.
Hardware has become task specific and that makes it exciting to different niches for different reasons.
You mention the idea of thinking in cycles and that concept is quite appealing to me. I believe the lack of focus on squeezing performance is a symptom of the accessibility of modern application development combined with the fact that most commercial products wouldn't see a financial benefit to delivering computationally efficient applications.
I do wish modern applications were more efficient, but that's a fool's errand as I don't see companies like Spotify rewriting their desktop client in 5 or 6 different native UI kits. Vendors like Microsoft and Apple will never collaborate on a common UI specification outside of web standards, so we are forced to suffer through Electron apps. Heck, Microsoft can't even figure out what UI API it wants to offer for Windows.
That said, if you're interested in computer science, we are only just uncovering novel approaches on how languages can allow engineers the ability to ergonomically leverage parallel computation. We see this in languages like Rust and Go - both of which are not perfect but there are so many lessons being learned here.
To me, the software engineering and language design world is unbelievably thrilling right now.
I do think and wish that large companies who own the platforms would work together more to avoid this standards mishmash application developers must contend with in today's landscape as it would help facilitate greater accessibility to writing efficient cross platform client applications that aren't written using web technologies.
These days cache is more important than registers. For typical n linear search beats the pants off of binary search just because linear search is cache friendly.
Modern optimizing compilers almost always to a much better job of micro optimization. Humans are much better attack the big picture making code fast with changes that cannot be safely made by the compiler because the algorithm isn't equivalent in all cases.
Even in 1980 programmers knew that optimization was best done at a high level. The low level stuff just had more value when compilers were not good.
High performance computing will drive demand for faster hardware, for example in machine learning. It is extremely computationally intensive and expensive to train large NLP models. The big companies in this game have a lot of money to invest in bringing those costs down, and in turn train better models.
That said, I don't see a reason why speeds will increase significantly on personal devices. We're seeing a situation now where personal devices are really 'fast enough' for normal use cases. Instead the focus is more on improving efficiency and battery life.
It depends. I dream of a world where your Smartphone is also your personal computer and you can just project everything from it using AR wherever you are. In that case they have to improve on both.
Apple seems to be latching onto the idea users need to run ML on their consumptive devices, as opposed the cloud, and I don’t believe it. I think you agree. Yet in my opinion, if anything they want the appearance of that necessity, as expressed in loss of efficiency and battery life for older devices to sell new ones.
I don't understand this comment. ANNs are being used everywhere - image recognition, voice recognition, document classification... I can only see this use increasing for the foreseeable future.
Google kills tons of very expensive projects. Facebook spends a lot on their Metaverse, but that doesn’t make it good. Tons of companies spend on terrible ideas.
They only difference with Google or Facebook is that they’re big enough to absorb the losses.
This isn’t to say that ML is a dead end, but instead to point out thatjust because they are investing a lot doesn’t make it good.
I’m just a few years younger than you and have had similar experiences. This is off topic, but when was the last “magical” new computer experience for you? For me, it was an M1; after seeing how good Intel had been for so long, everything that they had vanquished and then AMDs recent run, I just couldn’t see a non-x8664 part really performing outside of some IBM systems in special cases. That little m1 SoC blew me away with its consistently great performance and power use. I’m not sure it’ll be the same with the M3 and beyond. It was a taste of that old school new computer feeling though.
The first computer I used was a Pentium Pro 233mhz and I remembered how fast things were moving every year for at least a decade before it slowed to irrelevance. The M1 was long time coming. I remember back in 2013 when the iPhone 5S came out, Anandtech showed how it matched Atom's perf in a few web benchmarks at much lower power. Combined that with mw level idle power, it was obvious they would be very competitive in the pc space. That was also the year Apple called their chip "desktop level." I remember thinking back then how amazing I could FaceTime for hours on a passively cooled phone but yet can barely Skype for thirty seconds before the fan spin up on my Mac. Always thought it was the smaller screen, never made the connection it was the SoC that was the key difference.
For me it was the upgrade from spinning platters to a SSD. I was giggling as I restarted my computer a few times just to watch it almost instantly get to the login screen.
I am very very doubtful that people will once again start caring.
Even a decade ago, it was known that hardware gains wouldn't be as spectacular as before. It was predicted that this would lead to rise of specialized programming models such as GPGPU, DSPs, more focus on optimization, with a particular eye to hardware architecture, memory access patterns etc.
What actually happened?
Everything runs in the browser buried under six layers of Javascript and talks to a bazillion servers running microservices and passing JSON over HTTP to each other.
People care about optimization even less today than they did a decade ago.
Dude, in 1998 we at intel had a 64-core running system.
but its the microns to nm circuit size that really proved out, not macro cores... except that once they solved that, scaling CORES is what really gave way - going from nm to um...
Maybe some critical code paths will be assembly optimized (cf dav1d) because speed and efficiency, but, now the real issues are mostly at the software level where toxic planned obsolescence is going rampant, that fueled by the big tech companies steered by vanguard and blackrock (apple/microsoft/google/etc).
The only shield against that, some would think open source is the key, but actually it is "lean" open source, SDK included. Kludge, bloat, planned obsolescence are no better in the current open source world than in the closed source world.
I am a "everthing in risc-v assembly" (with a _simple_ and dumb macro preprocessor only) kind of guy (including python/lua/js/ruby/etc interpreters). The main reason for that is not to be "faster", but to remove those abominations which are the main compilers from SDK stacks. Some sort of "write assembly once/run everywhere" (and you don't need a c++7483947394 compiler).
I agree, but I also think we need a fundamentally new paradigm.
It's very important that we as programmers have a good mental model for how the machine works. Abstractions are cool, but it is important to be aware of how your data lives in memory and how the cpu acts on your code, but everything we've been taught in the last few decades is almost irrelevant.
Almost all of us think and write code sequentially. Even with multithreading, your program is generally sequential, and the cpu just doesn't work that way anymore. With all the fancy whizbang branch prediction and superscaling and whatever other black magic, the cpu is fundamentally not sequential.
As a result, compilers are becoming enormous hulking beasts with millions of lines of code trying to translate sequential programs into parallel ones. This kind of defeats the purpose of us having that mental model of the machine. The machine we think we know is not the machine that actually exists.
We need a new set of inherently parallel languages. Similar to the way we program GPUs these days.
The modern cpu is orders of magnitude more complex than anything we've seen before. We need new mental models and new programming paradigms to extract performance the way we used to on sequential processors.
Even for embedded applications, microcontrollers increasingly feature things like multiple instructions per cycle, branch prediction, and multiple cores are much more common these days.
I think we're stuck in a shitty place in between two wildly different worlds of computing. We aren't willing to make the leap to the new, so we live in this rapidly crumbling ecosystem trying to adapt 50 year old code to superscalar hyperthreading gigacore x86 processors.
The amount of wasteful code and technical debt in every one of the systems underpinning our society is truly unimaginable in its scale. There is no path forward from here except to burn it all down and begin again with a fundamentally new way of looking at things. Otherwise, it's all going to come crashing down sooner or later.
I don't quite feel that - one side of it is that my current computers cover my necessities well enough, but it's still quite impressive how even more instantaneous is the boot of a new computer in comparison to my daily drivers. For the rest, computers have been "fast enough" for me for some time now.
Maybe I should move to big data and machine learning...
> Back when I started programming as a teen in 1980 with 4k of RAM and ~1 Mhz 8-bit CPUs
I really miss those days. OTOH, just like my modern laptop, my Apple II could cold-start (from disk!) in 2-ish seconds.
This graph shows transistors basically maintaining pace and completely disregards multi-core performance. Of course single core perf will rise more slowly when a chip now has 8-64x as many cores.
> This graph shows transistors basically maintaining pace...
I'm no expert in silicon scaling but from reading technical papers, my (naive) understanding is that transistor density has almost kept up but now that scaling comes with increasingly stringent design constraints which architects must make trade-offs over. Broadly speaking, things like "You can have 2x last gen's density but they can't all be fully powered on for very long." That's a greatly simplified example but much of what I've seen has been far "thornier" in terms of interacting constraints along multiple dimensions.
My sense is that in the 90s we usually got "denser, faster AND cheaper" with every generation. Now we're lucky to get one and even that comes with implementation requirements which can be increasingly arcane. My understanding is that different fabs are having to roll more of their own design libraries which embody their chosen sets of trade-offs per node. In addition to limiting overall performance and being harder to design, this apparently makes reusing or migrating designs more challenging. While certain headline metrics like node density may appear to be scaling as usual. The reality under the hood is more complex and far less rosy.
You made me think that maybe computing is a deflationary force (I am not a libertarian, this isn't some free market bro idea, I think)d, the more that can be subsumed by computation, the more things that can get cheaper over time not more expensive, even if the face of rising material costs.
The relative price of steel has remained flat, while the steel performance has greatly increased.
Between material science and cheaper compute, we can build higher tech parts and techniques.
The cycles/consumed/per/person/per/year is an exponential, what are some important points on that curve? When the computation to design something is on the order as the same amount of energy to create it?
You could buy a Honda Civic new in 1980 for 5000$, that would only be just under 10k in todays dollars. What 1980 Honda Civic quality car can you buy today for 10k? Or am I a being nostalgic.
And look at the bump in inflation during the recession, https://blog.cheapism.com/average-car-price-by-year/#slide=6... of car prices. Was the 2008 recession triggered by excessively inflated car prices? Like causing a bubble in a pipeline, an economic embolism.
Current average price has dropped 10k$ from 35k to 25k in the years since 2008.
Could you please try to explain what you want to say with less snark? I'm a bit confused.
Paying people to do nothing gives you nothing.
Full employment isn't an end in itself, but it's useful because it is typically related to things we do care about. Employing people to do nothing is like fiddling with the speedometer of your care in order to 'go faster'. Or relabeling your amplifiers to go to 11.
You can sort-of turn atmospheric carbon into cheese. Have grass capture the carbon, and a cow eat the grass. That's totally doable, just not viable or efficient if your goal is to capture carbon at scale.
(If your goal was to go carbon negative at all costs, you could instate a whooping big carbon tax, and let the economy figure it out.)
Right now our economy basically runs on carbon at the core. We make stuff, move stuff and emitting carbon is necessary. If we switched our economy to owing and moving information, then we could still have full-employment, move money in the ecosystem while from the viewpoint of a materialist, just be moving useless bits around.
I think we already have a lot of high paying jobs in the economy that don't do much and pay people to do nothing (of value). We should absolutely spread that around.
Which is great if you have a traditional server application servicing a lot of independent requests, or giant linear equations that can be solved in parallel.
OTOH, the graph has an amadal's law section, which for many tasks is pretty out of steam (aka desktop web browsing/javascript JIT/etc).
I'm not going to be so stupid as to say 8 cores should be enough for anyone (while attached to a machine with 128) but you have to wonder if the stable diffusion style apps running on your desktop are going to be mainstream, or isolated to the few who choose to _need_ them as a hobby or a smaller part of the public that uses them for commercial success. AKA, I can utilize just about every core i'm given with parallel compiles, or rendering a 4K video, but I'm pretty sure i'm the only one in my immediate family that needs that. My wife in the past might have done some simulation work, but these days the heaviest thing she runs on her PC is office products.
This really gets back at the Arm big.little thing, where you really want 99% of your application usage to run on the big cores. The little cores only exist for background/latency insensitive tasks, and the odd case where the problem actually can utilize a large number of parallel cores and needs to maximize efficiency in the power envelope to maximize computation. AKA throw a lot of lower power transistors at the people rendering video/etc, and leave them powered off most of the time.
AKA, put another way, the common use case is a few big powerful cores for normal use, playing games, whatever with one or two high efficiency processors for everything else and a pile of dark silicon for the rare application that actually can utilize dozens of cores because its trivial to parallelize and doesn't work better being offloaded to a GPU. I suspect long term intel was probably right with larrabee, they were just a decade or two early.
So, economically I don't see people buying machines with a couple hundred cores that sit dark most of the time. Which will drive the price up even more, and make them less popular.
> I'm not going to be so stupid as to say 8 cores should be enough for anyone (while attached to a machine with 128) but you have to wonder if the stable diffusion style apps running on your desktop are going to be mainstream, or isolated to the few who choose to _need_ them as a hobby or a smaller part of the public that uses them for commercial success. AKA, I can utilize just about every core i'm given with parallel compiles, or rendering a 4K video, but I'm pretty sure i'm the only one in my immediate family that needs that. My wife in the past might have done some simulation work, but these days the heaviest thing she runs on her PC is office products.
Cause and effect is backwards there. Designers only went to multicore because single core performance improvement was leveling off. It's not that people wanted multicore systems and were willing to sacrifice single core performance to get it.
Well we wanted multicore, but it was mostly because windows loved to become irresponsive on single core. I think that from consumer point 2 cored circa 2006 were enough. 4 is probably the absolute maximum.
How does it disregard multi-core performance? As you said, it's showing the transistor counts going up, and it's also showing the rise in the number of logical cores.
The missing thing that's critical for most multi-core performance use cases is memory bandwidth. Maybe not easy to summarize on a graph like this, but for any workload that can't fit within L1 cache, you're unlikely to get close to linear performance scaling with cores. Sometimes a single core can fully saturate the available memory bandwidth.
I'm sure speeds and capabilities will continue to increase, albeit much more gradually, but significant gains are going to come slower, harder and at greater cost. The burden will have to be shouldered by system architects and programmers in finding clever ways to squeeze out net gains under increasingly severe fundamental constraints (density, leakage, thermals, etc).
Back when I started programming as a teen in 1980 with 4k of RAM and ~1 Mhz 8-bit CPUs, knowledge of the hardware underneath the code and low-level assembly language skills were highly valuable. Over the years, the ability to think in instruction cycles and register addressing modes grew anachronistically quaint. Now I suspect those kinds of specialized 'down-to-the-metal' optimization skills may see a resurgence in value.