Here is the thing though, and I am gonna phrase this very carefully as I have stepped on some toes before and got downvoted.
Imagine a naive young scientist who has been "trained" to code in Matlab. He is not a computer scientist, of course, but he profits if his program goes fast, even before actual computer scientists get involved to make the thing proper. Indeed that may make the difference between the project getting off the ground or not going forward. Such cases have been known to exist.
Now, he comes across some code that can not be written in pre-optimized matrix routines, but nevertheless populates deterministic addresses in a container without any further side effects. It could wait for some IO, or perhaps it just does a thing that takes a while on the CPU core (like iterations of something with results getting saved somehwere).
In any case, it's gotta go into a for loop and that's slow for most languages of that sort.
Given that he has a good number of cores, threads etc. available, he figures it'd be pretty cool if the thing run in parallel.
So he changes the "for" statement into "parfor". Matlab then starts a CPU pool and runs the loop to completion in parallel on both threads and cores without further issues. The whole thing just runs x times faster.
My point here is this: The whole thing in Python is complicated. It is so, because other languages, like Matlab, make it laughably easy. The Pythonic way would have been to offer such an easy option. However, if you are not well-versed in serious coding, then concurrent anything in Python is hella complicated. If you disagree, you likely know more about multi-x than the average person typing things into a computer.
I do not doubt that such a simple approach has its limitations. But it seems to me that in practice - and in particular when I look at most of the examples presented - such a simple way would have been completely sufficient. I do readily accept that I may be completely wrong.
edit: And now I realize that your verb was "debug" and not "write". Sorry.
Absolutely the worst. Ten years ago I encountered a bug in a wireless networking stack that would crash the whole network of nodes. It was in the worst possible location that was highly dependent on timed sending where every nanosecond mattered. This meant i could only use leds to indicate what was going on, as a print or anything the like would screw the timers enough to break the network. Can’t actually remember what caused it in the end.
It took me 6 weeks to debug and fix. I did nothing but debugging at the most primitive level. Probably could have done it more efficiently, but I was young and inexperienced.
Worst and most boring 6 weeks of my life caused by concurrent debugging.
About 10 years ago there was a bug in driver to the Intel wireless card, it crashed when it received 802.11n packet. That bug I remeber well, because it was responsible for me stunning Ubuntu. They knew about that bug (it was reported when they were getting ready to release) and still went forward with the release. I'm wondering if that was related.
Coupling can be a bitch. Worked in a shop once that had GPFS (shared filesystem) everywhere. Their rule was "no swap space on any node ever". Weird. Why not? Because apparently slow thrashing on one node will crush GPFS performance on all nodes, without ejecting the failing node. Ugh.