To me, this is a big part of what makes machine learning exciting: it's so chall...

To me, this is a big part of what makes machine learning exciting: it's so challenging to implement it well. The result of it is that machine learning touches a lot of computer science, from high-level languages and formal verification to low-level languages and systems concerns (GPU programming, operating systems).

This difficulty is also a reason why machine learning programmers who are, at least, validated tend to get a lot of trust from the business that CommodityScrumDrones don't get (and that's why most good programmers want to redefine themselves as "data scientists"; it's the promise of autonomy and interesting work). No one tells a machine learning engineer to "go in to the backlog and complete 7 Scrum tickets by the end of the sprint". Of course, the downside of all this is that true machine learning positions (which are R&D heavy) are rare, and there are a lot more so-called "data scientists" who spend most of their time benchmarking off-the-shelf products without the freedom to get insight into how they work.

I actually think that the latter approach is more fragile, even if it seems to be the low-risk option (and that's why mediocre tech managers like it). When your development process is glue-heavy, the bulk of your people will never have or take the time to understand what's going on, and even though operational interruptions in the software will be rarer, getting the wrong answer (because of misinterpretation of the systems) will be more common. Of course, sometimes using the off-the-shelf solution is the absolute right answer, especially for non-core work (e.g. full-text search for an app that doesn't need to innovate in search, but just needs the search function to work) but if your environment only allows programmers to play the glue game, you're going to have a gradual loss of talent, insight into the problem and how the systems work, and interest in the outcomes. Reducing employee autonomy is, in truth, the worst kind of technical debt because it drains not only the software but the people who'll have to work with it.

At any rate, I'd say that while this seems to be a problem associated with machine learning, it's just an issue surrounding complex functionality in general. Machine learning, quite often, is something we do to avoid an unmaintainable hand-written program. A "black box" image classifier, even though we can only reason about it empirically (i.e. throw inputs at it and see what comes out) is going to be, at the least, more trustworthy than a hand-written program that has evolved over a decade and had hundreds of special cases, coming from aged business requirements that no longer apply and programmers from a wide spectrum of ability, written in to it. All in all, I'd say that ML reduces total technical debt; it's just that it allows us to reach higher levels of complexity in functionality, and to get to places where even small amounts of technical debt can cause major pain.