The big language design problem that I think this post highlights is that the fl...

mfsch · on May 16, 2022

It seems to me that much of the difficulty with interfaces, whether they are made explicit or kept implicit, lies in defining the semantics that the functions are supposed to have.

As we expand the types our generic code can handle, we have to refine the semantics it relies on. For a long time, Base.length(::AbstractArray) could mean “the largest one-based index of the array”, but then we started using the same code that handles regular Arrays for OffsetArrays and this interpretation was no longer valid. I guess the alternative would have been to leave length(::OffsetArray) unimplemented and block the valid use of OffsetArrays for all generic code that understands Base.length as “the number of values”.

It can still be difficult to tell what a function like Base.length should mean if I implement it for my types. For example, should it return the number of local values or the global length for an array that is distributed between multiple processes (e.g. in an MPI program)? Perhaps some generic code will use it to allocate a buffer for intermediate values, in which case it should be the local length. Or some generic code computes an average by dividing the (global) sum by the global length.

It seems impossible to come up with a precise definition of all the semantics your generic code assumes a priori, so we can either restrict our usage of generics to a small number of concrete types that were considered when the code was written, or we have to accept that we occasionally run into these sorts of issues while we refine the semantics.

Anecdotally, it has been my experience that packages that have been made to work in many generic contexts (such as the ODE packages) are likely to work flawlessly with my custom types, while packages that have seen less such effort (e.g. iterative solvers) are more likely to cause issues. This makes me hopeful that it is possible to converge towards very general generic implementations.

It is also worth mentioning that it is very possible to use Julia without ambitious use of cross-package generic functionality, and use it “merely” as a better Fortran or Matlab.

Sukera · on May 16, 2022

To expand on the "interfaces are not enough" part: Defining an interface on an abstract type only gives you that a implementation exists, not that it is correct, i.e. that the specific implementation for a subtype guarantees the same properties the interface specifies.

On top of this, you really want to be alerted to when you expect more of an interface than the interface guarantees - this is what happened in the case of `1:length(A)` being assumed to give the indices into `A`, when the `AbstractArray` interface really only guarantees that a given set of methods exists.

I feel like these sorts of issues more or less require more formal models being provided & checked by the compiler. Luckily for us, nothing in this space has been implemented or attempted in & for julia, while there are a lot of experiments with formal methods and proofing systems being researched right now (TLA+, coq,..). There are of course a lot of footguns[1], but the space is moving fast and I'd love to see something that makes use of this integrated into julia at some point.

[1]: Why specifications don't compose - https://hillelwayne.com/post/spec-composition/

tialaramex · on May 16, 2022

> Defining an interface on an abstract type only gives you that a implementation exists, not that it is correct

Pretty far off topic for Julia, but the definition of Rust's Traits over semantics rather than syntax (even though of course the compiler will only really check your syntax) gives me a lot of this.

The fact that this Bunch<Doodad> claims to be IntoIterator<Item=Doodad> tells me that the person who implemented that explicitly intends that I can iterate over the Doodads. They can't accidentally be IntoIterator<Item=Doodad> the author has to literally write the implementation naming the Trait to be implemented.

But that comes at a heavy price of course, if the author of Bunch never expected me to iterate over it, the best I can do is new type MyBunch and implement IntoIterator using whatever ingredients are provided on the surface of Bunch. This raises the price of composition considerably :/

> you really want to be alerted to when you expect more of an interface than the interface guarantees

In the case alluded to (AbstractArray) I feel like the correct thing was not to implement the existing interface. That might have been disruptive at the time, but people adopting a new interface which explicitly warns them not to 1:length(A) are not likely to screw this up, and by now perhaps everything still popular would have upgraded.

Re-purposing existing interfaces is probably always a bad idea, even if you can persuade yourself it never specifically said it was OK to use it the way you suspect everybody was in practice using it, Hyrum's Law very much applies. That interface is frozen in place, make a new one.

jerome-jh · on May 17, 2022

I think the OP specifically complains about the use of @inbounds and that the documentation was advocating an invalid use of it. Some libraries may not have been updated to handle AbstractArray: that's normal SW rot. But the out of bound access being unreported is the actual grief of the OP.

clhodapp · on May 16, 2022

> What you really want is a way to generically express behaviors of an abstraction in a way that can be automatically tested.

The pure FP ecosystems in Scala often accomplish this in the form of "laws", which are essentially bundles of pre-made unit tests that they ship alongside their core abstraction libraries.

iamed2 · on May 16, 2022

Invenia's approach to interface testing ("Development with Interface Packages" on our blog) does some of the things you suggest as a standard of practice, by providing tools to check correctness that implementers can use as part of package tests. ChainRulesTestUtils.jl is a decent example (although this one doesn't come with fake test types). I think this is typically good enough, and only struggles with interface implementations with significant side effects.

One little win could be publishing interface tests like these for Base interfaces in the Test stdlib. I appreciate that the Generic* types are already exposed in the Test stdlib!

blindseer · on May 16, 2022

> 2) you can find out what methods you need to implement just by running the code that uses the implementation and see what fails.

For large codebases this is SO painful to do. I just don't understand how anyone gets anything done when this is how they have to develop code.

StefanKarpinski · on May 16, 2022

That's why interfaces are useful—they save you from that. But they don't actually solve the problem of checking that an abstraction has been implemented correctly, just that you've implemented the entire API surface area, possibly incorrectly. Note, however, that if you have a way of automatically testing the behavioral correctness of an implementation, then those tests presumably cover the entire API, so automatic testing would subsume the benefit that static interface checking provides—just run the automatic tests and it tells you what you haven't implemented as well as what you may have implemented incorrectly.

sirwhinesalot · on May 16, 2022

Indeed, static types don't save you from this issue, at least structural ones don't, exactly the same issue would occur as in Julia (see: C++ templates). Static structural types have the same problem as Julia here, you gain a lot of compositional power at the expense of potential correctness issues.

However, nominal types do "solve" the problem somewhat, as there's a clear statement of intent when you do "X implements Y" that the compiler enforces. If that promise is missing, the compiler will not let you use an X where a Y is expected. And if you do say X implements Y, then you probably tested that you did it correctly.

But this would also fail at the OffsetArray problem. The only way I can see of protecting against it (statically or dynamically) is to have an "offset-index" type, different from an integer, that you need to have to index an OffsetArray. That makes a[x] not be compatible between regular Arrays and OffsetArrays.

I don't think anyone wants that mess though. So if your language has OffsetArrays, and they're supposed to be compatible with Arrays, and you can index both the same way, no amount of static types will help (save for dependent/refinement types but those are their own mess).

EDIT: I seem to have replied to the wrong comment, but the right person, so hey, no issue in the end :)

blindseer · on May 17, 2022

I think I agree with your claim that writing tests would be a super set in terms of covering the use cases of interfaces. But

1) Testing is such a PAIN in Julia. You HAVE to run all the tests every time. You HAVE to write tests in a separate folder. Multiple dispatch and composibility prevent having confidence that your tests cover all the cases.

2) In a lot of cases, interfaces improve readability of the code. Being able to just look at code in a large code base and know which interfaces are implemented + need to be implemented is such an advantage

3) Static analysis tooling can provide linting. This doesn't even have to be implemented by the core team at the moment. The lack of interface limits any kind of tooling to be developed.

All in all, when I write Julia code, I know I have to write EXTENSIVE tests. Even more tests than I have to write with Python (with mypy). Almost an order of magnitude more tests than I have to write with Rust.

Sometimes it feels like I spend more time writing / running tests than adding features to our packages. And that is honestly just not fun for me, let alone for other scientists and researchers on my team.

kaba0 · on May 16, 2022

Interface’s provide correctness guarantees by way of implementing them is a conscious decision. If your array implements GenericArray, you know about that interface, and presumably what it is used for. Its methods can also contain documentation.

The point is a common point of… trust may be the word? Two developers that don’t even know each other can use each other’s code correctly by programming against a third, hypothetical implementation that they both agree on. Here OffsetArray would simply not implement the GenericArray interface if the latter expects 1-based indexing.

In this specific case the solution would be to move the indexing question into the interface itself - it is not only an implementation detail. Make the UltraGenericArray interface have an offset() method as well and perhaps make [] do 1-based indexing always (with auto-offsetting for indexed arrays), and a separate index-aware get() method, so that downstream usage must explicitly opt in to different indexing.

renox · on May 16, 2022

I remember reading a long time ago about the 1-based array and the offset-array 'kludge'.

My first thought was they should have replicated Ada's design instead, my second thought I hope that they have a good linter because putting arbitrary offset implementation in a library is a minefield.

I don't claim to be especially smart: this is/was obvious.. Unfortunately what isn't obvious is how to fix this issue and especially how to fix the culture which produces this kind of issue..

StefanKarpinski · on May 17, 2022

Offset arrays aren't a kludge and the package would exist regardless of whether zero or one has been chosen as a default base for indexing. Having arbitrary index ranges in different dimensions is extremely useful in many application domains. When working with FFTs, for example, it's natural for the indices to be symmetrical around zero. Or when doing coordinate transforms like in this example from the excellent Images.jl package: https://juliaimages.org/stable/tutorials/indexing/.

btilly · on May 17, 2022

Offset arrays are a wonderful source of footguns, and this should have been obvious from the start. Either you train everyone to write loops in an official way that avoids the problem, or you are going to have bugs. And if you choose training, you have to train EVERYONE, because anyone coming in from another language will have the wrong habits.

Even Perl realized that changing the indexing of arrays was a bad idea. Which is why the special variable $[ has been deprecated every which way possible because it caused too many bugs.

StefanKarpinski · on May 20, 2022

Here's a post I wrote on Julia discourse about why one-based indexing is not the culprit here, no matter what people may feel about it aesthetically: https://discourse.julialang.org/t/offsetarrays-inbounds-and-....

Several of the most serious scientifically minded programming languages have also had arrays that can be indexed starting at different offsets, including Fortran, Chapel and Fortress. If Julia were zero-based OffsetArrays would still exist and be highly useful. OffsetArrays is not some "kludge" just to allow indexing from zero. Frankly, indexing from zero isn't improtant enough for that. What really makes OffsetArrays useful is being able to do things like have indices go from -k:k where k may depend on the particular dimension. That way the point A[0, 0, 0] is the center of the array and you can navigate up/down/back from there naturally. Of course you can simulate that with arrays that arrays that start at zero or one, but it's a major pain.

ShamelessC · on May 17, 2022

Hubris