I find the whole focus on abstraction as a solution to flexibility very troublesome. Taking the declaration for a C++ function and whacking an abstract interface only goes a small way to making the system more flexible. All your interacting components still depend intimately on each other's behavior. The behavior is not fully captured by the interface - if it was, it wouldn't be an interface, it would be an implementation. So abstractions leak by their very nature, and the success of our attempt to enable flexibility through those interfaces depends completely on how well we choose what the interfaces are and how much of the behavior specification we decide should live in the interface definition vs the implementation. That subtlety of the design process is completely lost by this love of abstraction where it is assumed that anything with an abstract interface is completely flexible whereas anything without one is not.
> The behavior is not fully captured by the interface - if it was, it wouldn't be an interface, it would be an implementation.
I think your argument is weakened by the reality that even "behaviour" has plenty of abstraction in it... particularly if you are using a compiler. You've got the compiler, library linkages (and to different libraries), the OS's abstraction, and of course these days there's usually a hypervisor in there. Then there's the abstractions in the hardware... You just can't get away from it.
Good interfaces allow encapsulation, which tends to be very helpful, although there is a point where it does more harm than good.
> That subtlety of the design process is completely lost by this love of abstraction where it is assumed that anything with an abstract interface is completely flexible whereas anything without one is not.
It's all abstract. Design is concerned about the rightabstraction. The wrong abstraction gives you no flexibility, or no flexibility where you need it, or just a ton of unneeded complexity. Abstraction for abstraction's sake suggests a lack of design.
> Abstraction for abstraction's sake suggests a lack of design.
And yet a surprisingly large number of people think "good" design is entirely about adding abstractions until no more can be added. The classic example of this is enterprise Java, where designs are almost always overly abstract and flexible, but only in these narrow directions that were thought to be where extreme flexibility will be required in the future; and inevitably, the flexibility that's eventually needed turns out to be in a completely different direction.
In their defense mathematicians have gained huge advances in algebra (and other areas) precisely by abstracting away all unnecessary parts (most notably by Emmy Noether).
Of course working out which parts are necessary and which aren't is a very subtle process, and one of the areas of programming where a mathematical background can be a huge advantage.
The real problem is abstraction leaks. When an abstraction leaks, the user of the abstraction is forced to think both about the abstraction and its internal implementation, which results in a net increase in the cognitive burden of understanding the code.
However, Java doesn't lend itself to designing very tight abstractions, since all but the most elementary arithmetic operations can fail in utterly miserable ways. If you read a line of code like:
someMap.get(someList.length()).someMethod()
In how many ways can it throw an exception? Using temporary variables doesn't help: it only makes less things go wrong per expression or statement, at the price of making each individual expression or statement do less work for you.
Fortunately, there exist languages that offer less room for things to go wrong.
Much as with generics in C++, the correct approach generally tend to be too assume exceptions can be thrown at any time. Java's try-with clauses make that a bit more natural than it used to be.
Ironically, I think a lot of Java's abstraction pain comes from having much more fixed behavior. The static type system, the late introduction of generics, the large base class hierarchy, and the concern with backwards comparability means you've got a ton of behavior and interfaces that are locked down with specific (often poor) abstractions.
> Ironically, I think a lot of Java's abstraction pain comes from having much more fixed behavior.
Fixed behavior isn't necessarily a bad thing. What is always a bad thing is when types don't accurately capture what your program really means. Java's own collections documentation says:
“To keep the number of core interfaces small, the interfaces do not attempt to capture such subtle distinctions as mutability, modifiability, and resizability. Instead, certain calls in the core interfaces are optional, enabling implementations to throw an `UnsupportedOperationException` to indicate that they do not support a specified optional operation. Collection implementers must clearly document which optional operations are supported by an implementation.”
In Java, this API design style is the norm rather than the exception, which validates the criticism coming form dynamic language advocates: “You still need to use tests, so why bother with types?” The only way forward is to use types in a way that actually eliminates the need for (some) nontrivial tests.
At the core, the fundamental problems with Java are:
(0) Its type system is too nominal. Every notion, primitive or derived, needs a separate name. The result is that making crisp distinctions takes more effort than programmers are willing to make. In a structurally typed system, a derived notion is just a combination or more basic ones, whether you give the combination a name or not.
(1) It doesn't offer a principled way to express that an operation can naturally have multiple kinds of results. And such operations abound in practice! For example, if you try to get the value associated with a key in a container, the key might or might not be there. Or, if you try to get the i-th element of a list (assume 0-based indexing and nonnegative i for the sake of the example), then i is either a valid index or j positions after the end of the list. None of these possible results is intrinsically “wrong”, but Java can only express operations with one kind of correct result.
> Fixed behavior isn't necessarily a bad thing. What is always a bad thing is when types don't accurately capture what your program really means.
That's kind of what I meant by having too much fixed behaviour.
> In Java, this API design style is the norm rather than the exception, which validates the criticism coming form dynamic language advocates: “You still need to use tests, so why bother with types?”
The obvious answer is that while you can't solve everything with types, you can still succinctly represent what would otherwise be a ton of tests with a single type name.
> The only way forward is to use types in a way that actually eliminates the need for (some) nontrivial tests.
Well, one man's trivial is another man's non-trivial, but I'm happy having a machine automate all my trivial tests.
> Its type system is too nominal
Yes. It has become more sophisticated over time, but it is still pretty weak, and more importantly the built-in behaviour carries a lot of the consequences of the ridiculously limited nature of the type system in the first place.
> It doesn't offer a principled way to express that an operation can naturally have multiple kinds of results.
You could make that charge against most of the BCPL languages. C++ gets around it with templates, but even then has all kinds of limitations on how the output type is a function of the input type.
> For example, if you try to get the value associated with a key in a container, the key might or might not be there.
Java has Optional<T> which is exactly the way to express that case.
> None of these possible results is intrinsically “wrong”, but Java can only express operations with one kind of correct result.
Exceptions are a reasonable way of expressing alternative results, as is Optional<T>, as is returning a common supertype/interface, as is null (much as I despise it), as is having an object that serves as a discriminated union.
That may not all sound great to you, but those are actually very common strategies found in a variety of other languages.
> The obvious answer is that while you can't solve everything with types, you can still succinctly represent what would otherwise be a ton of tests with a single type name.
Except a name is just a declaration of intentions. A Python class name is just as good a declaration of intentions as a Java class name. But why should I trust an unverified declaration?
> Well, one man's trivial is another man's non-trivial, but I'm happy having a machine automate all my trivial tests.
Finite discriminate unions are pretty trivial by any reasonable standards, and Java can only handle them by very cumbersome means (visitors), which people who value their time reasonably avoid.
> You could make that charge against most of the BCPL languages. C++ gets around it with templates, but even then has all kinds of limitations on how the output type is a function of the input type.
Somehow Rust and Swift manage it. Even C++ and D fare better than Java, since finite discriminated unions can be implemented using templates, although pattern matching using variant_visitors is a truly maddening experience.
> Java has Optional<T> which is exactly the way to express that case.
But `null` still exists, and it's used in several parts of the Java standard library.
> Exceptions are a reasonable way of expressing alternative results, as is Optional<T>, as is returning a common supertype/interface, as is null (much as I despise it), as is having an object that serves as a discriminated union.
(0) No, exceptions aren't reasonable. They don't always show up in types, so you don't know what kinds of exceptions a particular procedure might throw.
(1) Java's `Optional<T>` is hardly more satisfying: how do I simultaneously pattern-match over two or more optionals?
(2) In any case, there exist many more interesting discriminated unions than can be expressed using only optionals and recursive types.
> A Python class name is just as good a declaration of intentions as a Java class name.
Yes, it is just as good a declaration of intentions.
> But why should I trust an unverified declaration?
Wait a second though...
def foo(bar):
if !isinstance(bar, Baz):
throw WrongTypeException
THAT is somewhat equivalent to having:
public void foo(Baz bar);
The Java one gets verified without test cases (there's literally steps in the process with "Verifier" in the name), but the Python one does not.
Most Python code doesn't have all of its parameters laden with type checks, whereas Java code tends to have type names attached to its parameters. More importantly though, that Python type check doesn't actually get verified until runtime, and only on a case by case basis. The static type checks in Java will be done first at compile time, and then again during bind/load time. You still have to write code to verify that you never get a non-Baz "baz" for the Python function, and that's trickier than it looks.
> Finite discriminate unions are pretty trivial by any reasonable standards, and Java can only handle them by very cumbersome means (visitors), which people who value their time reasonably avoid.
You value your time, so because of cases where you'd like to return more than one type without a common parent type, you would throw out type checking entirely?...
> But `null` still exists, and it's used in several parts of the Java standard library.
So, Rust at least still has null as well (they say they don't, but in practice they have std::ptr::null and they also have None... now tell me that doesn't create more problems than it solves ;-), as there's this problem that reality is already filled with nulls.
In practice, what you are getting at is exactly what I pointed out earlier: Java is locked in to certain behaviours and interfaces that predate the current somewhat more sophisticated state of the type system.
> No, exceptions aren't reasonable. They don't show up in types, so you don't know what kinds of exceptions a particular procedure might throw.
They do for checked exceptions:
final java.sql.ResultSet foo = query.execute();
foo.getInt(1);
That's going to get you an int, a SQLException, a RuntimeException, or an Error depending on circumstance. While the Error & RuntimeException aren't checked by the compiler, the SQLException is, and you've got a fairly natural way of writing logic for it too. That's a very reasonable way of declaring that your function will return either an int or a SQLException. Without any test cases, the compiler will tell me if I'm not handling that SQLException case (much to my chagrin).
> Java's `Optional<T>` is hardly more satisfying: how do I simultaneously pattern-match over two or more optionals?
Generally, I haven't needed to. Usually the optional comes in a different stream or in a different stage of processing a stream.
However, for cases where you really have multiple optionals, you typically either pass them as parameters to lambda or create a named type that contains your Optional<T> types. Honestly, given the complexity of those cases, it isn't that great a burden compared to the logic you're going to bind to it...
> In any case, there exist many more interesting discriminated unions than can be expressed using only optionals and recursive types.
Might be more interesting yes... but for your typical coding context, not that important.
> Most Python code doesn't have all of its parameters laden with type checks, whereas Java code tends to have type names attached to its parameters.
Why should I care about the name of the type, though?
> More importantly though, that Python type check doesn't actually get verified until runtime, and only on a case by case basis. The static type checks in Java will be done first at compile time, and then again during bind/load time.
My point is that Java's static checks don't buy you much, because types aren't very precise about what objects are supposed to be in the first place, as my quote from Oracle's own documentation above shows. I'm all for more static checking, but only when it actually buys you something. Java is effectively a dynamically typed language with an integrated linter.
> You still have to write code to verify that you never get a non-Baz "baz" for the Python function, and that's trickier than it looks.
Python programmers tend to care more about the methods a concrete object has, rather than the name of its class. Which is IMO pretty sensible - as well as doable with a static type system, see OCaml.
> You value your time, so because of cases where you'd like to return more than one type without a common parent type, you would throw out type checking entirely?...
No, I use sum types. The problem is that Java doesn't have them!
> So, Rust at least still has null as well (they say they don't, but in practice they have std::ptr::null
You can't do anything with raw pointers in safe code. Unsafe code is, well, unsafe.
> and they also have None...
`None` is the right way to handle optional values. It doesn't have type `T`, but rather `Option<T>`.
> now tell me that doesn't create more problems than it solves ;-)
I tell you.
> as there's this problem that reality is already filled with nulls.
No, reality doesn't have nulls. Where's null in the laws of physics?
So, you are suggesting then that if a static type system doesn't have sum types, it provides no value and you are better off without it, and I'm pointing out that as annoying as that might be, there is still plenty of value without them.
> The problem is that Java doesn't have them!
Java actually has SumType's for the simple cases of representing two types (Pair<K,V>) or for representing one type or null (Optional<T>).
For the rest of the cases, you'd either have to recursively use Optional<> and/or Pair<> or come up with your own type name, but considering you have to name everything else anyway, and you have to handle the various cases involved, it's not that bad to create a static inner class for some new combination of values. It's not elegant, but it really doesn't add much work. If you feel it does, use an attribute processor to make it easier to generate one on your behalf.
> It doesn't have type `T`, but rather `Option<T>`.
Yes, we're all very proud of the correctness of having a distinct type for null/nil/none/whatever. It's nice, but from a practical standpoint you can achieve the same thing in Java. You just can't fix the existing class library.
> No, reality doesn't have nulls. Where's null in the laws of physics?
Well, in physics, "no magnetic field" != "no gravitational field", so...
> So, you are suggesting then that if a static type system doesn't have sum types, it provides no value and you are better off without it,
There are alternatives to sums, like union and intersection types, with different tradeoffs. Nominal sums are what you could call “proven technology”: they integrate well with other desirable language features like type inference and ML-style modules with abstract types. Unions and intersections, OTOH, can be flexibly sliced or aggregated without defining new nominal types.
But there has to be some way to perform case analysis in a principled way.
> and I'm pointing out that as annoying as that might be, there is still plenty of value without them.
I don't see any value in a type system in which I can't accurately model whatever problem domain they throw at me, and case analysis is basically omnipresent in programming: Without loops you can't write a nontrivial program, and, without case analysis, you can't have non-infinite loops!
Type system - Accurate modeling = Annoying bureaucracy.
> Java actually has SumType's for the simple cases of representing two types (Pair<K,V>) or for representing one type or null (Optional<T>).
> For the rest of the cases, you'd either have to recursively use Optional<> and/or Pair<> or come up with your own type name, but considering you have to name everything else anyway, and you have to handle the various cases involved, it's not that bad to create a static inner class for some new combination of values. It's not elegant, but it really doesn't add much work.
Just because I can make a type isomorphic to the one I want (assuming I don't peek into the internal representation, which reflection always lets me do), it doesn't mean the one I get is equally convenient to work with. Can you imagine pattern matching on abstract syntax trees or other complicated mutually recursive data structures in this way? Even the visitor pattern sounds less painful, and that says a lot.
> If you feel it does, use an attribute processor to make it easier to generate one on your behalf.
Then I can't understand my code in terms of itself. I'd have to know what the attribute processor works transforms it into.
> Yes, we're all very proud of the correctness of having a distinct type for null/nil/none/whatever. It's nice, but from a practical standpoint you can achieve the same thing in Java.
From a practical standpoint, in a language with actual sums, I can match arbitrarily nested patterns in a single `case` or `match` block. I can't achieve that in Java. Trust me, it can cut the size of case-analyzing code by a factor of more than 3.
> You just can't fix the existing class library.
If the class library were the only problem, then the solution would be to make a new one. But even a new library couldn't give me the niceties I described above.
> Well, in physics, "no magnetic field" != "no gravitational field", so...
A zero potential field isn't a null reference to anything. It's just a normal potential field whose value at every point is zero. Do you have an example where:
(0) the identities of two objects with the same physical properties matters, that is, where swapping two objects with the same physical properties can make the system behave differently, and
(1) the laws of physics are formulated in terms of nullable references to such objects?
> Without loops you can't write a nontrivial program, and, without case analysis, you can't have non-infinite loops!
...and yet there are Java programs that have completed their loops time & again! ;-)
> Type system - Accurate modeling = Annoying bureaucracy.
Oh sure, lots of annoyance, but few languages have a good type system, and those that do have a limited pool of programmers to draw upon who can reason about them.
> Can you imagine pattern matching on abstract syntax trees or other complicated mutually recursive data structures in this way?
Or god forbid, using an Enum and an EnumMap!!!
> Even the visitor pattern sounds less painful, and that says a lot.
Double dispatch really isn't the end of the world. You'll make it. I promise.
> Then I can't understand my code in terms of itself. I'd have to know what the attribute processor works transforms it into.
You can use the attribute processor to do the reasoning. It becomes an extension of the language.
> Trust me, it can cut the size of case-analyzing code by a factor of more than 3.
Yes, it cuts code size dramatically. Out of curiosity, how much more compact is that case-analyzing code with a dynamic type system? ;-)
> If the class library were the only problem, then the solution would be to make a new one.
Hey, if you are just going to write an entirely new class library, then problem solved: you can just as easily write a new language parser and code generator.
> A zero potential field isn't a null reference to anything.
I didn't say zero potential, I said, "no magnetic field", as in, your magnetic field slot is non-existent/unknown. You've got notions of dimensions that may or may not exist, etc. The point is, you've got nothings that aren't just a magnitude of zero, but which have a type. You have that at least as much as you have a distinct type for nothing.
> ...and yet there are Java programs that have completed their loops time & again! ;-)
Sure, by branching on booleans. Unfortunately, branching on raw booleans is problematic for a variety of reasons, which are lucidly explained here: https://existentialtype.wordpress.com/2011/03/15/boolean-bli... . The tl;dr is that when you compute a boolean, your actual intention is to learn something about the state of other variables, but the type system doesn't keep track of this additional knowledge. So you're on your own, just as if you were using a dynamic language.
> Oh sure, lots of annoyance, but few languages have a good type system, and those that do have a limited pool of programmers to draw upon who can reason about them.
There's another possibility: using a language with (gasp!) no static type system.
> Double dispatch really isn't the end of the world. You'll make it. I promise.
I'll spare myself the pain.
> You can use the attribute processor to do the reasoning. It becomes an extension of the language.
Will I get error messages in terms of the abstraction itself, or in terms of what it elaborates into in the base language?
> Out of curiosity, how much more compact is that case-analyzing code with a dynamic type system? ;-)
`isinstance` in Python is roughly as verbose as Java's `instanceof`, which is the type-unsafe solution to this problem. It's still more verbose than proper pattern matching, but a lot less verbose than visitors.
> I didn't say zero potential, I said, "no magnetic field", as in, your magnetic field slot is non-existent/unknown.
Sorry for misunderstanding in my first attempt. But what you describe doesn't correspond to Java-style `null` either. Dimensional analysis corresponds to parameterizing a scalar type by the exponent of each primitive dimension, say:
template <class Magnitude, int time, int length, int mass, int charge...>
struct scalar { Magnitude magnitude; };
And then overloading all arithmetic operators in a dimensionally consistent way. Which again Java is incapable of expressing.
> The tl;dr is that when you compute a boolean, your actual intention is to learn something about the state of other variables, but the type system doesn't keep track of this additional knowledge.
That's right. Sometimes the type system doesn't have all the answers. I have yet to see a type system that solves the halting problem. It's a tragedy, but amazingly the type system can still be useful. Hence... why we have type systems.
> > Double dispatch really isn't the end of the world. You'll make it. I promise.
> I'll spare myself the pain.
...not if it is by using a dynamic type system.
> Will I get error messages in terms of the abstraction itself, or in terms of what it elaborates into in the base language?
I guess that is up to you.
> `isinstance` in Python is roughly as verbose as Java's `instanceof`, which is the type-unsafe solution to this problem. It's still more verbose than proper pattern matching, but a lot less verbose than visitors.
The vast majority of Java code wants for neither instanceof nor visitors. That's the win.
...and if isinstance really is preferable for you than the visitors pattern, then you've got no reason to prefer dynamic dispatch. You can muddle through your multi-type scenarios using Object and instanceof tests... and you still get lots of help for the single-type scenarios.
> Dimensional analysis corresponds to parameterizing a scalar type by the exponent of each primitive dimension..
That's one way of modeling it I guess, but that's a choice you are making about how it is modeled. Most people's mental models include moments where you know the type of something, but you don't actually have anything there. The type of your unfilled referee slot in your league planning application is not the same as a missing fifth dimension in someone else's physics simulator.
> That's right. Sometimes the type system doesn't have all the answers. (...) It's a tragedy, but amazingly the type system can still be useful. Hence... why we have type systems.
I'm not saying that any single type system can have the answer to every question, or that every question is worth having a compile-time answer for. But a useful type system must have some class of questions that it can answer clearly, accurately and with minimal fuss:
(0) ML and Haskell can answer questions of the form “how many qualitatively different outcomes can an operation have?”
(1) C++ can answer questions of the form “who's responsible for cleaning up this resource?”
(2) Rust can answer both (0) and (1), as well as “for how long can I hold a non-owning reference?”
(3) Java hardly ever gives any useful answers to anything.
The right language for a given task is the one that maximizes how efficiently you can get answers for the kind of questions you need to ask to construct a correct solution.
> ...and if isinstance really is preferable for you than the visitors pattern, then you've got no reason to prefer dynamic dispatch. You can muddle through your multi-type scenarios using Object and instanceof tests... and you still get lots of help for the single-type scenarios.
Yes, but if every single variable will have type `Object`, I might as well ditch the static type system, since it isn't providing any value.
> That's one way of modeling it I guess, but that's a choice you are making about how it is modeled.
My model is consistent with how people actually perform dimensional analysis with pencil and paper.
> Most people's mental models include moments where you know the type of something, but you don't actually have anything there.
Don't conflate “most programmers” with “most people”. Most people would be horrified if they found out how programmers model things.
> The type of your unfilled referee slot in your league planning application
Of course, the referee slot doesn't contain a `Referee`. It contains a `Maybe Referee`.
> is not the same as a missing fifth dimension in someone else's physics simulator.
> C++ can answer questions of the form “who's responsible for cleaning up this resource?”
It can't. You have to follow explicit conventions in order for it to do that, and an untrusted shared library can completely subvert the model. On the same basis you are judging Java, that makes its entire type system a waste of time.
> Java hardly ever gives any useful answers to anything.
BS. Java's type system gives you an ability to know the type of a given parameter with a high degree of confidence. For the majority of cases, you only have one possible type, and Java does a great job of validating those cases. Sure it's a simple thing, but I'd rather have a tool that at least does the simple things for me, rather than one that doesn't. If nothing else, IDE developers have demonstrated that a lot of value can be unlocked from these simple assurances.
> Of course, the referee slot doesn't contain a `Referee`. It contains a `Maybe Referee`.
No, it contains a referee slot. Maybe in your type model, it has a maybe referee, but you're artificially throwing in the notion of None as a crucial bit of the type system (and I agree it is important, but so are a lot of other things). Java's type system basically makes all reference variable types "maybe" types. Python's type system basically makes all variables "maybe" for any type.
> Yes, but if every single variable will have type `Object`, I might as well ditch the static type system, since it isn't providing any value.
No, every single variable will have type Object. Only the ones involving multiple types. Amazingly, if you look at Java code, the vast majority of the time, programmers are able to find more specific types to tag their variables, and rarely need to use instanceof or the visitor pattern.
> Sure, I don't perform arithmetic on `Referee`s.
That's great, and I'm sure from your vantage point all that matters is arithmetic, but it turns out a lot of other programs and programmers write code that largely has nothing to do with arithmetic.
> It [C++] can't. You have to follow explicit conventions in order for it to do that, and an untrusted shared library can completely subvert the model.
Point taken.
> BS. Java's type system gives you an ability to know the type of a given parameter with a high degree of confidence. For the majority of cases, you only have one possible type, and Java does a great job of validating those cases.
If that's the entire point, why have complicated features like wildcards and F-bound polymorphism? ML and Haskell (no GHC extensions) give you stronger guarantees than that, with far less complicated type systems: in fact, their type systems are simple enough that global type inference is decidable!
> Sure it's a simple thing, but I'd rather have a tool that at least does the simple things for me, rather than one that doesn't.
Well, for me, everything is ultimately subject to a benefit/cost analysis. Ceteris paribus, more static guarantees is better, but marginally more static guarantees than Python in exchange for a lot of additional ceremony isn't a tradeoff I find attractive enough.
> If nothing else, IDE developers have demonstrated that a lot of value can be unlocked from these simple assurances.
By heroically overcoming expressiveness limitations that shouldn't be there in the first place?
> Maybe in your type model, it has a maybe referee, but you're artificially throwing in the notion of None as a crucial bit of the type system
It's not artificial if it reflects what's actually going on. An incomplete league schedule doesn't always contain a `Referee` in every referee slot, and I have to model this fact.
> Java's type system basically makes all reference variable types "maybe" types.
This isn't true. `Maybe`s compose. You can have a `Maybe (Maybe Foo)`, which isn't the same thing as a `Maybe Foo`, and the type checker will require you to be very aware. OTOH, `null` can't be layered in this way, and that's why, for example, `TreeMap.get()` returns `null` in two qualitatively different circumstances.
> Python's type system basically makes all variables "maybe" for any type.
Again, this isn't true. Python has a single static type: the type of every Python expression.
> Amazingly, if you look at Java code, the vast majority of the time, programmers are able to find more specific types to tag their variables, and rarely need to use instanceof or the visitor pattern.
At the price of never knowing whether their case analyses are exhaustive.
> That's great, and I'm sure from your vantage point all that matters is arithmetic, but it turns out a lot of other programs and programmers write code that largely has nothing to do with arithmetic.
What matters to me, as I've made explicit above, is accurate modeling. Sometimes it involves arithmetic, other times it doesn't. Sometimes it doesn't even involve types! But, when types don't help, I prefer them to get out of the way.
> ML and Haskell (no GHC extensions) give you stronger guarantees than that, with far less complicated type systems: in fact, their type systems are simple enough that global type inference is decidable!
Java's type system is terrible because it has been introduced incrementally and with an eye towards backward compatibility. That's the "why". I'd point out though that ML & Haskell's type systems might be simpler from a theoretic stand point, the ugly Java model appears to be easier for humans to reason about. There's a whole ton of programmers who can't figure out what to do with ML & Haskell, but get along well enough with Java.
> Ceteris paribus, more static guarantees is better, but marginally more static guarantees than Python in exchange for a lot of additional ceremony isn't a tradeoff I find attractive enough.
You make it sound like it is trivial to validate those static guarantees in Python. Have you built some trace framework for Python that validates all the types that can be passed to a particular function? That'd be super handy.
> By heroically overcoming expressiveness limitations that shouldn't be there in the first place?
I wasn't talking about code generation. There's a whole bunch of code analysis/interpretation logic that is much harder to achieve in Python. For the most part, the IDE's that support both just provide a watered down Python feature set.
> It's not artificial if it reflects what's actually going on. An incomplete league schedule doesn't always contain a `Referee` in every referee slot, and I have to model this fact.
You don't have to model that fact as a distinct type in the type system... and people often don't. Often, people implement business logic in code, rather than in the type system... else you couldn't get anything done with JavaScript (which I'm told is not actually the case, despite appearances ;-).
> OTOH, `null` can't be layered in this way, and that's why, for example, `TreeMap.get()` returns `null` in two qualitatively different circumstances.
Yup. No composition. Then again, you can handle that logically in other ways. In the case of a TreeMap, the right to handle it would be to have a way to access the Map.Entry associated with a key. You sort of have that, but it is all around floor/ceiling/range type queries, rather than the get because... well, you rarely need the feature and you can get by with using the containsKey() test. The nice thing is that if I can't remember if get() returns the Map.Entry or just the value, the type system can catch me on it, because the return types are indeed different. In Python, I'd actually have to test it or look at the documentation.
> Again, this isn't true. Python has a single static type: the type of every Python expression.
That's semantics for the same thing as saying the variable can point to any type. That "single static type" can be tied to any Python type, making all variables intrinsically maybes... just like all of Java's reference types.
> At the price of never knowing whether their case analyses are exhaustive.
You can totally know whether you've properly handled all the null cases. There's lots of tools that do exactly that, even taking advantage of the @NotNull attribute to help winnow down the possibilities. For more complex case analyses, you can use Enum's & EnumMap's to know if you have complete coverage. It can be done through static analysis... just not purely from the type system.
> What matters to me, as I've made explicit above, is accurate modeling. Sometimes it involves arithmetic, other times it doesn't. Sometimes it doesn't even involve types! But, when types don't help, I prefer them to get out of the way.
You're ignoring all the other cases where types do help, because the type system doesn't help with your pet case. In short, when it comes to static typing, you throw out the baby with the bath water. I get it, you don't like it, and it is incredibly annoying looking at that narrow uncanny valley to the thing you do like, but let's not rationalize it by asserting it doesn't help.
> I'd point out though that ML & Haskell's type systems might be simpler from a theoretic stand point, the ugly Java model appears to be easier for humans to reason about.
It's easy to reason sloppily about anything. It isn't too hard to write ML programs where types are just as uninformative as in Java, but I don't do this because it isn't helpful.
> You make it sound like it is trivial to validate those static guarantees in Python. Have you built some trace framework for Python that validates all the types that can be passed to a particular function? That'd be super handy.
When I write Python programs, I only care which operations can be used on an object, hopefully without raising an exception. I'm perfectly aware that it's unreasonable to expect more than this. The situation isn't much better in Java.
> I wasn't talking about code generation.
Me either.
> There's a whole bunch of code analysis/interpretation logic that is much harder to achieve in Python. For the most part, the IDE's that support both just provide a watered down Python feature set.
I don't care about the questions a language or IDE can answer. I care about the questions I need to ask, which are driven by the problem domain, rather than the capabilities of the language. If they happen to coincide, good for the language and good for the IDE.
> You don't have to model that fact as a distinct type in the type system... and people often don't. Often, people implement business logic in code, rather than in the type system... else you couldn't get anything done with JavaScript (which I'm told is not actually the case, despite appearances ;-).
You're right, I don't need to model it with types. But if I'm using types, they better be useful for modeling purposes! And if they aren't useful for modeling purposes, they better stay out of the way.
> Yup. No composition. Then again, you can handle that logically in other ways. In the case of a TreeMap, the right to handle it would be to have a way to access the Map.Entry associated with a key.
Yep, this sounds like the Right Way (tm) to do it, given the constraints. Unfortunately, it's not the path of least resistance. It's the language designer's job to make these two align.
> You sort of have that, but it is all around floor/ceiling/range type queries, rather than the get because... well, you rarely need the feature and you can get by with using the containsKey() test.
Which is an admission that types aren't helping.
> The nice thing is that if I can't remember if get() returns the Map.Entry or just the value, the type system can catch me on it, because the return types are indeed different. In Python, I'd actually have to test it or look at the documentation.
Yep, types catch quite a few things in Java. But they don't catch enough things for it to be worth the annotation burden. Crucially, they don't catch `UnsupportedOperationException`.
> That's semantics for the same thing as saying the variable can point to any type. That "single static type" can be tied to any Python type, making all variables intrinsically maybes... just like all of Java's reference types.
Python variables aren't maybes. Maybe has only two constructors: Just and Nothing. Python expressions are classified by a gigantic sum type that can even be extended with new constructors at runtime.
> You can totally know whether you've properly handled all the null cases.
Most interesting case analysis aren't of the form null vs. not null.
> There's lots of tools that do exactly that, even taking advantage of the @NotNull attribute to help winnow down the possibilities.
These tools are cumbersome, workflow-disrupting, and even then, sometimes they still fall short. For example...
> For more complex case analyses, you can use Enum's & EnumMap's to know if you have complete coverage. It can be done through static analysis... just not purely from the type system.
... how would I say “list of lists of animals, where each sublist only contains animals of the same species”? (Assuming each species is a separate class.) In Haskell (plus existential quantification, a fairly uncontroversial extension) or OCaml (with GADTs, which subsume existential quantification), this is a breeze.
> You're ignoring all the other cases where types do help, because the type system doesn't help with your pet case.
Given the ubiquity of case analysis in programming, supporting it in a disciplined way isn't a “pet case”, but rather an everyday necessity. Even dynamic languages offer good facilities for it (Racket, Erlang).
> In short, when it comes to static typing, you throw out the baby with the bath water.
Nah. I know how to settle for less than what I want. For example, safety leaves a lot to be desired in C++, but it has other nice things to make up for it, like the previously mentioned compile-time dimensional analysis example.
> I get it, you don't like it, and it is incredibly annoying looking at that narrow uncanny valley to the thing you do like, but let's not rationalize it by asserting it doesn't help.
I love static type systems. That's why I feel insulted when they try to sell me a glorified linter as if it were a type system.
> Yep, this sounds like the Right Way (tm) to do it, given the constraints. Unfortunately, it's not the path of least resistance. It's the language designer's job to make these two align.
See, I'd argue that in this particular case, it actually is the more correct model of the solution. There are two questions you are looking to answer: is there an entry, and what is the value associated with that entry. One way to solve it is to ask two questions. The other is to get the entry, and if there is one, ask the entry for its value. Using a Maybe of Maybe type creates an unnecessary anonymous type when the entry type is already the natural and correct type. So I'm not sure how it isn't the path of least resistance, but I'll accept that there are better solutions. I find it hard to accept that it is a bad one.
I'll completely disagree with the notion that it is the language designer's job to make the two align. Aside from efforts like Brainfuck, programming language design is a series of trade offs, with the general guiding principle of "easy things should be easy, and hard things should be possible". There's plenty of empirical evidence that even getting that right is sufficiently challenging that nobody has really succeeded, but in some few cases they have hit somewhat close approximations.
> Yep, types catch quite a few things in Java. But they don't catch enough things for it to be worth the annotation burden. Crucially, they don't catch `UnsupportedOperationException`.
They don't catch it, because static types systems like Java's resolve at compile & bind time. You get a "cannot find symbol" error. UnsupportedOperationException can represent failures in circumstances that may literally be unknown/undecidable at compile or bind time.
...and let's be clear about the annotation burden here. It's not that huge of a burden. Typing in type annotations is a very small amount of work, and significantly less work (even if all you counted were keystrokes, it's an easy order of magnitude win) than writing the equivalent validation logic as unit tests.
> Python variables aren't maybes. Maybe has only two constructors: Just and Nothing. Python expressions are classified by a gigantic sum type that can even be extended with new constructors at runtime.
One can have a language that through inferencing will construct "Just" automagically from any expression that matches the parameterized type (e.g. C++'s optional type); whether you need an explicit constructor is a matter of syntactic sugar and doesn't change the type. Of course variable assignment needn't involve construction at all (e.g. "let a = b").
However, I'd agree that for the most part Python doesn't have polymorphic type. Maybe types are polymorphic types that represent a value or a non-value. However, what I said was that Python variables (which are not types) are intrinsically maybes, which is true. They can be bound to an object of any type, including the NoneType.
> ... how would I say “list of lists of animals, where each sublist only contains animals of the same species”? (Assuming each species is a separate class.)
I can't imagine a context where I'd create a separate class for each species. Maybe you'd do something like it so you could metaprogram the class system to use an eDSL for defining species.
That said, with a few assumptions, there are multiple ways to make what you are describing work. One is with a runtime check:
Another would be to create a Species annotation (you might want to check out type annotations in Java, they let you extend the reasoning in the type system quite extensively, and can even survive the dreaded type erasure), which would have the merit of being something that would be well enforced at compile time, bind time, and execution time, whilst providing that eDSL for defining new species.
For small lists of species, it would be practical to use CRGP to structure lists with no more than N species, but for longer lists it'd get tedious super fast.
Somewhat more realistically, I'd probably do something subtly different from what you are suggesting, using a map of lists rather than your list of lists. I'd be strongly tempted to use an enum (and therefore an enum map) for species instead of classes, but a LinkedHashMap<Class, Species> would have all the useful behaviours of your List of Lists that I would expect one would be looking for. This of course would have some issues (e.g. enforcing that species be leaf classes, allowing two separate lists of the same species), but it'd have the advantage of actually being useful.
Far more realistically, you'd implement the whole thing as a database -possibly a graph database- capturing all the various taxonomies around animals & types as well as relationships between animals. Then you'd actually have something useful.
> In Haskell (plus existential quantification, a fairly uncontroversial extension) or OCaml (with GADTs, which subsume existential quantification), this is a breeze.
Well, I'd say that breeze is mostly hot air. ;-)
I have yet to see someone do class per species (for a non-fixed list of species) with Haskell or OCaml (or really any language) in a way that was at all useful. At best you might do something like this to jerry rig an eDSL for defining new Species/Classes. Either way, the type system certain wasn't capturing the complexities of species taxonomies. I find it noteworthy that even in the cases of Haskell & OCaml, you had to go to language extensions, both of which are noted for how they make reasoning about types much more difficult (type inferencing with GADTs is pretty much something you don't do). Even with those extensions, the species logic is way more complex than any computer type system. I have yet to see a Haskell or OCaml program that would use to type system to magically reorganize the lists every time species assignments change or someone finds a new facet to the species problem (of which I believe there are well over twenty to date). Mathematical category theory models don't even accurately represent one of humanity's most significant taxonomic efforts! Such is life.
> Given the ubiquity of case analysis in programming, supporting it in a disciplined way isn't a “pet case”, but rather an everyday necessity.
There are a specific set of languages with type systems designed for doing case analysis. The entirety of works in those languages doesn't even come close to representing a majority of software based on lines of code, amount of use, people actually writing in them today, or any other reasonable quantification I can imagine. I think that speaks pretty thoroughly about how ubiquitous that particular solution is. There are other practical approaches to case based reasoning that work well enough (in a lot of cases, simple enums and switch statements allow for straightforward static analysis), while allowing the type system to provide all kinds of other useful assistance.
> I love static type systems. That's why I feel insulted when they try to sell me a glorified linter as if it were a type system.
I'm annoyed by bad type systems too, but I'd not pretend that a linter, let alone a glorified linter, doesn't save me a lot more trouble than it causes.
Choosing the wrong abstraction is an occupational hazard with software development. ;-)
In a lot of cases in enterprise Java, it felt to me like the problem was exacerbated by commercial interests that had an abstraction to sell, and needed to find use cases for it.
> All your interacting components still depend intimately on each other's behavior. The behavior is not fully captured by the interface - if it was, it wouldn't be an interface, it would be an implementation.
Many interfaces come with an assumed "contract" that all instances must satisfy, which limits the possible implementations but not completely defines them.
Interfaces for equality (must represent an equivalence relation) are like that.
So is the Monad interface (the monad laws are basically associative laws).
This is an interesting write-up, but it leaves out some specifics in designing C++. A few are:
1. Concurrency
In garbage collected languages, the runtime can generally add objects to a free list and clean up memory when it's more-or-less convenient. Since C++ uses deterministic destruction and freeing, there has to be clear responsibilities defined. In particular, it has to be clear which thread is ultimately responsible for running the destructor. Point being, writing code that would otherwise be "abstract" in Java-land implies (but rarely explicitly defines!) some rigidity in design with respect to how long objects need to be around for different threads to use them.
Maybe that's too abstract. Here's some code to illustrate:
class IHandler {
public:
virtual Status handleAsync(const Msg & msg) = 0
};
class CopyingHandler : public IHandler {
private:
WorkerThread m_worker;
public:
Status handleAsync(const Msg & msg) override {
// capture the msg by value, taking a copy
return m_worker.push_back([=msg](){
handle(msg);
});
}
};
class RefHandler : public IHandler {
private:
WorkerThread m_worker;
public:
Status handleAsync(const Msg & msg) override {
// Capture the msg by reference. Assumes
// sticks around at least as long as the
// background thread does.
return m_worker.push_back([&msg](){
handle(msg);
});
}
};
Anyway, I left some details out, but you can see that both implementations adhere to the interface provided by IHandler, but each makes drastically different assumptions about what's OK. In some cases, problem domains, copying the message might create a performance bug. In most programs, you could easily (likely) have undefined behavior by capturing the message by reference, letting the message be destroyed, and then trying to use the message afterwards.
Point being, rigidity in C++ designs is easy to create subtly and accidentally, and even generic programming won't necessarily help you here.
2. Polymorphism
C++ actually allows for quite a range of polymorphic behaviors. You can't really rank them from highest rigidity to the least, since there are pros and cons to each approach, but generally inheritance is more rigid than the other approaches. In contrast, you can also have:
2a. Opaque Pointers
Not necessarily very type safe, but if you need to wrap something up in a black box, pass it around, then unwrap it and use it later (maybe in a message queue, a mailbox, a private implementation, or in certain kinds of IOC patterns), you can completely lose your type and recover it later. Typically there is a small performance hit as you use RTTI (https://en.wikipedia.org/wiki/Run-time_type_information) or a discriminator (an enum maybe) to decide how to recover your type when you're ready for it. See <experimental/any> (http://en.cppreference.com/w/cpp/experimental/any) for an example of this.
2b. Type Erasure
This one is more involved but clever use of generic programming lets us automatically provide the adaptor boilerplate to get types to play nice with other code. Sean Parent does an excellent walkthrough of this technique in his GoingNative talk in 2013 (https://channel9.msdn.com/Events/GoingNative/2013/Inheritanc...). Point being that type erasure lets you have a different flavor of flexibility in a way not seen in Java-land.
2c. Unions
This is harder to do properly in practice, as evidenced by the difficulty in standardizing something like boost::variant, but from time to time it might be worth playing around with various types of unions. They're a bit awkward, but you might try boost::variant if you like the tradeoffs it provides.
3. Value Semantics
Since objects are values in C++, you need to know the size of things much more often. This causes dependencies you might not think about in other languages, depending on how their standard libraries are designed.
> Since C++ uses deterministic destruction and freeing, there has to be clear responsibilities defined. In particular, it has to be clear which thread is ultimately responsible for running the destructor.
Ownership itself is an abstraction that is very useful for manipulating ephemeral resources, and which is very difficult to express in languages where garbage collection is the mandated universal solution for resource reclamation.
> Point being, writing code that would otherwise be "abstract" in Java-land implies (but rarely explicitly defines!) some rigidity in design with respect to how long objects need to be around for different threads to use them.
When you care about ownership, “who frees this object” isn't an implementation detail or an abstraction leak. It's a part of the interface.
> Since objects are values in C++, you need to know the size of things much more often.
Pretty sure everything is a value in ML and Haskell, and I don't recall every having to worry about the size of anything when using those languages. C++ programmers tend to wrongly conflate “value” with “a physical copy of the value”. It's better than what you get in Java (no user-defined values at all!), but it still leaves a lot to be desired.
> Pretty sure everything is a value in ML and Haskell,
That's a good point. I guess it's the combination of value semantics and manual memory management. I'll try to keep that in mind in the future.
> When you care about ownership, “who frees this object” isn't an implementation detail or an abstraction leak. It's a part of the interface.
Exactly. But even more than that, references that don't extend the lifetimes of objects (like native pointers and references) are simple to create and are in many ways the default type of reference in C++.
At any rate, 'this message queue only works with types that are easy to copy' is not generally a prominent part of the documentation in C++, if it's present at all.
> But even more than that, references that don't extend the lifetimes of objects (like native pointers and references) are simple to create and are in many ways the default type of reference in C++.
There's nothing wrong with references that don't extend the lifetime of objects per se. Rust gets it right: the compiler checks that non-owning references don't outlive what they refer to.
> At any rate, 'this message queue only works with types that are easy to copy' is not generally a prominent part of the documentation in C++, if it's present at all.
I would expect it to be in the documentation. It could be phrased as follows: “This message queue only works with types with trivial copy constructors both for them and the types of their member variables, transitively”. Again, Rust gets it right: a generic can be parameterized by a type implementing the `Copy` trait. `MsgQueue<T: Copy>` tells you, in less than 20 characters, the same information as the preceding English tl;dr.
> But even more than that, references that don't extend the lifetimes of objects (like native pointers and references) are simple to create and are in many ways the default type of reference in C++.
C++ has a lot of cases of defaults that one tends not to use all that much (because it tends to default to maximal efficiency). Raw pointers would be a good example. Those are generally frowned upon.
On the other hand, references are encouraged and have pretty clear semantics around ownership.
> At any rate, 'this message queue only works with types that are easy to copy' is not generally a prominent part of the documentation in C++, if it's present at all.
Really? That's a very prominent part of say... the STL's documentation.
Even the documentation of vector describes what it does and few implications. It doesn't warn you that vectors of raw pointers are bad parameter types, for example. It does talk about iterator invalidation, but it doesn't really educate us about how that affects our designs.
That's not to criticize vector, but to point out that C++ had especially nuanced implications if loosely coupled design is a concern.
> It doesn't warn you that vectors of raw pointers are bad parameter types, for example.
Vectors of raw pointers function very effectively, they just have all the problems intrinsic with the use of raw pointers. I'm not sure I'd blame the vector documentation for that...
> It does talk about iterator invalidation, but it doesn't really educate us about how that affects our designs.
Design implications are a subtle thing that you can write entire books about. I think failing to fully capture those implications in the API documentation is more than understandable and far from unusual. For example, Java's containers have similar issues with iterator invalidation, and similarly don't have documentation on all the design implications.
In C++, a well designed generic component will document the concept requirements of its type parameters. No idea what 'easy to copy' means, but if it is just means that T has a copy constructor, then the requirement would simply stated as "requires: T models CopyConstructible" (which are well defined terms of art). Hopefully one day C++ will allow defining these contracts in code.
Note that Concepts carry both syntactic and semantic requirements.
"Easy to copy" means it makes sense to copy as needed in a given context. Large vectors of strings adhere to the CopyConstructible concept, but we don't generally recommend copying those around.
Which brings up a good point. The concepts proposal exists because there are implicit (though hopefully documented) contracts that are either unenforced or enforced in arcane ways (large amounts of error spew obscuring fairly simple violations, such as a missing copy constructor).
Anyway, I was trying to point out that there are many language-legal designs we can conceive that force design decisions and other entanglements on other pieces of code. It just so happens that C++ has much more flexibility than many languages, so it gives new opportunities to make those mistakes. I was pointing out that otherwise helpful authors aren't aware of the extra pitfalls, so they tend not to point them out very clearly.
As a result, for example, we see a lot of Java-like C++ that basically assumes the rest of the code can and should be structured similarly. Even something as seemingly innocuous as using shared_ptr as a method parameter can cause unintended issues in other code.
> Point being, writing code that would otherwise be "abstract" in Java-land implies (but rarely explicitly defines!) some rigidity in design with respect to how long objects need to be around for different threads to use them.
C++ allows you to have much more tightly defined contracts, which tends to be helpful, but you can have very loosely defined contracts (such as with shared ptr). It just turns out that such loose semantics aren't often found to be helpful in the C++ world.
My point was that you can easily stumble into a tight contract (at least in certain respects) in C++. At any rate, it's something that Java-school teachers like Bob Martin don't tend to cover. And I'm not aware of very comprehensive books on C++ design principles.
Robert C Martin - Agile Software Development: Principles, Patterns, and Practices
There's now an updated version of this book for C# but the first one deals with C++. This book explains, in details, how to architecture your application around use cases.
"Game Coding Complete 5" also talks about moderately large scale C++ high-level design. Of course, it's focused on games, but unlike other game programming books, it deals with things like dependency inversion, abstracting away the framework, data-driven architectures, etc.
It's surprisingly hard to find books talking about high-level software architecture ; Amazon is flooded with technology-specific books (things like "Learn <insert framework name> in 7 days", whose lifetime is so short, you wonder, why write a book about this in the first place).
It may surprise you to know Uncle Bob has written C++ design books as well. ;-)
There are lots of books on C++ design that focus particularly on the semantics of contracts that are comparatively unique to C++ (Sutter, Meyers, Alexandrescu, and Abrams all have a pretty established following). Indeed, I'd argue C++ is one of the few languages where you really need to read a lot of the supporting literature before you are much of a useful coder.
> Since objects are values in C++, you need to know the size of things much more often.
It's not like C++ prevents reference semantics. If you need the flexibility/abstraction, that's what you do. That said, value semantics actually provide clearer abstractions in interfaces (for example, no question about aliasing or whether other code is referencing the value), and generics can do a lot to abstract the extra size exposure from value semantics.
> no question about aliasing or whether other code is referencing the value
Well, actually, there's quite a bit of convention here. The 'could cause undefined behavior' code I hashed out as an example uses a const reference, which is the widely accepted best practices for copy semantics. There's sort of an implied 'and the concrete implementation will copy this value if it needs it for longer than the duration of the method call', but that's a convention; it's not enforced by compiler. We could pass these sorts of parameters by value, but there are reasons people like to avoid passing by value as a default.
Engineers could define interfaces with only pass-by-value arguments, but there
People do use the reference to const as a way to imply something similar to value semantics... but it's a huge mistake to say that implies that a copy of the value is implied. Ever since rvalue references were introduced, the rationale for having a const ref interface to avoid an unnecessary copy kind of became an anachronism.