It should be noted that Paul Pedriana, the original developer of EASTL and many other things, just recently passed away unexpectedly. He was an EA programming legend.
Oh, that's terrible news :( We had some good conversations about the EASTL design with him early on designing the Rust standard library. Very smart and articulate guy.
Sad news. It sounds like he must have been a really influential and inspiring person to work with. We could all hope to have the fortune of having people like this in our cohort of colleagues around us! :)
I love this bit of the 1993 bio on Mobygames: "I do this computer work on the side as a hobby"
Looks interesting, but the README is thin on espousing the benefits. Questions that come to mind:
- What stands out about it?
- Benchmarks?
- What platforms are supported?
- How is it differentiated today vs. 15 years ago? Qt wound up with its own 'standard library'-alike types mainly because the STL (and its implementations across platforms) were not really up to snuff when it started out. I suspect this may be similar.
- Does anyone have direct experience to share? :-)
Especially interesting for me as Linux dev: "[...] EASTL is significantly more efficient than Dinkumware STL, and Microsoft Windows STL [...] EASTL is roughly equal in efficiency to STLPort and GCC 3.x+ STL, though EASTL has some optimizations that these do not."
It's been a long time but from what I remember the selling point for EASTL is the ability to specify your own allocator instance for every container. EA writes portable code across lots of different architectures, many of which share a lot more with embedded systems than with PCs. They have constrained memory or special dedicated memory areas. Also, iirc EA uses tagging allocators extensively letting them trace against memory budgets down to the team level.
They benchmark against Windows STL from Visual C++ 7.1 that came out in 2003!. Not sure how it compares to more modern versions much less something like abseil or Folly.
I used EASTL on a few projects and it was 10x faster in debug mode. VS had a lot of checks in debug mode and it was almost impossible to play games in debug mode. I’m haven’t used it since 2012.
For Azure Sphere OS’s user-land components we opted to use EASTL b/c of its small footprint. I don’t recall any major issues arising in the two years I worked there.
> I am surprised compilers don't magically optimize the code and remove the size-remembering variable if it is not needed.
the problem is that technically anyone can come and do
auto lib = dlopen("my_lib.so", 0);
auto sym = dlsym(lib, "_ZNKSt7__cxx114listIiSaIiEE4sizeEv"); // std::list<int>::size()
auto func = static_cast<void(*)(std::list<int>*)>(sym);
std::list<int> some_local_list = ...;
(*func)(some_local_list);
and that is expected to not crash - template symbols are generally part of your shared library API (at least that's been the default, however bad it is, on Linux, for a very long time).
If compilers were optimizing the layout of individual list instances, then the above wouldn't work anymore (unless the compiler would inline / create new symbols for each individual cases in your code which would make the object sizes go through the roof).
There's no requirement or even expectation that your snippet of code in principle works (even after working out the issues in the snippet you posted, such as needing to use the .* operator, the fact that member functions are not the same size as void*, and other quirky details).
The only way something somewhat similar to what you posted could possibly work would be to have some_local_list allocated/constructed dynamically from the same shared object that contains the the member function and to expose the member function using extern "C".
For example something like:
auto some_local_list = create_list(...);
list_size(some_local_list);
Where create_list is a function loaded from the same dlopen and returns a dynamically allocated and opaque handle to a list, and list_size is also a function loaded from dlopen that is exported using extern "C".
With this approach it's certainly possible for a compiler to optimize out unused member variables. Any other approach is undefined behavior and may or may not work.
> There's no requirement or even expectation that your snippet of code in principle works
"in principle" goes away as soon as we use dlopen, as it implies a lot of things on the way C++ will be supported on that given platform ; no one cares about C++ in a vacuum.
In practice, different .dll / .so / .dylib communicate through C++ APIs all the time ; all relevant platforms have to support that at some level (which can sometimes cause strong headaches, for sure: https://www.codesynthesis.com/~boris/blog/2010/01/18/dll-exp... ).
When I say in principle, I don't mean according to a strict interpretation of the standard in a vacuum. I mean that even if I were to extend to you a great deal of liberties and operate at the level of what you're attempting to accomplish, your approach is fundamentally invalid and results in buggy code that will break and is entirely unneccessary.
The way you accomplish dynamically loading member functions is by exporting a plain C function using extern "C" that takes an opaque handle to the object you wish to operate on, and whose implementation wraps the member function whose operation you wish to expose.
Pointers to member functions are fundamentally not compatible with void* and hence may not reliably be returned using dlsym. Only once the member function is bound to an object (using the .* operator, ie. object.*member) is the resulting pointer compatible with a void* (in C++11 it's implementation defined). Until then, they not only have different sizes, their size may even be different within different translation units of the same application!
In practice you are right that DLLs and shared objects communicate through C++ APIs all the time, and the reliable way that they do so is by using extern "C". The article you linked to is exactly the kind of pain, undefined behavior, and buggy problems you will encounter when you try to use any other mechanism than the plain and straight forward mechanism that exists precisely for the purpose of facilitating this kind of communication.
The reason my point is worth making, as opposed to just being a pedantic technicality, is because this approach is precisely what allows compilers to make various optimizations that continue to work safely even in situations where objects, functions, and member functions are used across dynamic boundaries. If you don't follow this approach, then the compiler will make certain optimizations that will result in disastrous behavior.
If you don't want to take my word for it, hopefully you'll take the advice of the ISO CPP [1]:
"do not attempt to “cast” a pointer-to-member-function into a pointer-to-function; the result is undefined and probably disastrous. E.g., a pointer-to-member-function is not required to contain the machine address of the appropriate function."
I have seen implementations of type erasure/delegates that store an instance pointer and pointer-to-member-function as an instance and pointer-to-member-function of a dummy type with the same calling convention. It's interesting how many different member function pointer sizes Windows has. Undefined behavior and compiler dependent, but consistent.
These sort of layout optimizations are extremely hard in C-like languages as the compiler needs to prove that the program can't tell the difference. At the very least you need whole program optimizations.
somewhat unrelated but came across the Embedded TL recently, mostly for use on very small CPU, not exactly the same functionality but might be useful elsewhere as well
It would be interesting to see how well it performs on the libc++ test suite. I know Microsoft recently explored running their C++ lib against this test suite and I think libstdc++ has been run against it too.
Ea were shipping games on PS3 and Xbox 360 until 2017. A modern game likely has code that datds back to the early days of that console, so it's a choice between maintaining eastl or porting all of their games to the standard library, along with the tooling theuve built around it (as someone else mentioned, tagged allocators are a big thing that aren't supported by the standard library.
Also, just because something is x64, doesn't mean it's standard library is up to snuff, or that it has the same features. You might be using a 5 year old compiler with a standard library to match it.
Is this really necessary? How many implementations of the standard library do we need? I get the feeling that engineers love reinventing the wheel so much that they forget to ask themselves what's wrong with the current wheel. My last company had its own implementation of the stl and it was a piece of crap, productivity would've gone way up if we just used the normal implementation and boost threading. Maybe I'm just projecting, but I get the feeling this company wasn't the only one that has made that mistake.
Full disclosure: I used to work at EA and while I was there I contributed a tiny amount of code to EASTL.
EASTL provided a bunch of value that the standard STL would not. The biggest benefit was a unified implementation across all platforms. Standard library STLs all had their own idiosyncrasies and code that worked on one platform might not compile, or worse, have a bug on another.
At the time, EASTL was equal or higher quality than standard implementations. Performance was better, code quality was better, and it broke from the standard in some key ways that were important for performance, and it had some key upgrades that allowed usage patterns and data structures that the standard STL simply didn't allow:
Vectors supported "trivial relocation" before the existence of move constructors. While move constructors have ameliorated the problem, I argue that feature is still missing from c++. Please support P1144! http://open-std.org/JTC1/SC22/WG21/docs/papers/2020/p1144r5....
Intrusive containers (in particular, intrusive linked lists in eastl::intrusive_list) embed the container overhead into objects themselves. This allows non-movable objects to be stored in these lists without requiring an extra pointer dereference on access. It also lets you convert an object reference into an iterator over the list that contains it. There are tons of uses for this. It also allows polymorphic lists (e.g. intrusive_list<BaseClass> that actually holds instances of various subclasses, again without an extra pointer dereference)
1. The STL bakes in a bunch of assumptions about how the data structures you would want should work.
These assumptions are often wrong. It's possible they were not wrong when the STL was first conceived, but we don't care about that because we're writing programs now, not then.
For example the STL unordered_map thinks your hash map has buckets. After all, if you learned how to make such a data structure in a typical college course in like 1995 the hash map had buckets. So the mandatory API for the STL's unordered_map has buckets like in that college course.
But today in many cases you don't want buckets, you've got a single unbucketed structure. How do these APIs work with your better structure? They don't. The STL is incompatible with your better structure.
2. The STL bakes in a bunch of assumptions about how C++ works.
Those assumptions were undoubtedly correct when the STL was first conceived, but since then there have been major changes to the language and all it can do is bolt on more and more, and more boilerplate to try to cope.
Take emplace(). This looks like it's a better choice in a bunch of cases than say, insert() and then you dig into your STL implementation and you discover it had no choice but to construct your expensive object and then throw it away when it wasn't needed just as you might have with insert(). That's just how the class is defined, too bad.
If EASTL is in fact just an STL then it might be no better than a modern STL you got with your compiler. But some people choose to have something better instead of the STL. Abseil's Swiss Tables for example offer a faster Unordered Map, it's just that it isn't, and can't be, a std::unordered_map
The STL interfaces work just fine with your alternative hash map, for example, those from Abseil. But to get the speed that they got, Abseil had to impose restrictions that make it less general. So yes, it can't be an std::unordered_map. That's OK. The algorithm APIs work just fine on any class that looks like a standard container.
This is all messier because C++ didn't have Concepts at the outset and when it got Concepts those had to have duck-typing instead of having the programmer write down which types implement a Concept.
Both existing templates and any C++ Concept can accidentally implicate something as a duck (or a container) when it actually isn't. Meanwhile for an implementer, the only way to be sure you've written a working duck (or container) is to try it and see. You can't just assert "This is a duck" and have the compiler explain why it isn't AFAICT.
Even if Abseil's SwissTables weren't containers from the point of view of STL algorithms, the only way for Abseil to stop STL algorithms assuming they are anyway would be to purposefully sabotage the API and annoy users who don't need any such guarantee. So that sucks pretty badly IMNSHO.
When this is discussed there are usually two examples in play. One is ludicrous like Stroustrup's "CowboyWindow" from the 2nd edition of "The C++ Programming Language" which needs draw() for Window and draw() for Cowboy. This will be dismissed as a corner case that isn't going to have a real impact. It's hard to imagine some class that "accidentally" offers all the method signatures from your non-trivial concept when it's actually something quite different.
The other is the "backward compatibility" example. You have a better_map your company used for decades, and now some asshole came along and said it isn't a container because you didn't write that down? Who does he think he is?
But actually the problem you run into isn't CowboyWindow or better_map it's faster_map, which has exactly the same method signatures as better_map and so can be dropped in as far as the linker and compiler are concerned, but alas, for performance faster_map behaves a little differently and it must not be used as an STL container. Oops.
> but alas, for performance faster_map behaves a little differently and it must not be used as an STL container. Oops.
No one sane expects that every single implementation of an API must match the performance of its canonical implementation. Hell, I'm pretty sure that MS's STL in debug mode does not satisfy the STL performance requirements due to all the added checks, some being O(n) iirc.
Who said it was merely a performance difference? I said it was a behaviour difference for performance.
Both C++ concepts and templates have cat rules, "If it fits, I sits". Your insert() method has different semantics? I don't care, the function signature matched so I'm calling it anyway.
The STL is meant to solve a fairly general class of problems. It isn’t optimized for a number of use cases, and frankly some of the design and nomenclature is terrible.
This is now a classic reading:
(tldr: it is faster to call PHP process from C++ code and do regex in PHP than to use std::regex, and slow std::regex will never change because the Comittee doesn’t want to break ABI in favor of speed):
Note of passing: https://www.facebook.com/groups/20296764839/posts/1015929527...
EASTL paper: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2007/n227...
Partial list of Paul's programming credits: https://www.mobygames.com/developer/sheet/view/developerId,2...