i still fail to see the enormous fuss about hash tables as if they are some revelatory data structure.
the special cases where you can construct perfect hashes are normally amenable to setting some fixed index 'hashes' for each item and storing them in an array, then never referring to their values for lookups in code, but always using the indices.
in heavily compiled languages like C and C++, when you know everything at compile-time then the run-time should become zero. anything more is a failure to use the language to sufficiently inform the compiler of what is happening.
i suspect this is why perfect hashes do not see the enormous widespread use, because there is a solution for the most common use cases with the exceptional run-time complexity of O(0).
this is enormously faster than the necessarily more complicated perfect hash functions, and only falls down when you have a run-time requirement to do the lookup from the value instead of the key... which in my experience goes hand in hand with not knowing your data until run-time, which rules out a perfect hashing function.
in most run-time cases a binary tree is good enough, its very rare that you need better, and even when you do the optimisation can be found by optimising that structure to pool allocate nodes or similar. the usual, array of linked-list approach can be better for small data sets too... but a perfect hash function isn't even applicable.
all that being said, it is the right solution in those exceptionally rare cases where you know your data ahead of time, but need to look it up by value, rather than hash, at run-time.
Constrained input that is processed according to what the program knows about each "key" it contains, with keys defined in advanced and effectively specifying a data structure or file format, is actually very common.
Example: HTML, XML and similar kinds of text markup; tagged file formats, e.g. PNG; the traditional application to programming language keywords.
right, that's a good use case, although i'm not sure its terribly important. the slow part of processing a png file is not identifying its contents... although this maybe different with XML and HTML parsers. i'm still not sure i'd call it common, or even desirable.
for lots of chunk based formats using a switch statement on the chunk-ids is clean and gives the optimisation task to the compiler, which tends to do a pretty good job of these things... even if it generates a gigantic conditional statement. since these sorts of things are 4 or 8 bytes long the 'memcmp is slow' argument falls apart
in fact this sort of switch statement usage is such a good optimisation here that it is used by this perfect hash mechanism itself to solve its own problem, effectively offloading some of the cleverness requirement to the compiler...
also, in the context of programming languages this problem falls into the category i described of being beatable at compile-time. usually this is implemented by tokenising (hashing pieces of) all the string input then only ever using the tokens (hashes, indices) outside of that context. whatever lookups need doing can then become indexing into a nice flat array defined at compile-time.
in the other cases where lookups are needed they are for things like variables and constants, which are not known at compile-time, and the perfect hash can not help...
you might also want hash collisions in cases where languages have aliases for keywords
sure, you could lift some of the performance cost of the tokenisation itself outside of run-time with that, but tokenising programming language input is not the dominant cost of compiling code.
the special cases where you can construct perfect hashes are normally amenable to setting some fixed index 'hashes' for each item and storing them in an array, then never referring to their values for lookups in code, but always using the indices.
in heavily compiled languages like C and C++, when you know everything at compile-time then the run-time should become zero. anything more is a failure to use the language to sufficiently inform the compiler of what is happening.
i suspect this is why perfect hashes do not see the enormous widespread use, because there is a solution for the most common use cases with the exceptional run-time complexity of O(0).
this is enormously faster than the necessarily more complicated perfect hash functions, and only falls down when you have a run-time requirement to do the lookup from the value instead of the key... which in my experience goes hand in hand with not knowing your data until run-time, which rules out a perfect hashing function.
in most run-time cases a binary tree is good enough, its very rare that you need better, and even when you do the optimisation can be found by optimising that structure to pool allocate nodes or similar. the usual, array of linked-list approach can be better for small data sets too... but a perfect hash function isn't even applicable.
all that being said, it is the right solution in those exceptionally rare cases where you know your data ahead of time, but need to look it up by value, rather than hash, at run-time.