Are there any plans for allowing hash indexes in uniqueness constraints such as ...

snaky · on Sept 27, 2017

Maybe it would be easier just to remove "A foreign key must reference columns that either are a primary key or form a unique constraint" restriction, stating that any unique index is enough.

anarazel · on Sept 27, 2017

> Maybe it would be easier just to remove "A foreign key must reference columns that either are a primary key or form a unique constraint" restriction, stating that any unique index is enough.

The problem is that only btree indexes support uniqueness atm. That's the relevant unsupported features, not the ability to have constraints (which essentially just requires uniqueness support of the underlying index):

    postgres[27716][1]# SELECT amname, pg_indexam_has_property(oid, 'can_unique') FROM pg_am;
    ┌────────┬─────────────────────────┐
    │ amname │ pg_indexam_has_property │
    ├────────┼─────────────────────────┤
    │ btree  │ t                       │
    │ hash   │ f                       │
    │ gist   │ f                       │
    │ gin    │ f                       │
    │ spgist │ f                       │
    │ brin   │ f                       │
    └────────┴─────────────────────────┘
    (6 rows)

Edit: different uses of word constrain (to constrain, and a constraint) seemed too confusing.

jontro · on Sept 27, 2017

Is the hash guaranteed to be unique though? There is always a possibility of a hash collision

anarazel · on Sept 27, 2017

> Is the hash guaranteed to be unique though? There is always a possibility of a hash collision

I mean my point is that hashindexes do not support uniqueness right now. But hash collisions wouldn't be a problem there. Does mainly require some tricky concurrency aware code (consider cases lik ewhere one transaction just deleted a conflicting row but is still in progress, and a new value like that is inserted, etc).

smitherfield · on Sept 27, 2017

Perhaps I'm missing something, but wouldn't the most efficient possible hash function for a set of unique integers be identity? So I don't see how an ID column would benefit.

anarazel · on Sept 28, 2017

You can use such a hashfunction, but it'd not be a good one for general purpose. Remember that hashtables, and also our hash-indexes, use buckets / ranges of hash values to keep the hash-table at reasonable sizes. If you don't use a hash function that has a good bit perturbation, you can end up with a lot of values in the same bucket.

Consider e.g. the common implementation where buckets are determined by masking out either the lowest or the highest bits of the hashvalue. If you mask out the highest bits and your values aren't sequential (pretty common), you end up with a lot of collisions. More extremely, if you mask the low bits and shift, if you only have small values everything ends up in the first bucket. Therefore what you want is a hashfunction where a one bit change at "one side" of the input value, is likely to affect most of the remaining bits.