>It is possible, with some proper insight and approaches, to sort general datast...

kraghen · on Aug 24, 2017

Discrimination runs in linear time not in the number of items but in the total size of the data. If you have n items each of size k it takes O(kn). Conventional sorting often assumes that you can compare keys of size k in constant time and therefore gets O(n lg n) but a more honest analysis would yield O(kn log n) for (say) merge sort.

kwillets · on Aug 24, 2017

The bound on string sorting is typically written as O(n log n + D), where D is the sum of distinguishing prefixes, ie input length minus some fluff. Since D >= n log n we already have linearity on input length.

kraghen · on Aug 24, 2017

Are you saying that you can sort strings in O(n log n + D) using a conventional sorting algorithm such as merge sort? If so, I don't understand why D would be an additive factor implying that each string is only involved in a constant number of comparisons.

(I wasn't, by the way, only considering strings when discussing variable sized keys -- the beauty of discrimination is that we essentially reduce all sorting problems to string sorting.)

kwillets · on Aug 25, 2017

A conventional sorting algorithm such as Multikey Quicksort.

http://people.mpi-inf.mpg.de/~sanders/courses/algdat03/salgd...

KirinDave · on Aug 25, 2017

When people say "conventional" sorting algorithms, they're usually talking about sorting algorithms based around pairwise comparison functions.

I note on slide 14 of this presentation, it looks like this is sort of the discriminator for selecting a better partitioning scheme. So it looks to me like this actually leverages a similar principle?

As we've seen in this thread and others, there are some other ways to measure and/or process that have different characteristics. Surely all of these deserve attention! So, thanks very much for sharing this.

kraghen · on Aug 25, 2017

Sorry, I was being imprecise. By conventional I meant comparison-based sorting functions that are polymorphic in the element type and thus not allowed to examine individual bytes.

Multikey Quicksort indeed looks like a special case of discrimination, exploiting some of the same principles.

nokcha · on Aug 24, 2017

Thank you, that cleared up my misunderstanding.

joatmon-snoo · on Aug 25, 2017

> There is a well-known proof from information theory [...]

Remember that this proof is talking in terms of the number of comparisons necessary to sort n items. The moment you stop comparing data, like in radix sort (or more generally, bucket sort), that all flies out the window.

NewEntryHN · on Aug 24, 2017

> Are they saying O(n) operations where each operation operates on O(log n) bits

Yes, as in any other claimed complexity about an algorithm. According to your proof finding the maximum in a list is Ω(n log n), which isn't the commonly used measure.