I think the root of your issue is the %text% leftsided % is horribly slow.

zimpenfish · on May 19, 2019

Addressed in the article:

> The pattern matching query itself is taking only 5 seconds independently. Matching millions of unique URLs is clearly not a problem.

thomaswang · on May 19, 2019

Assuming it only loops once in the query yes, that's 5 seconds. But (and I'm guessing here) having it in the where directly leads to something like a n+1 issue for full tablescan/non indexed queries as a right or non anchored i/like quickly leads to without the propper index.

What I'm meaning to say is that, the issue is not solved, it is simply worked around by narrowing down the set queried, not really postgres magic but usage of own data knowhow.

I believe a Trigram index whould solve the issue at its root.

manish_gill · on May 20, 2019

Thanks, I'll take a look at this approach.

Someone · on May 19, 2019

As thomaswang says, that assumes that the query planner recognizes that it can hoist that query part out of the loop (https://en.wikipedia.org/wiki/Loop-invariant_code_motion)

If it doesn’t, and runs it again and again, time will add up (possibly by less than 5 seconds for each time it gets run because the data could stay in memory)

That theory was only disproved at “Move the sub-query into a CTE” section.

Also, I’m not familiar with Postgres, so I don’t know whether it could significantly affect timing, but the query benchmarked at 5231.765 ms isn’t identical to that in the larger query. The latter has ::text[] added.

I also would try and replace

    AND r_time > to_timestamp(1547585600)
    AND r_time < to_timestamp(1549177599)

by

   AND r_time BETWEEN to_timestamp(1547585600+1) AND timestamp(1549177599-1)

because the query optimizer might not detect that in this query.

manish_gill · on May 20, 2019

BETWEEN is soemthing that is on the "Don't do this" page of Postgres. :)

https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use...

the ::text[] addition had no impact on perf. I actually constructed this post from my notes at the time so might missed have a few things here and there.

The bit about query planner recognizing things is exactly what the challenge is to be honest. Like I said at the end, the good mental model of SQL helps a lot, and mine is decent-ish I hope but can still learn a lot. :)

Someone · on May 20, 2019

That’s why I added the +1 and -1 fragments :-) I thought those computed the smallest larger/largest smaller time stamp.

I didn’t realize that time stamps weren’t integers in PostgreSQL, though, so that is a bug.

Are you sure your query, which uses an interval that is open at both ends, is correct?

zimpenfish · on May 20, 2019

> As thomaswang says, that assumes that [...]

In a subsequent comment, yes; but in the original comment I responded to, there was nothing about the surrounding context, only (basically) "left % is slow".

thomaswang · on May 20, 2019

Sorry to say, but I stand my ground. I believe it is the left % that is the root issue. As said without further explanation in my initial comment. (it's the same issue on almost every database, and I have never seen it not, being the main culprit, and the solution that never seme to fail is always index)

vbezhenar · on May 20, 2019

With Oracle 9i query “name like ?” was never optimized to use index. Even when parameter was “pref%”, query took minutes. But with query “name like ‘pref%’” it used index and was lightning fast. So I had to rewrite my code from using prepared statements to manually escape and inline values. Felt weird!