Great post, I really like the approach of adding additional context for WAL CDC. I imagine this could be implemented with something like Hasura pretty easily (you can access hasura session data in the db).
I use both the trigger + audit table approaches in my sass (for user-facing activity feeds) and subscribe to CDC WAL changes for dealing with callback-like logic (i.e., send registration email, clear cache).
I'm not a fan of the pg_notify approach due to requirement of adding triggers (performance penalty) and the 8k character limit per column (you will lose data as it will splice off anything larger than that).
Application-level tracking or callbacks makes the stack dependent on the application for data integrity, I'd rather the database be the source of truth on all things data. Especially in the age of microservices.
For something turn-key, Bemi looks like a really good option - especially if you need persistence of the changes!
For something very light wight - check out WalEx (I'm the maintainer):
Has anyone ever implemented a schema where you define a view to return the latest instance of each resource, and it maps INSERT, UPDATE and DELETE to an append only table underneath?
I’ve done some of this with an event sourcing system, but not the query rewriting. All my reads use the “latest view”, but the app uses functions to write to the underlying scheme rather than doing “simple” DML.
I wondered if keeping the storage as append only under the hood would keep the app layer simple.
The end result is almost the same as an audit table except there's much more work to do at read time, and some queries simply cannot scale as they need to with this approach.
In general for most apps you'd rather have 2 writes (write to table and write to audit table) than 1 write to the event log and hard to scale reads.
This article brings a very good question, let us say I have a stock news app and want to issue an alert everytime a new news item is added to the database for AAPL. There could be a 1000 symbols. What is the best way to implement this alerting system in postgres?
It depends on your reliability and performance constraints. I'd recommend just doing this via hooks on the application layer if it's a simple application. If you're building a production scalable application that required reliability guarantees - you can do this with https://github.com/BemiHQ/bemi too (I'm one of the maintainers from the article). I saw you mentioned you're using Node.js, check out the Bemi github there's a few Node.js compatible libraries too. Feel free to ping me any q's!
Assuming by issue an alert you want to call an external service, I’d go with listen notify and use something like Graphile-worker to pick it up and process it. Or you could stay in pg and use a FDW with a trigger
Agreed. Triggers always feel really clever and powerful, but they live outside your regular application code and are therefor more or less invisible. This will eventually break the law of least surprise.
I generally use triggers for these kinds of things.
I’d create a message queue table that would be inserted with jobs according to changes in other tables. Makes it easier to handle the message queue asynchronously.
Doing it in application side in a reliable way needs some kind of a distributed transaction mechanism which might not be feasible. I don’t want to fire an insert and then fail to send an alert and find myself within an if statement thinking “what the fuck do I do now?”
You could insert the job on the application code too. I just like triggers and abuse them which might not be a good idea in all cases.
I'm using Django and Celery, so the tasks (eg. Send notifications) are scheduled on db transaction commit.
The business logic is clear to read, easy to change.
Ive enjoyed writing complex SQL, but when it comes to app development I want my engineers to just write python. Triggers are mostly invisible in the codebase and I would end up having to do all the maintenance myself.
There is of course more than one way to skin a cat, but I want our codebase to have just one single way.
Another technique not discussed is to create an md5 hash of each record and to keep track of added and deleted hashes. The d
hash would exist as an additional column on each table being tracked.
If I understand you right, this seems like a very course-grained way to track changes. You can record that a change was made, but not the specific change. It seems like it'd help facilitate something like, say, an etag, but I don't think could get any auditable data using this alone, could you? Keeping track of added/deleted hashes would probably be best handled by a trigger, unless you really want that in your app code, so this seems a lot like an audit trigger, with very little audit data. Have I misunderstood?
you'd add a hash column to the table you want to track. this would be used in a data warehouse where you want to track what has changed after a truncate and load. you'd keep additional tables on the side to track added/and delete hashes for a delta copy to downstream application tables.
For tables that don't need to scale infinitely, and to trade off in simplicity and keeping logic in the application, I use a version column.
It's the "brute force" type approach. It doesn't scale, has performance penalties, but it's incredibly easy to use and understand, and time walking is trivial.
Relies heavily on DISTINCT ON.
On truly old tables I've added "maintenance tasks" that push old rows into an archive/audit table.
Edit/note - I realise if you suggest this at any of the big players, you'll probably be fired on the spot.
I use both the trigger + audit table approaches in my sass (for user-facing activity feeds) and subscribe to CDC WAL changes for dealing with callback-like logic (i.e., send registration email, clear cache).
I'm not a fan of the pg_notify approach due to requirement of adding triggers (performance penalty) and the 8k character limit per column (you will lose data as it will splice off anything larger than that).
Application-level tracking or callbacks makes the stack dependent on the application for data integrity, I'd rather the database be the source of truth on all things data. Especially in the age of microservices.
For something turn-key, Bemi looks like a really good option - especially if you need persistence of the changes!
For something very light wight - check out WalEx (I'm the maintainer):
https://github.com/cpursley/walex