As much as I detest MongoDB immaturity in many respects, I found a lot of featur...

Joel_Mckay · on April 11, 2023

Funny enough, we designed one subsystem to use RabbitMQ to enforce linear committed records into mongodb to avoid indices. I.e. the routes in rabbitMQ would ensure a GUID tagged record was spatially localized with other user data on the same host (the inter-host shovel traffic is minimized).

Depends on the use-case, but the original article smells like FUD. This is because the connection C so lib allows you to select how the envelopes are bound/ack'ed on the queue/dead-letter-route in the AMQP client-consumer (you don't usually camp on the connection). Also, the expected runtime constraint should always be included when designing a job-queue regardless of the underlying method (again, expiry default routing is built into rabbitMQ)...

Cheers =)

necovek · on April 11, 2023

That's quite interesting: I wonder if someone has done something similar with Postgres WAL log streaming?

twic · on April 11, 2023

If i understand correctly, then yes, they have. PostgreSQL supports "logical decoding", which is where you parse the WAL into logical events:

https://www.postgresql.org/docs/15/logicaldecoding.html

Quite a number of bits of software use that to do things with these events. For example:

https://debezium.io/documentation/reference/stable/connector...

twawaaay · on April 11, 2023

MongoDB's change stream is accidentally very simple to use. You just call the database and get continuous stream of documents that you are interested in from the database. If you need to restart, you can restart processing from the chosen point. It is not a global WAL or anything like that, it is just a stream of documents with some metadata.

spmurrayzzz · on April 11, 2023

> If you need to restart, you can restart processing from the chosen point

One caveat to this is that you can only start from wherever the beginning of your oplog window is. So for large deployments and/or situations where your oplog ondisk size simply isn't tuned properly, you're SOL unless you build a separate mechanism for catching up.

twawaaay · on April 11, 2023

Which is fine, queueing systems can't store infinity of messages either. In the end messages are stored somewhere so there is always some limit.

spmurrayzzz · on April 11, 2023

Yep, absolutely. But the side effect I am referring to (and probably wasn't clear enough about) is that the oplog is globally shared across the replica set. So even if your queue collection tops out at like 10k documents max, if you have another collection in the same deployment thats getting 10mm docs/min, your queue window is also gonna be artificially limited.

Putting the queue in its own deployment is a good insulation against this (assuming you don't need to use aggregate() with the queue across collections obviously).

twawaaay · on April 11, 2023

I do agree, but listen... this is supposed to be handy solution. You know, my app already uses MongoDB, why do I need another component if I can run my notifications with a collection?

Also, I am firm believer that you should not put actual data through notifications. Notifications are meant to wake other systems up, not carry gigabytes of data. You can pack your data into another storage and notify "Hey, here is data of 10k new clients that needs to be processed. Cheers!"

The message is meant to ensure correct processing flow (message has been received, processed, if it fails somebody else will process it, etc.), but it does not have to carry all the data.

I have fixed at least one platform that "reached limits of Kafka" (their words not mine) and "was looking for expert help" to manage the problem.

My solution? I got the component that publishes upload the data to compressed JSON to S3 and post the notification with some metadata and link to the JSON. And the client to parse the JSON. Bam, suddenly everything works fine, no bottlenecks anymore. For the cost of maybe three pages of code.

There is few situation where you absolutely need to track so many individual objects that you have to start caring if they make hard drives large enough. And I managed some pretty large systems.

spmurrayzzz · on April 11, 2023

> I do agree, but listen... this is supposed to be handy solution. You know, my app already uses MongoDB, why do I need another component if I can run my notifications with a collection?

We're in agreement, I think we may be talking past each other. I use mongo for the exact use case you're describing (messages as signals, not payloads of data).

I'm just sharing a footgun for others that may be reading that bit me fairly recently in a 13TB replica set dealing with 40mm docs/min ingress.

(Its a high resolution RF telemetry service, but the queue mechanism is only a minor portion of it which never gets larger than maybe 50-100 MB. Its oplog window got starved because of the unrelated ingress.)

john928347 · on April 13, 2023

You have a single mongo cluster that's writing 40M docs a minute? Can you explain how? I dont think I've ever seen a benchmark for any DB that's gotten above >30k writes/sec.

spmurrayzzz · on April 24, 2023

Sorry for the late reply here, just noticed this. You're correct that figure was wrong, that metric was supposed to be per day, not per minute. Its actually closer to 47mm per day now, so roughly 33k docs/min.

> I dont think I've ever seen a benchmark for any DB that's gotten above >30k writes/sec

Mongo's own published benchmarks note that a balanced YCSB workload of 50/50 read/write can hit 160k ops/sec on dual 12-core Xeon-Westmere w/ 96GB RAM [1].

Notably that figure was optimized for throughput and the journal is not flushed to disk regularly (all data would be lost from last wiredtiger checkpoint in the event of a failure). Even in the durability optimized scenario though, mongo still hit 31k ops/sec.

Moving beyond just MongoDB though, Cockroach has seen 118k inserts/sec OLTP workload [2].

[1] https://www.mongodb.com/scale/mongodb-benchmark [2] https://www.cockroachlabs.com/docs/stable/performance.html#t...

aPoCoMiLogin · on April 11, 2023

i've used that as a cache flush mechanism when some cached records were updated/deleted, the simplicity was the key.