A moral hazard with all cloud providers is that their PaaS services are typically billed on consumption.
So what incentive do you image they have to make those services efficient?
My favourite example is Log Analytics. It can easily cost up to 20% of the virtual machines it is monitoring! If you have a very heavily loaded website and you're logging every HTTP request, it can exceed the cost of the service it is monitoring.
They charge you a ludicrous $2,380 per terabyte of ingested data. This is 20x the cost of the underlying storage, even if it's Premium SSD! For comparison, AWS charges just $500, which is still overpriced.
Now consider: If you're an Azure software developer and you find a way to reduce the bloat in the log data stream format, what do you think the chances are of getting that approved with management?
They have a firehose spewing money in their cloud. I can't imagine them ever saying: "I think it's a good idea to turn that down to a mere trickle!"
As others have pointed out, all of their other services have similar moral hazards: Bastion, NAT Gateways, Private Endpoints, Backup, Snapshots, etc...
Even though it takes some ops-skill to setup, this
> They charge you a ludicrous $2,380 per terabyte of ingested data. This is 20x the cost of the underlying storage, even if it's Premium SSD! For comparison, AWS charges just $500, which is still overpriced.
just makes me happy about our monitoring cluster on the german hoster Hetzner. The old systems are at 40 Euros / month for 900GB storage and the upgraded ones are 40 Euros for 1.3TB / month. There's some manpower per month in there, and some egress costs, but it's still very cheap.
It's funny this is coming up now. Just yesterday in GCP I was trying to figure out our billing and looking at what was costing such a huge amount. I couldn't figure out any way to map the actual service being used to the price on the normal reporting or even on the billing cost table export. The only way I could figure out how to do it was to enable log export. They used to have an option to download that as a file. They disabled that a while ago and now it's only available as a bigquery export. Which is exported every day. I was like "Why would that they do that ?" Oh cause now I have to set up big query and pay for all that. So I have to pay extra JUST TO SEE my detailed billing information. Pretty ridiculous.
We really should revolt against this. I should be able to have a view of all of my billing without having to pay extra. It also shouldn't be hidden behind a bigquery export, it should be easy view what is being spent and what is causing it.
I used to work in GCP. The billing report UI in the billing account section shows per SKU usage. While detailed breakdown can be a nontrivial monster. Some customer may just launch thousands of VMs or data processing jobs per day.
Right that's correct. It shows SKU usage but not mapped to actual instance id. And we have the scenario you are talking about - lots of same sku with variable cost and no way to correlate it without using bigquery it seems
For all of the faults of Azure, they let you do this reporting directly in the Portal. You can slice and dice the data without having to spin up infrastructure.
Same story with Cosmos. On an IoT pipeline I set up with event hub, Cosmos needed $15k/month to keep up with the flow without generating 429s (i.e., causing the ingestion function to drop events). The RUs shouldn't have needed to be that high, but due to upstream providers there was a regular spike every five minutes that would exceed the average RU need. RUs are a hard per second cap; there's no average or windowing. I had to set my RU consumption cost at a level that guaranteed 40% unused capacity.
So I tried setting up the same ingestion database with Kafka Connect and Mongo on a $200/month VM. It worked flawlessly, and Azure helpfully suggested I downsize that VM because it was underutilized based on CPU statistics.
What incentive do the Cosmos engineers have make it more efficient, or to make the RU pricing model more reflective of actual usage? Zero. It's a money hose. Why would you turn that off?
I saw CosmosDB turn up in some recommended multi-region designs, and I had a customer with DR requirements so I looked into it.
I started by spinning up a small one in my lab but when I saw the pricing I back-pedalled very, very fast. Deleted the whole Resource Group and never looked into it again.
Would it not be possible for you to stream the data to a data lake and then take it from there to either do bulk inserts, or smooth out the inserts at a predictable rate to remove the peaks?
There were a variety of ways for me to do so by invoking more Azure services, but I stopped trying at that point, because even after smoothing it out, Cosmos would still be 50x as expensive as a basic VM running Mongo.
But also, every time I start stringing together cloud services, I experience two things: first, exploding complexity because now I'm adding points of failure, integrations, transformations, to keep it all running; and second, this sense of "the whole point of the cloud is to simplify things, to offer canned services and features that save me the trouble of doing this in code for myself." Once I'm using cloud features to work around cloud limitations, I bail out because if I'm going to spend that time (and money), I'm going to get the benefits of something much more direct.
internally it will probably be approved, for example I am sure that Google drive applies basic compression and deduplication to uploaded files, but if I upload 10 files of 10GB of zeros they are gonna count 100GB, not the few MB they are actually writing to disk
(there are good reasons for this, but still declared consumption is different from internal consumption)
Log Analytics uses a columnar compression format on-disk, so ingested data is likely compressed by anywhere between 10:1 and 100:1, maybe even higher.
However, the wire format is super verbose JSON.
They bill per GB of the latter, not the former.
To put things in perspective: How many $ of CPU time do you imagine it takes to column-compress 1 TB of data? I would estimate that a single modern CPU core could do this in a minute or so. Factor in various inefficiencies and make it a super generous 1 hour. At spot pricing, that's about $0.01! One cent!!!
The larger cost would be bandwidth. Azure charges a huge markup for traffic (just like AWS), so for example zone-to-zone data costs $10 per terabyte at retail pricing (not internal costing).
They store that data for 30 days "for free" (lol). Assume a worst-case compression ratio of 10:1 and then that means that they have to retain 100 GB for 30 days. That's $9.43 for a Premium SSD at retail pricing.
So their hosting costs for Log Analytics is something like $20 per TB ingested, but they charge well over $2000 for it.
That 100:1 markup is pretty sweet if your KPIs are based on recurring revenue.
There is no way in hell they will ever "optimise" this. Any accidental improvement will be rolled back or "adjusted" to ensure the revenue stream doesn't fall off a cliff.
Have you not wondered why it's taken them so long -- over ten years -- to enable any feature to filter logs at the source?
So what incentive do you image they have to make those services efficient?
My favourite example is Log Analytics. It can easily cost up to 20% of the virtual machines it is monitoring! If you have a very heavily loaded website and you're logging every HTTP request, it can exceed the cost of the service it is monitoring.
They charge you a ludicrous $2,380 per terabyte of ingested data. This is 20x the cost of the underlying storage, even if it's Premium SSD! For comparison, AWS charges just $500, which is still overpriced.
Now consider: If you're an Azure software developer and you find a way to reduce the bloat in the log data stream format, what do you think the chances are of getting that approved with management?
They have a firehose spewing money in their cloud. I can't imagine them ever saying: "I think it's a good idea to turn that down to a mere trickle!"
As others have pointed out, all of their other services have similar moral hazards: Bastion, NAT Gateways, Private Endpoints, Backup, Snapshots, etc...