I have a question about the new tagging feature in InfluxDB 0.9 - hopefully you can clear up my confusion.
My understanding is that tags are great, as they are indexed and hence quick to search.
However, they are not suitable for situations where you have very high cardinality (> 100,000). Assuming this isn't the case, where else would you use fields over tags?
For example - apart from the indexing and cardinality issue - what are the pros/cons of
Or say you have a logline, and you're parsing various attributes out of it (meaning all the values are quite tightly associated with each other) - would you split them into separate series, each with their own set of tags, or would you store them in a single series with multiple fields?
For fields, you generally only want to put two pieces of data together in a single point (thus in two fields) if you're always going to be querying them together. Either by pulling the values out, or filtering on a WHERE clause.
Unrelated to this, we've created a line protocol to write data in and it's a much more compact way to show a point:
Aha, fair enough - so if you only sometimes query the values together, you should separate them out into discrete series =).
For a logline, the sort of metrics we'd get would be things like query duration, database, client ID etc. You've often query them together, but you'd also query them separately, so I guess that also makes sense as multiple separate series.
Interesting, that line protocol looks cool and seems like it'd be more efficient on the wire.
Will the drivers (e.g. Python, Go) be updated to take advantage of this new endpoint?
Finally, I see there's also a ticket open for binary protocols:
The Go client has already been updated. We'll be updating the Ruby and front end JS client. Hopefully the community will jump on updating the other clients once we release 0.9.0. It's a super simple protocol so I don't imagine it'll take much work.
The binary protocol is probably much less useful now. HTTP + GZip of the line protocol will already saturate what our storage engine can do at this point. In fact, I'm going to close that out right now...
I have a question about the new tagging feature in InfluxDB 0.9 - hopefully you can clear up my confusion.
My understanding is that tags are great, as they are indexed and hence quick to search.
However, they are not suitable for situations where you have very high cardinality (> 100,000). Assuming this isn't the case, where else would you use fields over tags?
For example - apart from the indexing and cardinality issue - what are the pros/cons of
and versus Or say you have a logline, and you're parsing various attributes out of it (meaning all the values are quite tightly associated with each other) - would you split them into separate series, each with their own set of tags, or would you store them in a single series with multiple fields?