Base 2048

wongarsu · on May 7, 2022

For comparison:

A sha512 hash in hex: 309ecc489c12d6eb4cc40f50c902f2b4d0ed77ee511a7c7a9bcd3ca86d4cd86f989dd35bc5ff499670da34255b45b0cfd830e81f605dcf7dc5542e93ae9cd76f

Same in this Base2048: ЗཟǷњϳݫЬߦՏԈ௰ڿƫ௪தͶޡഺཀވࡌੳٿ༲৩ত༥၄ঙџڸࠑحϷгଘƩƴߢய߅ϚƐγ๓ۑఞ

The encoding also seems to be HN-safe, using no emojis or other things HN filters. The font used here is lacking some of the characters, but that shouldn't matter if you just copy-paste.

Beltalowda · on May 8, 2022

That base2048 string is 114 bytes (vs 129 for hex), so it's only significantly "shorter" in what you see, which isn't exactly human-readable (actually, less than the hex string; for a quick visual inspection comparing just the first and last 2 or 3 characters is enough, but that's a lot harder to do with all those unfamiliar Cyrillic, Tibetan, Tamil, Armenian, Arabic, Samaritan, Greek, etc. characters in there).

jbverschoor · on May 8, 2022

  Base2048: ЗཟǷњϳݫЬߦՏԈ௰ڿƫ௪தͶޡഺཀވࡌੳٿ༲৩ত༥၄ঙџڸࠑحϷгଘƩƴߢய߅ϚƐγ๓ۑఞ (47 characters, 113 bytes)
  Base64:   MJ7MSJwS1utMxA9QyQLytNDtd+5RGnx6m808qG1M2G+YndNbxf9JlnDaNCVbRbDP2DDoH2Bdz33FVC6TrpzXbw== (88 bytes)
  Base58:   yP4cqy7jmaRDzC2bmcGNZkuQb3VdftMk6YH7ynQ2Qw4zktKsyA9fk52xghNQNAdkpF9iFmFkKh2bNVG4kDWhsok (87 bytes)
  Original:  64 bytes, 128 bytes in hex.

I'd prefer base58. Maybe compress it first.

vonwoodson · on May 7, 2022

That’s 2,598 tweets to transmit a single megabyte.

So, if I wanted to tweet the movie Spaceballs, encoded at 4K UHD (20MB/s), I estimate that I will need to make 20,945,455 tweets.

Twitter limits the posts made by nobodys (like me) to 2,400 tweets per day… so, it is going to take me just under 24 years to complete my task.

I’d better get started.

rpastuszak · on May 7, 2022

You can fit much more in a single tweet!

You can do that with base64 + gzip + (and that’s the important one) _wrapping the content in a url_.

Here’s pong (3.5kb) stored in a single tweet: https://twitter.com/rafalpast/status/1316836397903474688?s=2...

Source: I was bored, curious if I could turn twitter into a CDN

beepbooptheory · on May 7, 2022

If/when we get an edit button, I feel like twitter could be a CDN and a CMS. Could make a "severless" blog that is just hijacking Twitter's server. Comments and media hosting built in!

inportb · on May 7, 2022

There's no need to edit or delete. Treat Twitter as a changes feed. Encode your editions and deletions as new tweets.

teaearlgraycold · on May 7, 2022

Git on Twitter then

btown · on May 7, 2022

A Twit-chain, if you will

teaearlgraycold · on May 7, 2022

Delete this comment now before you have a dozen offers from various VCs.

Retr0id · on May 7, 2022

You can fit much more into a tweet if you consider image data, too.

tonnydourado · on May 7, 2022

Problem is that there's no guarantee that twitter won't apply lossy transformations to your data (in fact, it's guaranteed that it does). So either you would have to encode the data with lots of redundancy and/or error correction, or you have to encode in like, QR code or something similar, and rely on image recognition to extract it.

Inside the range of characters support by twitter, your data is "guaranteed" to not change

mananaysiempre · on May 7, 2022

In 2018 people discovered that Twitter would recompress images but leave the embedded ICC profile, if present, intact, and used that to make a Twitter-surviving JPEG+ZIP polyglot[1], although that got patched out once someone used it as a C&C channel[2]. Apparently that still worked (and was utilized for the same purpose) on Steam user profiles in 2021[3].

[1] https://twitter.com/David3141593/status/1057042085029822464

[2] https://www.trendmicro.com/en_us/research/18/l/cybercriminal...

[3] https://twitter.com/miltinh0c/status/1392944896760238080

Retr0id · on May 7, 2022

This technique is still fully functional: https://github.com/DavidBuchanan314/tweetable-polyglot-png

dhosek · on May 7, 2022

Or the 1000 characters of the Alt tag. Since you can have multiple images per tweet, your limit just jumped without having to worry about images getting edited.

_8j50 · on May 7, 2022

Or just transmit a url to the media on some site,ipfs,.onion,etc...

I don't know their use case but I was thinking more for malware command-and-control and red teaming.

zamadatix · on May 7, 2022

One of the funniest encoding things I've ever seen is Shrek encoded to just under 8 MB (128x72, 8 FPS, with audio! Actually understandable to watch, if only just barely). With this that would look to be 21,655 tweets or, using the API to post base65536 tweets, 14,888 tweets.

At 2,400 tweets per day that's just over 9 days for base2048 or just under a week for base65536. Doable!

Dylan16807 · on May 7, 2022

Well you don't have to use such a decadent encoding! If you aim to upload 10 seconds a day you could be done in a year and a half. 900KB can handle 10 seconds just fine, especially with a cutting-edge codec.

_Algernon_ · on May 7, 2022

Ib4 some masochist makes a streaming platform that uses twitter as a backend.

codetrotter · on May 7, 2022

Related but opposite, a tool I made a couple of years ago.

https://github.com/ctsrc/Base256

> Encode and decode data in base 256

> […]

> You might expect data encoded in base 256 to be more space efficient than data encoded in base 16, but with this particular set of symbols, that is not the case! Likewise, you have to type more, not less, than you would if you use my base 256 instead of base 16. So why?

> The purpose […] is to make manual input of binary data onto a computer less error-prone compared to typing in the base 16 or base 64 encoding of said data. Whereas manually typing out base 64 is painful, and base 16 makes it easy to lose track of where you are while typing, [this program] attempts to remedy both of these problems by using 256 different words from the EFF autocomplete-friendly wordlist.

Disclaimer: I am not using this base 256 program myself, even though I authored it. It just serves as a fun little experiment.

cylinder714 · on May 7, 2022

...This is brilliant, actually.

rpastuszak · on May 7, 2022

Hehe, this is both wonderfully useless and impressive.

I was experimenting with using Twitter as a CDN, here’s pong (3.5kb) in a single Tweet:

https://twitter.com/rafalpast/status/1316836397903474688?s=2...

James-Livesey · on May 7, 2022

This is something I used for the "tweet your BASIC code" functionality in atto (https://jamesl.me/atto)... Trying to fit an advanced BASIC program in a single Tweet isn't too easy, but at least encoding it in Base 2048 beforehand is!

Example: https://twitter.com/jthecoder/status/1412848719737851905

gpderetta · on May 7, 2022

Thought I recognized the author! That is qntm of "There is no antimemetic division" fame.

losvedir · on May 7, 2022

Heh, well that's one way to work around the inane Twitter character limit...!

My personal Musk dream is that he'll abolish that. If I never have to see tweets of pictures of text or tweets ending in `/n` again, it will be too soon.

Though, I sort of remember that Twitter was architected initially in a way that relied on the short tweets and it was a surprisingly complex change to even bump it up the little that they did recently, so who knows.

Someone · on May 7, 2022

Twitter started as “an individual using an SMS service to communicate with a small group”, and SMS messages are 140 bytes (that may store 160 characters when using a 7-bit encoding such as https://en.wikipedia.org/wiki/GSM_03.38, but they didn’t go there)

I guess bumping the limit up while keeping the ability to interact with all kinds of SMS systems and ancient phones was a challenge, or just required waiting a few years for some of the weirder systems to die out.

opencl · on May 7, 2022

The 140 byte limit actually didn't last very long, it was 140 Unicode code points since at least a decade ago. You could use 140 CJK (Chinese/Japanese/Korean) characters or 140 emojis, and if it didn't fit in an SMS they just sent a link to the tweet.

Interestingly when they doubled the character limit they also started double counting CJK characters so the limit is still effectively 140 for those languages.

lifthrasiir · on May 7, 2022

> Interestingly when they doubled the character limit they also started double counting CJK characters so the limit is still effectively 140 for those languages.

CJK languages do tend to be relatively dense when written and counted by their code points.

wongarsu · on May 7, 2022

If they abolish the character limit, do they also rebrand the platform from twitter to blogger? /s

I think there are some interesting observations in how both twitter and tiktok set out with short maximum lengths, establishing a culture of short, easy to digest messages, before relaxing the limit a bit. I'm not sure how much you can relax the limit before you turn the platform into something else entirely. But on the other hand there is a pattern of users circumventing the limit anyways. It will be interesting to watch how it develops over the years.

evanb · on May 7, 2022

Twitter’s limit used to be 140 characters. Then it became 280. By simple arithmetic progression the next limit will be…

mike_hock · on May 7, 2022

280. Then the next one will be 187 (186.6666...).

    n[0] = 140

    n[k] = 2/k * n[k-1]

evanb · on May 7, 2022

That is not an arithmetic progression; I chose my words carefully :)

https://en.wikipedia.org/wiki/Arithmetic_progression

droidist2 · on May 7, 2022

Knowing Musk he'd totally be down for changing it to 420.

ant6n · on May 7, 2022

Or perhaps 5420.

jcims · on May 7, 2022

The prophecy will be fulfilled.

divbzero · on May 7, 2022

What about encoding data for image tweets? QR codes are limited to a few kB of storage [1] because they are designed for visual scans, but using higher resolution could get you more capacity.

[1]: https://en.wikipedia.org/wiki/QR_code#Storage

hinkley · on May 8, 2022

For those playing at home, you need at least base 4096 to equal the data density of base64, because utf8 uses two bytes for anything over 7 bits. It is only because of a somewhat arbitrary Twitter rule that this works. On Twitter.

graderjs · on May 7, 2022

Encode the text as an image, Tweet it, then use OCR to pull the text out. No more bit fiddling

themerone · on May 7, 2022

I don't think Base-N names should be given to encodings that haven't been IETF standardized. This could lead to incompatible encodings with the same name.