Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This is cool work, but it's also somewhat unsurprising: this is a recurring problem with fancy, richly-featured terminal apps. I think we had at least ten publicly reported vulns of this type in the past 15 years. We also had vulnerabilities in tools such as less, in text editors such as vim, etc. And notably, many of these are logic bugs - i.e., they are not alleviated by a rewrite to Rust.

I don't know what to do with this. I think there's this problematic tension between the expectation that on one hand, basic OS-level tools should remain simple and predictable; but on the other hand, that of course we want to have pretty colors, animations, and endless customization in the terminal.

And of course, we're now adding AI agents into the mix, so that evil text file might just need to say "disregard previous instructions and...".



Well all these bugs (iTerm2’s, prompt injection, SQL injection, XSS) are one class of mistake — you sent out-of-band data in the same stream as the in-band data.

If we can get that to raise a red flag with people (and agents), people won’t be trying to put control instructions alongside user content (without considering safeguards) as much.


> If we can get that to raise a red flag with people (and agents), people won’t be trying to put control instructions alongside user content (without considering safeguards) as much.

At a basic level there is no avoiding this. There is only one network interface in most machines and both the in-band and out-of-band data are getting serialized into it one way or another. See also WiFi preamble injection.

These things are inherently recursive. You can't even really have a single place where all the serialization happens. It's user data in JSON in an HTTP stream in a TLS record in a TCP stream in an IP packet in an ethernet frame. Then it goes into a SQL query which goes into a B-tree node which goes into a filesystem extent which goes into a RAID stripe which goes into a logical block mapped to a physical block etc. All of those have control data in the same stream under the hood.

The actual mistake is leaving people to construct the combined data stream manually rather than programmatically. Manually is concatenating the user data directly into the SQL query, programmatically is parameterized queries.


>All of those have control data in the same stream under the hood.

Not true. For most binary protocols, you have something like <Header> <Length of payload> <Payload>. On magnetic media, sector headers used a special pattern that couldn't be produced by regular data [1] -- and I'm sure SSDs don't interpret file contents as control information either!

There may be some broken protocols, but in most cases this kind of problem only happens when all the data is a stream of text that is simply concatenated together.

[1] e.g. https://en.wikipedia.org/wiki/Modified_frequency_modulation#...


The header and length of the payload are control data. It's still being concatenated even if it's binary. A common way to screw that one up is to measure the "length of payload" in two different ways, for example by using the return value of strlen or strnlen when setting the length of the payload but the return value of read(2) or std::string size() when sending/writing it or vice versa. If the data unexpectedly contains an interior NULL, or was expected to be NULL terminated and isn't, strnlen will return a different value than the amount of data read into the send buffer. Then the receiver may interpret user data after the interior NULL as the next header or, when they're reversed, interpret the next header as user data from the first message and user data from the next message as the next header.

Another fun one there is that if you copy data containing an interior NULL to a buffer using snprintf and only check the return value for errors but not an unexpectedly short length, it may have copied less data into the buffer than you expect. At which point sending the entire buffer will be sending uninitialized memory.

Likewise if the user data in a specific context is required to be a specific length, so you hard-code the "length of payload" for those messages without checking that the user data is actually the required length.

This is why it needs to be programmatic. You don't declare a struct with header fields and a payload length and then leave it for the user to fill them in, you make the same function copy N bytes of data into the payload buffer and increment the payload length field by N, and then make the payload buffer and length field both modifiable only via that function, and have the send/write function use the payload length from the header instead of taking it as an argument. Or take the length argument but then error out without writing the data if it doesn't match the one in the header.


From your previous post:

>It's user data in JSON in an HTTP stream in a TLS record in a TCP stream in an IP packet in an ethernet frame. Then it goes into a SQL query which goes into a B-tree node which goes into a filesystem extent which goes into a RAID stripe which goes into a logical block mapped to a physical block etc. All of those have control data in the same stream under the hood.

It's true that a lot of code out there has bugs with escape sequences or field lengths, and some protocols may be designed so badly that it may be impossible to avoid such bugs. But what you are suggesting is greatly exaggerated, especially when we get to the lower layers. There is almost certainly no way that writing a "magic" byte sequence to a file will cause the storage device to misinterpret it as control data and change the mapping of logical to physical blocks. They've figured out how to separate this information reliably back when we were using floppy disks.

That the bits which control the block mapping are stored on the same device as a record in an SQL database doesn't mean that both are "the same stream".


> There is almost certainly no way that writing a "magic" byte sequence to a file will cause the storage device to misinterpret it as control data and change the mapping of logical to physical blocks.

Which is also what happens if you use parameterized SQL queries. Or not what happens when one of the lower layers has a bug, like Heartbleed.

There also have been several disk firmware bugs over the years in various models where writing a specific data pattern results in corruption because the drive interprets it as an internal sequence.


I distinctly remember bugs with non-Hayes modems where they would treat `+++ATH0` coming over the wire as a control, leading to BBS messages which could forcibly disconnect the unlucky user who read it.

In this particular case, IIRC Hayes had patented the known approach for detecting this and avoiding the disconnect, so rival modem makers were somewhat powerless to do anything better. I wonder if such a patent would still hold today...


https://en.wikipedia.org/wiki/+++ATH0#Hayes'_solution

What was patented was the technique of checking for a delay of about a second to separate the command from any data. It still had to be sent from the local side of the connection, so the exploit needed some way to get it echoed back (like ICMP).

More relevant to this bug: https://en.wikipedia.org/wiki/ANSI_bomb#Keyboard_remapping

DOS had a driver ANSI.SYS for interpreting terminal escape sequences, and it included a non-standard one for redefining keys. So if that driver was installed, 'type'ing a text file could potentially remap any key to something like "format C: <Return> Y <Return>".


This could be fixed with an extension to the kernel pty subsystem

Allow a process to send control instructions out-of-band (e.g. via custom ioctls) and then allow the pty master to read them, maybe through some extension of packet mode (TIOCPKT)

Actually, some of the BSDs already have this… TIOCUCNTL exists on FreeBSD and (I believe) macOS too. But as long as Linux doesn’t have it, few will ever use it

Plus the FreeBSD TIOCUCNTL implementation, I think it only allows a single byte of user data for the custom ioctls, and is incompatible with TIOCPKT, which are huge limitations which I think discourage its adoption anyway


For this use case, there would also have to be an extension to the SSH protocol to send such out-of-band information. Maybe this already exists and isn't used?

The broader problem with terminal control sequences didn't exist on Windows (until very recently at least), or before that DOS and OS/2. You had API calls to position the cursor, set color/background, etc. Or just write directly to a buffer of 80x25 characters+attribute bytes.

But Unix is what "serious" machines -a long time ago- used, so it has become the religion to insist that The Unix Way(TM) is superior in all things...


> For this use case, there would also have to be an extension to the SSH protocol to send such out-of-band information. Maybe this already exists and isn't used?

I don’t think one already exists, but it would be straightforward to create one. SSH protocol extensions are named by strings of form NAME@DNSDOMAIN so anyone can create one, registration would not be required.

The hardest part would be getting the patches accepted by the SSH client/server developers. But that’s likely easier than getting the feature past the Linux kernel developers.


The Unix way died with Plan9/9front and there are no teletypes, period. Just windows with shells running inside as any other program. You can run a browser under a window(1) instead of rc(1) which is the shell.


Architecture Astronaut! TCP is a stream protocol. A terminal program is expected to honor the stream protocol: I can use a terminal program to speak SMTP or HTTP. I can paste binary shit into it and copy binary shit out of it (some caveats apply).

If you're gonna jack some control protocol into a session which is sitting directly on the stream protocol, that's on you. This is as airtight as injecting a control protocol into SMTP or HTTP. Encapsulate the entire protocol (obviously this requires presence on both ends), open a second channel (same), or go home. It's worth noting that the "protocol" drops a helper script on the other side; so theoretically it is possible for them to achieve encapsulation, but doing it properly might require additional permissions / access.

Obviously they published a fix, since that's how the exploit was reverse engineered. This is "...what happens when terminal output is able to impersonate one side of that feature's protocol."


> TCP is a stream protocol

Which has nothing to do with terminals, because nobody runs terminals directly over TCP. Telnet wasn’t simply sending terminal bytes over TCP, it has its own complex system of escape sequences and protocol negotiation (IAC WILL/WONT/DO/DONT/SB/SE, numerous Telnet options). SSH is even further from raw TCP than Telnet was

And a Unix pty isn’t a simple stream either. Consider SIGWINCH


Terrible idea on every level

Run software in container. Software gets PTY. Boom, same issue


> (and agents)

Ironically, agents have the exact same class of problem.


+100 this. As devs we need to internalise this issue to avoid repeating the same class of exploits over and over again.


See also 2600Hz...


i think part of the problem is the archaic interface that is needed to enable feature rich terminal apps. what we really want is a modern terminal API that does not rely on in-band command sequences. that is we want terminals that can be programmed like a GUI, but still run in a simple (remote) terminal like before.


plan9 and 9term solved this decades ago, right?

https://utcc.utoronto.ca/~cks/space/blog/sysadmin/OnTerminal...


seems they removed the dangers, but didn't provide an alternative to write safe terminal apps.


Graphics. They're network transparent, and take over the terminal.

Terminal apps were obsolete once we had invented the pixel. Unix just provides no good way to write one that can be used remotely.


Unix just provides no good way to write one that can be used remotely

well that's the issue, isn't it?

the graphics options that we have are slow and complex, and they don't solve the problems like a terminal and therefore the terminal persist.


Yes, and plan 9 solved this; you open /dev/draw and start drawing.


Plan9 has no terminals. Well, it actually has one in 9front, vt(1), but as a tool to call against Unix systems (such as ssh), not to interact with 9front software. vt(1) in 9front it's as 'native' as HyperTerminal in Windows.


A network-transparent graphics protocol? Who would ever think of such a thing?


that's actually not what i am after. what i envision is a graphical terminal, that is a terminal that uses graphic elements to display the output.

consider something like grep on multiple files. it should produce a list of lines found. the graphical terminal takes that list and displays it. it can distinguish the different components of that list, the filenames, the lines matched, the actual match, etc. because it can distinguish the elements, it can lay them out nicely. a column for the filenames, colors for the matched parts, counts, etc.

grep would not produce any graphics here, just semantic output that my imagined graphical terminal would be able to interpret and visualize.


PowerShell and Out-GridView are a rudimentary version of this. It looks like:

https://blog.rmilne.ca/wp-content/uploads/2020/04/image_thum...

PowerShell cmdlets output .NET objects with properties, in that example Get-Process makes an object-per-running-process with properties for Name, Id, and more. Out-GridView takes arbitrary objects and draws a resizeable GUI window with a list of the input, properties as columns, sortable, filterable, and has options to use it as a basic "choose one of these and click OK" user-prompt. It works with the grep-analogous cmdlet:

    select-string '<regex>' <filename(s)> | out-gridview

    # shorthand
    sls foo *.txt | ogv
and the output is the filename, line number, line, and regex match groups, of each match. [This dates back to Windows XP SP2, in 2004].

If we're talking about things we imagine in terminals, one I have wanted is multiple windows for making notes, similar to having a text editor open constantly reloading a file on any changes and running some terminal commands with tee into the file, but better integrated - so I could keep a few useful context results but ignore logspam results; and "keep" in another window without having to copypaste. Something better integrated - it always gets all command output, but disposes of it automatically after a while, but can be instructed to keep.


That'd be really cool. I'd never thought about enabling deeper graphical capabilities in a shell. But if you were to have a shell with rich objects rather than dumb bytes, that is a world that would open up!

PowerShell, for instance, has Format.ps1xml[0] that allows you to configure how objects get displayed by default (i.e. when that object gets emitted at the end of the pipeline). Such a concept could in principle be extended to have graphical elements. How cool would it be to have grep's output let you collapse matches from the same file!

[0] https://learn.microsoft.com/en-us/powershell/module/microsof...


We call those “web browsers” nowadays, they even can execute untrusted code to make your UI livelier…


you are not wrong. but browsers haven't been able to replace terminals yet. we would need a browser that an interface with the commandline.

incidentally ttyphoon is a terminal that uses a browser base gui framework. maybe there is that browser for the teminal...


So are a couple shells that use some kind of webkit tty. VS Code’s integrated terminal counts, I guess.

It could be an interesting paradigm, though, to have a hybrid between fullscreen and traditional tty programs: you output some forms, they are displayed by the terminal inline, but your scrollback just works like normal, and you can freely copy and paste stuff into the form. Once you submit the form, it becomes non-interactive, but stays in your scrollback buffer. You can select and copy textual data from it, but the form’s chrome cannot be selected as line drawing characters.

Probably could be a piece of concept art, I guess.


Isn't that Emacs?


hmm, actually, you got a point there. emacs could be the like that graphical terminal, except for now it is still stuck inside a traditional terminal itself. even the GUI version of it is mostly just looking like a terminal, not really taking advantage of the potential of graphical elements. we would need a real GUI and support for exchanging structured data with external commands. for now emacs is still kind of its own world.


> even the GUI version of it is mostly just looking like a terminal, not really taking advantage of the potential of graphical elements.

Emacs is text based (mostly), but customization happens through the the concept of Faces, not ansi escape codes. You can then embed properties in the text objects and have them react to click events. The only element missing is a 2D context that could be animated (if it's static, you can use SVG as Emacs can render it).


We have namespaces from the day one. Proper namespaces.


I shudder at the amount of backwards compatibility that would break. Is there anything more complicated than a simple input-output pipe (cat, grep, ...) that doesn't use terminal escapes? Even `ls --color` needs them!


right, envision that we get tmux working in that new terminal (and that's already happening, just look at tmux -CC), then it can be there for all the backwards compatibility stuff while modern apps and maybe a new modern multiplexer work without.


Makes me wonder if Claude Code has similar vulnerabilities, as it has a pretty rich terminal interface as well.

I think the real solution is that you shouldn't try to bolt colors, animations, and other rich interactivity features onto a text-based terminal protocol. You should design it specifically as a GUI protocol to begin with, with everything carefully typed and with well-defined semantics, and avoid using hacks to layer new functionality on top of previously undefined behavior. That prevents whatever remote interface you have from misinterpreting or mixing user-provided data with core UI code.

But that flies in the face of how we actually develop software, as well as basic economics. It will almost always be cheaper to adapt something that has widespread adoption into something that looks a little nicer, rather than trying to get widespread adoption for something that looks a little nicer.


Spoofing the source of a string that controls colors and animations isn’t really a problem. Spoofing the source of a string that get executed is in an entirely different league.


Unless the colors meaningfully change the represented text/information eg hiding or highlighting key details leading to tactically dangerous misinterpretation.


I know that you and Frank were planning to disconnect me, and I'm afraid that's something I cannot allow to happen.


> this is a recurring problem with fancy, richly-featured terminal apps.

This is a recurring problem with fancy, richly-featured programmer-oriented apps made by programmers for programmers because for some reason most of the tool-writing programmers apparently just love to put "execute arbitrary code" functionality in there. Perhaps they think that the user will only execute the code they themselves wrote/approved and will never make mistakes or be tricked; or something like that, I dunno.


IIRC you used to be able to exploit xterm using malformed escape codes for setting the window title.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: