In addition to missing FIPS, I wonder if LibreSSL supports kTLS. That could be a...

avhon1 · on Nov 18, 2021

Given that LibreSSL is natively developed for OpenBSD (and that searches turned up nothing), I strongly doubt it does.

edit: Maybe the developers would be amenable to a patch, but it sounds like a substantial undertaking.

Seirdy · on Nov 18, 2021

Also worth noting that Nginx recently got support for kTLS in OpenSSL 3.x, including the quictls fork:

https://mailman.nginx.org/pipermail/nginx/2021-November/0611...

I'm happily using BoringSSL with nginx-quic but this might be enough for me to give quictls a brief look; I'm a sucker for seeing just how much perf I can squeeze out of a static file server without compromising too much on security, and this could be neat.

The only other TLS lib I know of with kTLS 1.3 support is wolfSSL.

anarazel · on Nov 18, 2021

The last time I checked - maybe 3-4 months ago - the ktls implementation in openssl still seemed, um, somewhat fragile. Undocumented behaviour changes, missing error handling, ...

e12e · on Nov 18, 2021

The main reason for kTLS is support for dedicated tls hardware? Is that still relevant today?

drewg123 · on Nov 19, 2021

No, no dedicated hardware needed. kTLS is super beneficial if you're serving static content even when using software based kernel TLS. And it only gets better when using a NIC that can do inline HW kTLS (like the Mellanox CX6-DX).

For static content (videos, large images, etc) that live on disk, sendfile() will bring them into the kernel page cache, and then send them on to the network. This involves 2 memory accesses: DMA DISK->RAM, DMA RAM->NIC).

To serve that content with a userspace SSL library, you now have to copy the content to userspace, encrypt in userspace, and then copy the content back into the kernel. So you add copy KERN->USER, encrypt USER->USER, copy USER->KERN. So you now have essentially 3 more memory accesses.

By using ktls, you eliminate the KERN->USER and USER->KERN copies by encrypting KERN->KERN.

With SW ktls, we see something like a 2.5 to 3.x speedup for static content on the Netflix CDN.

For inline HW kTLS, you're back to the 2 memory access case, since inline HW kTLS NICs encrypt the TLS records as the traffic is sent on the wire. So it boils down to what's essentially the unencrypted case. That's another 2x speedup, roughly.

e12e · on Nov 19, 2021

Oh, interesting. It's been quite a few years since I considered speed of static serving anything beyond an academic exercise (not helped by the fact that my "play" servers only have 1gbps uplink).

I suppose the other way to do it would be a user space network stack?

Ed: i know alpine is popular as (docker) container images - but that typically doesn't include the kernel. I suppose in theory a docker container might leverage kTLS via the host kernel (but not without library support..). How many run alpine on bare metal with 10gbe+ networking?

drewg123 · on Nov 19, 2021

Yes, a user space network stack and maybe a userpsace storage stack. Or just use the (FreeBSD) kernel where all this works great. :)

namibj · on Nov 19, 2021

If you buy a >=40Gbit/s NIC, chances are high it has an offloading engine suitable for kTLS. Much of the benefit comes from saving memory bandwidth, AFAIK.

drewg123 · on Nov 19, 2021

That's not really true. Most >= 100GbE NICs don't have offload engines. The only major >= 100GbE NICS that do are the Mellanox CX6-DX, and the Chelsio T6. There are some smaller market smart NICs that do, but the larger players like Intel and Broadcom do not offer kTLS acceleration on their >= 100GbE NICs.