So none of this is particularly novel, but I like to self-host most the services I use on a daily basis (think email, photos, music, etc.), and I was getting a little bit fed up with having to type in a password to decrypt the boot disk for the dozen or so machines I run, everytime one of them has to be rebooted or the power goes out. I'll admit it's happened a few times-- when I didn't have easy access to a monitor-- I've plugged in a bluetooth keyboard into a headless machine, hit enter a few times, typed in the LUKS password (carefully reading the 30-50 characters from my password manager), and prayed that what I was typing into was indeed the pre-boot LUKS password input prompt. So after doing this for the last few months, I got to thinking that there has to be a better way without compromising on security. For these purposes, we'll assume our threat model is someone who has physical access to these machines in their unbooted states and the broader LAN network. Of course, the status-quo of Full Disk Encryption via LUKS with manual key entry fixes this right off the bat, but it's also highly inconvenient. So I got to researching a few alternatives, which I'll discuss briefly:
A lot of self-hosters online recommend adding a dropbear SSH server to the initramfs that runs in the pre-boot environment... that way you can SSH into the machine pre-boot and enter the key that way. This method has the added advantage of allowing copy/paste from a password manager into the SSH terminal (saving time by removing the need to enter 30+ char passwords), but it's still manual and doesn't scale well to a whole fleet of machines. I don't want to have to SSH into a dozen machines and copy/paste a dozen encryption keys... and that's assuming I know the static IP address of the machine. If they're assigned via DHCP (keyed on MAC address or DUID, or however your implementation does it), or even worse, SLAAC, forget about it... how do you find out an IP address of a device allocated by SLAAC in a /64 subnet without access to it? You don't.
The TPM on a modern computer is (usually) a hardware based chip that is specifically designed for key escrow. This is the solution used by most devices that utilize Full Disk Encryption. During the boot process, the TPM takes measurements of the system state including but not limited to the hardware present, hashes of the UEFI firmware, the kernel, and bootloader, and/or a user provided pin, and if everything matches up to the values the TPM expected, it unlocks the key and away the boot process goes. So in other words, the TPM is a key storage indexed to the local (i.e., on-device) hardware/software environment. Unfortunately, this doesn't help us very much, because what does an attacker need to do to decrypt the machine? Absolutely nothing, assuming there's no PIN required and they have access to the entire machine (and not just the boot disk)... we've assumed they do. TPMs also have other weaknesses, some of them theoretical, others not... For one, on older implementations, it's possible to monitor/sniff the hardware bus using an oscilloscope between the TPM and the CPU when the key is transmitted. On newer implementations this is far, far more difficult, and TPMs are actually a pretty robust solution... I don't want you to come away from reading this thinking a TPM is bad at its job... however, in theory, the TPM is still on-device, and all the TPM does is deter an attacker, not make it impossible (not to mention the fact that TPMs are nearly all blackbox/closed-sourced, so there is some degree of trust you're placing in the manufacturer). So let's take this idea of key escrow indexed on the local device environment and extend that out to the local network.
Tang + Clevis does just that, separating key management out from the device and distributing it across the network. Tang is a simple http-based key server that uses public key cryptography to store a key fragment that the client can then query to reconstruct the decryption key from. Tang is NOT a key escrow, because it does not store the key itself, just the information the client needs to resconstruct the key. Clevis on the other hand, the client, is a plugin module for LUKs that allows you to mix & match (for example combining a TPM-stored key with any number of remote tang key servers using Shamir secret-sharing) key retrieval methods. So now it's not enough for an attacker to steal the machine. They would also need to steal any number of the Tang keyservers on the local or wide area network. The setup process is remarkably simple too. You just download the tang server on each machine you want to run a keyserver on, bind it to a address/port, then on each client configure clevis. Configuring Clevis with two key servers is as simple as running:
clevis luks bind -d <encrypted block device> sss '{"t":2,"pins":{"tang":[{"url":"http://<server 1>:<port>"},{"url":"http://<server 2>:<port>"}]}}'
So it would seem here we have a pretty good solution to our problem. We have on-demand, automated decryption, bound to the local network state. However, this solution still isn't ideal, because an attacker still knows where they need to go if they want to collect all the infinity stones to decrypt your device. Presumably the key servers are themselves not using full disk encryption, and we already assumed an attacker has full access to the devices on the local network. To solve this, we could push one of the key servers out into the internet, say, into AWS, and that coupled with secret sharing between several different key servers in conjunction with a TPM is actually a pretty darn great solution, suitable for just about every use case... unless... <and from here on we can push our threat model just for fun... we're already nigh into tinfoil-hat-land> your attacker has the power of subpoena, and can access not only devices on the local network, but can with time get access to any *known* device it can get the address or location of. Intros The Onion Router, and its location-hidden services.
Note: there is a reason to go this route beyond just wearing a tinfoil hat, and that is that TOR actually simplifies the network configuration to some degree, allowing you to host key servers on cellphones and laptops anywhere in the world, in network environments (e.g. cellular networks, NAT scenarios, satellite, etc.) that are typically very difficult to traverse. TOR simplifies all this by facilitating a "meet in the (unknown) middle" approach, where both the client and the server are introduced to each other anonymously by an intermediary, allowing for NAT punching and all sorts of other good things.
So like the above, we configure Clevis on all our clients, and a few Tang servers on the local network or spread out in the cloud. The only difference is this time we run a key server as a TOR hidden service, which uses the TOR network to anonymize it's location and IP address. A TOR service is not, generally, discoverable unless you know it's onion address ahead of time (e.g. duckduckgogg42xjoc72x3sjasowoarfbgcmvfimaftt6twagswzczad.onion), and access to the service can be whitelisted to only authorized-clients. So now our attacker has no idea where this key server is even located, and even if they had the money and resources to attempt complex recovery methods (like trying to recover the key from a modern TPM), and the best an attacker can hope for is to impersonate a client and ping the TOR service endpoint with information stolen from the client. So in this use case, ironically, the weakness becomes the high availability of the key server. The solution is easy... make it only available on demand, say, by running the TOR service and tang server on a laptop or phone, and only starting the TOR service when you know a client device needs to boot (as simple as 'systemctl tor start' on systemd linux machines). This method could also be used in theory in the reverse, for when you're traveling with sensitive data and you don't want to be coerced into giving up the password to unlock sensitive data. I would just add that it's unclear to me whether and to what degree this method is susceptible to replay attacks, where the attacker steals the key fragments from the local tang servers and the client (and the TOR hidden service address which the client will need to know), waits for the TOR/Tang server to come online and impersonates the client with the stolen information to reconstruct the key. Care will need to be taken that the TOR service is only turned on when the attacker is not waiting for it.