FHE Hardware
Advances in specialized hardware (ASICs)
While algorithms boosts FHE speed 8x per year, the hardware developments compound this effect. There is growing interest in designing FHE-specific hardware.
ASIC stands for Application-Specific Integrated Circuit. They are specialized hardwares, or you can call custom-designed chips, that are built to perform well-defined set of operations. The mainstream CPUs are designed to be reasonably performant across many types of computations. If you know what kind of math operations you will do beforehand (and in what sequence you will do), you can tailor your chip to be very performant on those.
The GPUs themselves are ASICs to CPUs. They are designed to be very performant on matrix multiplications (matmuls). That means, GPUs are also advantageous for FHE computations which involve lots of matmuls. But FHE, in addition to matmuls, also relies on the following operations:
Modular Arithmetic
Chinese Remainder Theory (CRT) based operations
Polynamial Arithmetic
Therefore, there are opportunities for FHE ASICs that can perform better than generic GPUs, by optimizing above operations.
An example ASIC design (HE-Booster on GPU) is below. Notice that it includes modules for CRT, NTT, and some other HE-specific operations.
FHE ciphertexts are 40-1000x larger than the plaintext, meaning that ASICs need to have lots of memory. Additionally, FHE is embarrassingly parallel (a feature of lattice-based math) that creates other opportunies for special hardware.
Hardware acceleration
Hardware acceleration
One major aspect of FHE research is the work on designing custom hardware to accelerate FHE operations. There are many projects, but let me summarize a few I am more familiar with.
The most prominent group of projects are part of a DARPA program called DPRIVE. In short, DARPA is funding FHE hardware designers with a challenge to perform logistic regression, CNN inference, and CNN training as quickly as possible in FHE. There are currently four participants:
Intel, whose accelerator is called HERACLES.
SRI International, whose accelerator is called CraterLake.
Niobium Microsystems, whose accelerator is called BASALISC.
All DPRIVE participants are working on ASICs that accelerate arithmetic FHE schemes (BFV/BGV and CKKS), and they are at various points along the path toward fabrication. Their initial performance claims are based on simulation, but to trumpet the horn of the underdog, Niobium, their initial paper claims a 5,000x speedup over CPU, with logistic regression of a 1,024-sample, 10-feature dataset is estimated to take 40 seconds versus 60 hours on CPU. To me this is a lower bound on what’s possible with hardware acceleration. At the core of most of these accelerators are accelerations of number-theoretic transforms (NTTs) and other polynomial operations in the relevant polynomial rings. To my understanding, the hard parts of these accelerators are packing enough RAM into them so that they can store all of the ciphertexts and auxiliary key material and get good memory locality.
Another project I’m familiar with is an FPGA-based approach to accelerating CGGI, which goes by the name FPT, out of the COSIC research lab at KU Leuven. They use an Alveo U280, and they functionally bootstrap pipelined batches of 16 ciphertexts at a time to achieve a throughput of 1 bootstrap / 35 microseconds. I’ve seen their live demo, in which they run Conway’s Game of Life in CGGI, and the animation is effectively in real time. Unlike the NTT-crunching machines from the DPRIVE program, FPT is an FFT-crunching machine. Naturally, this project starts from the TFHE-rs API for CGGI.
Then there are approaches I’m less familiar with. The folks at Intel have a HEXL project that focuses on targeting Intel CPUs using AVX and similar modern CPU fanciness. There are also folks at NVIDIA working on GPU acceleration, and the HEaaN library (CKKS) also supports GPU acceleration. There is also a company called Optalysys that is building an optical computing chip for FHE. The idea there is that, by using interference patterns of light passing through lenses (or rather, nanoscale equivalents), one can compute Fourier transforms “at the speed of light,” and in doing so accelerate bootstrapping.
And finally, I’m working on my own hardware acceleration approach: CGGI on TPUs. This is in an open source library called jaxite (named so because it’s written in JAX). The performance is nothing to write home about yet, but my hope is that if I can get performance to be 10-100x faster than CPU, then I can use the fact that Google already has TPUs deployed at scale to ship some FHE products before more intense hardware acceleration is ready at scale.
For some more details on these and other accelerators I know less about, see this paper, “SoK: Fully Homomorphic Encryption Accelerators”.
FHE Hardware Startup Ecosystem
A new wave of companies is building FHE-specific acceleration:
Raised $5.5M in 2024 for FHE accelerator chips
Focus: Custom ASICs optimized for lattice-based cryptography
Potential: 1000-10,000× speedup for FHE operations
Building custom silicon for cryptographic workloads
FHE-optimized processors with specialized arithmetic units
Targeting cloud deployment and edge computing
Hardware acceleration for privacy-preserving computation
Focus on making FHE practical for real-time applications
Software-hardware co-design approach
Real deployments in healthcare (patient privacy) and finance (fraud detection)
Notable: CTO founded the Palisade Library, chief cryptographer developed BGV scheme
From Fully Homomorphic Encryption to Silicon - What is Microsoft's HEAX?
https://x.com/i/grok/share/zUMo4KqWBszye3FbM4Qv8ozjW
https://www.jeremykun.com/2024/05/04/fhe-overview/#hardware-acceleration
Niobium - https://www.biometricupdate.com/202405/niobium-raises-5-5m-to-develop-fully-homomorphic-encryption-accelerator-chip
Chips to Compute With Encrypted Data Are Coming
FHE Hardware-accelerator startups:
https://niobiummicrosystems.com/
https://www.fabriccryptography.com/
https://agitalabs.com/
https://dualitytech.com/
preserving patient privacy in healthcare
financial firms, check for fraud
The CTO of Duality is also the founder of the Palisade Library while their chief cryptographer is the developer of a leading FHE scheme called BGV.
-
https://cornami.com/
https://x.com/Dod_2206/status/1943312631227650410
TFHE-rs (github)
- https://docs.zama.ai/tfhe-rs
- https://docs.zama.ai/tfhe-rs
Outgoing Internal References (1)
Outgoing Web References (13)
-
ectrum.ieee.org/homomorphic-encryption
- growing interest
-
chatgpt.com/share/6877859e-2af0-8010-8b36-6ef2a19b1f6b
- Chinese Remainder Theory (CRT)
-
chatgpt.com/share/687684d7-3a38-8010-b3b7-8ac410919862
- Number-Theoric Transforms (NTT)
-
www.computer.org/csdl/journal/td/2023/04/10012383/1JNmPsHAw2Q
- ASIC design (HE-Booster on GPU)
-
www.youtube.com/watch?v=PfSZL9LsMCg&t=620s
- 40-1000x larger
-
niobiummicrosystems.com
- Niobium Microsystems
-
www.fabriccryptography.com
- Fabric Cryptography
-
agitalabs.com
- Agita Labs
-
dualitytech.com
- Duality Technologies
-
openmined.org/blog/from-fully-homomorphic-encryption-to-silicon
- From Fully Homomorphic Encryption to Silicon - What is Microsoft's HEAX?
-
ectrum.ieee.org/homomorphic-encryption
- Chips to Compute With Encrypted Data Are Coming
-
www.apheris.com
- Apheris AI
-
github.com/zama-ai/tfhe-rs
- github