This has gone a bit under the radar but Apple went and built a private search engine that doesn’t know what you’re searching for. It’s called Wally and it powers queries like caller ID and landmark recognition in photos.
* Homomorphic encryption is nothing short of dark magic that makes it impossible for the server to know what you are searching for or what it’s responding with (at considerable greater computational expense).
* To make it scale they added differential privacy techniques that generate fake queries from clients to hide which shards are being queried. So rather than always querying every shard (private but expensive), clients precompute which shards are most likely to hold the nearest neighbor match for their query. By itself this would leak the nature of your query but then they bury these queries in a sea of noise across the fleet.
* They also slightly randomize and batch the queries into epochs so that they can’t even identify time-based traffic patterns.
* It’s all routed through an OHTTP-like private relay that anonymizes requests by stripping IP addresses. It does this by having the client connect to a relay that’s sort of like a VPN except the client passes it an encrypted destination address. The relay then hands this to a second relay run by another company which holds the decryption key, but doesn’t know your IP. So no one party can associate you with your destination. (This is actually the least groundbreaking part of the system.)
* Homomorphic encryption is nothing short of dark magic that makes it impossible for the server to know what you are searching for or what it’s responding with (at considerable greater computational expense).
* To make it scale they added differential privacy techniques that generate fake queries from clients to hide which shards are being queried. So rather than always querying every shard (private but expensive), clients precompute which shards are most likely to hold the nearest neighbor match for their query. By itself this would leak the nature of your query but then they bury these queries in a sea of noise across the fleet.
* They also slightly randomize and batch the queries into epochs so that they can’t even identify time-based traffic patterns.
* It’s all routed through an OHTTP-like private relay that anonymizes requests by stripping IP addresses. It does this by having the client connect to a relay that’s sort of like a VPN except the client passes it an encrypted destination address. The relay then hands this to a second relay run by another company which holds the decryption key, but doesn’t know your IP. So no one party can associate you with your destination. (This is actually the least groundbreaking part of the system.)
Academic paper here with all the details: https://arxiv.org/pdf/2406.06761