After a developer reverse-engineered a portion of the Apple CSAM system, it was effectively fooled into flagging an innocent picture. Apple, on the other hand, claims to have extra safeguards in place to prevent this from happening in real-world situations.
The most recent advancement happened when the NeuralHash method was published on the open-source developer site GitHub, allowing anybody to try it out…
The National Center for Missing and Exploited Children, for example, has a database of known child sexual abuse material that all CSAM systems use (NCMEC). The hashes, or digital fingerprints, obtained from the pictures are used to create this database.
While most tech giants scan uploaded photos in the cloud, Apple uses a NeuralHash algorithm on a customer’s iPhone to generate hashes of the photos stored and then compare this against a downloaded copy of the CSAM hashes.
Apple’s CSAM system was deceived.
Researchers used the technique to construct an intentional false positive — two entirely different pictures that generated the same hash result – within hours of the GitHib upload. This is referred to as a collision.
Collisions are inevitable in such systems, given the hash is a significantly reduced representation of the picture, but it was surprising that someone could produce one so rapidly.
This collision was purposefully set up as a proof of concept. Although developers do not have access to the CSAM hash database, which would be necessary to produce a false positive in the live system, it demonstrates that collision attacks are theoretically simple.
Apple claims to have two safeguards in place to prevent this.
Although Apple effectively admitted that the algorithm constituted the foundation for its own system, it told Motherboard that it was not the final version. The business also stated that it was never meant to be kept a secret.
Apple told Motherboard in an email that that version analyzed by users on GitHub is a generic version, and not the one final version that will be used for iCloud Photos CSAM detection. Apple said that it also made the algorithm public.
The NeuralHash algorithm [… is] included as part of the code of the signed operating system [and] security researchers can verify that it behaves as described,” one of Apple’s pieces of documentation reads.
It said, “There are two further steps: a secondary (secret) matching system that runs on its own servers, and a manual review.”
Apple also said that after a user passes the 30-match threshold, a second non-public algorithm that runs on Apple’s servers will check the results.
“This independent hash is chosen to reject the unlikely possibility that the match threshold was exceeded due to non-CSAM images that were adversarially perturbed to cause false NeuralHash matches against the on-device encrypted CSAM database.”
Finally, as previously said, the pictures are reviewed by a person to ensure that they are CSAM.
According to one security expert, the only real concern is that someone trying to sabotage Apple might swamp the human reviewers with false positives
“Apple actually designed this system so the hash function doesn’t need to remain secret, as the only thing you can do with ‘non-CSAM that hashes as CSAM’ is annoy Apple’s response team with some garbage images until they implement a filter to eliminate those garbage false positives in their analysis pipeline,” Nicholas Weaver, senior researcher at the International Computer Science Institute at UC Berkeley, told Motherboard in an online chat.