Introduction

Reading in the news about “digital fingerprints,” and you come across several concepts:

  1. A literal “digital” version of your actual fingerprint, used in smartphones for example
  2. Some kind of unique digital “marker” or “tell” that identifies some data or code
  3. hash or message digest, a unique identifier of data generated by a one-way cryptographic function
  4. A unique identifier of a device, commonly used in IoT infrastructure
  5. A copyright-enforcement method for identifying copyrighted digital materials

There are countless other use cases for digital fingerprints, so we ask: How do digital fingerprints work? What role do they play in securing data, devices, and our own privacy?

From Analog to Digital

Crime scene forensics have long “dusted for fingerprints” to identify who may have been present at the scene. A supercharged equivalent is DNA, opaque but a unique fingerprint none-the-less. People leave behind traces, and because our fingerprints are patterns unique to us, they can be used to identify us. Fundamentally, digital fingerprints are similar: they are patterns or identifiers that uniquely identify data. These so-called “digital fingerprints” present themselves both directly and indirectly.

In part 1 of this FAQ, we’ll discuss direct fingerprints, and present examples of how this works. In part 2, we’ll discuss indirect or derived digital fingerprints.

Direct Fingerprints

Direct digital fingerprints are patterns in data that can be assessed directly, without processing the data in any way. They are the equivalent of a digital “tell” that somewhat uniquely identifies data.

Attack Fingerprints

For example, in his book The Perfect Weapon, David Sanger explains that one of the ways security researchers were able to identify who was behind the Stuxnet hack was in digital “tells” in the code. “Stuxnet was filled with digital fingerprints and other clues about where and when it had been created.” If we knew of code that was produced by Israeli Intelligence, and Stuxnet code contained exactly the same types of patterns and idiosyncrasies, then we’d have a pretty good indication that those portions of Stuxnet code was created by Israeli Intelligence. As it were, the code also had US intelligence fingerprints all over it.

In fact, these kinds of analysis — digital fingerprinting by finding patterns in source code — is one of the most common ways we identify malware provenance: who created it and where it originated.

In 2017, a devastating cyberattack took place against computers across Ukraine. While masquerading as ransomware, the malware in fact was intended to inflict maximum damage across many sectors of Ukraine’s economy, and by 11:30am on June 27, 2017, computers across the country suddenly stopped working. ATM’s failed, bank operations halted, radiation monitoring systems at Ukraine’s Chernobyl Nuclear Power Plant went offline, and computers across the entire electrical grid collapsed.

Through code analysis, security researchers were able to identify this brutal malware as a derivative of the “Petya” ransomware, and named this new code “NotPetya” to distinguish it from the original strain. Unique fingerprints in the code indicated that it was indeed a modification of prior malware, not a complete re-write. As Russell Brandom explains, these fingerprints included the ransomware mechanism from earlier Petya code, specific exploits used, and other idiosyncrasies that enabled security experts to not only identify where the code had come from, but also how it worked (thus better understanding the impact).

By using fingerprints, patterns, and “tells” within data and source code, security experts can better understand the history, scope, and attribution of newly emerging malware and other damaging code.

Your Digital Fingerprints

You also leave digital fingerprints all over the web whenever you are browsing almost any website. Your web browser provides a surprising amount of information to any website that asks for it, including:

  • information about your computer
  • your location
  • details about your web browser
  • your browsing history
  • every mouse movement you make
  • when you activate a website
  • when you deactivate your browser page
  • miscellaneous other information

If you have scripting enabled, which you do by default, an incredible amount of invasive information can be pulled out of your browser, including network scans of your home network, additional device information, storing malware and other data on your computer, and of course, the ability to embed tracking cookies in your browser.

Once tracking and cross-site cookies are injected into your browser configuration, these unique identifiers will track all web sites you go to and what you do on those websites: which products you look at, what you hovered over, what you put into your shopping carts, what you removed, how long those items remained there, what you purchased, your entire web browsing history, how often you check your bank balance, and any other activity you perform.

How do websites know it’s “you” when you’re doing all this stuff on the web? They use a kind of fingerprint. By taking all the freely available configuration information your browser provides, plus unique tracking cookies that any website using ads will insert into your browser (without your permission), you become a unique individual on the web. Websites, ad networks, and social media sites associate anything you do with you.

You can see your own digital fingerprint using a free tool from Electronic Frontier Foundation, Panopticlick. Another interesting site to see how browsers track your movements is Click.

There is another, perhaps more common, approach to digital fingerprinting, and that is to derive a digital fingerprint from data. That will be the subject of Part 2.