Personally Identifiable Information (PII)
Personally Identifiable Information (PII) is any data that can be used to identify a specific individual. Social Security numbers, mailing or email address, and phone numbers have most commonly been considered PII, but technology has expanded the scope of PII considerably. It can include an IP address, login IDs, social media posts, or digital images. Geolocation, biometric, and behavioral data can also be classified as PII. This broad definition of PII creates security and privacy challenges, especially when specific and stringent safeguards for it are spelled out in regulations such as the European Union’s (EU’s) General Data Protection Regulation (GDPR).
Conceptions of Personally Identifiable Information (PII)
The U.S. government used the term "personally identifiable" in 2007 in a memorandum from the Executive Office of the President, Office of Management and Budget (OMB), and that usage now appears in US standards such as the NIST Guide to Protecting the Confidentiality of Personally Identifiable Information (SP 800-122). The OMB memorandum defines PII as follows:
Information which can be used to distinguish or trace an individual's identity, such as their name, social security number, biometric records, etc. alone, or when combined with other personal or identifying information which is linked or linkable to a specific individual, such as date and place of birth, mother’s maiden name, etc.
A term similar to PII, "personal data" is defined in EU directive 95/46/EC, for the purposes of the directive:
Article 2a: 'personal data' shall mean any information relating to an identified or identifiable natural person ('data subject'); an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity;
However, in the EU rules, there has been a clearer notion that the data subject can potentially be identified through additional processing of other attributes—quasi- or pseudo-identifiers. In the GDPR Personal Data is defined as:
Any information relating to an identified or identifiable natural person ('data subject'); an identifiable natural person is one who can be identified, directly or indirectly, in particular by reference to an identifier such as a name, an identification number, location data, an online identifier or to one or more factors specific to the physical, physiological, genetic, mental, economic, cultural or social identity of that natural person
Another term similar to PII, "personal information" is defined in a section of the California data breach notification law, SB1386:
(e) For purposes of this section, "personal information" means an individual's first name or first initial and last name in combination with any one or more of the following data elements, when either the name or the data elements are not encrypted: (1) Social security number. (2) Driver's license number or California Identification Card number. (3) Account number, credit or debit card number, in combination with any required security code, access code, or password that would permit access to an individual's financial account. (f) For purposes of this section, "personal information" does not include publicly available information that is lawfully made available to the general public from federal, state, or local government records.
The concept of information combination given in the SB1386 definition is key to correctly distinguishing PII, as defined by OMB, from "personal information", as defined by SB1386. Information, such as a name, that lacks context cannot be said to be SB1386 "personal information", but it must be said to be PII as defined by OMB. For example, the name John Smith has no meaning in the current context and is therefore not SB1386 "personal information", but it is PII. A Social Security Number (SSN) without a name or some other associated identity or context information is not SB1386 "personal information", but it is PII. For example, the SSN 078-05-1120 by itself is PII, but it is not SB1386 "personal information". However the combination of a valid name with the correct SSN is SB1386 "personal information".
The combination of a name with a context may also be considered PII; for example, if a person's name is on a list of patients for an HIV clinic. However, it is not necessary for the name to be combined with a context in order for it to be PII. The reason for this distinction is that bits of information such as names, although they may not be sufficient by themselves to make an identification, may later be combined with other information to identify persons and expose them to harm.
According to the OMB, it is not always the case that PII is "sensitive", and context may be taken into account in deciding whether certain PII is or is not sensitive.
Sensitive Vs. Non-Sensitive Personally Identifiable Information(PII)
Personally identifiable information (PII) can be sensitive or non-sensitive. Sensitive personal information includes legal statistics such as:
- Full name
- Social Security Number (SSN)
- Driver’s license
- Mailing address
- Credit card information
- Passport information
- Financial information
- Medical records
The above list is by no means exhaustive. Companies that share data about their clients normally use anonymization techniques to encrypt and obfuscate the PII, so it is received in a non-personally identifiable form. An insurance company that shares its clients’ information with a marketing company will mask the sensitive PII included in the data and leave only information related to the marketing company’s goal.
Non-sensitive or indirect PII is easily accessible from public sources like phonebooks, the Internet, and corporate directories. Examples of non-sensitive or indirect PII include:
- Date of birth
- Place of birth
The above list contains quasi-identifiers and examples of non-sensitive information that can be released to the public. This type of information cannot be used alone to determine an individual’s identity.
However, non-sensitive information, although not delicate, is linkable. This means that non-sensitive data, when used with other personal linkable information, can reveal the identity of an individual. De-anonymization and re-identification techniques tend to be successful when multiple sets of quasi-identifiers are pieced together and can be used to distinguish one person from another.
Multiple data protection laws have been adopted by various countries to create guidelines for companies that gather, store, and share the personal information of clients. Some of the basic principles outlined by these laws state that some sensitive information should not be collected unless for extreme situations.
Also, regulatory guidelines stipulate that data should be deleted if no longer needed for its stated purpose, and personal information should not be shared with sources that cannot guarantee its protection.
Cybercriminals breach data systems to access PII, which is then sold to willing buyers in underground digital marketplaces. For example, in 2015, the IRS suffered a data breach leading to the theft of more than a hundred thousand taxpayers’ PII. Using quasi-information stolen from multiple sources, the perpetrators were able to access an IRS website application by answering personal verification questions that should have been privy to the taxpayers only.
PII Security Controls
The Data Privacy Framework should define which security controls the organization needs to have in place to prevent data loss or data leak:
- Change Management: tracking and auditing changes to configuration on IT systems which might have security implications, such as adding/removing user accounts.
- Data Loss Prevention: implementing systems that can track sensitive data transferred within the organization or outside it, and identify unnatural patterns that might suggest a breach.
- Data masking: ensuring that data is stored or transmitted with the minimal required details for the specific transaction, with other details masked or omitted.
- Ethical walls: implementing screening mechanisms to prevent certain departments or individuals within an organization from viewing PII that is not relevant to their work, or that might create a conflict of interest.
- Privileged user monitoring: monitoring all privileged access to files and databases, user creation and newly granted privileges, blocking and alerting when suspicious activity is detected.
- Sensitive data access auditing: in parallel to monitoring activities by privileged users, monitoring and auditing all access to sensitive data, blocking and alerting on suspicious or anomalous activity.
- Secure audit trail archiving: ensuring that any activity conducted on or in relation to PII is audited and retained for a period of 1-7 years, for legal or compliance purposes, and also to enable forensic investigation of security incidents.
- User rights management: identifying excessive, inappropriate, or unused user privileges and taking corrective action, such as removing user accounts that have not been used for several months.
- User tracking: implementing ways of tracking user activity, online and while using organizational systems, to identify negligent exposure of sensitive data, compromise of user accounts, or malicious insiders.