FAQ
Frequently Asked Questions
Accessing the Full Dataset
Can I download the Data Breach Chronology Database?
Yes! The complete dataset is available for purchase here. This data includes detailed breach notifications, classification data, and full notification letter text where available.
We offer tiered pricing with substantial discounts for academic researchers. If you're conducting academic research, working with a nonprofit, or are a media outlet operating on a limited budget, please contact us at databreachchronology@privacyrights.org to request a complimentary download and describe your proposed use and affiliation. We prioritize requests that align with our mission of advancing public understanding of privacy issues and consumer privacy protections. In your message, we encourage you to explain how your work has the potential to advance consumer privacy.
Our team will respond to you as soon as possible; however, we cannot guarantee that we will be able to respond in time for any deadline.
Is there a way to preview the data before purchasing?
Yes! We offer sample files to help you explore the structure and content of our data:
- Download SQLite sample (.db file, coming soon)
- Download Excel sample (.xlsx file)
- Download CSV sample (.csv file)
While the full database is available in all three formats, we recommend the SQLite version for most analysis work. The Excel and CSV versions have been specially processed to ensure compatibility with spreadsheet software which has the results of collapsing the formatting of the extracted data breach notification letters and manipulating some fields to prevent parsing issues.
The full database includes complete notification letter text and is available in all three formats upon purchase.
About the Project
What is the Data Breach Chronology
The Data Breach Chronology helps advocates, policymakers, journalists, and researchers understand reported data breaches in the United States. Launched in 2005 following the ChoicePoint incident, it has evolved from a manually maintained list into a comprehensive database combining breach notifications from state and federal agencies with detailed analysis of notification letters.
The project tracks breaches across all sectors - from small businesses to major corporations, from K-12 schools to universities, and from local medical practices to national healthcare systems. Through a combination of human expertise and carefully validated machine learning techniques, we extract and structure information previously locked away in thousands of notification letters.
How is this project funded?
This project was funded in large part thanks to The Rose Foundation for Communities and the Environment Consumer Products Fund. We have also received funds for this project from cy pres awards and Consumer Federation of America.
If you are interested in supporting this project, please reach out to us at support@privacyrights.org.
Data Sources, Coverage and Project Methodology
Is this a complete record of every data breach in the United States?
The data is comprised of publicly available information on reported breaches and should not be considered a complete and accurate representation of every data breach in the United States. It reflects breaches reported in the United States that are made publicly available by government entities.
What data makes up the Data Breach Chronology?
The Data Breach Chronology draws from fifteen U.S. government agencies that maintain public records of data breach notifications. These include the U.S. Department of Health and Human Services and various state Attorneys General who require organizations to report breaches affecting their residents.
Each state has unique reporting thresholds and requirements. For example, some states require reporting of any breach affecting state residents, while others set minimum thresholds. Some states make notification letters public, while others provide only summary data.
When a breach affects residents of multiple states, it may be reported to several agencies. We track these related reports and group them together to provide a more complete picture of each incident.
What fields do you track for each breach?
We collect and structure detailed information about each breach across several categories:
Core Information:
- Organization name and alternative names
- Organization type classification
- Source agency that reported the breach
- Unique identifiers for tracking
Incident Details:
- Description of what occurred
- Type of breach
- Types of information exposed
- Dates (when reported, when occurred, when ended)
- Number of individuals affected (total and state residents)
Location Information:
- Street address
- City, state, and ZIP code
- Country
Related Incidents:
- Group identifier for related breaches
- Standardized organization name
- Common breach classification
- Common organization type
Source Documentation
- Agency report URL
- Notification letter URL
- Full text of notification letter
Each field includes explanatory notes documenting how we determined the values and any relevant context.
How do you analyze and classify breaches?
Each breach notification is analyzed across multiple dimensions, including the type of organization affected, the method of breach, what information was exposed, geographic information and the scale of impact. We use a consistent classification system that has evolved with our understanding of data breaches:
Organization Types include:
- BSF (Financial Services Business): Banks, credit unions, investment firms, insurance carriers
- BSO (Other Business): Technology companies, manufacturers, utilities, professional services
- BSR (Retail Business): Physical and online retail merchants
- EDU (Educational Institutions): Schools, universities, educational services
- GOV (Government and Military): Public administration, government agencies
- MED (Healthcare Providers): Hospitals, clinics, HIPAA-covered entities
- NGO (Nonprofits): Charities, advocacy groups, religious organizations
Breach Types include:
- CARD: Physical payment card compromises (skimming devices, POS tampering)
- HACK: External cyber attacks (malware, ransomware, network intrusions)
- INSD: Internal threats from authorized users
- PHYS: Physical document theft or loss
- PORT: Portable device breaches (laptops, phones, tablets)
- STAT: Stationary device breaches (desktops, servers)
- DISC: Unintended disclosures (misconfiguration, accidents)
How do you build and maintain the database?
The Data Breach Chronology combines regular monitoring of government notification portals with careful analysis of breach notification letters. In addition to every government published breach report, we download and process any available notification letters using optical character recognition and text extraction tools to make their contents searchable and analyzable.
Key steps in our process include:
- Collection of breach reports from fifteen government sources
- Extraction of text from notification letters where available
- Standardization of organization names and dates
- Analysis and classification using consistent taxonomies
- Grouping of related notifications across multiple states
How do you use artificial intelligence in this work?
As a privacy and consumer advocacy organization, we approach artificial intelligence with both careful consideration and concern. We recognize AI's profound implications for civil liberties, environmental justice, economic equity, and the concentration of power in the technology sector. These issues are at the core of our mission and shape our approach to using AI in our work.
The scope of data breach reporting—thousands of notifications across multiple agencies—creates a significant challenge for a small nonprofit organization. While we previously maintained this database through manual entry, the volume of notifications has grown beyond what we can process without technological assistance. AI tools help us continue this important work while maintaining consistent standards.
Our approach balances efficiency with accuracy:
- We use AI to analyze notification text and extract factual information, as well as to help determine breach and organization classifications
- Our AI processing is strictly limited to analyzing the actual content of notifications, not making broader inferences
- Multiple automated validation checks help identify potential errors or inconsistencies
- We regularly review system output and monitor for systematic errors or biases
- While the processing is largely automated, we maintain oversight of the final staging and publication process
While we work to minimize issues like confabulation or incorrect inferences through careful system design and validation steps, we acknowledge that complete elimination of these problems isn't currently possible. We continue to explore ways to improve our process, including the potential development of dedicated tools that would allow for local processing and reduce dependency on large technology platforms.
Contributing to the Project
How can I get involved with this project?
Thank you for your interest – there is no shortage of work that can be done to continue to improve this project, and there are many ways to join us in that endeavor!
- Donate your time and expertise as a data science or tableau volunteer to help us collect, clean, process, maintain, and present this resource. Contact us at databreachchronology@privacyrights.org with the subject line “VOLUNTEER”.
- Apply for a legal internship to help us stay up to date on changing data security and breach notification laws.
- Apply to join our Data Breach Chronology advisory committee to help drive future project decisions and new features. Contact us at databreachchronology@privacyrights.org with the subject line “ADVISORY COMMITTEE”.
- Donate to sustain the project.
If you are interested in getting updates on this project, join our email list here.
I have an idea for a new visualization, can I provide input for this project?
Yes! We welcome suggestions for improving how we present this data. Please let us know at databreachchronology@privacyrights.org and include “SUGGESTION” in the subject line.
I believe a breach has incorrect information associated with it, how can I alert you?
Please email us at databreachcorrections@privacyrights.org and include “CORRECTION” in the subject line followed by the name of the breached organization. Include any documentation that supports the correction so we can review and update our records.
What's the difference between the current database and the historical archive (2005-2018)?
While our current database includes breaches from 2005 onward, our approach to collecting and analyzing breach notifications changed significantly in 2019. Before 2019, we maintained a manually curated list of breaches. This historical dataset, representing our original data collection effort, is available through a separate interactive dashboard. Though it uses similar classification systems, it reflects our earlier, manual approach to tracking data breaches.
Acknowledgements
The Data Breach Chronology began in 2005 under the leadership of Beth Givens, Privacy Rights Clearinghouse's founder and former Executive Director. The current version was developed and is maintained by Emory Roane, Associate Director of Policy at Privacy Rights Clearinghouse.
We are also thankful for the contributions of Coleman Research Lab, Ahmed Eissa, The Rose Foundation for Communities and the Environment, Consumer Federation of America, and everyone else who has supported the project in its various forms over the years.
Limitations and Disclaimers
The Data Breach Chronology is based on publicly available information and should not be considered a complete and accurate representation of every data breach in the United States. Rather, it reflects the data breaches that have been reported and made publicly available in the United States.
Users should pay careful attention to the issue of duplicate reporting when making use of this data or making assertions based on this data. While we work to identify when a single breach has been reported to multiple state Attorneys General, this process is not perfect.
Additionally, though we collect the contents of breach notification letters where possible, we do not host these letters locally–and source URLs may no longer be active.
Privacy Rights Clearinghouse makes no representations as to the accuracy of the information included in the Data Breach Chronology.