Data Integrity for Spreadsheets

Since the FDA’s Data Integrity guidance came out in April 2016, there has been a lot of interest on the topic. Indeed, the number of warning letters for drug GMPs increased by 12% in FY2017 as compared to the previous year. 65% of these warning letters cited data integrity violations.

In a world awash with data, where does one start to improve data integrity? From an IT perspective, data is generally classified as:

  • Structured data, that normally resides in a relational database such as SQL or Oracle. Enterprise applications such as ERP, MES and LIMS typically store their data in such relational databases and are examples of structured data.
  • Unstructured data, that refers to files residing on users’ local machines or network drives. Common examples of unstructured data include spreadsheets, Word/PDF documents, drawings, instrument lab or other raw data files in CSV or TXT format.

Data integrity assurance requires a number of components that include, but are not limited to, security, audit trails and electronic signatures. Applying a risk-based approach to data would lead to the following general conclusions:

  • Structured data, at a minimum, usually meets the security requirements for data integrity as most enterprise applications feature access controls and may also meet other requirements as well.
  • Unstructured data, often lacks even the most basic of data integrity controls – security and access protection, not to mention the other required elements.

Amongst unstructured data, there is one particular file format that poses the highest data integrity risk. While all other file formats contain content, this file format also contains some combination of calculations, logic, code, external links and queries, making it into a full-blown, complex application that masquerades as an innocuous file.

There are no prizes for guessing what this file format is – it is indeed a spreadsheet! Compounding the risk from spreadsheets is their easy availability on everyone’s desktop, familiarity, varying levels of expertise, lack of security, and increasing complexity. From over 65,000 rows in Excel 2003, spreadsheets can now have over a million rows.

Spreadsheets are commonly used in manufacturing, laboratory, clinical, and training processes. However, they can be used in support of GXP operations as an easy substitute for paper any time data needs to be stored, in virtually any environment or setting and regardless of company size. For any quality, compliance or risk professional looking to improve data integrity at their firm, spreadsheets are “low-hanging fruit” that should be addressed with the highest priority. Not only do they pose the highest risk, but making them compliant with 21 CFR Part 11 and FDA’s data integrity guidance can be accomplished with relatively low cost and effort.

The eInfotree Excel Desktop software from CIMINFO automatically makes an existing Excel spreadsheet compliant in less than a minute, and can also quickly convert a large number of Excel files with automated settings. Security, audit trails and electronic signatures become embedded in an eInfotree controlled spreadsheet meeting all of the data integrity requirements. The XLValidator software allows for visual error-checking, logic analysis, design reviews and automatic generation of validation documentation saving hours of manual effort and labor. For more information on both these time-saving solutions, visit