Data Integrity for Spreadsheets

Since the FDA’s Data Integrity guidance came out in April 2016, there has been a lot of interest on the topic. Indeed, the number of warning letters for drug GMPs increased by 12% in FY2017 as compared to the previous year. 65% of these warning letters cited data integrity violations.

In a world awash with data, where does one start to improve data integrity? From an IT perspective, data is generally classified as:

  • Structured data, that normally resides in a relational database such as SQL or Oracle. Enterprise applications such as ERP, MES and LIMS typically store their data in such relational databases and are examples of structured data.
  • Unstructured data, that refers to files residing on users’ local machines or network drives. Common examples of unstructured data include spreadsheets, Word/PDF documents, drawings, instrument lab or other raw data files in CSV or TXT format.

Data integrity assurance requires a number of components that include, but are not limited to, security, audit trails and electronic signatures. Applying a risk-based approach to data would lead to the following general conclusions:

  • Structured data, at a minimum, usually meets the security requirements for data integrity as most enterprise applications feature access controls and may also meet other requirements as well.
  • Unstructured data, often lacks even the most basic of data integrity controls – security and access protection, not to mention the other required elements.

Amongst unstructured data, there is one particular file format that poses the highest data integrity risk. While all other file formats contain content, this file format also contains some combination of calculations, logic, code, external links and queries, making it into a full-blown, complex application that masquerades as an innocuous file.

There are no prizes for guessing what this file format is – it is indeed a spreadsheet! Compounding the risk from spreadsheets is their easy availability on everyone’s desktop, familiarity, varying levels of expertise, lack of security, and increasing complexity. From over 65,000 rows in Excel 2003, spreadsheets can now have over a million rows.

Spreadsheets are commonly used in manufacturing, laboratory, clinical, and training processes. However, they can be used in support of GXP operations as an easy substitute for paper any time data needs to be stored, in virtually any environment or setting and regardless of company size. For any quality, compliance or risk professional looking to improve data integrity at their firm, spreadsheets are “low-hanging fruit” that should be addressed with the highest priority. Not only do they pose the highest risk, but making them compliant with 21 CFR Part 11 and FDA’s data integrity guidance can be accomplished with relatively low cost and effort.

The eInfotree Excel Desktop software from CIMINFO automatically makes an existing Excel spreadsheet compliant in less than a minute, and can also quickly convert a large number of Excel files with automated settings. Security, audit trails and electronic signatures become embedded in an eInfotree controlled spreadsheet meeting all of the data integrity requirements. The XLValidator software allows for visual error-checking, logic analysis, design reviews and automatic generation of validation documentation saving hours of manual effort and labor. For more information on both these time-saving solutions, visit


Sustainable Business Practice leading to Quality and Compliance Enhancements

Companies that use spreadsheets for business process that involve GCP, GLP or GMP outcomes often look to tighten up good practice by focusing on the most immediate
problem(s) at hand. Typically that involves one or more of the following processes:

1. Validate of the spreadsheet
2. Verify that the use of the spreadsheet complies with Predicate rules and 21 CFR Part 11 requirements for security, audit trail and e-Signature sign offs
3. Validate the application providing the controls

All three processes are important and play a critical role in the proper use of spreadsheets in labs, clinical trials, research or manufacturing and in meeting requirements of 21 CFR Part 11.

CIMINFO believes that a sustainable compliance approach to Life Cycle Management of Spreadsheets should be comprised of repeatable business processes. Whether management is looking for help in the areas of inventory and risk assessment, validation/re-validation or deploying a 21 CFR Part 11 control environment there should be solutions in place that can meet these areas of need, today, tomorrow and into the future. Everyone understands the time, effort and challenges of spreadsheet validation but is enough attention devoted to inventory and risk assessment? Or to the final controlled environment that provides file level security and Audit Trail for change management? By using our Life Cycle Management toolbox all of these necessary processes can be addressed and managed in a way that complies with all GxP concerns while improving business intelligence, performance and a sustainable compliance approach.

CIMINFO’s Life Cycle Management Approach is one that has been developed over many years, coupling our extensive domain understanding of spreadsheets and databases along with close working relationships with Life Science companies helping them to address information, validation or control gaps in their unique processes. Besides the knowledge base which has grown from these relationships there are three independent but closely aligned software applications which CIMINFO has developed and that can be applied to these ongoing business and regulatory concerns.

They are;

  • XLRisk – Spreadsheet Inventory and Risk Assessment
  • XLValidator – Streamline Validation Process
  • eInfotree Excel Module – Control Spreadsheets in a 21 CFR Part 11 Environment

In Summary

Throughout the Life Science industry as in other business verticals there is pressure to cut costs, increase production while keeping quality and safety high. The use of smart software toolsets can aid in your 21 CFR Part 11 compliance initiatives, add quality enhancements to a number of business processes and provide peace of mind to management that all necessary aspects of spreadsheet/database management are being met. Furthermore once this sound approach to a spreadsheet management strategy is understood and implemented the outcome at any part of the process whether it is information gathering, validation or production use is enhanced, moves along faster and the output is far more secure. A Life Cycle Management approach leads to repeatable sustainable processes that aids business requirements and all quality and compliance activities.