During the Data Science Plenary Panel at Bio-IT World 2018, Tanya Cashorali of TCB Analytics highlighted some innovative use cases for data science in the pharmaceutical and biotech industries. These use cases span drug manufacturing, clinical enrollment, and bioinformatics.
For an example of one use case, here’s an in-depth look at how we helped Biogen, a Cambridge-based biotech company, leverage data science to implement a genealogy visualization and inspection solution for drug manufacturing quality control.
Background
Biogen is a pioneer in biotechnology, with a portfolio of pharmaceuticals for neurological and neurodegenerative conditions. Led by world-class research and development, Biogen uses novel science and leading-edge technologies to create, commercialize, and manufacture transformative therapies. With this ideology for incorporating innovative technologies into their operations, Biogen recognized an area for optimization – raw materials genealogy in drug manufacturing.
Drug manufacturing is a costly, time-consuming, and intricate process. Many combinations of raw materials in specific sequences are used to generate final drug substances. Even in the strictest environments complying to Good Manufacturing Practice (GMP) standards, major deviations can occur. When a deviation or discrepancy is identified, it is critical to stop manufacturing of all batches that include the problematic substance. However, identifying the incident’s root cause is a complex endeavor, since the manufacturing process data is not typically structured or stored to enable this type of analysis.
In addition to the considerable cost of lost batches that will accrue during an investigation, compliance issues are a major concern for the pharmaceutical industry. Over the past few years, failure to thoroughly review discrepancies and deviations is consistently one of the top citations in FDA inspectional observations. These issues are widespread, with over 100 citations issued for this observation each year (FDA, Summary of Inspectional Observations by Fiscal Year).
While Biogen had a process in place to identify the root cause of deviations, the effort was manual and time consuming. Manufacturing information existed in siloed electronic systems, some information was incomplete, and systems were rigid and didn’t allow for flexible genealogy mapping. Information that was submitted from raw materials vendors was typically paper-based, which resulted in a tedious process to extract data from those documents. Aside from the data constraints, the analysis of the information was also manual, with a single user running potentially hundreds of SQL queries to identify the root cause of deviations.
Solution
As an industry leader in pioneering technology solutions, Biogen proactively addressed issues and limitations in their process identifying deviations in raw material manufacturing. Partnering with TCB Analytics, they implemented a genealogy visualization and inspection solution for drug manufacturing quality control. Through centralized data mapping that links inputs to outputs, end-to-end across production, scientists can track the usage of any material throughout the product lifecycle.
This solution enables detailed monitoring of raw material inputs, intermediates, outputs, and final drug substances. The ability to quickly query for a specific drug recipe is critical to understanding trends and improving the at-line response time to potential negative deviations. With an intuitive node-based, drill-down interface, scientists and analysts can quickly identify a root lot associated with a deviation, the raw materials used in the solution, and any downstream substances or outputs that include the root lot. By hovering over a node, users are presented with additional details and mixed lot proportionality, to assist with their investigation.
To streamline the import of data from raw material vendors, data is now automatically stored in SQL Server. Data that was previously stored in separate silos, such as raw material vendor testing, solution testing, and drug substance release testing, is now consolidated for analysis. This visualization solution pulls raw material data from that repository, for a complete, automated picture of the manufacturing pipeline.
In addition to visualizations to assist with the investigation process, this application will also be leveraged to quickly export genealogical data to build predictive models. These models can predict whether specific raw materials could have a negative influence on drug substance product quality attributes. In these cases, the raw materials are flagged and put on hold prior to use in an actual manufacturing process. This level of foresight provides the flexibility to potentially save entire batches that otherwise may have been QC rejected by tweaking process parameters to improve predicted product quality or switching to new raw material lots.
This framework was developed using the following technologies:
- R: Open source statistical programming language, used to query data on raw materials, manufacturing pipelines, and transactions
- R Shiny: Interactive web application
- d3.js: Custom charting and visualizations that were embedded in R Shiny applications
Benefits
Biogen experienced numerous cost and time savings benefits after implementing this solution for visualizing raw material genealogy.
- Time Savings: Whereas prior investigations could run on for months and required a large team of investigators, one user can now identify a problematic substance within a few days.
- Investigator Expertise: Typically, mining this genealogy data is usually a task reserved for database users with advanced SQL skills. This solution eliminates the reliance on IT, enabling scientists with greater knowledge with the manufacturing pipeline to quickly investigate the raw material process.
- Actionable Intelligence: Ultimately, this quality control application will be used to provide scientists with deeper insights into the manufacturing process, and the ability to track trends and report on large-scale production.