Pistoia Alliance Launches New DataFAIRy: Bioassay Project to Make More Data Machine-Readable and to Drive AI Adoption

The Pistoia Alliance, a global, not-for-profit alliance that advocates for greater collaboration in life sciences R&D, launches the second phase of its DataFAIRy: Bioassay project, which aims to convert bioassay data into machine-readable formats that adhere to the FAIR guiding principles of Findable, Accessible, Interoperable and Reusable. The current pilot phase has been sponsored by AstraZeneca, Bristol Myers Squib, Novartis and Roche, and has successfully annotated 496 assays using a Natural Language Processing model that has been custom-built to recognize life sciences language. This second phase aims to scale the annotation process by 10 to 100-fold, and eventually promote the data model to become the industry standard.

Biological assays are analytical methods that are crucial for testing compounds being considered for new drugs, as well as monitoring environmental toxicity. There are currently more than 1.3 million biological assay protocols that exist in plain-text formats, such as published papers or vendor notes. Selection and validation of assays currently requires a labor-intensive search, taking scientists up to 12 weeks per assay. Adhering to the DataFAIRy model will reduce the time scientists spend searching and planning assay experiments. In addition, assay metadata is a popular data type for post-hoc data mining. But most of these published data and metadata are not in a form suitable for automated mining. They are partially annotated in public data banks, but the volume, depth and quality of these annotations are inadequate for addressing many current and future business questions. Yet, Gartner predicts that 85 percent of AI projects will deliver erroneous outcomes due to data issues, for example information not being machine readable. Projects such as DataFAIRy are therefore crucial to AI adoption being successful in the life sciences.

“For the duration of my career, which has spanned the last thirty years, unstructured data has been a major problem for scientists. As the volume, variety and complexity of assay information continues to increase, organizations must manage their data more effectively, so that researchers can make the most out of their time and organizations can fully realize the benefits of digital transformation,” explains Dr Vladimir Makarov, Project Manager of The Pistoia Alliance AI and ML Centre of Excellence. “The DataFAIRY model we have developed will not only reduce the time bench scientists spend searching for assay information. It may also allow them to skip experiments known to have failed in the past. In turn, this will decrease the costs for companies and accelerate vital research.”

Although digitalization has made companies more aware of the importance of robust data management, the lack of industry standards is still a barrier to successful annotation and management of protocols, including assays. Adopting the FAIR principles is the first step towards enabling greater data sharing between organizations and helping scientists cope with the growing volume and complexity of data generated. Additionally, current data models are not built to recognize scientific language so a new model must be created to automate the annotation of these valuable resources. The second stage of the DataFAIRy project will further develop a model of this kind in a community-wide collaborative way.

“AI and Natural Language processing tools need to be built with scientific terminology in mind in order to be successful,” continues Dr Makarov. “The DataFAIRy model we have built will automate the annotation process so that assays are searchable and reusable, speeding up valuable research. We hope that this model will become the community standard for the publication of new assays and for the management of existing assays across vendors, regulatory agencies, and publishers, in addition to pharma and biotech.”

If you are interested in supporting the next stage of the DataFAIRy project, please contact projectteamdatafairy@pistoiaalliance.org. Or, for assistance adopting FAIR in your organization you can download the Pistoia Alliance’s free FAIR Toolkit, which contains method tools, training and use cases, allowing organizations to learn from industry successes.

About The Pistoia Alliance:

The Pistoia Alliance is a global, not-for-profit members’ organization made up of life science companies, technology and service providers, publishers, and academic groups working to lower barriers to innovation in life science and healthcare R&D. It was conceived in 2007 and incorporated in 2009 by representatives of AstraZeneca, GSK, Novartis and Pfizer who met at a conference in Pistoia, Italy. Its projects transform R&D through pre-competitive collaboration. It overcomes common R&D obstacles by identifying the root causes, developing standards and best practices, sharing pre-competitive data and knowledge, and implementing technology pilots. There are currently over 150 member companies; members collaborate on projects that generate significant value for the worldwide life sciences R&D community, using The Pistoia Alliance’s proven framework for open innovation.