Accelerating Synthetic Biology Discovery through Integrated Curation

Accelerating Synthetic Biology Discovery through Integrated Curation

Genetic Design Automation

Synthetic biology designed systems have many applications in areas including environmental, manufacturing, sensor development, defense, and medicine. However, currently the progress and usefulness of synthetic biology is impeded by the time required for literature studies and the replication of existing but poorly documented work. The Synthetic Biology Knowledge System (SBKS) project endeavored to address these challenges by integrating data from parts repositories with information extracted from literature into a unified knowledge system. However, this form of post-hoc curation requires the extraction of knowledge from manuscript and supplemental text files after publication by curators separate from the original authors. To handle large amounts of data, machines are used to scour free text and attempt to recognize key words and work out their meaning from context. This tests the limits of named entity recognition and entity classification. Additionally, it leaves ambiguous entities that only the original authors might disambiguate. For example, yeast may refer to many different strains of yeast. Furthermore, the SBKS project also extracted sequences provided as supplemental information in publications. However, these sequences, even when they are provided, are typically poorly annotated, incomplete, and provided in non-machine readable formats. Taken together, the SBKS project demonstrated that reconstruction of this important design information through post-hoc curation is extremely noisy and error prone.

This project is founded by National Science Foundation Grants No. 2231864

Publications