SeqImprove: Machine-Learning-Assisted Curation of Genetic Circuit Sequence Information

Abstract

The progress and utility of synthetic biology is currently hindered by the lengthy process of studying literature and replicating poorly documented work. Reconstruction of crucial design information through post hoc curation is highly noisy and error-prone. To combat this, author participation during the curation process is crucial. To encourage author participation without overburdening them, an ML-assisted curation tool called SeqImprove has been developed. Using named entity recognition, called entity normalization, and sequence matching, SeqImprove creates machine-accessible sequence data and metadata annotations, which authors can then review and edit before submitting a final sequence file. SeqImprove makes it easier for authors to submit sequence data that is FAIR (findable, accessible, interoperable, and reusable).

Publication
ACS Synthetic Biology
Zachary Sents
Zachary Sents
Coursewand, Chief Technology Officer
Duncan Britt
Duncan Britt
Undergraduate Researcher
William Mo
William Mo
Graduate Researcher, Ph.D.
Ryan Greer
Ryan Greer
Undergraduate Researcher