DNA can store petabytes of information per gram and can last intact for tens of thousands of years. This makes it an appealing prospect for long-term archival storage. However, DNA synthesis, sequencing, and replication are prone to errors, which limit its potential as a storage medium. These errors can be controlled by applying the tools of information theory, treating DNA storage as a noisy channel coding problem. Several coding schemes for DNA storage have been proposed that address the interrelated issues of error avoidance, error correction and redundancy. There are currently no schemes that address all the above. Researchers at UC Berkeley have combine some of these ideas, and introduced new ones, using a modular strategy for code design. With this method, codes can be assembled to meet requirements including error-avoidance, error-correction (resistant to corruption of the information by substitutions, insertions, duplications, or deletions that are introduced during sequencing or replication of the DNA), and demarcation of metadata. The DNA generated by the codes is free of short local repeats and other (foldback) structure. The codes generated by this method are flexible in that they arise by systematic combination of state machines, each machine formally representing a particular transformation of the input sequence. So, for example, one state machine might be used to introduce a "watermark" signal that helps protect against insertion/deletion errors; another state machine could be used to convert the binary sequence into a ternary sequence (or mixed-radix sequence); another state machine would convert the ternary or mixed-radix sequence into a non-repeating DNA sequence; and another state machine to model the errors that are introduced during sequencing.