Patent number: 8812243

Transmission and compression of genetic data

Original Assignee: International Business Machines Corporation

Field of technology: Biotech, Computer Software

Patent granted on: Tue, 19 Aug 2014

Patent drawing

Abstract

A method, computer product and computer system of transmitting a compressed genome of an organism: a computer at a source reading an uncompressed sequence and a reference genome from a repository; the computer comparing nucleotides of the genetic sequence of the organism to nucleotides from a reference genome, to find differences where nucleotides of the genetic sequence of the organism which are different from the nucleotides of the reference genome; the computer using the differences to create surprisal data, the surprisal data comprising a starting location of the differences within the reference genome, and the nucleotides from the genetic sequence of the organism which are different from the nucleotides of the reference genome; and the computer transmitting, to a destination, a compressed genome comprising: surprisal data and an indication of the reference genome, discarding sequences of nucleotides that are the same in the sequence of the organism and reference genome.