Patent number: 8751166

Parallelization of surprisal data reduction and genome construction from genetic data for transmission, storage, and analysis

Original Assignee: International Business Machines Corporation

Field of technology: Biotech, Computer Software

Patent granted on: Tue, 10 Jun 2014

Patent drawing

Abstract

A method, computer product, and computer system of reducing an amount of data representing a genetic sequence of an organism, comprising: a computer dividing a reference genome and a sequence of the organism into parts and assigning the parts to one of a plurality of computer processing elements. Within each computer processing element, comparing nucleotides of the genetic sequence of the organism to nucleotides from a part of the reference genome, to find differences where nucleotides of the genetic sequence of the organism which are different from the nucleotides of the reference genome; and storing the surprisal data in a repository. Combining the parts of the surprisal data from the repository to form a complete set of surprisal data representing the differences between the genetic sequence of the organism and the reference genome; and storing the complete set of surprisal data in the repository.