Patent number: 8855938

Minimization of surprisal data through application of hierarchy of reference genomes

Original Assignee: International Business Machines Corporation

Field of technology: Biotech, Computer Software

Patent granted on: Tue, 07 Oct 2014

Patent drawing

Abstract

A method, computer product, and computer system of minimizing surprisal data comprising: at a source, reading and identifying characteristics of a genetic sequence of an organism; receiving an input of rank of at least two identified characteristics of the genetic sequence of the organism; generating a hierarchy of ranked, identified characteristics based on the rank of the at least two identified characteristics of the genetic sequence of the organism; comparing the hierarchy of ranked, identified characteristics to a repository of reference genomes; and if at least one reference genome from the repository matches the hierarchy of ranked, identified characteristics, comparing nucleotides of the genetic sequence of the organism to nucleotides from the at least one matched reference genome, to obtain differences and create surprisal data.