Wednesday, December 23, 2015

Comparing DNA - What's the Process?

     Actually, for those that don't know the process, it's quite simple.  DNA sequences for many species are publically available through the National Institute of Health (NIH), by way of their affiliate, the National Center for Biotechnology Information (NCBI).  The NCBI maintains an ever-growing genomic database representing the vast diversity of life forms on earth. You can simply select a species and then download its genome.
     Any given genome is only an individual genetic representation of that species.  A collection of ten genomes of Pan troglodytes (the common chimpanzee), for example, will show some level of variation.  Therefore, mapping ten chimp genomes, and taking into account all the genetic variation of those ten individual chimps, provides a better comparison between species than simply having access to one sample. There is always some measure of genetic overlap between closely related species.
    This is the biological complexity of which we are a part.  Life is on a genetic continuum.  And genetics largely expresses itself in physiology.  The reason chimps are physiologically very similar to humans is based in the reality that our DNA sequences are very closely matched to theirs.
     Accurate methods which compare the similarity of chimp and human DNA will account for the variation within each species.  Fortunately, through www.phylotree.org, we have a publically available database of 20,600 human mtDNA genomes upon which to drawn information concerning this variation on the human side of our puzzle.  We have no such luck of the chimp side.  There are only an estimated 170,000  to 300,000 chimps left in the wild, reduced from a population of over 2 million that existed in 1995.
    My study includes three diverse samples, one common chimp (Pan troglodytes) from Gabon, one western chimp (Pan troglodytes verus) from Senegal, and a bonobo (Pan Paniscus) from Congo.  The bonobo, also known as the pygmy chimpanzee, was first sequenced in 2012.  My study lacks variation on the chimp side.  My results show 97.02% similarity across 1,008 base pairs.  We are at least that similar.  If I had more chimp samples to work with, the potential overlap could be slightly higher.
    Genomes are stored in a format known as FASTA, basically the list of the nucleotides A, G, C, T, in the order that they appear in the DNA sequence.  Mitochondrial DNA is a circular molecule, on average about 16,569 nucleotides long in a human, and (on average) about a dozen less in a chimpanzee.  The FASTA data can be downloaded into a spreadsheet and then compared base by base.  Being circular, mtDNA has no naturally defined beginning position or end position.  The NCBI data begins with the first mtDNA gene, what geneticists define as the "coding region." For family history DNA researchers, this is position 577. The NCBI data then wraps around so that the last nucleotide is position 576.  I follow the numbering system established by Bryan Sykes at Oxford in the 1990s. If you have your mitochondrial DNA mapped through a testing service such as FamilyTreeDNA, the results will follow the same numbering system as my report.
    I first align the sequences so they are starting from the same position, and then I compare each base to base.  When I encounter a distinction between any of the samples in the study, I flag it in yellow and provide a notation.  In my current study, found here, The first 28 positions are exactly the same. A distinction is found at position 29, as two of the chimp genomes exhibit G (guanine), while all the human samples (including myself and two other anonymous individuals) exhibit C (cytosine).  In addition the Neanderthal and Denisovan samples in the study exhibit C at position 29.  At first glance this looks like a marker distinct between chimps and humans.  However, when the bonobo sample is assessed, we find the bonobo shares a C at position 29, the same as the human samples and opposed to the G shared between the western chimps and common (Eastern) chimps.  Therefore position 29 is not a marker distinct between chimp and human populations. 
    Based on this marker alone we would conclude the bonobo subspecies of chimps are more closely related to humans than western or common chimps.  As we move down the sequence, will find this confirmed, in part.  As we move to position 40, we find the chimp samples again don't match the human ones.  The bonobos have a G where the human have a T, and the western and common chimps have a C.  Here our first distinct marker between chimps and humans.  The T to C mutation (or vice versa) is called a transition, and is about three times more likely to occur than a transversion (T to G), simply based on the chemistry of the molecules as they replicate. 
    I continue this comparison through the first 1,008 bases and find 30 markers distinct in chimp populations compared to human populations...I'll point out more of these distinction in the next few days...

No comments:

Post a Comment