Thursday, December 24, 2015

At locus #930 I have the chimp marker instead of the human one!

     Yes, as strange at it sounds, I encountered something a bit crazy in my own mitochondrial DNA at location #930.  I've been comparing my genome with other humans to determine variation, but also with chimps and bonobos found in the databases at NCBI, and the Neanderthal and Denisovan samples extracted under the direction of Svante Paabo at the Max Planck Institute for Evolutionary Biology in Leipzig, Germany.  Step by step, base by base, I'm comparing the data, and found 30 distinctive markers separating chimps and humans among the first 1,008 base pairs analyzed.

Jane Goodall spent her life exploring the psychology and sociology of chimpanzees
    But at marker #930 something strange occurs.  Humans have a G (guanine) in this position.  Chimps and bonobos have an A (adenine).  However, my own DNA has an A!  I have the chimp marker, while humans (or to be clear, most humans) have a distinctive human marker.  How could this be? 
    It so happens that my mitochondrial DNA signature belongs within a specific haplotype known as T2b.  The mutations in my DNA can be arranged chronology and traced back to a founding population in Africa, ultimately connecting to every human on Earth--if you trace back far enough. The same is true for your own haplotype. My haplotype T2b descends from T2, which descends from T, which descends from JT, which descends from R, which descends from N, which descends from L3 all the way back to Africa. 
    So how did I end up with a supposedly distinctive chimp marker?  Well, apparently my value 930A was not original to my ancestry.  Haplotype T2b is defined by a mutation G930A (meaning the "G" mutated to an "A" in position 930).  It so happens that the "A" is the chimp value, seemingly by coincidence.  This doesn't mean I have more chimp ancestry than you.  It only means that I have a specific marker that mutated back to the value found in chimps today.  Back before chimps and humans existed we shared common ancestor.  I'm uncertain if that ancestor had a "G" or an "A" at position 930.  Therefore, I place G/A in red in that position on my report found here.  As I expand my study to include other diverse primates, I may be able to determine an original value.  After all, there are only two choices.  I have never found a "C" or "T" in this position. Perhaps the mutation occurred in the chimps from an original "G."  Perhaps the mutation occurred in humans from an original "A."  Perhaps this marker switches back and forth in both populations in prehistory.
    I don't have many chimp samples to work with, but so far all chimps I've studied have an "A" at position 930.  On the human side I have 20,600 samples, and I found this "back mutation" to the chimp value in seven lineages, in haplotypes L1c6, L5a1, L2a1c1, M1a1b, M44a, D4h3a1a, and T2b. It includes less than 3% of the world's human population.  Haplotype T2b is the most populous of these seven lineages that share G930A.  It is concentrated in central Europe.
    I think I can confidently state that marker 930 has shifted from the "G" to the "A" throughout human history--at least in seven instances.  Compared to most mitochondrial DNA markers, this position mutates rapidly.  Part of the outcome of my study is to determine which markers mutate rapidly and which markers are especially stable.  In comparative genetics, a difference in a fast mutating marker is not as significant as a difference in a slow mutating marker.  I have access to 20,600 human genomes, upon which to determine which markers mutate more often than others.  That's quite a lot of data.  More data allows for better conclusions.  What was thought of as a distinctive chimp marker is actually part of the DNA that overlaps between humans and chimps.  There are other markers like this as well.  I notate them all, base by base,  in my report.


   

Wednesday, December 23, 2015

Comparing DNA - What's the Process?

     Actually, for those that don't know the process, it's quite simple.  DNA sequences for many species are publically available through the National Institute of Health (NIH), by way of their affiliate, the National Center for Biotechnology Information (NCBI).  The NCBI maintains an ever-growing genomic database representing the vast diversity of life forms on earth. You can simply select a species and then download its genome.
     Any given genome is only an individual genetic representation of that species.  A collection of ten genomes of Pan troglodytes (the common chimpanzee), for example, will show some level of variation.  Therefore, mapping ten chimp genomes, and taking into account all the genetic variation of those ten individual chimps, provides a better comparison between species than simply having access to one sample. There is always some measure of genetic overlap between closely related species.
    This is the biological complexity of which we are a part.  Life is on a genetic continuum.  And genetics largely expresses itself in physiology.  The reason chimps are physiologically very similar to humans is based in the reality that our DNA sequences are very closely matched to theirs.
     Accurate methods which compare the similarity of chimp and human DNA will account for the variation within each species.  Fortunately, through www.phylotree.org, we have a publically available database of 20,600 human mtDNA genomes upon which to drawn information concerning this variation on the human side of our puzzle.  We have no such luck of the chimp side.  There are only an estimated 170,000  to 300,000 chimps left in the wild, reduced from a population of over 2 million that existed in 1995.
    My study includes three diverse samples, one common chimp (Pan troglodytes) from Gabon, one western chimp (Pan troglodytes verus) from Senegal, and a bonobo (Pan Paniscus) from Congo.  The bonobo, also known as the pygmy chimpanzee, was first sequenced in 2012.  My study lacks variation on the chimp side.  My results show 97.02% similarity across 1,008 base pairs.  We are at least that similar.  If I had more chimp samples to work with, the potential overlap could be slightly higher.
    Genomes are stored in a format known as FASTA, basically the list of the nucleotides A, G, C, T, in the order that they appear in the DNA sequence.  Mitochondrial DNA is a circular molecule, on average about 16,569 nucleotides long in a human, and (on average) about a dozen less in a chimpanzee.  The FASTA data can be downloaded into a spreadsheet and then compared base by base.  Being circular, mtDNA has no naturally defined beginning position or end position.  The NCBI data begins with the first mtDNA gene, what geneticists define as the "coding region." For family history DNA researchers, this is position 577. The NCBI data then wraps around so that the last nucleotide is position 576.  I follow the numbering system established by Bryan Sykes at Oxford in the 1990s. If you have your mitochondrial DNA mapped through a testing service such as FamilyTreeDNA, the results will follow the same numbering system as my report.
    I first align the sequences so they are starting from the same position, and then I compare each base to base.  When I encounter a distinction between any of the samples in the study, I flag it in yellow and provide a notation.  In my current study, found here, The first 28 positions are exactly the same. A distinction is found at position 29, as two of the chimp genomes exhibit G (guanine), while all the human samples (including myself and two other anonymous individuals) exhibit C (cytosine).  In addition the Neanderthal and Denisovan samples in the study exhibit C at position 29.  At first glance this looks like a marker distinct between chimps and humans.  However, when the bonobo sample is assessed, we find the bonobo shares a C at position 29, the same as the human samples and opposed to the G shared between the western chimps and common (Eastern) chimps.  Therefore position 29 is not a marker distinct between chimp and human populations. 
    Based on this marker alone we would conclude the bonobo subspecies of chimps are more closely related to humans than western or common chimps.  As we move down the sequence, will find this confirmed, in part.  As we move to position 40, we find the chimp samples again don't match the human ones.  The bonobos have a G where the human have a T, and the western and common chimps have a C.  Here our first distinct marker between chimps and humans.  The T to C mutation (or vice versa) is called a transition, and is about three times more likely to occur than a transversion (T to G), simply based on the chemistry of the molecules as they replicate. 
    I continue this comparison through the first 1,008 bases and find 30 markers distinct in chimp populations compared to human populations...I'll point out more of these distinction in the next few days...

Tuesday, December 22, 2015

A base by base comparison of chimp and human DNA

     It's been a sideline interest of mine to determine, genetically, how similar we really are to our closest cousins on the tree of life.  Since 2008 I've been using DNA to trace family history, revealing ethnic ancestry and the close relationships between seemingly unrelated surnames.  DNA analysis has traced our lineages back further, and yielded all kinds of diverse information unknown through paper records alone. 
     Of course the same methodology can be used to trace back further, beyond (and before) the time of anatomically modern humans.  We now have a several complete Neanderthal genomes, and DNA from the recently discovered Denisovan fossils in southern Siberia.  Genetic anthropologists also have the reconstructed genome of "Mitochondrial Eve," the maternal ancestor to all living humans.  We know this DNA sequence not because we have identified a fossil of such an ancestor, but rather we have traced back the nested mutations of all humans tested to date, and have reconstructed what this "original" genome looked like--at least in reference to mitochondrial DNA.
    Genetic anthropologists define "Mitochondrial Eve" as the most recent human ancestor of all living human populations today, and therefore automatically exclude Neanderthal, Denisovan, and all other archaic human populations that are now extinct.  Now we have genomes from these ancient ancestors.  Doesn't it make sense to determine genetically how they (and us) fit into a primate tree?  There are plenty of samples of chimp and bonobo DNA available as well. All the DNA required to construct a base by base comparison is available to the public.  Someone just needs to begin the painstaking work of documenting the comparison piece by piece, aligning the sequences, discovering the insertions and deletions, recording every deviation and accounting for every variation within the populations tested.
     This work is far to important to be overlooked. If the differences between chimps and humans are based in genetics, then these sequence differences between our species are especially significant. Once compiled, we'll better determine what exactly defines us as human genetically.  As we learn what every difference means, there will certainly be surprises, I suspect surprises both in regards to our similarities and our differences.
    Human mitochondrial DNA is composed of 16,569 base pairs, a manageable number, easily contained within a standard excel spreadsheet.  So far, I've compared the first 1,008 base pairs, and found 30 absolute base-pair distinctions between "us" and "them."  That's 97.02% similarity.  The 3% tagged as distinct, must cause all the difference.
    Actually 25 of those absolute base-pair distinctions were found within the first 576 bases, what geneticists called the Hyper-Variable Region (HVR).  The other five distinct bases were found in the region 577-1008, a coding region where DNA expresses proteins, and is therefore less susceptible to mutations.  Most of the DNA moving forward in the sequence (beyond marker 1008), is also coding. I suspect the current 97% similarity will increase, perhaps to 98% or 98.5%, as I complete the comparison of the full mitochondrial genome of 16,569 bases. 
    A chimp/human comparison of the Y-chromosome should also reveal some interesting results, however our Y-DNA is something on the order of 59 million base pairs long, so for now I'll focus on the mitochondria.   You can download a pdf of my work in progress, a base of base comparison (57 pages) of the first 1,008 nucleotides here.