HAM DNA Group 2 Spectral Reconstruction

HAM DNA Group #2
Spectral Reconstruction
(from about 1740 to about 1850)
Phylogenetic Chart

April, 2006
April, 2006

Contact HAM Country
This is my attempt to reconstruct the ancestral lines in Group01 of the HAM DNA Project. I call it a "Spectral" reconstruction for lack of a better term.  By that I mean I am reconstructing by adding hypothetical ancestral lines where no data currently exist for them.  The hypothetical information for the most part is based uponrunning the current data through the LAMARC * software, which does a Hastings-Metropolis Monte Carlo Markov Chain analysis with a Bayesian analysis. That output was then analized for ancestral nodes.  At the time of this writing, there is no scientific method accepted by geneticists to reconstruct ancestral nodes for use in genealogy.

The goal is to provide a reasonable estimate of how and when the ancestral lines came to be, so that genealogists can better estimate where to concentrate their research.  If the result of DNA analysis results in a better determination of dates and time spans, then the data would deliver more than what is originally expected.

What follows are HAM Surname DNA Project Phylogenetic charts, generated using the data from the DNA results for the HAM DNA Project.  Unless otherwise indicated, all charts are based upon TMRCA calculations, which is based upon Genetic Distance and Mutation Rate to give Time to Most Recent Common Ancestor (TMRCA).

First, let me provide a brief overview ot the methodology behind this study.

For this hypothetical reconstruction, three participants in HAM DNA Project Group #2 were chosen, mainly because they are the only ones in Group02 with 37 marker results to date.

The marker repeat data was translated into "ATGC" format with FT2DNA, then converted into LAMARC format, and then run through LAMARC * which does a Hastings-Metropolis Monte Carlo Markov Chain analysis with a Bayesian analysis (final number of samples was set to 30,000).

Overall, LAMARC found for the three individuals to have a Most Probable Estimate (MPE) for Theta to be:   .0000306
DYS marker values that were determined to be most significant from LAMARC Bayesian analysis: 

DYS390    with a Theta (MPE) of  0.119850
DYS426    with a Theta (MPE) of  0.00012 *        (not yet shown to mutate for Group 2)
DYS449    with a Theta (MPE) of  0.128910
DYS464d  with a Theta (MPE) of  0.000109 *     (not yet shown to mutate for Group 2)
GATA-H4  with a Theta (MPE) of  0.056399

Each of the three participant's data were then modified by +/- one marker around DYS426 and DYS464d  in order to produce hypothetical results from future DNA tests.

Those two hypothetical markers modifications were recorded and labeled below as MACK01-MACK18, and then run through Dean McGee's Y-DNA Comparison Utility.

After conversion to ATGC format and running through LAMARC, calculations were then performed by use of Dean McGee's Y-Comparison Utilty, and the resulting output can be found here.  The output was then converted to graphic format with the KITSCH program within the PHYLIP package.  The branch length view was produced with the MEGA software. Instructions for the graphing are given in the HAM Country Tools area.  

Reconstruction of Ancestral nodes was originally inspired by the work of Charles Kerchner.

             HAM DNA Group 2 time based phylogenetic tree:

HAM DNA Group 2 Reconstruction Phylogenetic tree

Only participants in HAM DNA Group #2 with 37 marker results were included in this study. Currently (April, 2006) that is three individuals, represented by kits 41641, 46118, and 48988.

Participant 41641 has ancestor Joseph HAM from Monroe County, VA.  Participants 41641 and 46118 share a common ancestor within the last 325 years (at 95 % probability), with only one mis-match within 37 markers.  Participant 46118 descends from Levi HAM who was born in South Carolina. 
The most recent ancestor (TMRCA) so for participants 46118 and 48988 with a TMRCA of 400 years (at 95 % probability).  New participant 48988 descends from Obed Jones HAM of South Carolina.

From previous observations of other lines that I have charted, I would have thought that 46118 and 41641 should have a common ancestor from about 1844, since they have branch lengths (or median) tMRCA value of 162.5 years (2006 - 162 = 1844) on my normal phylogenetic charts.**  On the chart above, using simulated data, branch lengths still show the 162.5 years, but the path to the TMRCA has picked up a significant difference.  This difference implies that future DNA data may show that the branch lengths may pick up about an additional 50 some years, by following paths inferred from the Most Probable Estimates suggested by the LAMARC output.

That is, the chart above suggests my best estimate to date of how these three individuals might share a MRCA on a phylogenetic chart of the DNA data.  Whether or not the DNA evidence will develop into what is shown above remains to be seen.  It will be interesting to find out whether or not we can obtain more detail from the DNA results than is currently understood today.

What is obvious in the above graph is that Levi (kit #46118) does not have a clear path to the other kits as represented here.  His path to TMRCA does not make a whole lot of sense, at least back to his grandfather. As shown here, he has three separate branch lengths of 12.38, 13.77, and 3.58 (years).  It is not clear to me what this short section is trying to suggest, but we can take this as a clue for reconstructive purposes.

Remembering that LAMARC gave different values for Theta for the MLE run, as opposed to the Bayesian run, it occurs to me that perhaps DYS456 plays a part in the reconstruction of the ancestral line for #46118.  For the Maximum Likelihood Estimate (MLE) run from LAMARC, it gave:

DYS456    with a Theta (MLE) of  0.000509        (not yet shown to have mutated within Group02)

That is, it may be likely that the line of 46118 mutated around DYS456.  Thinking that thought through, it appears to me to be logical that the most likely MACK values above that should be modifed for DYS456 would be MACK05, MACK06, and MACK14.  Reconstructing the mutation of DYS456 for MACK05, MACK06, and MACK14 for and graphing it out produces the following chart:
HAM Group 2 Spectral Reconstruction #2

I would think it very likely that 48988, 46118, and 41641 would share a common immigrant ancestor.  The DNA evidence tells me that this group has a very interesting area to focus upon.  My reconstruction is not exactly scientifically rigorous, but it does demonstrate two things, a) more DNA evidence should help clear up details regarding relationships, and b) better reconstruction tools should improve our estimates of ancestral nodes.


  Group 2 needs more DNA participants if they want more information.  Lacking that, I am trying to see if I can figure out what the DNA tells us anyway.  Using the best of the latest genetic software that I can find so far, my estimate for conclusions are:

1) Group 2 needs more DNA participants if they want more information.  Carefully chosen from lines that branched off in the mid 1700's and at around 1800 would help the DNA efforts.
2) the DNA tells me that members #41641 (Thomas) and #46118 (Bill) should connect up in about 1807, plus or minus a generation.
3) the DNA tells me that those two guys should connect up to #48988 (John Jeffrey) in about 1742, plus or minus a generation.

And I probably should say four, I could be terribly wrong.  Nobody has published a scientifically accepted method of reconstructing ancestral nodes yet.   If I am wrong about how to do this, then I will just have to try to do better next time.

Of course, what is implied here is that I am interpreting the results from LAMARC to be estimating that DYS426, DYS456, and DYS464d should show mutating markers from future DNA sampling for Group02.  That assumption could be wrong, due to the smal size of Group02.  Another explanation would be that LAMARC is delivering erroneous information because of the lack of data.

What else could be expected from the reconstruction?  Knowing an approximate date that ancestors might connect, would give genealogists an idea of when and perhaps even where they would want to do the normal genealogy research for that connection.

My disclaimer would be that these DNA estimates have nothing to do with good old fashion, solid genealogy research, or more DNA evidence.  I am simply trying to get more information out of the DNA information.  My thoughts are that there is more information in those DNA numbers, and I am trying to figure out what that information might tell us.

Genealogical information about Group 2 can be found here.  The original 37 marker data for Group 2 can be found here.

*  LAMARC is not intended for small populations.

[It is fairly common to see the tMRCA occur on the midpoint of the 95 % probability figure, especially when 37 markers are returned. The observed error is usually plus or minus 1 generation on those charts.  For participants 46118 and 48988, we have branch lengths (or median) tMRCA value of 219.5 years  and separately known ancient ancestors in South Carolina in about 1800.]

