In 2013 AncestryDNA updated their Ethnicity Estimates to include a detailed breakdown of West African DNA. Pioneering when compared with other DNA testing companies. Soon afterwards I started collecting AncestryDNA results in an online spreadsheet in order to conduct a survey of the African regional scores being reported by AncestryDNA. At first only for people of the Afro-Diaspora and later on also among Africans. My main research goal has always been to establish how much the AncestryDNA results on an aggregated group level can already (despite limitations of sample size and other shortcomings) be correlated with whatever is known about the documented regional African roots for each nationality. As well as to improve correct interpretation of personal results.
In May 2016 I published my first summary of my Afro-Diasporan survey findings based on 707 results for 7 nationalities (see this blog post). My survey has been ongoing ever since. Right now an update of AncestryDNA’s Ethnicity Estimates seems even more imminent than it was in 2016 (when it was canceled in the beta phase). So that’s why I will yet again provide a “final” overview of my survey findings 😉 . Mainly based on 1,264 results for people from 8 nationalities. Although the total number of results and nationalities in my survey is even greater.
A major addition is the inclusion of 45 Brazilian results. Their predominant Central African profiles (as measured by both “Southeastern Bantu” and “Cameroon/Congo”) are quite striking when compared with my other sample groups. This outcome reinforces how the African breakdown on AncestryDNA has been reasonably in alignment with historically documented origins of the Afro-Diaspora. Unlike any other DNA testing platform I’m aware of and therefore not to be lightly dismissed despite inherent imperfections.
In the second part of this blogseries I will also provide an overview of the non-African regions (Amerindian, Asian, Pacific etc.) being reported for Afro-Diasporans. As well as a more detailed analysis of their European breakdown.
“This frequency of regions being ranked #1 (regions with the highest amount in the African breakdown) is perhaps the best indicator of which distinct African lineages may have been preserved the most among my sample groups.”
Chart 1 (click to enlarge)
” This chart features an additional breakdown of my own making into 3 greater African zones: “Upper Guinea”, “Lower Guinea” and “Central Africa”. Bantu speaking ancestry from Southeast Africa also to be included in “Central Africa”. […] I find this distinction useful because it allows certain regional patterns to show up more clearly and it conforms with common nomenclature in slave trade literature. For ethnolinguistical and historical maps from these 3 main regions of provenance see:
Chart 2 (click to enlarge)
Compared with my previous survey findings (see this blog post) from two years ago my main outcomes currently (as shown in chart 1 & 2) are actually quite consistent, safe for some small variance. Given the substantial increase in sample size (especially for Haiti, Jamaica and Cape Verde) this would seem to signal that these regional patterns are quite robust already. Corroborating my previous findings. For my African American sample group I have utilized two separate but partially overlapping data-sets. One consisting of 200 unscaled AncestryDNA results (incl. non-African scores; 165 results are entirely new). As well as my previous dataset (n=350) which focused only on the African breakdown (scaled to 100%). In chart 1 the sample-size n=515 is consisting of 165 results from the new data-set and 350 results from my older data-set. In chart 2 I maintained my previous findings from the data-set, n=350.
A major addition to my previous report is the inclusion of 45 Brazilian samples. In 2016 I had lamented that I wasn’t able to compare with Brazilian results (due to the lack of Brazilian testers at that time). As I already foresaw how insightful such a comparison would have been. Their predominantly Central African profiles (as measured by both “Southeastern Bantu” and “Cameroon/Congo”) are quite striking when contrasted with my other sample groups. In accordance with historically documented origins from especially Angola but also Mozambique and Congo. Especially given that the majority of my Brazilian survey participants is from southeast Brazil where such ancestral connections are even more pronounced (people from Bahia might show more Bight of Benin origins and profiles from Amazonia/Maranhão could show a tendency towards a greater Upper Guinean imprint according to documented slave trade records: see this chart).
Very tellingly my other sample groups are not showing any predominance (>50%) of Central African origins (on average). However when I rank them from highest to lowest Central African contribution historically documented strong connections to Angola and Congo for resp. Mexico and Haiti but also the USA are again confirmed. See also :
Chart 3 (click to enlarge)
Chart 4 (click to enlarge)
One of the most fascinating aspects of my survey findings is that socalled substructure is now also starting to slowly be revealed. Genetic substructure is basically referring to subgroups within greater populations. To be defined along geographical, social, cultural, or even “racial” lines. Despite commonalities various localized factors may still have have caused differentiation between various subgroups within a given population. In particular pointing towards a distinctive mix of African regional origins. Showing overlap to be sure but still recognizable due to deviating proportions.
In 2016 I already pointed out the limitations of my survey in regards to how they might correlate with any fictional national group averages. In particular I mentioned the various cases of possible sampling bias (see the introduction section of this blog page). Even when I still fully acknowledge these limitations at the same time I do feel more confident about the representativeness of my survey findings. Given not only the general increase in sample size but also because of a greater variety in backgrounds from each nationality. Covering a wider span of geographical locations within one particular country as well as including people from various social & “racial” backgrounds (i.e. black Hispanics). Probably thanks to the increasing popularity of DNA testing. Even when the distinctive migration patterns to the USA are still impacting the general composition for many of my survey groups.
Unlike commonly assumed you do not need to sample entire populations to obtain informational value with wider implications. Naturally greater sample size does (usually) help matters. Right now for each of my 8 sample groups I have collected nearly 100 results. Going up to n=515 for African Americans. Only exception being Brazil. However I suspect that even their sample size (n=45) is pretty robust already and will roughly correspond to what is to be found within the Brazilian genepool (especially the southeast). See also:
- Representative Samples: Does Sample Size Really Matter? (SurveyGizmo)
Returning to the substructure theme I am very pleased to have collected an admittedly very minimal but still insightful number of samples seemingly showcasing substructure for the Dominican Republic as well as for Haiti (see charts 3 & 4). In upcoming blog posts I will discuss these preliminary outcomes in more detail. However I can already say that for Haiti the relevant context would seem to be the known differences in slave trade patterns between the North, West/Center and South (see this link). Summarizing:
- Northern Haiti might possibly have the greatest degree of Central African origins (as measured especially by “Cameroon/Congo”)
- Western Haiti might possibly have the greatest degree of Bight of Benin origins (as measured especially by “Benin/Togo”)
- Southern Haiti might possibly have the greatest degree of Bight of Biafra origins (as measured especially by “Nigeria”)
For Dominicans the main underlying cause might be relative endogamy after initial admixture. Taking place mostly in the early colonial period (1500’s/1600’s) when the nucleus of a (tri-racially) mixed Dominican population was being formed. Ensuring that certain regional African origins show up more pronounced. Because additional African admixture was not occurring afterwards (or to a much lesser degree) for certain socially/racially defined population segments. For Dominicans with Africa <25% an Upper Guinean founding effect (as measured by “Senegal” and “Mali”) seems to be apparent especially. And to a lesser degree possibly also an Angolan founding effect (as measured by “Southeast Bantu”). While for Dominicans with Africa >50% it seems reasonable to assume that they may have more diverse but also more recent African origins, on average. Mostly reflecting regions of provenance from the 1700’s (as measured by especially “Benin/Togo” but also “Nigeria” and “Cameroon/Congo”) rather than the 1500’s/1600’s. See also section 2 of this blog post.
In previous blog posts published in 2015 I have already demonstrated the likelihood of similar substructure for African Americans along state origins (South Carolina, Virginia & Louisiana; see section 5 of this blog post). And also for Puerto Ricans according to either low or high African admixture rates. Very similar to my Dominican findings in fact (see section 2 of this blog post). It will be instructive to uncover similar cases of substructure for my other survey groups. But due to lack of needed sample variation I have not been able to establish this yet. Although I am already picking up on distinctive regional patterns for my Brazilian samples from either the southeast or northeast of that country. Also for Cape Verdeans a (very subtle) subdivision along island lines might exist. Even when arguably they remain my most homogeneous Afro-Diasporan sample group.
“In order to research the least diluted regional lineages across the Diaspora I am focusing however on the maximum scores for each region. Results which feature one region in extra pronounced degree.“
Chart 5 (click to enlarge)
Chart 6 (click to enlarge)
Chart 7 (click to enlarge)
Determining the largest regional components within the African breakdown, on average, for each of my sample groups has been a primary research effort during my AncestryDNA survey. Afterall these most prominent regional scores can be considered to have the highest reliability at this stage and might also be confirmed independently by historical sources. Establishing where each African region is relatively more pronounced or instead more subdued might therefore provide insightful clues into localized ethnogenesis across the Diaspora.
In the charts shown directly above I am singling out the individual maximum scores I have encountered in my survey. Therefore not representing the most typical profiles. A much greater variation and a usually more regionally balanced outcome can be observed if you closely examine other individual results within my survey. Still I do believe that these results are meaningful in illustrating the most characteristic top regions for each of my survey groups (even when my survey groups might have more than one characteristic top region). These maximum scores are mostly in line with my other findings such as frequency of top regions (see chart 1), group averages (see chart 2), as well as slave trade statistics (see this page).
Another feature of chart 5 is the frequency of “100% African” profiles among my survey groups. Non-African admixture is both widespread as well as greatly variable across the Afro-Diaspora. The popularity of DNA testing may have conveyed the impression that Afro Diasporans of “pure” African lineage are some kind of “unicorns”. But this is clearly a misconception. Within my sample groups especially Haitians and Jamaicans show a noticeable (even if still minor) proportion of people who are “100% African”. However also among African Americans I did manage to find several people without any non-African admixture. Contrary to some popular media reports (see this link for example). Ten profiles out of 515 results turned out to be 100% African, genetically speaking. Which makes about 2% of my sample group (10/515). For all of these people I have verified to the best of my capabilities that they were indeed multi-generationally African American on all lines, without any West Indian or African parents/grandparents.
These statistics are still quite preliminary of course. And it may be assumed for various reasons that people with such “100% African” profiles will generally be under-represented among AncestryDNA testers. However I suspect that these findings are already suggesting wider tendencies. As an extra note I should mention that also among 21 Guyanese AncestryDNA results in my survey I came across 4 profiles which showed a 100% African score. Therefore other parts of the Afro-Diaspora might show equally pronounced retention of African DNA as Haiti and Jamaica have been showing in my survey.
“Mere estimates” or as comprehensive as it can get?
Chart 8 (click to enlarge)
Chart 8 is showing a comprehensive overview of my survey findings among 34 nationalities and based on 1,377 results. I have also included a few African sample groups to provide a benchmark so to speak. This might be helpful to learn what to expect more or less and get a basic idea of how my sample groups from across the Afro-Diaspora fit in the bigger scheme. The range of my survey has become quite extensive over the years. Despite the limited sample size for most of the separate nationalities this wide array does still seem to contribute to the robustness and coherency of my overall data set. Although I wish I could have included more results from especially Guinea and Mali to make more sense of the “Mali” region. As well as from Angola and Mozambique to arrive at a better understanding of the socalled “Southeast Bantu” region. I also find it regrettable that the number of samples from Cuba and the Dutch and French Caribbean has been rather minimal sofar. I hope to eventually improve the coverage of these significant parts of the Trans-Atlantic Afro-Diaspora. As well as extend my survey further into the Indian Ocean Diaspora.
As I have been repeating continuously from the start of my survey the labeling of AncestryDNA’s regions is not intended to be taken as gospel! Rather consider the AncestryDNA regions to be proxies of ancestral components which have become more frequent in certain loosely defined areas but still show a wide dispersal in neighbouring areas as well due to ancient migrations and overlapping genetics. See also these blog posts:
- “Cameroon/Congo” = moreso Angola/Congo for Diasporans?
- “Benin/Togo” also describes DNA from Ghana & Nigeria
- “Ivory Coast/Ghana” also describes Liberian DNA
Admixture analysis such as provided by AncestryDNA is often drawing criticism for not being in line with unrealistic expectations. Specifically in regards to how ancestral categories should conform exactly to a person’s family tree and all the known ethnic lineage it may contain. Disregarding how such over-specified information is simply not to be found in our DNA. Atleast not given the current state of knowledge. There are many other shortcomings to keep in consideration as well. However I myself do strongly believe that AncestryDNA’s Ethnicity Estimates can still be of great informational value as long as you know how to correctly interpret them; educate yourself about inherent restrictions; as well as combine with other research findings.
Instead of taking a generalizing and dismissive stance I would argue that each aspect of admixture analysis should be judged on its own strenghts and weaknesses. Precisely because of its still unrivaled framework for describing both West African and Central African DNA I find AncestryDNA to be more insightful than anything presently on offer by other commercial DNA testing companies as well as any third-party analysis such as available on Ged-Match, DNA Land etc.. My assessment is based on the more than 300 AncestryDNA results of native Africans I have seen by now. These were usually in alignment (broadly) with their verifiable background (see this overview). Also my survey of Afro-Diaspora results has largely been a confirmation of historically documented African origins for each nationality. In this current review of my findings this has been demonstrated most clearly perhaps by the inclusion of 45 Brazilian results and their predominantly Central African profiles. Again such potentially profound information is not something to carelessly brush aside when wanting to Trace African Roots!
***(click to enlarge)
It is sometimes said that your DNA results are only as good as the next update. So it’s best not to get too attached to them 😉 Given scientific advancements and a greater number of relevant African reference samples hopefully a greater degree of accuracy may be obtained in the near future. But naturally no guarantees are given that this will indeed be the case. As shown above one major change in regards to AncestryDNA’s African breakdown might be the combining of the “Cameroon/Congo” and “Southeastern Bantu” regions into one single region. With Eastern Africa appearing as a new region. At the moment of writing this blog post there is still quite some uncertainty if AncestryDNA’s intended update will indeed be implemented or remain stuck in beta phase (as happened in 2016). I will therefore refrain from any detailed judgement for now. For more details about the update:
- Updated “ethnicity” estimates at AncestryDNA (Cruwys News)
- AncestryDNA New 150+ Regions (Youtube)
- Coming Down the Ethnicity Admixture Pike (Roots & Recombinant DNA, 2016)
My survey has been based on the current version of AncestryDNA’s Ethnicity Estimates containing 9 African regions. It remains to be seen how well my present findings will correspond with any newly calculated AncestryDNA results. Will they be rendered completely obsolete or may they still contain lasting insights about the approximate composition of African regional origins for my survey groups? Judging from several updated results I have seen already I do suspect that there will be some regional shifting. Seemingly not per se consistent with previous results although usually involving neighbouring regions with overlapping genetics.
To make more sense of my survey findings I have been using an additional more basic regional framework based on Upper Guinea, Lower Guinea and Central Africa (see chart 2). I believe it may prove its added value once more when trying to determine how any newly updated African breakdown compares with the current one. Either way I intend to once again contrast Ancestry’s updated results with historical plausibility as well as the results of actual Africans. As I aim for combining insights from various fields. Always looking for correct interpretation. Critical but also staying open-minded and careful not to be dismissive when informational value can still be obtained
Links to source data & methodology
The survey findings featured in this blog post merely represent my personal attempt at identifying generalized, preliminary and indicative patterns on a group level inspite of individual variation. Everyone has a unique family tree of course first of all. For a deeper understanding of your personal results my advice therefore is to perform follow-up research (DNA matches, genealogy, relevant historical context etc.) and aim for complementarity of your findings (see also this blog post). I would like to thank again all my survey participants for sharing their results with me. I am truly grateful for it!
For a detailed discussion of my methodology & research read these blog sections:
- Afro-Diaspora AncestryDNA results: A Comparison (Tracing African Roots, 2016)
- Survey of AncestryDNA results for Africans & Afro-Diasporans (main overview)
For the direct links to the source data follow these links (all of which are tabs of the same online spreadsheet; calculations can be checked by verifying the formula’s):
- African breakdown for 34 nationalities (n=1,377)
- African AncestryDNA results
- African American AncestryDNA results (n=350)
- African American AncestryDNA results (n=200, incl. non African scores)
- Brazilian AncestryDNA results
- Cape Verdean AncestryDNA results
- Dominican AncestryDNA results
- Haitian AncestryDNA results
- Jamaican AncestryDNA results
- Mexican AncestryDNA results
- Puerto Rican AncestryDNA results
- Maximum Scores for African AncestryDNA Regions across the Diaspora
- Regional diversity/uniformity of African breakdown