The examples on this page have been designed to demonstrate some of Bonaparte′s capabilities. For most of the cases, an Excel file containing profile data is provided so you can import the case quickly into Bonaparte, only the pedigree remains to be created. If you import example files, please use the Prefix option (located on the import window tab pane) to make sure the names of the individuals you import are unique. The profile data in the Excel files are created by hand. For cases where it is relevant, the results are given so you can compare your results with our results.
The following cases are demonstrated
In all computations below, the following parameters were used
Bonaparte uses the uniform mutation model. Basically this model specifies the probability of an allele mutation during transmission as μ, and thus the probability of no mutation as 1-μ:
| Pμ(α|β) = | { | 1-μ | iff α = β |
| μ/(k-1) | iff α ≠ β |
where k is the total number of alleles.
The λ′s are numbers > 0 that control the frequency of alleles. Bonaparte knows three classes of alleles
Let kc, kr, kn the number of alleles in the common, rare and new alleles class respectively. Let Δ(a) be a classifier for allele a, e.g. it determines the class (either common, rare or new) of allele a. Bonaparte calculates the frequency of allele α as
where nα the count of allele α in the population database, N the total count of all alleles in the database and Λ=kcλc+krλr+knλn a factor to ensure that the new probabilities sum to 1.
The paternity case is depicted in Figure 1. The father is the missing person (MP), both mother and child are typed. The profiles contain ′F′ alleles, the so-called drop-out alleles (see for example [1]). There is also one unidentified individual (UI), the task is to compute the likelihood ratio
where Hp the hypothesis that the UI is related to the MP and Hp the hypothesis that both are unrelated [2], [3]. The data for this computations is in the Microsoft Excel file below, which can be imported directly into Bonaparte. The pedigree should be constructed using the pedigree editor. Once completed the case can be matched.
Bonaparte returns the following results
| Locus | 10log(LR) | Mut |
|---|---|---|
| D3S1358 | 0.493 | 0 |
| VWA | 0.562 | 0 |
| FGA | 0.805 | 0 |
| D8S1179 | 0.689 | 0 |
| D21S11 | -0.058 | 0 |
| D18S51 | 0.559 | 0 |
| D5S818 | 0.376 | 0 |
| D13S317 | 0.000† | 0 |
| D7S820 | 0.165 | 0 |
| D16S539 | 0.607 | 0 |
| TH01 | -0.300 | 0 |
| TPOX | 0.128 | 0 |
| CSF1PO | 0.000† | 0 |
| D2S1338 | 0.724 | 0 |
| D19S433 | 0.915 | 0 |
| Total | 5.665 | 0 |
†For loci D13S317 and CSF1PO Bonaparte actually returns values of order 10-16. These values should be interpreted as 0.000
because they are close to machine precision limits—which is 2-53≈1.11×10-16
on IEEE 754 machines.
Basically these results indicate that it is about 460,000 times more likely that that UI is the MP than that a random person from the same population.
The six sibling case is somewhat different from the other cases described here. The goal
is not so much to calculate the likelihood ratio, but to determine the consistency or coherence
of the pedigree.
This case consists of two untyped individuals who presumably have six children
together (the pedigree is shown in figure 2).
Of these six children, one is the missing person, four are true siblings and one is a half sibling, e.g.
the pedigree as depicted in figure 2 is not the true pedigree. All siblings, excluding the
MP, are typed, so there are five reference profiles.
The idea is that from the genotypes of father and mother four unique child genotypes per locus can be created. The data file contains such profiles for siblings EX-s1 to EX-s4. The profile for sibling EX-s5 is a completely different (e.g. non fitting) profile. The profile of the MP is unknown. After importing the data into a project, the pedigee editor can be started. When the pedigree is drawn, save it and select Pedigree → Validate Pedigree from the pedigree editor menu bar. Bonaparte quickly validates the pedigree (e.g. checks if the pedigree is biologically sound, and if it contains at least one missing person), and counts the number of mutations. In order to do the mutation count a profile of 'F/F' is subsituted for all loci for all MP′s in the pedigree. A popup box displays the number of mutations (in this case four, the profiles consist of only two loci).
The incestuous case is important because it is the simplest case that creates loops in the pedigree. These loopy pedigrees usually slows down calculations or breaks them (for example the Elston-Stewart algorithm [4]). This test case is to demonstrate that Bonaparte can handle pedigrees with loops easily. Consider the case of two untyped individuals who have a daughter. The man also has a sun with his daughter—so he is the father and maternal grandfather of the MP (e.g. the MP shares on average 75% of his DNA with his father). The situation is displayed in Figure 3.
| Locus | 10log(LR) | Mut |
|---|---|---|
| D3S1358 | -0.598 | 0 |
| VWA | -0.064 | 0 |
| FGA | -0.203 | 0 |
| D8S1179 | 0.039 | 0 |
| D21S11 | -0.605 | 0 |
| D18S51 | -2.885 | 1 |
| D5S818 | 0.178 | 0 |
| D13S317 | 5.418×10-5 | 0 |
| D7S820 | -0.185 | 0 |
| D16S539 | -2.699 | 1 |
| TH01 | -0.597 | 0 |
| TPOX | 0.252 | 0 |
| CSF1PO | 0.000† | 0 |
| D2S1338 | 0.222 | 0 |
| D19S433 | -0.441 | 0 |
| Total | -7.586 | 2 |
† Indicates a rounded down value, see the Paternity Case.
Another problem is when the provided family relations are not true, but this cannot be seen from the reference profiles. Consider the pedigrees shown in figure 4a and 4b. The case consists of three typed individuals that serve as reference. Two of the siblings share the same parents but the MP is not a full but a half-sibling.
If the pedigree displayed in Figure 4a were used in finding the MP, then very low likelihood ratio's are expected since about 50% of the MP′s profile does not naturally fit in the given pedigree and has to be explained by mutation. Bonaparte can handle this problem differently. It allows it′s users to specify the confidence in the family relations. An example of this is given in Figure 4c. Note that the numbers just above the missing person symbol read ′80′ and ′100′. This means 80% confidence in the father-child relation (and 100%-80%=20% confidence that another non-typed individual from the same population is the father), 100% confidence (certainty) in mother-child relation.
Per locus LR are not available when using uncertain pedigrees since these are no longer independent in this case, so we only report the total log likelihood ratios here. We calculated these LLR′s (log likelihood ratio′s) for four cases; the wrong pedigree (from Figure 4a), the wrong pedigree (from Figure 4c) but with 90% confidence, the wrong pedigree (also Figure 4c) but with 80% confidence and the actual pedigree (Figure 4b). It must be noted that by using pedigree 4a and looking at the data, only three mutations can be detected. The results are in the table below.
| Pedigree | 10log(LR) | LR |
|---|---|---|
| Pedigree 4a (100%) | -4.776 | 1.676×10-5 |
| Pedigree 4c (90%) | 2.615 | 411.746 |
| Pedigree 4c (80%) | 2.916 | 823.493 |
| Pedigree 4b (actual pedigree) | 3.615 | 4.117×103 |
It clearly shows that the MP won′t be found using pedigree 4a. Especially in cases that return a lot of matches these are usually filtered based on LLR > 0, e.g. the first one is never seen. LR′s of both variants of pedigree 4c are—although not impressive—much better.
A complex problem is given by the pedigree depicted in Figure 5. In this case the MP is married to a woman with whom he shares a grandfather. Again there is also an unidentified individual and the task is to compute the likelihood ratio. This case demonstrates that Bonaparte is capable of handling pedigrees with a large number of untyped founders, large loops and serveral generations.
Bonaparte computed the results displayed in table 4 for the 15 loci in about a second.
| Locus | 10log(LR) | Mut |
|---|---|---|
| D3S1358 | -0.439 | 1 |
| VWA | 0.279 | 0 |
| FGA | -0.521 | 1 |
| D8S1179 | -2.443 | 1 |
| D21S11 | 0.670 | 0 |
| D18S51 | -2.058 | 1 |
| D5S818 | 0.300 | 0 |
| D13S317 | -0.379 | 1 |
| D7S820 | -1.882 | 1 |
| D16S539 | 0.833 | 0 |
| TH01 | -0.159 | 1 |
| TPOX | -0.021 | 0 |
| CSF1PO | 0.000† | 0 |
| D2S1338 | -2.521 | 1 |
| D19S433 | -1.248 | 1 |
| Total | -9.588 | 10 |
† Indicates a rounded down value, see the Paternity Case.
As a final example we consider a set of disaster victim identification (DVI) cases. These cases consists of 25 to 1,000 UI′s and an equal number of pedigrees. The pedigrees are of basic father-mother-child format, where the child is the MP. All profiles consist of 10 loci.
The purpose of this example is to demonstrate Bonaparte′s system performance—the time it takes to compute all possible matches in a case. The datasets consist of n UI′s and n pedigrees. The profiles have been independently sampled from the Caucasian population statistics. The number of pairs and therefore the number of matches in each case is n2. The number of individuals in each case is listed in the table below.
| Case | #UI′s | #Pedigree′s | #Matches |
|---|---|---|---|
| 1 | 25 | 25 | 625 |
| 2 | 50 | 50 | 2,500 |
| 3 | 75 | 75 | 5,625 |
| 4 | 100 | 100 | 10,000 |
| 5 | 150 | 150 | 22,500 |
| 6 | 200 | 200 | 40,000 |
| 7 | 300 | 300 | 90,000 |
| 8 | 400 | 400 | 160,000 |
| 9 | 500 | 500 | 250,000 |
| 10 | 1,000 | 1,000 | 1,000,000 |
The processing time can be approximated by
for n≥25.
The α term is linear in the number of matches. It expresses the
time it takes to calculate the match plus the time it takes to write it to the database. For the match type
used here (simple pedigrees) this takes approximately 3.7 ms.
The constant β term is the system startup time; for example the time
the database needs to parse SQL statements, allocate memory, start processes, etc.
N.B.: For this case we do not provide input data, since this would require you to construct all pedigrees by hand. It is not possible to import pedigree data via the user interface, only via the system XML interface.
Direct matching, e.g. the comparison of two profiles to see if they originate from the same individual,
is also an import aspect of a DVI system. This kind of matching is used to collapse profile (group profiles
belonging to the same UI together), or to test for contamination (by matching against a list of
DNA profiles from people working in the DNA laboratory for instance).
Bonaparte also calculates a likelihood ratio for direct matches. This way population statistics information is incorporated into the result. Let g1={a,b},g2={c,d} be two genotypes for a certain loci that we want to compare. The direct match likelihood for this is given by
The conditional term is given by
where δ the Kronecker delta, and φ«1 a penalty factor.
The direct matches for 1,000 UI′s against the same 1,000 UI′s generated 499,500 results. Bonaparte calculated these matches in 268 seconds (for 10 loci profiles), so it takes about 5.4×10-5 seconds per locus.
The cases presented here demonstrate that Bonaparte can be used for a variety of identification tasks; from relatively simple paternity cases to identification cases spanning multiple generations and loopy pedigrees. A very important aspect for a modern DVI system is performance.
The computation time was investigated for four of the cases which contain single pedigrees. For each case the pedigee was matched against 100 UI′s. This yielded 100 matches. The time it took to obtain those 100 results was divided by 100 to obtain an estimate of the computation time for a single match. All of these tests used the single threaded computation algorithm. However, under normal operations, Bonaparte uses as many processor cores as available and distributes the matching processes over these cores (which reduces the processing time by the number of cores).
| Case | Time (s) |
|---|---|
| Paternity | ‡ |
| Incestuous | ‡ |
| False Pedigree | 0.15 |
| Complex Inbred Pedigree | 1.33 |
‡ below measurable threshold, processing times «10-2 s.
In section 7 the performance in DVI cases was investigated using the manual matching mode. This means that all results are always computed
and stored. In contrast; Bonaparte also has an automatic matching mode. Using this mode, UI-MP pairs are first screened. If
too many mutations are detected, the computation is skipped. If the match is calculated, but the outcome is too low (for example
a log likelihood ratio < 0), the match is not stored. This matching mode can speed things up by several factors.
The DVI performance results cleary show that computation time is of the order n2,
the number of UI-MP pairs (as expected) and that the system does not suffer from performance issues in very large cases.
For DVI cases in the range 1-10,000 UI′s and the same number of pedigrees
(e.g. up to 100 million matches) a single server running Bonaparte will suffice. The computation time for cases of approximately
10,000 UI's and 10,000 MP's is in the order of of days.
The Bonaparte system used in the tests is a dual Intel Xeon 5520 processor server with 16GB DDR3 system memory.
The server runs the 64 bit variant of FreeBSD UNIX as operating
system and hosts both Bonaparte and the database management system.