some cases to demonstrate the capabilities of the bonaparte dvi system

W. Burgers, W. Wiegerinck, version 1.0, June 2010

1. Introduction

The examples on this page have been designed to demonstrate some of Bonaparte′s capabilities. For most of the cases, an Excel file containing profile data is provided so you can import the case quickly into Bonaparte, only the pedigree remains to be created. If you import example files, please use the Prefix option (located on the import window tab pane) to make sure the names of the individuals you import are unique. The profile data in the Excel files is created by hand. For cases where it is relevant, the results are given so you can compare your results with our results.

The following cases are demonstrated


1.1 Parameters

In all computations below, the following parameters were used

  • Mutation rate μ = 5.0×10-3
  • Mutation model = Uniform
  • λc = 1
  • λn = 1
  • λr = 1
  • Population statistics = Caucasian

1.1.1 Mutation Models

Bonaparte uses the uniform mutation model. Basically this model specifies the probability of an allele mutation during transmission as μ, and thus the probability of no mutation as 1-μ:

Pμ(α|β) = { 1-μ    iff α = β
μ/(k-1)    iff α ≠ β

where k is the total number of alleles.

1.1.2 Allele Frequencies

The λ′s are numbers > 0 that control the frequency of alleles. Bonaparte knows three classes of alleles

Common alleles
Alleles present in the population statistics with a count ≠ 0
Rare alleles
Alleles present in the population statistics with a count = 0
New alleles
Alleles not present in the population statistics at all

Let kc, kr, kn the number of alleles in the common, rare and new alleles class respectively. Let Δ(a) be a classifier for allele a, e.g. it determines the class (either common, rare or new) of allele a. Bonaparte calculates the frequency of allele α as

pα=(nαΔ(α))/(N+Λ)

where nα the count of allele α in the population database, N the total count of all alleles in the database and Λ=kcλc+krλr+knλn a factor to ensure that the new probabilities sum to 1.

2. Paternity Case

2.1 Case Description and Data

The paternity case is depicted in Figure 1. The father is the missing person (MP), both mother and child are typed. The profiles contain ′F′ alleles, the so-called drop-out alleles (see for example [1]). There is also one unidentified individual (UI), the task is to compute the likelihood ratio

LR=P(E|Hp)/P(E|Hd)

where Hp the hypothesis that the UI is related to the MP and Hp the hypothesis that both are unrelated [2], [3]. The data for this computations is in the Microsoft Excel file below, which can be imported directly into Bonaparte. The pedigree should be constructed using the pedigree editor. Once completed the case can be matched.

Figure 1: Pedigree for example 4.1.


Profile data

2.2 Results

Bonaparte returns the following results

Locus10log(LR)Mut
D3S1358 0.4930
VWA 0.5620
FGA 0.8050
D8S1179 0.6890
D21S11 -0.0580
D18S51 0.5591
D5S818 0.3760
D13S317 0.000†0
D7S820 0.1650
D16S539 0.6070
TH01 -0.3001
TPOX 0.1280
CSF1PO 0.000†0
D2S1338 0.7240
D19S433 0.9150
Total5.6651
Table 1: Results for the Paternity Case.

†For loci D13S317 and CSF1PO Bonaparte actually returns values of order 10-16. These values should be interpreted as 0.000 because they are close to machine precision limits—which is 2-53≈1.11×10-16 on IEEE 754 machines.

Basically these results indicate that it is about 460,000 times more likely that that UI is the MP than that a random person from the same population. The system also detected two loci where the profiles did not match. This is explained by introducing two mutations, one for D18S51 and one for TH01.

3. Six Siblings Case

3.1 Case Desciption and Data

The six sibling case is somewhat different from the other cases described here. The goal is not so much to calculate the likelihood ratio, but to determine the consistency or coherence of the pedigree. This case consists of two untyped individuals who presumably have six children together (the pedigree is shown in figure 2). Of these six children, one is the missing person, four are true siblings and one is a half sibling, e.g. the pedigree as depicted in figure 2 is not the true pedigree. All siblings, excluding the MP, are typed, so there are five reference profiles.

The idea is that from the genotypes of father and mother four unique child genotypes per locus can be created. The data file contains such profiles for siblings EX-s1 to EX-s4. The profile for sibling EX-s5 is a completely different (e.g. non fitting) profile. The profile of the MP is unknown. After importing the data into a project, the pedigee editor can be started. When the pedigree is drawn, save it and select Pedigree → Validate Pedigree from the pedigree editor menu bar. Bonaparte quickly validates the pedigree (e.g. checks if the pedigree is biologically sound, and if it contains at least one missing person), and counts the number of mutations. In order to do the mutation count a profile of 'F/F' is subsituted for all loci for all MP′s in the pedigree. A popup box displays the number of mutations (in this case four, the profiles consist of only two loci).

Figure 2: Pedigree for the Six Siblings case.

Profile data

4. Incestuous Case

4.1 Case Description and Data

The incestuous case is important because it is the simplest case that creates loops in the pedigree. These loopy pedigrees usually slows down calculations or breaks them (for example the Elston-Stewart algorithm [4]). This test case is to demonstrate that Bonaparte can handle pedigrees with loops easily. Consider the case of two untyped individuals who have a daughter. The man also has a sun with his daughter—so he is the father and maternal grandfather of the MP (e.g. the MP shares on average 75% of his DNA with his father). The situation is displayed in Figure 3.

Figure 3: Pedigree for example 4.2.

Profile data

4.2 Results

Locus10log(LR)Mut
D3S1358-0.5981
VWA -0.0640
FGA -0.2030
D8S1179 0.0390
D21S11 -0.6050
D18S51 -2.8851
D5S818 0.1780
D13S3175.418×10-50
D7S820 -0.1850
D16S539-2.6991
TH01 -0.5971
TPOX 0.2520
CSF1PO 0.000†0
D2S1338 0.2220
D19S433-0.4410
Total-7.5864
Table 2: Results for the Incestuous Case.

† Indicates a rounded down value, see the Paternity Case.

5. False Pedigree

5.1 Case Description and Data

Another problem is when the provided family relations are not true, but this cannot be seen from the reference profiles. Consider the pedigrees shown in figure 4a and 4b. The case consists of three typed individuals that serve as reference. Two of the siblings share the same parents but the MP is not a full but a half-sibling.

Figure 4 a/b: Pedigree for example ′False Pedigree′. a) Left hand side the family relations as provided; b) right hand side the actual relations (the true pedigree).

If the pedigree displayed in Figure 4a were used in finding the MP, then very low likelihood ratio's are expected since about 50% of the MP′s profile does not naturally fit in the given pedigree and has to be explained by mutation. Bonaparte can handle this problem differently. It allows it′s users to specify the confidence in the family relations. An example of this is given in Figure 4c. Note that the numbers just above the missing person symbol read ′80′ and ′100′. This means 80% confidence in the father-child relation (and 100%-80%=20% confidence that another non-typed individual from the same population is the father), 100% confidence (certainty) in mother-child relation.

Figure 4c: Pedigree with adjusted confidence in family relations.


Profile data

5.2 Results

Per locus LR are not available when using uncertain pedigrees since these are no longer independent in this case, so we only report the total log likelihood ratios here. We calculated these LLR′s (log likelihood ratio′s) for four cases; the wrong pedigree (from Figure 4a), the wrong pedigree (from Figure 4c) but with 90% confidence, the wrong pedigree (also Figure 4c) but with 80% confidence and the actual pedigree (Figure 4b). It must be noted that by using pedigree 4a and looking at the data, only three mutations can be detected. The results are in the table below.

Pedigree10log(LR)LR
Pedigree 4a (100%) -4.776 1.676×10-5
Pedigree 4c (90%) 2.615411.746
Pedigree 4c (80%) 2.916823.493
Pedigree 4b (actual pedigree) 3.6154.117×103
Table 3: Likelihood ratios for the False Pedigree case.

It clearly shows that the MP won′t be found using pedigree 4a. Especially in cases that return a lot of matches these are usually filtered based on LLR > 0, e.g. the first one is never seen. LR′s of both variants of pedigree 4c are—although not impressive—much better.

6. Complex Inbred Pedigree

6.1 Case Description

A complex problem is given by the pedigree depicted in Figure 5. In this case the MP is married to a woman with whom he shares a grandfather. Again there is also an unidentified individual and the task is to compute the likelihood ratio. This case demonstrates that Bonaparte is capable of handling pedigrees with a large number of untyped founders, large loops and serveral generations.

Figure 5: Pedigree for the Complex Inbred Pedigree case.

Profile data

6.2 Results

Bonaparte computed the results displayed in table 4 for the 15 loci in about a second.

Locus10log(LR)Mut
D3S1358-0.4391
VWA 0.2790
FGA -0.5211
D8S1179-2.4431
D21S11 0.6700
D18S51 -2.0581
D5S818 0.3000
D13S317-0.3791
D7S820 -1.8821
D16S539 0.8330
TH01 -0.1591
TPOX -0.0210
CSF1PO 0.000†0
D2S1338-2.5211
D19S433-1.2481
Total-9.58810
Table 4: Results for the Complex Inbred Pedigree Case.

† Indicates a rounded down value, see the Paternity Case.

7. DVI Cases

As a final example we consider a set of disaster victim identification (DVI) cases. These cases consists of 25 to 1,000 UI′s and an equal number of pedigrees. The pedigrees are of basic father-mother-child format, where the child is the MP. All profiles consist of 10 loci.

The purpose of this example is to demonstrate Bonaparte′s system performance—the time it takes to compute all possible matches in a case. The datasets consist of n UI′s and n pedigrees. The profiles have been independently sampled from the Caucasian population statistics. The number of pairs and therefore the number of matches in each case is n2. The number of individuals in each case is listed in the table below.

Case #UI′s #Pedigree′s #Matches
12525625
250502,500
375755,625
410010010,000
515015022,500
620020040,000
730030090,000
8400400160,000
9500500250,000
101,0001,0001,000,000
Table 5: Contents of the different cases and the number of matches.
Figure 6: DVI case processing times. Horizontal axis: the number of UI′s c.q. pedigrees n, the number of resulting matches is n2. Vertical axis: the processing time in seconds for the whole case.

The processing time can be approximated by

t(n)=αn2

for n≥25. The α term is linear in the number of matches. It expresses the time it takes to calculate the match plus the time it takes to write it to the database. For the match type used here (simple pedigrees) this takes approximately 3.7 ms. The constant β term is the system startup time; for example the time the database needs to parse SQL statements, allocate memory, start processes, etc.

N.B.: For this case we do not provide input data, since this would require you to construct all pedigrees by hand. It is not possible to import pedigree data via the user interface, only via the system XML interface.

8. Direct Matching

Direct matching, e.g. the comparison of two profiles to see if they originate from the same individual, is also an import aspect of a DVI system. This kind of matching is used to collapse profile (group profiles belonging to the same UI together), or to test for contamination (by matching against a list of DNA profiles from people working in the DNA laboratory for instance).

8.1 Direct Match LR

Bonaparte also calculates a likelihood ratio for direct matches. This way population statistics information is incorporated into the result. Let g1={a,b},g2={c,d} be two genotypes for a certain loci that we want to compare. The direct match likelihood for this is given by

LR = P(g1|g2)P(g2)/(P(a)P(b)P(c)P(d)).

The conditional term is given by

P(g1|g2) = P(g1)φ + (1-φ)δg1,g2,

where δ the Kronecker delta, and φ«1 a penalty factor.

The direct matches for 1,000 UI′s against the same 1,000 UI′s generated 499,500 results. Bonaparte calculated these matches in 268 seconds (for 10 loci profiles), so it takes about 5.4×10-5 seconds per locus.

9. Discussion

The cases presented here demonstrate that Bonaparte can be used for a variety of identification tasks; from relatively simple paternity cases to identification cases spanning multiple generations and loopy pedigrees. A very important aspect for a modern DVI system is performance.

9.1 Performance

The computation time was investigated for four of the cases which contain single pedigrees. For each case the pedigee was matched against 100 UI′s. This yielded 100 matches. The time it took to obtain those 100 results was divided by 100 to obtain an estimate of the computation time for a single match. All of these tests used the single threaded computation algorithm. However, under normal operations, Bonaparte uses as many processor cores as available and distributes the matching processes over these cores (which reduces the processing time by the number of cores).

CaseTime (s)
Paternity
Incestuous
False Pedigree 0.15
Complex Inbred Pedigree 1.33
Table 6: Processing times for four of the cases.

‡ below measurable threshold, processing times «10-2 s.

In section 7 the performance in DVI cases was investigated using the manual matching mode. This means that all results are always computed and stored. In contrast; Bonaparte also has an automatic matching mode. Using this mode, UI-MP pairs are first screened. If too many mutations are detected, the computation is skipped. If the match is calculated, but the outcome is too low (for example a log likelihood ratio < 0), the match is not stored. This matching mode can speed things up by several factors.

The DVI performance results cleary show that computation time is of the order n2, the number of UI-MP pairs (as expected) and that the system does not suffer from performance issues in very large cases. For DVI cases in the range 1-10,000 UI′s and the same number of pedigrees (e.g. up to 100 million matches) a single server running Bonaparte will suffice. The computation time for cases of approximately 10,000 UI's and 10,000 MP's is in the order of of days.

The Bonaparte system used in the tests is a dual Intel Xeon 5520 processor server with 16GB DDR3 system memory. The server runs the 64 bit variant of FreeBSD UNIX as operating system and hosts both Bonaparte and the database management system.

References

  1. Butler, John M., Forensic DNA Typing, 2nd Edition, Elsevier Academic Press, 2005
  2. Slooten, K., Validation of DVI Software ′Bonaparte′ by Computation of Pedigree Likelihood Ratios, to be published
  3. Balding, D.J., Weight-of-evidence for Forensic DNA Profiles, Wiley, 2005
  4. Elston, R.C., Stewart, J., A General Model for the Genetic Analysis of Pedigree Data, Hum Hered 1971;21:523-542