Andrew Collins, Genetic epidemiology and Bioinformatics Research Group, University of Southampton. LDMAP is a program for constructing linkage disequilibrium (LD) maps. LD maps are scaled in linkage disequilibrium units (LDUs) and show (when plotted against the physical map) a pattern of plateaus (reflecting regions of low haplotype diversity or 'LD blocks') and steps (which represent recombination hot-spots or recombination events). The LD map has a number of compelling advantages for disease mapping over both the sequence-based physical map and the, often low resolution, genetic linkage map. Unlike the physical map SNP markers should be spaced evenly on the LD map and one LD unit is equivalent to 1 swept radius (representing the extent of useful LD, see below). LD maps can be constructed automatically from both haplotype and genotype (diplotype) data. It is possible that different populations show the same PATTERN of LD even if the overall LENGTH of the LD map differs (reflecting different population histories). This appears to be true because recombination and the time over which recombination events have taken place are the major determinants of the LDU pattern. Recent evidence suggests a close relationship between the pattern of LD and recombination. The LD map may be useful to increase the resolution of the linkage map which will permit more detailed studies of sequence determinants of recombination pattern.
References.
LDMAP has been written to extend a body of work in this general area going back several years. Readers should refer to a number of papers for more detailed explanation of key terms. The main reference for LDMAP is Maniatis et al (2002) given in the following list which includes other relevant papers: references.
The software.
LDMAP is written in C for a SUN workstation, some modification may be required for use under linux. This program is under active development and the program supplied here may not be the latest version. Users should contact Andy Collins at arc@soton.ac.uk for updated information. The files supplied may be extracted using:
uncompress ldmap.tar.Z
tar xvf ldmap.tar
The executable program is controlled by a simple shell script called ldmap. If the command './ldmap' is given the script will run and give the following output: script. I will review, with examples, the process of constructing an intermediate file from both haplotype and diplotype data (options 1 and 2) and using this file for constructing LD maps (options 3 and 4).
Haplotypes and diplotypes with SNPs - construct intermediate file (OPTIONS 1 and 2).
DATA FILE
An example data file with haplotypes is given here: CF data and an example file with diplotypes is: diplotype data. I will describe the haplotype example in the most detail but the comments largely apply to both types of data. All LDMAP haplotype/genotype data files have three sections delimited by dashed lines. The first section usually contains a reference, as here to the Kerem et al., Science paper. The second section defines the columns given in the data section. There are 23 marker loci (R1...R23) and a count N defining the number of haplotypes of that particular type. Each line of the third section then contains the haplotype data for either an individual or the number of examples of that haplotype found in the sample. Column descriptions are given with brackets and each locus name is followed by a comma, column position in the file for the allele(s) (eg. 1-4) and a further comma followed by a location. Typically the location is given in kilobases (kb). Note that these are disease case and control haplotypes but we are ignoring phenotype for this demonstration of LD map construction. Alleles are coded 1, 2 and missing data is indicated by blank or zero. Note that column locations must be specified in the order they appear in the file but map location need not be.
JOB FILE
The job file has the function of defining the models to be tested by the program. A very simple example is given here which estimates parameters M and E in the Malecot model. Job files contain a PA 'control' followed by 1 or more IT controls. PA stands for parameter and it defines parameter starting values in the model. The IT control defines which parameters are to be iterated. The three parameters describe the pattern of linkage disequilibrium in the region (see references). Multiple PA and IT controls may be placed in the job file. For constructing LD maps the third Malecot model parameter (L) is best left out of this file and used by the program as the predicted L (computed internally) as defined by Morton et al (2001). CC always terminates a job file. The job file follows here: JOB FILE.
INTERMEDIATE (output) FILE
The main output at this point is the 'intermediate' file: INTERMEDIATE FILE. These files always have the same basic format whether derived from haplotype or diplotype data: diplotype int file . The Malecot model may be fitted directly to intermediate file data (still under 'job' file control). The fields in the file are given below and include locations for both loci/alleles in a pair, their map locations, the association rho and its information, chi square, sample size n, allele frequencies Q and R and disequilibrium, D.
Column definitions:
locus1 locus2 location1 location2 rho Krho chiSq n Q R DR1 R2 0.000 0.009 0.9065040650 22.188 18.23 184 0.1304347826 0.5543478261 0.0526937618 R1 R3 0.000 0.024 0.8778778779 98.660 76.03 183 0.4043715847 0.5573770492 0.1571262206 R1 R4 0.000 0.524 0.1525498891 64.631 1.50 182 0.3021978022 0.5494505495 0.0207704384 R1 R5 0.000 0.534 0.1676829268 66.329 1.87 182 0.3076923077 0.5494505495 0.0232459848 R1 R6 0.000 0.554 0.3657575758 126.283 16.89 182 0.3626373626 0.4505494505 0.0728776718 R1 R7 0.000 0.569 0.3576470588 132.392 16.93 182 0.3736263736 0.4505494505 0.0734210844 R1 R8 0.000 0.594 0.5492063492 52.534 15.85 142 0.3169014085 0.5563380282 0.0772168221 R1 R9 0.000 0.614 0.2819512195 101.781 8.09 184 0.4076086957 0.5543478261 0.0512169187 R1 R10 0.000 0.619 0.3006018372 106.448 9.62 184 0.4184782609 0.5543478261 0.0560609641 R1 R11 0.000 0.654 0.2913992298 104.093 8.84 184 0.4130434783 0.5543478261 0.0536389414 R1 R12 0.000 0.684 0.2550779405 69.977 4.55 166 0.3493975904 0.5602409639 0.0391929162 R1 R13 0.000 0.709 0.3212022081 88.184 9.10 166 0.4036144578 0.5602409639 0.0570111772 R1 R14 0.000 0.744 0.2550779405 69.977 4.55 166 0.3493975904 0.5602409639 0.0391929162 R1 R15 0.000 0.779 0.1862567812 63.584 2.21 180 0.3111111111 0.5611111111 0.0254320988CONSTRUCTING THE LD MAP FROM THE INTERMEDIATE FILE.
Once constructed from either haplotype or diplotype data the intermediate file can then be used for the construction of an LD map (OPTIONS 3 and 4). Option 3 will allow the construction of very large maps in sections, while option 4 assembles the map in one piece. The same (or different) job file may be used and the ("analysis") output file containing the final LD map is given here: LDMAP analysis output file. There is also a shorter output ("map") file: LDMAP map output file . As an example the output from the diplotype analysis is given here: LDMAP (diplotype) analysis output file and LDMAP (diplotype) map output file . The method for constructing the map involves the estimation of the epsilon parameter (E) in the Malecot equation for each interval (between adjacent SNPs) in the map. Any larger interval that includes the one being estimated contains some information about epsilon unless the markers in the pair are at such large distance that they contain no more useful information about LD. In running option 4 there are yes/no queries asking whether any of the mapping defaults be changed. For most purposes the response 'no' to all queries will give a satisfactory result. The output file firstly gives the result of fitting the physical (kb) map to the intermediate file data and the final -2lnL is 1298.17 . During the construction of the LD map the Malecot parameter M is updated at 25, 50, 100 and 200..etc. iterations , which improves the approach to convergence. Finally the fit of the pairwise association data to the LD map is obtained and the -2lnL is now 289.40, a substantial improvement reflecting how well the LD map describes the pattern on LD in the region. Note that epsilon (E) is ~1 in LD maps. The LD map shown here has 6.51 LDUs length in a region of 1769 kb. Full details of the method are described in Maniatis et al (2002).