Two-locus frequency spectrum

See Selection at two loci for introduction and examples to the Two-Locus extension.

API

The TLSpectrum class handles all manipulations of the two-locus frequency spectrum:

class moments.TwoLocus.TLSpectrum(data, mask=False, mask_infeasible=True, mask_fixed=False, data_folded=None, check_folding=True, dtype=<class 'float'>, copy=True, fill_value=nan, keep_mask=True, shrink=True)[source]

Represents a two locus frequency spectrum.

Parameters
  • data (array) – The frequency spectrum data, which has shape (n+1)-by-(n+1)-by-(n+1) where n is the sample size.

  • mask (array) – An optional array of the same size as data. ‘True’ entries in this array are masked in the TLSpectrum.

  • mask_infeasible (bool) – If True, mask all bins for frequencies that cannot occur, e.g. i + j > n. Defaults to True.

  • mask_fixed (bool) – If True, mask the fixed bins. Defaults to True.

  • data_folded (bool) – If True, it is assumed that the input data is folded for the major and minor derived alleles

  • check_folding (bool) – If True and data_folded=True, the data and mask will be checked to ensure they are consistent.

D(proj=True, nA=None, nB=None)[source]

Return the expectation of \(D\) from the spectrum.

Parameters
  • proj – If True, use the unbiased estimator from downsampling. If False, use naive maximum likelihood estimates for frequency.

  • nA – If None, the average is computed over all frequencies. If given, condition on the given allele count for the left locus.

  • nB – If None, the average is computed over all frequencies. If given, condition on the given allele count for the right locus.

D2(proj=True, nA=None, nB=None)[source]

Return the expectation of \(D^2\) from the spectrum.

Parameters
  • proj – If True, use the unbiased estimator from downsampling. If False, use naive maximum likelihood estimates for frequency.

  • nA – If None, the average is computed over all frequencies. If given, condition on the given allele count for the left locus.

  • nB – If None, the average is computed over all frequencies. If given, condition on the given allele count for the right locus.

Dz(proj=True, nA=None, nB=None)[source]

Compute the expectation of \(Dz = D(1-2p)(1-2q)\) from the spectrum.

Parameters
  • proj – If True, use the unbiased estimator from downsampling. If False, use naive maximum likelihood estimates for frequency.

  • nA – If None, the average is computed over all frequencies. If given, condition on the given allele count for the left locus.

  • nB – If None, the average is computed over all frequencies. If given, condition on the given allele count for the right locus.

S(nA=None, nB=None)[source]

Return the sum of probabilities over all variable two-locus entries in the spectrum.

ancestral_misid(p)[source]

Return a new SFS with a given ancestral misidentification, p.

Parameters

p – The rate of ancestral state misidentification.

fold()[source]

Fold the two-locus spectrum by minor allele frequencies.

static from_file(fid, mask_infeasible=True, return_comments=False)[source]

Read frequency spectrum from file.

Parameters
  • fid (str) – String with file name to read from or an open file object.

  • mask_infeasible (bool) – If True, mask the infeasible entries in the two locus spectrum.

  • return_comments (bool) – If true, the return value is (fs, comments), where comments is a list of strings containing the comments from the file.

integrate(nu, tf, dt=0.01, rho=None, gamma=None, sel_params=None, sel_params_general=None, theta=1.0, finite_genome=False, u=None, v=None, alternate_fg=None, clustered_mutations=False)[source]

Simulate the two-locus haplotype frequency spectrum forward in time. This integration scheme takes advantage of scipy’s sparse methods.

When using the reversible mutation model (with finite_genome = True), we are limited to selection at only one locus (the left locus), and selection is additive. When using the default ISM, additive selection is allowed at both loci, and we use sel_params, which specifies [sAB, sA, and sB] in that order. Note that while this selection model is additive within loci, it allows for epistasis between loci if sAB != sA + sB.

Parameters
  • nu – Population effective size as positive value or callable function.

  • tf (float) – The integration time in genetics units.

  • dt_fac (float) – The time step for integration.

  • rho (float) – The population-size scaled recombination rate 4*Ne*r.

  • gamma (float) – The population-size scaled selection coefficient 2*Ne*s.

  • sel_params (list) – A list of selection parameters. See docstrings in Numerics. Selection parameters will be deprecated when we clean up the numerics and integration.

  • sel_params_general (list) – To be filled. ## TODO!!

  • theta (float) – Population size scale mutation parameter.

  • finite_genome (bool) – Defaults to False, in which case we use the infinite sites model. Otherwise, we use a reversible mutation model, and must specify u and v.

  • u (float) – The mutation rate at the left locus in the finite genome model.

  • v (float) – The mutation rate at the right locus in the finite genome model.

  • alternate_fg (bool) – If True, use the alternative finite genome model. This parameter will be deprecated when we clean up the numerics and integration.

left()[source]

The marginal allele frequency spectrum at the left locus.

mask_fixed()[source]

Mask all infeasible entries, as well as any where both sites are not segregating.

mask_infeasible()[source]

Mask any infeasible entries.

pi2(proj=True, nA=None, nB=None)[source]

Return the expectation of \(\pi_2 = p(1-p)q(1-q)\) from the spectrum.

Parameters
  • proj – If True, use the unbiased estimator from downsampling. If False, use naive maximum likelihood estimates for frequency.

  • nA – If None, the average is computed over all frequencies. If given, condition on the given allele count for the left locus.

  • nB – If None, the average is computed over all frequencies. If given, condition on the given allele count for the right locus.

project(ns, finite_genome=False, cache=True)[source]

Project to smaller sample size.

param int ns: Sample size for new spectrum. param bool finite_genome: If we also track proportions in fixed bins.

right()[source]

The marginal AFS at the right locus.

to_file(fid, precision=16, comment_lines=[], foldmaskinfo=True)[source]

Write frequency spectrum to file.

Parameters
  • fid (str) – String with file name to write to or an open file object.

  • precision (int) – Precision with which to write out entries of the SFS. (They are formated via %.<p>g, where <p> is the precision.)

  • comment_lines (list) – List of strings to be used as comment lines in the header of the output file.

  • foldmaskinfo (bool) – If False, folding and mask and population label information will not be saved.

unfold()[source]

Remove folding from the spectrum.

The Demographics module contains some standard demographic models. This is a good place to look for some inspiration to create your own two-locus models as well.

moments.TwoLocus.Demographics.bottlegrowth(params, ns, rho=None, theta=1.0, gamma=None, sel_params=None)[source]

A bottleneck followed by exponential growth. The population changes size to nuB T generations ago, and then has exponential size change to final size nuF. Time is in units of 2Ne generations, and sizes are relative to the ancestral Ne.

Parameters
  • params – Given as [nuB, nuF, T].

  • ns – The sample size.

  • rho – The population size scaled selection coefficient, 4*Ne*r.

  • theta – The mutation rate at each locus, typically left as 1.

  • gamma – Only used for additive selection at the A/a locus.

  • sel_params – Additive selection coefficients for haplotypes AB, Ab, and aB, so that sel_params = [sAB, sA, sB]. If sAB = sA + sB, this is a model with no epistasis.

moments.TwoLocus.Demographics.equilibrium(ns, rho=None, theta=1.0, gamma=None, sel_params=None, sel_params_general=None, cache=False)[source]

Compute or load the equilibrium two locus frequency spectrum. If the cached spectrum does not exist, create the equilibrium spectrum and cache in the cache path.

Parameters
  • ns – The sample size.

  • rho – The population size scaled selection coefficient, 4*Ne*r.

  • theta – The mutation rate at each locus, typically left as 1.

  • gamma – Only used for additive selection at the A/a locus.

  • sel_params – Additive selection coefficients for haplotypes AB, Ab, and aB, so that sel_params = [sAB, sA, sB]. If sAB = sA + sB, this is a model with no epistasis.

  • sel_params_general – General selection parameters for diploids. In the order (s_AB_AB, s_AB_Ab, s_AB_aB, s_AB_ab, s_Ab_Ab, s_Ab_aB, s_Ab_ab, s_aB_aB, s_aB_ab)

  • cache – If True, save the frequency spectrum in the cache for future use. If False, don’t save the spectrum.

moments.TwoLocus.Demographics.growth(params, ns, rho=None, theta=1.0, gamma=None, sel_params=None)[source]

An expnential growth model, that begins growth at time T ago, in units of 2Ne generations. The final size is given by nu, which is the relative size to the ancestral Ne.

Parameters
  • params – Given as [nu, T].

  • ns – The sample size.

  • rho – The population size scaled selection coefficient, 4*Ne*r.

  • theta – The mutation rate at each locus, typically left as 1.

  • gamma – Only used for additive selection at the A/a locus.

  • sel_params – Additive selection coefficients for haplotypes AB, Ab, and aB, so that sel_params = [sAB, sA, sB]. If sAB = sA + sB, this is a model with no epistasis.

moments.TwoLocus.Demographics.set_cache_path(path='~/.moments/TwoLocus_cache')[source]

Set directory in which demographic equilibrium phi spectra will be cached.

The collection of cached spectra can get large, so it may be helpful to store them outside the user’s home directory.

moments.TwoLocus.Demographics.three_epoch(params, ns, rho=None, theta=1.0, gamma=None, sel_params=None)[source]

A three-epoch model, with relative size changes nu1 that lasts for time T1, followed by a relative size change to nu2 that last for time T2. Times are in units of 2Ne generations, and sizes are relative to the ancestral Ne.

Parameters
  • params – Given as [nu1, nu2, T1, T2].

  • ns – The sample size.

  • rho – The population size scaled selection coefficient, 4*Ne*r.

  • theta – The mutation rate at each locus, typically left as 1.

  • gamma – Only used for additive selection at the A/a locus.

  • sel_params – Additive selection coefficients for haplotypes AB, Ab, and aB, so that sel_params = [sAB, sA, sB]. If sAB = sA + sB, this is a model with no epistasis.

moments.TwoLocus.Demographics.two_epoch(params, ns, rho=None, theta=1.0, gamma=None, sel_params=None)[source]

A two-epoch model, with relative size change nu, time T in the past. T is given in units of 2Ne generations. Note that a relative size of 1 implies no size change.

Parameters
  • params – Given as [nu, T].

  • ns – The sample size.

  • rho – The population size scaled selection coefficient, 4*Ne*r.

  • theta – The mutation rate at each locus, typically left as 1.

  • gamma – Only used for additive selection at the A/a locus.

  • sel_params – Additive selection coefficients for haplotypes AB, Ab, and aB, so that sel_params = [sAB, sA, sB]. If sAB = sA + sB, this is a model with no epistasis.