Triallele frequency spectrum

API

class moments.Triallele.TriSpectrum(data, mask=False, finite_genome=False, mask_infeasible=True, mask_fixed=True, data_folded_major=None, check_folding_major=True, data_folded_ancestral=None, check_folding_ancestral=True, dtype=<class 'float'>, copy=True, fill_value=nan, keep_mask=True, shrink=True)[source]

Represents a triallelic frequency spectrum.

The triallelic spectrum is represented as a square numpy masked array in which the (i, j)-th element stores the count or density of loci in which there are i copies of the first derived allele and j copies of the second derived allele.

Parameters:
  • data (array) – The frequency spectrum data of size (n+1)-by-(n+1) where n is the sample size.

  • mask (array) – An optional array of the same size as data. ‘True’ entries in this array are masked in the TriSpectrum. These represent missing data categories, or invalid entries in the array

  • mask_infeasible (bool) – If True, mask all bins for frequencies that cannot occur, e.g. i + j > n. Defaults to True.

  • mask_fixed (bool) – If True, mask the fixed bins. Defaults to True.

  • data_folded_major (bool) – If True, it is assumed that the input data is folded for the major and minor derived alleles.

  • data_folded_ancestral (bool) – If True, it is assumed that the input data is folded to account for uncertainty in the ancestral state. Note that if True, data_folded_major must also be True.

  • check_folding_major (bool) – If True and data_folded_ancestral=True, the data and mask will be checked to ensure they are consistent.

  • check_folding_ancestral (bool) – If True and data_folded_ancestral=True, the data and mask will be checked to ensure they are consistent.

S()[source]

Number of sites in the unmasked spectrum.

fold_ancestral()[source]

Fold the spectrum based on the ancestral state

fold_major()[source]

Fold the spectrum based on the major allele(s).

static from_file(fid, mask_infeasible=True, return_comments=False)[source]

Read frequency spectrum from file.

See to_file method for details on the file format.

Parameters:
  • fid (str) – String with file name to read from or an open file object.

  • mask_infeasible (bool) – If True, mask the infeasible entries in the triallelic spectrum.

  • return_comments (bool) – If true, the return value is (fs, comments), where comments is a list of strings containing the comments from the file.

integrate(nu, tf, dt=0.001, gammas=None, theta=1.0)[source]

Method to simulate the triallelic fs forward in time. This integration scheme takes advantage of scipy’s sparse methods.

Parameters:
  • nu – The population effective size as positive value or callable function.

  • tf (float) – The integration time in genetics units.

  • dt_fac (float) – time step for integration

  • gammas (list) – Population size scaled selection coefficients [sAA, sA0, sBB, sB0, sAB]. Here, 0 represents that ancestral allele, so we can implement dominance by picking the relationship between, e.g., sAA, sA0, sAB, and sA0.

  • theta (float) – Population size scale mutation parameter, assuming equal mutation rates to both derived alleles.

log()[source]

Return the natural logarithm of the entries of the frequency spectrum.

Only necessary because numpy.ma.log now fails to propagate extra attributes after numpy 1.10.

mask_fixed()[source]

Mask entries that are not triallelic.

mask_infeasible()[source]

Mask any infeasible entries.

pi()[source]

Estimated expected number of pairwise differences between two samples from the population at loci that are triallelic

project(ns, finite_genome=False)[source]

Project to smaller sample size.

Parameters:

ns (int) – Sample size for new spectrum.

to_file(fid, precision=16, comment_lines=[], foldmaskinfo=True)[source]

Write frequency spectrum to file.

The file format is:

  • # Any number of comment lines beginning with a ‘#’

  • A single line containing the sample size. On the same line, the string ‘folded_major’ or ‘unfolded_major’ denoting the folding status of the array. And on the same line, the string ‘folded_ancestral’ or ‘unfolded_ancestral’ denoting the folding status of the array.

  • A single line giving the array elements. The order of elements is e.g.: fs[0, 0] fs[0, 1] fs[0, 2] … fs[1, 0] fs[1, 1] …

  • A single line giving the elements of the mask in the same order as the data line. ‘1’ indicates masked, ‘0’ indicates unmasked.

Parameters:
  • fid (str) – String with file name to write to or an open file object.

  • precision (int) – Precision with which to write out entries of the SFS. (They are formated via %.<p>g, where <p> is the precision.)

  • comment_lines (list) – List of strings to be used as comment lines in the header of the output file.

  • foldmaskinfo (bool) – If False, folding and mask and population label information will not be saved.

unfold()[source]

Completely unfold the spectrum.

Returns a new TriSpectrum.