Triallele frequency spectrum
API
- class moments.Triallele.TriSpectrum(data, mask=False, finite_genome=False, mask_infeasible=True, mask_fixed=True, data_folded_major=None, check_folding_major=True, data_folded_ancestral=None, check_folding_ancestral=True, dtype=<class 'float'>, copy=True, fill_value=nan, keep_mask=True, shrink=True)[source]
Represents a triallelic frequency spectrum.
The triallelic spectrum is represented as a square numpy masked array in which the (i, j)-th element stores the count or density of loci in which there are i copies of the first derived allele and j copies of the second derived allele.
- Parameters:
data (array) – The frequency spectrum data of size (n+1)-by-(n+1) where n is the sample size.
mask (array) – An optional array of the same size as data. ‘True’ entries in this array are masked in the TriSpectrum. These represent missing data categories, or invalid entries in the array
mask_infeasible (bool) – If True, mask all bins for frequencies that cannot occur, e.g. i + j > n. Defaults to True.
mask_fixed (bool) – If True, mask the fixed bins. Defaults to True.
data_folded_major (bool) – If True, it is assumed that the input data is folded for the major and minor derived alleles.
data_folded_ancestral (bool) – If True, it is assumed that the input data is folded to account for uncertainty in the ancestral state. Note that if True, data_folded_major must also be True.
check_folding_major (bool) – If True and data_folded_ancestral=True, the data and mask will be checked to ensure they are consistent.
check_folding_ancestral (bool) – If True and data_folded_ancestral=True, the data and mask will be checked to ensure they are consistent.
- static from_file(fid, mask_infeasible=True, return_comments=False)[source]
Read frequency spectrum from file.
See to_file method for details on the file format.
- Parameters:
fid (str) – String with file name to read from or an open file object.
mask_infeasible (bool) – If True, mask the infeasible entries in the triallelic spectrum.
return_comments (bool) – If true, the return value is (fs, comments), where comments is a list of strings containing the comments from the file.
- integrate(nu, tf, dt=0.001, gammas=None, theta=1.0)[source]
Method to simulate the triallelic fs forward in time. This integration scheme takes advantage of scipy’s sparse methods.
- Parameters:
nu – The population effective size as positive value or callable function.
tf (float) – The integration time in genetics units.
dt_fac (float) – time step for integration
gammas (list) – Population size scaled selection coefficients [sAA, sA0, sBB, sB0, sAB]. Here, 0 represents that ancestral allele, so we can implement dominance by picking the relationship between, e.g., sAA, sA0, sAB, and sA0.
theta (float) – Population size scale mutation parameter, assuming equal mutation rates to both derived alleles.
- log()[source]
Return the natural logarithm of the entries of the frequency spectrum.
Only necessary because numpy.ma.log now fails to propagate extra attributes after numpy 1.10.
- pi()[source]
Estimated expected number of pairwise differences between two samples from the population at loci that are triallelic
- project(ns, finite_genome=False)[source]
Project to smaller sample size.
- Parameters:
ns (int) – Sample size for new spectrum.
- to_file(fid, precision=16, comment_lines=[], foldmaskinfo=True)[source]
Write frequency spectrum to file.
The file format is:
# Any number of comment lines beginning with a ‘#’
A single line containing the sample size. On the same line, the string ‘folded_major’ or ‘unfolded_major’ denoting the folding status of the array. And on the same line, the string ‘folded_ancestral’ or ‘unfolded_ancestral’ denoting the folding status of the array.
A single line giving the array elements. The order of elements is e.g.: fs[0, 0] fs[0, 1] fs[0, 2] … fs[1, 0] fs[1, 1] …
A single line giving the elements of the mask in the same order as the data line. ‘1’ indicates masked, ‘0’ indicates unmasked.
- Parameters:
fid (str) – String with file name to write to or an open file object.
precision (int) – Precision with which to write out entries of the SFS. (They are formated via %.<p>g, where <p> is the precision.)
comment_lines (list) – List of strings to be used as comment lines in the header of the output file.
foldmaskinfo (bool) – If False, folding and mask and population label information will not be saved.