Triallele frequency spectrum
- class moments.Triallele.TriSpectrum(data, mask=False, finite_genome=False, mask_infeasible=True, mask_fixed=True, data_folded_major=None, check_folding_major=True, data_folded_ancestral=None, check_folding_ancestral=True, dtype=<class 'float'>, copy=True, fill_value=nan, keep_mask=True, shrink=True)
Represents a triallelic frequency spectrum.
The triallelic spectrum is represented as a square numpy masked array in which the (i, j)-th element stores the count or density of loci in which there are i copies of the first derived allele and j copies of the second derived allele.
data (array) – The frequency spectrum data of size (n+1)-by-(n+1) where n is the sample size.
mask (array) – An optional array of the same size as data. ‘True’ entries in this array are masked in the TriSpectrum. These represent missing data categories, or invalid entries in the array
mask_infeasible (bool) – If True, mask all bins for frequencies that cannot occur, e.g. i + j > n. Defaults to True.
mask_fixed (bool) – If True, mask the fixed bins. Defaults to True.
data_folded_major (bool) – If True, it is assumed that the input data is folded for the major and minor derived alleles.
data_folded_ancestral (bool) – If True, it is assumed that the input data is folded to account for uncertainty in the ancestral state. Note that if True, data_folded_major must also be True.
check_folding_major (bool) – If True and data_folded_ancestral=True, the data and mask will be checked to ensure they are consistent.
check_folding_ancestral (bool) – If True and data_folded_ancestral=True, the data and mask will be checked to ensure they are consistent.
Number of sites in the unmasked spectrum.
Fold the spectrum based on the ancestral state
Fold the spectrum based on the major allele(s).
- static from_file(fid, mask_infeasible=True, return_comments=False)
Read frequency spectrum from file.
See to_file method for details on the file format.
fid (str) – String with file name to read from or an open file object.
mask_infeasible (bool) – If True, mask the infeasible entries in the triallelic spectrum.
return_comments (bool) – If true, the return value is (fs, comments), where comments is a list of strings containing the comments from the file.
- integrate(nu, tf, dt=0.001, gammas=None, theta=1.0)
Method to simulate the triallelic fs forward in time. This integration scheme takes advantage of scipy’s sparse methods.
nu – The population effective size as positive value or callable function.
tf (float) – The integration time in genetics units.
dt_fac (float) – time step for integration
gammas (list) – Population size scaled selection coefficients [sAA, sA0, sBB, sB0, sAB]. Here, 0 represents that ancestral allele, so we can implement dominance by picking the relationship between, e.g., sAA, sA0, sAB, and sA0.
theta (float) – Population size scale mutation parameter, assuming equal mutation rates to both derived alleles.
Return the natural logarithm of the entries of the frequency spectrum.
Only necessary because numpy.ma.log now fails to propagate extra attributes after numpy 1.10.
Mask entries that are not triallelic.
Mask any infeasible entries.
Estimated expected number of pairwise differences between two samples from the population at loci that are triallelic
- project(ns, finite_genome=False)
Project to smaller sample size.
ns (int) – Sample size for new spectrum.
- to_file(fid, precision=16, comment_lines=, foldmaskinfo=True)
Write frequency spectrum to file.
The file format is:
# Any number of comment lines beginning with a ‘#’
A single line containing the sample size. On the same line, the string ‘folded_major’ or ‘unfolded_major’ denoting the folding status of the array. And on the same line, the string ‘folded_ancestral’ or ‘unfolded_ancestral’ denoting the folding status of the array.
A single line giving the array elements. The order of elements is e.g.: fs[0, 0] fs[0, 1] fs[0, 2] … fs[1, 0] fs[1, 1] …
A single line giving the elements of the mask in the same order as the data line. ‘1’ indicates masked, ‘0’ indicates unmasked.
fid (str) – String with file name to write to or an open file object.
precision (int) – Precision with which to write out entries of the SFS. (They are formated via %.<p>g, where <p> is the precision.)
comment_lines (list) – List of strings to be used as comment lines in the header of the output file.
foldmaskinfo (bool) – If False, folding and mask and population label information will not be saved.
Completely unfold the spectrum.
Returns a new TriSpectrum.