Skip to content

Wang-RJ/generationtimes

Repository files navigation

Human generation times across the past 250,000 years

The generation times of our recent ancestors can tell us about both the biology and social organization of prehistoric humans, placing human evolution on an absolute timescale. We implement a method for predicting historical male and female generation times based on changes in the mutation spectrum.

Our method combines data from two different types of studies:

  • Mutations from pedigree studies
    We apply a Dirichlet-multinomial regression to mutation count data to capture the relationship between the underlying mutation spectrum and parental ages.
  • Variants from population genetic studies
    We use human variants from the 1000 Genomes Project with allele ages estimated from the Genealogical Estimation of Variant Age (GEVA) approach.

Summary of analysis workflow

  1. Preprocess 1000 Genomes variant data with ages from GEVA
  2. Count binned variants, including for each continental population
  3. Load mutation data and build Dirichlet-multinomial model
  4. Estimate best-fit parental ages for variant spectrum in each bin

Brief descriptions for folders and files in top-level of repository

folders

  • bootstraps/
    Recalculate estimates for each 100x100 double-bootstrap of model and variants
  • neanderthal_masked/
    Reanalysis masking genomic tracts with potential Neanderthal introgression
  • resample_alleleage/
    Reanalysis after drawing new allele ages based on 95% CI from GEVA
  • var_count/
    Preprocess variant data, bin variants, and count each mutation class
  • model_coefficients/
    RData object for spectrum prediction model and coefficients from this model as plain-text

files

  • age_modeling.R
    Loads mutation data and builds the probabilistic model for estimating parental ages
  • analyze_main.R
    Analysis script for main plots, depends on age_modeling.R and plot_helper.R
  • analyze_populations.R
    Analysis script for separate continental human populations
  • calculate_SSE.R
    Calculate the sum of squared error (SSE) for generation time estimates
  • cross_validation.R
    Short script for calculating sample variance SSE
  • plot_helper.R
    Auxillary scripts for shaping output and plotting
  • recombination_analysis.R
    Investigation of connection between recombination rate and mutation spectrum
  • sim_famvariance.R
    Simulate variance in parental ages and calculate SSE