Skills
Phylogenetics
- Managed DNA and AA multiple sequence alignments including reformatting, extracting individuals genes or sequences, filtering sites or taxa, and mapping SNPs to reference genomes
- Developed phylogenetic pipelines using current best practices including:
- Data selection and data filtering
- Identification of appropriate phylogenetic methods and parameters
- Model assessment and tests of model adequacy
- Tree estimation
- Tests of tree topology and comparison of tree topologies
- Estimated phylogenetic trees including:
- Maximum parsimony
- Maximum likelihood tree estimation with current best practices (in IQ-Tree, RAxML, FastTree)
- Gene tree estimation with current best practices (in IQ-Tree, RAxML)
- Summary tree estimation with current best practices (in ASTRAL)
Data science
- Developed workflows for data aggregation and formatting, including sanity checks and data conversion
- Performed exploratory data analysis, statistics, and data visualisation on hundreds of datasets in multiple disciplines including genetics, phylogenetics, ecology, climate science, geography,
- Wrote executable programs to automatically perform repetitive tasks for myself and others such as: applying statistical tests; aggregating and summarising data; and automating repetitive tasks including file management, text processing, or calling software via command line
Statistics and experimental design
- Designed experiments considering: controls and randomisation; available time and manpower; specific research questions; statistical analyses; and the scientific literature
- Selected and implemented appropriate statistical tests
- Developed custom implementations of statistical tests depending on experimental design and requirements, such as custom bootstraps and parametric bootstraps
Software programming
- Fluent in R and Python.
- Experienced with Julia and BASH
- Applied R and python software packages for applications such as statistics, data conversion and cleaning, data analysis, data visualisation, bioinformatics, population biology, spatial analysis, climate modelling, niche modelling, evolutionary biology, genetics, and phylogenetics
- 6 research projects in R and 2 in Python each with hundreds of lines of code and dozens of functions, covering a range of biological disciplines from phylogenetics to climate modelling to population genetics
- Created and documented custom bioinformatics or data science pipelines incorporating custom programming scripts
- Incorporated existing software into custom pipelines, including Microsoft Excel and programs for phylogenetics, genetics, recombination detection, data formatting and data conversion, and statistics
- Ran custom scripts and pipelines on HPC systems, with Slurm or PBS scheduling
- Extracted data from online databases or servers via GUI and API call
Documentation and record-keeping
- Collaborated on research projects via GitHub with best practices
- Summarised data from years of lab notebooks and providing thorough documentation for future researchers, including data visualisations to enable planning for field work and lab experiments
- Thoroughly documented all code to allow replication or adaptation