Statistical genomics and epidemiology

Large biological datasets do not become evidence by scale alone.

Statistical genomics depends on the chain between study design, cohort structure, population genetics, molecular data, disease biology, uncertainty, and reproducible inference. My work builds that chain across WGS statistics, rare variant analysis, multi-omics, clinical cohorts, and population-scale genomic data.

What this work resolves

Cohort evidence

Genetic association studies need more than variant filtering. Cohort structure, ancestry, phenotype definition, missingness, and test design determine whether the result can be trusted.

Rare variant signal

Rare variant analyses need pathway, gene, variant, and burden logic that remains interpretable when events are sparse and uncertainty is large.

Multi-omic interpretation

DNA, RNA-seq, proteomics, metabolomics, and clinical data need a statistical layer that connects molecular signal to disease biology and patient stratification.

Reusable inference

Population evidence becomes more valuable when analyses are reproducible, versioned, visualised clearly, and reusable beyond the first manuscript or internal report.

Evidence at a glance

>100,000 whole genomes analysed in population-scale and biobank-scale settings
5,000 participants in translational infectious, inflammatory, and multi-omic cohort studies
>1,000 children in clinical genomic, multi-omic, and EHR-linked analytical workflows
>100 TB biomedical data handled through secure, reproducible analytical infrastructure

Portfolio samples

Archipelago, variant set association visualisation for complex genomic studies.

QuantCalc. Probabilistic genomic interpretation using priors, observed evidence, and Bayesian inference.

Methods, standards, and systems

Statistical genetics

GWAS, rare variant association testing, gene-level analysis, pathway analysis, burden tests, population stratification, ancestry inference, and cohort QC.

Biostatistics and inference

Regression, multivariate analysis, Bayesian inference, uncertainty quantification, Monte Carlo methods, resampling, high-dimensional modelling, and interpretable visualisation.

Omics integration

Whole-genome sequencing, RNA-seq, proteomics, metabolomics, phenotype-linked molecular data, EHR-linked workflows, and patient-level molecular interpretation.

Computational delivery

R, Python, Bash, Git, Linux, Unix, high-performance computing, AWS, Azure, Docker, Singularity or Apptainer, Nextflow, Snakemake, and reproducible analytical frameworks.

Evidence outputs

Manhattan plots, regional association plots, variant set visualisation, structured reports, machine-readable outputs, cohort summaries, and decision-ready statistical figures.

Scientific communication

Lead-author manuscripts, analytical reports, cross-functional presentations, methods documentation, multidisciplinary collaboration, and supervision in genomics and bioinformatics.

Selected technologies

Selected publications

ORCID record: ORCID iD 0000-0001-8496-3725

Archipelago method for variant set association test statistics. Genetic Epidemiology, 50(1), e70025, 2026.

Predicting the occurrence of variants in RAG1 and RAG2. Journal of Clinical Immunology, 39(7), 688-701, 2019.

Viral genetic determinants of prolonged respiratory syncytial virus infection among infants in a healthy term birth cohort. The Journal of Infectious Diseases, 227(10), 1194-1202, 2022.

Prevalence and clinical challenges among adult primary immunodeficiency patients with RAG deficiency. Journal of Allergy and Clinical Immunology, (), , 2018.

Relevant experience

3 years (2023 to present) · Universitäts-Kinderspital Zürich

Genome-wide and multi-omic analysis in paediatric disease

Lead computational and translational analyses within the Swiss Pediatric Sepsis Study and SwissPedHealth National Data Stream, funded at approximately CHF 1M and CHF 5M across three hospitals.

WGS, RNA-seq, proteomics, metabolomics, clinical phenotypes, EHR-linked data, approximately 1,000 children, more than 100 TB of biomedical data, rare variant and gene-level analysis.

5 years (2018 to 2023) · EPFL Global Health Institute

Statistical genetics across translational cohorts

Developed statistical genetics, computational biology, and multi-omic workflows across sepsis, tuberculosis, asthma, infectious disease, and chronic inflammatory disease studies.

Cohorts up to 5,000 participants, more than CHF 3M in competitive research support, population statistical genetics, rare variant testing, Bayesian reasoning, reproducible computational workflows.

3 years (2015 to 2018) · University of Leeds, School of Medicine

Human genetics in severe immune disease

Led genomic discovery and clinical interpretation in rare immune disease within a translational hospital setting.

Approximately 500 severe clinical cases, genome sequencing, molecular biology, immune-mediated disease, treatment-related findings, and disease mechanism interpretation.

Working fit

Statistical genomics teams need analyses that support replication, review, and reuse. The useful output is not only a significant result, but a defensible chain from cohort definition to interpretable evidence.