Biostatistics & Bioinformatics Shared Resource Facility (BB SRF)

BB SRF Overview

The mission of the Biostatistics and Bioinformatics Shared Resource Facility (BB SRF) is to apply statistical principles to ensure rigor and enhance the execution of scientific research through a team science model of collaboration with MCC investigators. To this end, our primary goal is to provide centralized, cutting-edge and accessible expertise in biostatistics and bioinformatics to support all phases of cancer studies from study development, implementation, and post-study analyses.

The BB SRF is a cancer center-managed shared resource that is comprised of highly experienced personnel with diverse skill sets. The BB SRF received an Exceptional rating from the NCI’s last review in 2018 and continues to provide value-added support to the mission of the Markey Cancer Center at the University of Kentucky.

Specific Aims

  1. Provide statistical expertise and consultation in study design, study conduct and analysis across the spectrum of projects from MCC Research Programs.
  2. Provide high-quality bioinformatics expertise focused on study design and data analysis across the spectrum of projects from MCC Research Programs.
  3. Enhance MCC research through a team science model along with utilization of unique processes for interfacing across MCC Shared Resources.

bbsrf staff holiday outing

Biostatistical Services*
Assistant Director: Brent Shelton, PhD

  • Study planning, peer-reviewed and pilot grants
  • Clinical trial design and implementation*
  • Statistical analysis
  • Statistical programming and quality control
  • Mentoring, training and education

*Integrated with CRI SRF, MCC CRO, MCC PMC

Bioinformatics Services*
Assistant Director: Chi Wang, PhD

  • Study planning, power, sample size
  • Focus on data processing and data analysis methods
    • Next generation sequencing (DNAseq, RNAseq, scRNAseq, snATACseq, ChIPseq, ATACseq, WGBS, metagenomic sequencing, Visium)
    • Metabolomics
    • Digital spatial profiling
    • Genomic data mining
    • ‘Omics integration
  • Mentoring, training, education

*Inter-SRF workflows with BPTP, CRI, RM, OG

The BB SRF uses REDCap to manage project requests. To start using BB SRF services, click the link below, which will take you to the REDCap form that you can fill out for project requests.

BB SRF REDCap Project Request Form


Heidi Weiss

Heidi L. Weiss, PhD
BB SRF Director


Brent Shelton

Brent J. Shelton, PhD
BB SRF Assistant Director
of Biostatistics

Chi Wang

Chi Wang, PhD
BB SRF Assistant Director
of Bioinformatics

Li Chen

Li Chen, PhD
Associate Professor

Bin Huang

Bin Huang, DrPH

Donglin Yan

Donglin Yan, PhD
Assistant Professor

jinpeng liu

Jinpeng Liu, PhD
Assistant Professor

hunter mosley

Hunter Mosley, PhD

Feitong Lei

Feitong Lei, PhD
Assistant Professor









Rani Jayswai

Rani Jayswal, MS
Statistician Senior

Daheng He

Daheng He, PhD
Statistician Principal

Andrew Shearer

Andrew Shearer, MS

Todd Weiss

Todd Weiss, MSPH
Statistician Senior

Lauren Corum

Lauren Corum, MPH

Ashley Peter

Ashley Peter
Administrative Operations Coordinator

Shouyi Liang

Shouyi Liang
Graduate Research Assistant


Not Pictured

  • Bioinformatics Analyst: Ryan Goettl, MS
  • Bioinformatics Analyst: Abu Saleh Mosa Faisal, MS
  • Bioinformatics Analyst: Jinge Liu, PhD, MS
  • Graduate Research Assistant: Ning Li, MS
  • Graduate Research Assistant: Kun Liu

For more information, contact

Ben F. Roach Cancer Building

800 Rose St. 
Lexington KY 40536-0093 
Building #93 
Registrar Code: RPCA
Google Map
Interactive Campus Map (#93)

We also have a facility in the same location as the UK Cancer Control Program.

2365 Harrodsburg Road, Suite A230 
Lexington KY 40504-3381

To ensure that the BB SRF supports high-quality research, the top three priority areas are: 1) investigators with peer-reviewed funded studies; 2) investigators with non-peer-reviewed funded studies and applying for peer-reviewed grants with proposed BB SRF funding; and 3) investigators conducting MCC pilot studies and IITs.

Consultation and support for grant applications and clinical trial development are provided free of charge. BB SRF also provides limited preliminary data analysis to formulate new hypotheses for proposals to be submitted for funding. More extensive support should provide funding of BB SRF effort on individual grants. 

  1. Wu J, Chen L, Wei J, Weiss H, Miller RW, Villano JL. Phase II trial design with growth modulation index as the primary endpoint. Pharm Stat 18:212-222, 2019. PMCID: PMC9335177
  2. Wu J, Chen L, Wei J, Weiss H, Chauhan A. Two-stage phase II survival trial design. Pharm Stat. 2020 May;19(3):214-229. doi: 10.1002/pst.1983. Epub 2019 Nov 21. PMID: 31749311.
  3. Wu J, Wei J. Cancer immunotherapy trial design with delayed treatment effect. Pharm Stat. 2020 May;19(3):202-213. doi: 10.1002/pst.1982. Epub 2019 Nov 15. PMID: 31729149.
  4. Wu J, Wei J. Cancer immunotherapy trial design with random delayed treatment effect and cure rate. Stat Med. 2022 Feb 20;41(4):786-797. doi: 10.1002/sim.9258. Epub 2021 Nov 15. PMID: 34779534.
  5. Wei J, Wu J. Cancer immunotherapy trial design with cure rate and delayed treatment effect. Stat Med. 2020 Mar 15;39(6):698-708. doi: 10.1002/sim.8440. Epub 2019 Nov 26. PMID: 31773770.
  6. Tian S, Wang C, Chang HH, Sun J. Identification of prognostic genes and gene sets for early-stage nonsmall cell lung cancer using bi-level selection methods. Scientific Reports, 7:46164, 2017. 
  7. Huang Z†, Chen L and Wang C*.  Classifying lung adenocarcinoma and squamous cell carcinoma using RNA-Seq Data. Cancer Studies and Molecular Medicine. 2017 Sep; Volume 3: Issue 2. 
  8. Tian S, Wang C, Chang HH. A longitudinal feature selection method identifies relevant genes to distinguish complicated injury and uncomplicated injury over time. BMC Medical Informatics and Decision Making,18(5):115, 2018.
  9. Tian S, Wang C, Chang HH. To select relevant features for longitudinal gene expression data by extending a pathway analysis method. F1000Research. 2018;7.
  10. Li Y†, Fan TWM, Lane AN, Kang WK, Arnold SM, Stromberg AJ, Wang C*, Chen Li. SDA: A semi-parametric differential abundance analysis method for metabolomics and proteomics data. BMC Bioinformatics, 2019 Dec 1;20(1):501. 
  11. Wang M†, Yu T, Liu J, Chen L, Stromberg AJ, Villano JL, Arnold SM, Liu C, Wang C*. A probabilistic method for leveraging functional annotations to enhance estimation of the temporal order of pathway mutations during carcinogenesis. BMC bioinformatics. 2019 Dec;20(1):1-2. 
  12. Tian S, Wang C*. Feature Selection for Longitudinal Data by Using Sign Averages to Summarize Gene Expression Values over Time. Biomed Res Int. 2019;2019:1724898. doi: 10.1155/2019/1724898. eCollection 2019. 
  13. Tian S, Wang C, Wang B. Incorporating Pathway Information into Feature Selection towards Better Performed Gene Signatures. Biomed Res Int. 2019;2019:2497509. doi: 10.1155/2019/2497509. eCollection 2019. Review. 
  14. Huang Z†, Lane AN, Fan TW, Higashi RM, Weiss HL, Yin X, Wang C*. Differential Abundance Analysis with Bayes Shrinkage Estimation of Variance (DASEV) for Zero-Inflated Proteomic and Metabolomic Data. Scientific Reports. 2020 Jan 21;10(1):1-2. 
  15. Tian S, Wang C, Suarez-Farinas M. GEE-TGDR: A longitudinal feature selection algorithm and its application to lncRNA expression profiles for psoriasis patients treated with immune therapies. BioMed Research International, 2021;2021:8862895.
  16. Tian S, Wang C*. An ensemble of the iCluster method to analyze longitudinal lncRNA expression data for psoriasis patients. Human Genomics 2021 Apr 20;15(1):23.
  17. Liu S†, Liu J, Xie Y, Zhai T, Hinderer EW, Stromberg AJ, Vanderford NL, Kolesar JM, Moseley HNB, Chen L, Liu C, Wang C*. MEScan: a powerful statistical framework for genome-scale mutual exclusivity analysis of cancer mutations. Bioinformatics. 2021 Jun 9;37(9):1189-1197. 
  18. Huang Z, Wang C. A Review on Differential Abundance Analysis Methods for Mass Spectrometry-Based Metabolomic Data. Metabolites. 2022 Apr;12(4):305.
  19. Yang Y, Shelton BJ, Tucker TC, Li L, Kryscio RJ, Chen L (corresponding author). Estimation of exposure distribution adjusting for association between exposure level and detection limit. Statistics in Medicine. 2017 Aug 15;36(18):2935-2946.
  20. Wu J, Chen L, Wei J, Weiss H, Miller RW, Villano JL. Phase II trial design with growth modulation index as the primary endpoint. Pharmaceutical Statistics 18(2):212-222,2019.
  21. Li Y, Fan TWM, Lane AN, Kang WK, Arnold SM, Stromberg AJ, Wang C, Chen L (corresponding author). SDA: A semi-parametric differential abundance analysis method for metabolomics and proteomics data.  BMC Bioinformatics 20(1):501, 2019. 
  22. Wang M, Yu T, Liu J, Chen L, Stromberg A, Villano J, Liu C, Wang C. A Probabilistic Method for Leveraging Functional Annotations to Enhance Estimation of the Temporal Order of Pathway Mutations during Carcinogenesis. BMC Bioinformatics. 20(1):620, 2019.
  23. Liu S, Liu J, Xie Y, Zhai T, Hinderer EW, Stromberg AJ, Vanderford NL, Kolesar JM, Moseley HNB, Chen L, Liu C, Wang C. MEScan: A Powerful Statistical Framework forGenome-Scale Mutual Exclusivity Analysis of Cancer Mutations. Bioinformatics. 2021 Jun 9;37(9):1189-1197.
  24. A'mar T, Beatty JD, Fedorenko C, Markowitz D, Corey T, Lange J, Schwartz SM, Huang B, Chubak J, Etzioni R. Incorporating Breast Cancer Recurrence Events Into Population-Based Cancer Registries Using Medical Claims: Cohort Study. JMIR Cancer. 2020 Aug 17;6(2):e18143.
  25. Huang B, Pollock E, Zhu L, Athens JP, Gangnon R, Feuer EJ, Tucker TC. Ranking Composite Cancer Burden Indices for Geographic Regions: Point and Interval Estimates. Cancer Causes Control. 2018 Feb;29(2):279-287. 
  26. Warren JL, Benner S, Stevens J, Enewold L, Huang B, Zhao L, Tilahun N, Bradley CJ. Development and Evaluation of a Process to Link Cancer Patients in the SEER Registries to National Medicaid Enrollment Data. J Natl Cancer Inst Monogr. 2020 May 1;2020(55):89-95. 


Statistical and Bioinformatics Tools and Software

Our faculty and staff have extensive experience and expertise with many statistical software packages to help develop studies, calculate sample sizes with adequate power, and perform data analyses for MCC investigators across all three Research Programs. We are happy to collaborate with researchers on a project that may require specialized software. Our statistical package experience includes (but is not limited to):

  • General Purpose: SAS, SUDAAN, SPSS, MINITAB, StatXact, LogXact, STATA
  • Design and Sample Size: nQuery Advisor, NCSS and PASS 2022, EAST, EAST SURV, EAST Adapt
  • Adaptive Designs: ExpDesign Studio, ADDPLAN
  • Shareware: R, SaTScan for spatial analyses, WinBugs, MD Anderson Biostatistics Software
  • Bioinformatics: FastQC, SAMtools, Picard, Cutadapt, trimmomatic, BBDuk, CellRanger, Seurat, Scanpy, ArchR, CellChat, MapSplice2, STAR, BWA, Bowtie1, Bowtie2, TopHat, HTSeq, RSEM, DESeq2, edgeR, DSS, GSEA, GOseq, IPA, David, GATK, MuTect1, MuTect2, VarScan2, GISTIC2, MutSigCV, Phylowgs, Funcotator, Oncotator, SnpEFF, ANNOVAR, Bismark, methylKit, MACS2, deeptools, DiffBind, HOMER, Kraken2, Braken2, HUMAnN3, MetaPhlAn3, graPhlAn, QIIME2, mothur, limma, TCGAbiolinks, lme4, NanoStringDiff, maftools, affy, oligo, globaltest, anota, virtualArray, sva, RPPanalyzer, minfi, ChAMP, NearestTemplatePrediction, RCircos.
  • Bioinformatics Databases: Single Cell Portal, Human Cell Atlas, Gene Expression Omnibus, Sequence Read Archive, Genomic Data Commons, cBioPortal, COSMIC, dbSNP and 1000 Genomes.






A semiparametric method for differential abundance/expression analysis of proteomic, metabolomic and scRNAseq data



Delineate the temporal order of driver mutations during carcinogenesis



Bayesian shrinkage estimation of variance for differential abundance analysis of proteomic and metabolomic



Identify cancer driver mutations by genome-wide screen of mutually exclusive mutation patterns



Categorizing Gene Ontology into subgraphs of user-defined emergent concepts




Quality Control, Deposition, and Curation of the Metabolomics Workbench Data Repository




Parsing and analyzing electron density maps data available from the worldwide Protein Data Bank



Scripts for creating and updating the Metabolomics Workbench File Status Website



R package for information-content-informed Kendall Tau correlation coefficients.



Python package for calculating information-content-informed Kendall Tau correlation coefficients.



Peak characterization of Fourier-transform mass spectrometry data



Reading and writing of Chemical Table file (CTfile) formats



Meta-analysis of high-throughput datasets using enriched feature annotations instead of just the features themselves



Facilitates reading and writing NMR-STAR formatted files



Enumerates isotopically resolved InChI (International Chemical Identifier) for metabolites.



Moiety model representation, model optimization, and model selection.



Solving boundary-value inverse problem based on a simulated annealing and genetic algorithm



k-mer based gene count algorithm to support efficient UMI counts from single cell RNA-seq data



Arbitrary sequence query against large collections of RNA-seq experiments



Taxonomic classification of metagenomic sequences


  • bacr is an R package developed by Dr. Chi Wang for implementing the Bayesian Adjustment for Confounding (BAC) method for estimating the average causal effect of a treatment on an outcome from cohort studies.
  • NanoStringDiff is an R/Bioconductor package developed by Dr. Chi Wang to perform differential expression analysis based on gene expression data generated from the NanoString nCounter system. In addition, a user-friendly web application, NanoStringDiffWeb, is available here.
  • “paf” R package: Calculate unadjusted/adjusted attributable fraction function of a set of covariates for a censored survival outcome from a Cox model using the method proposed by Chen, Lin and Zeng (Biometrika 97, 713-726, 2010).
  • “KENDL” R Package: Calculate the kernel-smoothed nonparametric estimator for the exposure distribution in presence of detection limits using the method proposed by Yang et al. (Sat. Med 36(18), 2935-2946, 2017)

quotation marks

Acknowledge the BB SRF

Investigators are required to acknowledge the Markey Cancer Center Biostatistics and Bioinformatics Shared Resource Facility (BB SRF) in any publications that result from the use of biostatistics, bioinformatics or information received through the MCC BB SRF. Please use the following statement to acknowledge the BB SRF. 

“This research was supported by the Biostatistics and Bioinformatics Shared Resource Facility of the University of Kentucky Markey Cancer Center (P30CA177558).”

NCI Comprehensive Cancer Center - A Cancer Center Designated by the National Cancer Institute

Markey Cancer Center is designated by the National Cancer Institute as a Comprehensive Cancer Center – a distinction that recognizes our commitment to accelerating precision cancer research and care to patients. We are the first and only NCI-Comprehensive Cancer Center in Kentucky, and one of 57 in the nation.