Author Identifier (ORCID)
Guan Tay: https://orcid.org/0000-0003-1035-5692
Abstract
Accurate species identification from Whole Genome Sequencing (WGS) data remains challenging, particularly for closely related species such as sheep (Ovis aries) and goats (Capra hircus). Through analysis of mapping quality metrics and Kraken2 taxonomic classification of 40 WGS sheep and goat samples, we demonstrate that conventional approaches yield ambiguous results, with overlapping alignment rates and inconclusive taxonomic assignments. We present a robust comparative genomic approach that uses species-specific genomic regions to distinguish these species in WGS samples. We define species-specific regions as those exhibiting distinctive coverage patterns: average coverage when samples are aligned to their matching reference genome but absent/low coverage when aligned to non-matching references. By analyzing WGS data from both species aligned to both reference genomes, we identified 155,800 goat-specific and 1,714,126 sheep-specific regions. After curation, 10 high-confidence regions per species were selected, achieving 100% accuracy within the analyzed validation datasets comprising 14 independent samples. This approach provides reliable species verification for WGS data and establishes a framework that could be extended to other closely related species. To facilitate adoption, we provide analysis scripts and curated genomic regions available on our GitHub repository (https://github.com/BTC-Lab/Goat_Sheep_specific_regions).
Keywords
Genomic regions, goat, next-generation sequencing, sheep, species identification, species-specific
Document Type
Journal Article
Date of Publication
12-1-2026
Volume
16
Issue
1
PubMed ID
41748869
Publication Title
Scientific Reports
Publisher
Nature
School
School of Medical and Health Sciences
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.
Comments
Marzouka, N. a. D., Al-Aamri, A., Alshamsi, F., Khalili, M., Chehadeh, S. E. H., Mohamed, M. S., Eltahir, Y. M., Koliyan, R., Abdelhalim, M. M., Attia, A., Mousa, M., Tay, G., & Alsafar, H. (2026). A genomic approach for accurate identification of closely related species with next-generation sequencing samples. Scientific Reports, 16. https://doi.org/10.1038/s41598-026-41497-0