Author Identifier (ORCID)

Guan Tay: https://orcid.org/0000-0003-1035-5692

Abstract

Accurate species identification from Whole Genome Sequencing (WGS) data remains challenging, particularly for closely related species such as sheep (Ovis aries) and goats (Capra hircus). Through analysis of mapping quality metrics and Kraken2 taxonomic classification of 40 WGS sheep and goat samples, we demonstrate that conventional approaches yield ambiguous results, with overlapping alignment rates and inconclusive taxonomic assignments. We present a robust comparative genomic approach that uses species-specific genomic regions to distinguish these species in WGS samples. We define species-specific regions as those exhibiting distinctive coverage patterns: average coverage when samples are aligned to their matching reference genome but absent/low coverage when aligned to non-matching references. By analyzing WGS data from both species aligned to both reference genomes, we identified 155,800 goat-specific and 1,714,126 sheep-specific regions. After curation, 10 high-confidence regions per species were selected, achieving 100% accuracy within the analyzed validation datasets comprising 14 independent samples. This approach provides reliable species verification for WGS data and establishes a framework that could be extended to other closely related species. To facilitate adoption, we provide analysis scripts and curated genomic regions available on our GitHub repository (https://github.com/BTC-Lab/Goat_Sheep_specific_regions).

Keywords

Genomic regions, goat, next-generation sequencing, sheep, species identification, species-specific

Document Type

Journal Article

Date of Publication

12-1-2026

Volume

16

Issue

1

PubMed ID

41748869

Publication Title

Scientific Reports

Publisher

Nature

School

School of Medical and Health Sciences

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License.

Comments

Marzouka, N. a. D., Al-Aamri, A., Alshamsi, F., Khalili, M., Chehadeh, S. E. H., Mohamed, M. S., Eltahir, Y. M., Koliyan, R., Abdelhalim, M. M., Attia, A., Mousa, M., Tay, G., & Alsafar, H. (2026). A genomic approach for accurate identification of closely related species with next-generation sequencing samples. Scientific Reports, 16. https://doi.org/10.1038/s41598-026-41497-0

Share

 
COinS
 

Link to publisher version (DOI)

10.1038/s41598-026-41497-0