From pixels to prognosis: A multi-modal attention-based framework for visceral adipose tissue estimation

Abstract

Obesity is a chronic disease that increases the risk of multi-organ damage as well as cardiovascular disease, diabetes, and certain cancers. It is strongly related to Visceral Adipose Tissue (VAT), which is the fat stored around the internal organs. New approaches to assessing VAT in large populations are essential to understand how obesity contributes to chronic disease progression. Various direct and indirect measures have been developed to quantify VAT. However, many of these techniques either fail to distinguish between various types of body fats (e.g., subcutaneous versus visceral) or involve high radiation imaging and/or are costly (e.g., Computed Tomography). Annually, millions of individuals globally undergo hip or spine Dual-energy X-ray Absorptiometry (DXA) scans to screen for osteoporosis as well as lateral spine (LS) scans to detect vertebral fractures. In this paper, we develop a multi-modal attention-based framework for VAT estimation from LS DXA scans and patient demographic information. We compare our results on two LS DXA datasets with baseline methods and also perform clinical analysis to demonstrate its effectiveness. The proposed approach has the potential to enable cost-effective, non-invasive, and efficient quantification of VAT in people undergoing bone density assessment with LS scans. To the best of our knowledge, this is the first paper to predict VAT from DXA LS scans.

Document Type

Conference Proceeding

Date of Publication

1-1-2026

Volume

15974 LNCS

Publication Title

Lecture Notes in Computer Science

Publisher

Springer

School

Centre for Artificial Intelligence and Machine Learning (CAIML) / School of Science / Nutrition and Health Innovation Research Institute

RAS ID

84393

Funders

National Health and Medical Research Council / Raine Medical Research Foundation / Western Australian Future Health Research and Innovation Fund

Grant Number

NHMRC Number : APP1183570

Comments

Maqsood, A., Saleem, A., Sim, M., Suter, D., Radavelli-Bagatini, S., Hodgson, J. M., Prince, R. L., Zhu, K., Leslie, W. D., Schousboe, J. T., Lewis, J. R., & Gilani, S. Z. (2025). From pixels to prognosis: A multi-modal attention-based framework for visceral adipose tissue estimation. In Lecture Notes in Computer Science (pp. 249–259). https://doi.org/10.1007/978-3-032-05182-0_25

Copyright

subscription content

First Page

249

Last Page

259

Share

 
COinS
 

Link to publisher version (DOI)

10.1007/978-3-032-05182-0_25