Therapying outside the box: Innovating the implementation and evaulation of CBT in therapeutic artificial agents

Author Identifier

Sharjeel Tahir: https://orcid.org/0009-0008-9012-0490

Jumana Abu-Khalaf: https://orcid.org/0000-0002-6651-2880

Syed Afaq Ali Shah: https://orcid.org/0000-0003-2181-8445

Document Type

Conference Proceeding

Publication Title

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

Volume

15439 LNCS

First Page

203

Last Page

213

Publisher

Springer

School

School of Science

RAS ID

77622

Comments

Tahir, S., Abu-Khalaf, J., Shah, S. A. A., & Johnson, J. (2025). Therapying outside the box: Innovating the implementation and evaluation of CBT in therapeutic artificial agents. In M. Barhamgi, H. Wang, & X. Wang (Eds.), Web information systems engineering – WISE 2024 (pp. 203-213). Springer, Singapore https://doi.org/10.1007/978-981-96-0573-6_15

Abstract

With the rise in sedentary lifestyles and burdening work routines, mental health problems have been growing exponentially in recent years. While there are many online therapy agents, most of them lack human-like cognitive capabilities. The objective of this study is to develop and analyze a framework for delivering and assessing Cognitive Behavioural Therapy (CBT), utilizing the sophisticated attributes of state-of-the-art large language models (LLM). This paper presents our three key contributions: (A) Implementation and evaluation of the efficacy of utilizing LLMs, such as Llama2, GPT-3.5, and GPT-4, on CBT data. (B) Curation of real-world CBT conversations, which were gathered and annotated with the help of professionals in the mental health domain. (C) A novel approach for evaluating the performance of AI-based CBT agents or chatbots. Our technique leverages widely used assessment scales in the fields of cognitive behavioral therapy (CBT), natural language processing (NLP), and computer vision. To improve the quality of CBT conversation creation in LLMs, we use a preference-based learning method that bears resemblance to reinforcement learning with human feedback (RLHF). By incorporating the novel evaluation scale alongside three widely used metrics-BLEU, PPL, and Distinct - we were able to establish that the proposed model outperforms state-of-the-art LLMs. For instance, a BLEU score of 0.1739 was achieved compared to GPT-4’s 0.1633.

DOI

10.1007/978-981-96-0573-6_15

Access Rights

subscription content

Share

 
COinS