Behavioral Biometrics in Assisted Living: A Methodology for Emotion Recognition

Behavioral biometrics aim at providing algorithms for the automatic recognition of individual behavioral traits, stemming from a person’s actions, attitude, expressions and conduct. In the field of ambient assisted living, behavioral biometrics find an important niche. Individuals suffering from the early stages of neurodegenerative diseases (MCI, Alzheimer’s, dementia) need supervision in their daily activities. In this context, an unobtrusive system to monitor subjects and alert formal and informal carers providing information on both physical and emotional status is of great importance and positively affects multiple stakeholders. The primary aim of this paper is to describe a methodology for recognizing the emotional status of a subject using facial expressions and to identify its uses, in conjunction with pre-existing risk-assessment methodologies, for its integration into the context of a smart monitoring system for subjects suffering from neurodegenerative diseases. Paul Ekman’s research provided the background on the universality of facial expressions as indicators of underlying emotions. The methodology then makes use of computational geometry, image processing and graph theory algorithms for the detection of regions of interest and then a neural network is used for the final classification. Findings are coupled with previous published work for risk assessment and alert generation in the context of an ambient assisted living environment based on Service oriented architecture principles, aimed at remote web-based estimation of the cognitive and physical status of MCI and dementia patients.


INTRODUCTION
People older than 64 represent an average of 18.5% of the total European population and growing notably is the age group over 84, as reported by Eurostat in 2015 [1]. Such remarkable steady increase of older population in Europe requires innovative approaches, to make possible and maintain the independence, autonomy and quality of life of our elderly. The impact of an increasing aging population also affects greatly formal and informal care providers. An aging population is also associated with an increase in the number of people suffering from mild cognitive impairment (MCI), such as the range of dementia [2][3][4]. These conditions are very challenging from the outset, ranging from the problem of reaching and obtaining a diagnosis, to the tailoring of appropriate care and support for the individual and their caregivers both family and professional [5][6]. There is therefore the need for an innovative approach, to explore different ways of working with increased collaboration across disciplines: healthcare experts and technologists and greater working partnerships between older people and their formal and informal caregivers [7][8][9].
Progress in Information and Communications Technology (ICT) in the past years has led in exciting advances with direct effect in aspects of everyday life activities. Especially concerning modern healthcare, ICT has permeated deeply the current paradigm providing new frameworks for telecare, with comprehensive multi-stakeholder benefits, for the individual subject and direct carers, local healthcare infrastructures and healthcare system as a whole [10][11][12]. ICT provides frameworks that enable internet based, decentralized, costeffective and efficient services to patients. Ambient Assisted Living (AAL) systems are being constantly on the cutting edge of ongoing research, aiming at improving home-base patients' quality of life, provide them with means to cope with their diseases, delay institutionalization and alleviate burden of family members [12][13][14]. The need for a paradigm shift from remedial to preventive measures in the field of assisted living is

www.etasr.com Xefteris et al.: Behavioral Biometrics in Assisted Living: A Methodology for Emotion Recognition
apparent and currently in its early stages. In this context, modern AAL systems comprise significant tools, providing better information processing, extraction of patterns, analysis and comparison of data for adverse event prediction and also patient profiling capabilities to improve subject's lives delaying hospitalization/institutionalization as much as possible. A systematic flow of information to and from the subject along with identification of adverse events and assessment reports leads to a more effective care system, socially and financially beneficial to the subjects. Remote monitoring of patients under a risk prediction framework is crucial for shifting the current paradigm from remedial to preventive action [7,[15][16][17][18][19].
Patient profiling as well as activity recognition in the context of AAL systems, opens new routes and possibilities, aiming at actively and measurably aiding subjects to cope with their activities of daily life (ADLs) and their physical and psychological status. Supplemental to the regimented physiological measurements, profiling the subject's mood changes, emotional reactions to stimuli, attitudes and conduct can be of great aid to attending doctors and family members. Information can be acquired and processed in order to not only detect abnormalities but identify behavioral trends in order to prevent both physical and physiological adverse events [17,[20][21]. The next step in patient profiling is behavioral and activity monitoring as parts of decision support implementations in AAL context [22].
Within this context, biometrics, as a source rich in profiling information related to the physiological and behavioral characteristics of a person, form the basis for a very promising technology that can further boost and enhance patient profiling.
The strength of biometric traits lies in four fundamental characteristics [23] :  Sufficient inter-person variability for distinctiveness  Short-term temporal invariability  Collectability/measurability and  Universality In this paper, we discuss a methodology for emotion recognition targeting applications in the framework of an ambient assisted living system following Service Oriented Architecture (SoA) principles, making use of extracted biometric facial features of the subjects. In the case of emotion recognition, the methodology makes use of the latter two characteristics of biometric traits, i.e the ease of collection and measurability and universality with the added benefit of the non-invasive nature of collection method. In our methodology, biometric information is not used to directly characterise the subject (explicit biometric profiling) but provide the background for extraction of regions of interest on a subjects' facial structure and the abstract generalisation of emotion as a geometric shape on a surface.

II. BIOMETRIC PROFILING IN AAL SYSTEMS
ICT applications are gaining more acceptance as solutions that can act complementary to the caregiver's work by actively providing information and continuously monitoring and assisting subjects with reduced cognitive (or even physical capacity). This is also augmented by the ease of installation in home environments of a multitude of sensors -from simple "binary" sensors, to temperature and lighting sensors [24][25] and from movement trackers on floors [26] to more complex sensors that provide audio and video recordings [27].
We use biometric profiling information as a part of the whole patient profile and assist in the detection and prediction of physical and psychological adverse events. In this context researchers usually make use of biometric technologies such as voice analysis, gesture and gait classification and emotion recognition. Emotion recognition is the task of processing a specific stream of data in order to classify the underlying emotional state. It is easier to use basic (or archetypal) emotions such as joy, sadness, anger, fear, disgust and contempt [28]. In general, emotion recognition systems are built on three distinct levels: acquisition, feature extraction and interpretation [29]. In our case, sensor technologies used in biometric data capturing are cameras which provide the advantage of being non-intrusive and cheap. Facial expressions as the basis of emotion recognition contain all the necessary information and are easily captured and analysed.
The design and implementation of monitoring systems to enable the self-management of neurodegenerative disease patients at home, impose many issues. The focus of this approach lies on prevention and prediction of adverse events through the early identification of risk factors. Thus, our methodology aims to be integrated into a risk assessment and detection component that processes patient data and if necessary, creates alerts for the clinicians. Such a component, can be based on an existing set of "Rules" and implemented to process both physical and psychological data as well as answers to cognitive and behavioral evaluation tests, to produce its warnings [30].
The main functionalities of such a component would include:  Generate warnings for the attending physicians for situations where the patient's physiological and/or psychological measurements indicate an alarming condition for the patient's health.
 Compile automatic reports for the attending physicians on a scheduled regular basis  Generate warnings for the subject's caregivers in case of emergencies  Present evidence for the produced warnings Generation of warnings is based on an easily reconfigurable rule-based schema , that attending clinicians can individualize according to each subject's personal needs, as seen in Figure 1.

III. FACIAL EXPRESSIONS AND EMOTION
Faces are essentially very effective systems for the transmission of multiple signals and multiple messages [31]. They transmit voluntary and involuntary messages regarding emotions, mood, character, age, sex, race etc. When humans respond to a stimulus, internal or external and feel something,

www.etasr.com Xefteris et al.: Behavioral Biometrics in Assisted Living: A Methodology for Emotion Recognition
their face responds accordingly. Muscles contract, wrinkles appear or disappear, mouths change shape, eyes open wide or close etc. Emotions as expressed by our faces are not static structures, but dynamic and short-lived. In this paper we concern ourselves with "emblematic" messages. The term "emblematic" is used to describe messages that have a clearly defined content, formed by specific and easily recognizable facial expressions [32]. Before venturing forth, we have to answer to 3 specific questions: a) which emotions can be expressed by the human face? b) is the initial recognition accurate? and c) can we use emotion as a behavioral biometric trait?
A. Which emotions can be expressed? Before proceeding to the technical aspect of automatic emotion recognition from the image of a human face we must set concrete foundations, based on psychology and anthropology centred on how emotion is pictured on the human face. Research on the expression of emotion begins even from the time -and person-of Charles Darwin [33].
Since then there has been a plethora of research and hundreds of experiments whose results have been described in the works of Dr. Paul Ekman [32,[34][35]. According to this research, the human face can express (in a meaningful way for observers) six basic and archetypal emotions [31]: joy, sadness, surprise, anger, disgust and fear. There are obviously more emotions in the expression range of the human face (such as shame and enthusiasm) but these have not shown to have the same constant qualities as the 6 archetypal ones [32,[36][37]. Researchers here faced the following issue: it is not enough to just simply discern human emotion inside an expression. It is essential to discover if the interpretation by the observers is accurate or not. In Ekman's experiments researchers who knew a priori the emotion expressed by the subject, categorized the answers of observers chosen from different cultural backgrounds, as right or wrong. The subjects ranged from healthy people who were shown a horror film/comedies etc to psychiatric hospital patients subjected to stress or relaxation experiences. The accuracy of the answers can be seen in Table  I. C. Can emotion be used as a biometric trait?
In order to categorise emotion as a soft biometric traitbecause it is obvious that emotion is a dynamic feature that cannot be used for identification or verification as other biometric traits-it has to fulfil the fundamental requirements of biometric characteristics, as they were earlier described. It is obvious that it is easily collectible, but it also needs to fulfil the universality requirement. Everybody expresses emotions through facial expressions but are emotional expressions themselves universal among people of all ages and cultural backgrounds? We have to establish here that the expressions of the six archetypal emotions are the same on people, irrelevant of cultural background, race, education, sex and age. Research has proven exactly that: the expression of emotions of the human face is the same for the six basic emotions, regardless of cultural background, age, sex and educational status. The only influence on recorded results was found when cultural background interfered on when it was appropriate to show or hide one's emotions. The same expressions indicated the same emotions whether the observers came from the U.S, Chile or Brazil [31,38]. In order to remove any possible objection, researchers travelled to the highlands of New Guinea to measure reactions and expressions of people totally unaffected by the western civilization and found the exact same results: The expression of emotion on the human face is universal across all cultures [31,35,[38][39][40][41]. Thus, it can be used as a biometric trait.

IV. METHODOLOGY
In this research the authors used, after signing the relevant user agreement that gave them the right to reproduce parts of the dataset with the relevant copyright notice, the Cohn-Kanade extended dataset of facial expressions [42,43] The methodology followed, based on the fact of complete emotion expression universality aimed at providing an algorithm that needs no individual training of specific users (although training can be used in a future implementation to further improve acquired results). Also the proposed methodology aims at being easily incorporated into a SoA based telemedicine system, based on reconfigurable rules [30], or as part of a multi-algorithmic pool in a biometrics system as the one proposed in [44].
The methodology follows the classic steps of biometric traits recognition: Collection, Extraction of features, Comparison and Classification as shown in Figure 2. The proposed methodology

A. Identification of Regions of Interest
The first concern in the preprocessing stage, was to crop out parts of the face that are completely irrelevant with emotion and could function as noise, such as the ears and hair, as well as totally irrelevant elements of the image itself (like timestamps). For this reason, we employed the well-known Viola-Jones face detection algorithm [45]. The Viola-Jones algorithm is a very fast an efficient solution on face detection, scale and location invariant, based on Haar-like features and the Adaboost training algorithm coupled with cascading classifiers. Moreover, it also performs extremely well in realtime applications so in a future implementation for real-time emotion recognition, it would not add delay to the methodology. The second concern is to identify which areas on the face are of interest in our case and try to separate them from the others. These regions are in order of significance: mouth, eyes, forehead and nose area.
Observing facial expressions, how they form on the human face and where the human eye focuses to perceive them, it is clear to see that where emotion appears, we have creases in the skin or facial features that define changes in luminosity. The "background" of the face can be considered as a canvas on which harsh lines appear when we express emotion. That same "background" is not needed in the processing phase and so we employ an edge detection algorithm to remove useless parts of the phase and at the same time significantly lessen the amount of data the algorithm uses. While the employment of edge detection may seem as a vulnerability of the algorithm, due to its intolerance in high luminosity environments, we keep in mind that the target application concerns the detection inside home environments where the average luminosity is kept at regular levels.
At this stage, we must segment the regions of interest in specific sub-clusters. For this purpose we first perform Voronoi tessellation [46][47] and its dual, Delaunay triangulation [47][48] on the edge detection generated matrix of points. With this transform, the separate points in 2D space that the edge detection provided, become an undirected graph, which both preserves the geometric properties of the structure and provides an easy input for the application of clustering algorithms. Moreover, having our input in the form of a graph, makes it easily processable by a multitude of algorithmic approaches.
To define the Voronoi tessellation of a set of points on a metric space, we let X be the space and d the metric. Also, let K be a set of pointers and K k k P  ) ( a tuple of non-empty subsets in space X. The Voronoi tessellation V i of (P i) is the set of points in X whose distance from (P k ) is less or equal from their distances of every other P j , with j a different set of pointers. Formally: The Voronoi tessellation diagram Vor(P) corresponding to the edge detection points in two-dimensional Euclidean space is the representation of a tuple . In our case, where the space is Euclidean and on it the regular distance metric l 2 , we have a set of non-overlapping points and each Voronoi tessellation is a convex polygon. With this step we have created a polygon model of the human face with low area polygons in areas of expression. At this point in our methodology the goal is to transform this set of polygons to an undirected graph representation. To facilitate this we employ the dual method of Voronoi tessellation, the Delaunay triangulation. We define the Delaunay triangulation of a Voronoi diagram of points P as follows : Let P= {p 1, p 2 ,….,p n } a set of points on a space Ε m and Vor(P) the Voronoi diagram of P, with V i as each tesselation. The Delaunay triangulation Del(P) of p is: The complex Del(P) includes the k-simplex The complex Del(P) is the Delaunay triangulation of the convex hull of P. In essence, the Delaunay triangulation creates an edge between two points in Voronoi tessellation P, if these

www.etasr.com Xefteris et al.: Behavioral Biometrics in Assisted Living: A Methodology for Emotion Recognition
two points have a common edge among their polygons. With the Delaunay triangulation we have now available an undirected graph representation of the expression of the human face, on which we can apply a graph clustering algorithm. In order to facilitate input for the neural network classification, we must first separate these clusters one by one and lessen the total number of the points that define them. As we have sets of vertexes, connected via edges in 2D space, and the basic criterion of definition of a cluster is how close these edges are, the simplest way of defining separate point clusters is the use of a density based scanning algorithm such as DBSCAN [49]. This algorithm, DBSCAN, was selected instead of other clustering algorithms due to the inherent properties of the clusters:  there is not a priori knowledge of the number of appearing clusters so K-means clustering cannot be applied.
 Clusters are not organized in regular shapes and in some cases, one cluster can encompass another. Given the geometric nature of facial expressions, we should be aware of the single link effect (two clusters linked by a thin line of points) and negate it so there are no overlapping.
 We need an algorithm that is resilient to noise and outlier points. DBSCAN fulfils these requirements [50] with good results.
DBSCAN parses an input of a set of D points in the two dimensional Euclidean space and works on the assumption that each cluster around a point fulfils the requirement that there is a specific minimum number of points inside a given radius around that point. In this case, the normal Euclidean distance was used as a metric. The algorithm discerns between core and border points, removes outliers and noise and performs well regardless of the sequence of visiting points. Fourier shape descriptors [51,52] on the selected clusters are then used to reduce the number of vertexes needed for the definition of a specific shape, normalizing it. This makes it easier for pattern recognition and classification since it reduces the total number of input points needed and thus provides a specific number of input points every time.

B. Classification
Having defined the regions of interest and extracted the geometric features of the facial expression, we then need to classify the data to identify facial expressions. It is apparent from the previous sections that the geometric shapes of the regions of interest provide irregular shapes and thus a nonlinear classification scheme should be adopted to improve classification accuracy. In this case we make use of a back propagating neural network for irregular shape classification [53]. The neural network classifies shapes according to their convex hulls. Since forehead and nose wrinkles are widely irregular shapes and cannot be described by a convex hull, we indicate their presence with 1 for yes's and 0 for no's. Shape of the eyes and the mouth are described by the respective significant Fourier descriptors. We have 7 different classes (happiness, sadness, fear, anger, disgust, surprise and neutral) therefore in the output layer we need 4 nodes.
To demonstrate the effectiveness of the methodology, an experiment with practical examples was performed. The Cohn-Kanade+ database contains 593 sequences from 123 different people. This subject pool was divided according to emotions depicted among different faces and part of it was used for training, part for verification. The number of available samples is shown in Table II. C. Problem Formulation Let us assume in the following that some features have been extracted from a human face analysis and included in a feature vector say i f , where index i corresponds to the ith human individual. Let us also assume that p humans' emotions are recognized by the proposed ERI computing architecture and thus each vector i f is classified to one of the p available classes ω j , j=1,2,…,p, i.e., emotions. As it can be shown in several publications, six emotions are basic; the happiness, sadness, fear, anger, surprise, and disgust [31]. It is not in doubt that these basic emotions are key points of reference and can be described as 'archetypal emotions', which reflects the fact that they are undeniably the obvious examples of an emotion state. New studies attempt to define a set of terms covering a wider range of emotions without becoming unmanageable. Let us denote in the following as  w , of (2), so

D. Adaptable Emotionally Rich Architecture
Although the general expression of archetypal emotions is the same, we nevertheless have to take into account what would be considered as noise: More specifically, since we are targeting an elderly user-base, the most important noise inducing factor would be the appearance of wrinkles. Moreover, we should take into account also micro-changes in the expressions, owed to individual quirks. Thus, direct application of the model of (2), in which the model parameters are considered constant, would not work well, since the model cannot be adapted to the slight changes in the way that humans use to express their emotions. To overcome this difficulty, we equip the proposed architecture with an adaptable mechanism, as seen in Σφάλμα! Το αρχείο προέλευσης της αναφοράς δεν βρέθηκε. Figure 3, which can update the system response with respect to the current way that the individual use to express their emotions. In this way, a new set of parameters, say a w , should be created for each emotional state, which are capable of updating the response of the ERI architecture to the current actual way that the users express their emotions. To perform the adaptation, a new set of representative samples, say

E. Optimal Estimation of the Model Parameters
In this section, we describe a novel, fast and reliable algorithm for estimating the new model parameters a w .
Initially, we assume that a small perturbation of the model parameters before the adaptation is enough to achieve good classification performance. Then, where w  is a small incremental vector. This assumption leads to an analytical and tractable solution for estimating a w , since it permits linearization of functional components ) (  l of (2, 5) using a first order Taylor series expansion.
It can be shown [54] that linearization of (5) with respect to the weight increments w  is equivalent to a set of linear where vector c and matrix A are appropriately expressed in terms of the previous model parameters w . In particular, (8) indicates the difference between model outputs after and before the adaptation for all input vectors in c S .

www.etasr.com Xefteris et al.: Behavioral Biometrics in Assisted Living: A Methodology for Emotion Recognition
The size of vector c is smaller than the number of unknown weights w  , since in general a small number M, of current data are available. Thus, many solutions exist for (7), since the number of unknowns is much greater than the respective number of equations. Uniqueness, however, is imposed by an additional requirement, which takes into consideration the previous network knowledge. Among all possible solutions that satisfy (7), the one which causes a minimal degradation of the previous model knowledge is selected as the most appropriate. This is expressed by equation (4).
It can be shown [54] that (4) takes the form of (9) where the elements of matrix K are expressed in terms of the previous network weights w and the data in the set S . Thus, the problem results in the minimization of (9) subject to constraints of (7).
The error function defined by (10) is convex since it is of squared type, while the constraints of (11) are linear equalities. Thus, the solution should lie on the hyper-surface defined by (11) and simultaneously minimize the error function given in (10). The gradient projection method is used in this paper to solve this problem. The gradient projection method starts from a feasible point and moves in a direction, which decreases E and simultaneously satisfies the constraints; a point is called feasible, if it satisfies all constraints. The weights are adapted as follows: where n is the iteration index and ) (n  is a scalar that determines the rate of convergence. Using the methodology of Σφάλμα! Το αρχείο προέλευσης της αναφοράς δεν βρέθηκε. we can estimate vector using E  computed from (13). The computational complexity required to independently update each network weight is proportional to the number of network weights.

F. Computational Complexity
The computational complexity of the adaptation algorithm, in contradiction to the generally long training periods of the initial model parameters, is very small. In particular, the adaptation process is separated in two phases; the initialization phase where matrix Q, which projects the negative gradient of function E onto the surface defined by the constraints of (11) (13) requires few msec to be executed (~10 ms on a core i7 PC). The number of iterations n that the gradient projection method requires to derive the optimal solution is in general small (8)(9)(10)(11)(12) iterations are usually sufficient to obtain a weight increment close to the optimal one). Thus, the computational cost of the iteration phase is about half a sec.
In the initialization phase, matrix Q is estimated as a product of matrix P and matrices requires about few seconds for a typical network size of 500 weights. Thus, the total cost of retraining is small (of order of few seconds), allowing the efficient use of the proposed scheme to real-life interactive multimedia systems.
After the classification, results of the process can be stored in xml format as seen in Figure 4Σφάλμα! Το αρχείο προέλευσης της αναφοράς δεν βρέθηκε.. For the evaluation of the classification we used ten-fold cross validation. Available data are divided into ten non-overlapping sets of roughly equal size, and nine of them are used for training, one for validation. The classification accuracy of the BPN ranges from ~47% to 98,4%. Lowest success rates where noted in the expressions of anger and fear, where facial expressions are of similar geometric attributes, while sadness, happiness and surprises were met with the highest success rates. The implementation of the proposed methodology was specifically designed to be flexible and scalable so as to fit into telemedicine based AAL systems. Providing a ready XSD

www.etasr.com Xefteris et al.: Behavioral Biometrics in Assisted Living: A Methodology for Emotion Recognition
schema, the results of the methodology can be easily integrated into a reconfigurable rule based risk assessment module, as part of a telemedicine system [30]. The results from this methodology are promising and useful, especially in the aforementioned context, and can be used to evaluate -and even predict-the overall health status of subjects suffering from neurodegenerative diseases. Depressive symptoms are widely present in people with the onset of MCI and can even further aid at the deterioration of their mental and cognitive status [55][56][57][58]. It is thus crucial for the attending clinician to be able to monitor not only the physical but also the psychological status of subjects.
The proposed algorithm, although having some shortcomings, such as lack of real-time properties and strong illumination sensitivity, is a first step towards the full emotional status estimation and prediction of adverse events in the framework of an internet-based AAL system for people with MCI or other neurodegenerative disease. These disadvantages though, are not deemed deteriorating for the overall efficacy of the methodology: The methodology is able to grab image frames from a web-camera at regular intervals (several times a minute), which can be adjusted as deemed necessary and provide the relevant results to a rule-based implementation of risk detection [30]. Moreover, since this methodology is aimed at home based AAL systems, it is considered a given that subjects are lit by a home's natural lighting which in everyday life activities are not usually extreme so they don't induce significant errors in the process.
Finally, the proposed methodology is ready to be incorporated into a metadata-aware biometric architecture as the one proposed in [44] specifically targeted in ambient assisted living systems. Future improved implementations of the methodology aim to encompass real time capabilities, especially based on Constrained Local Models [59][60] and deep learning networks for better hierarchical multimodal classification of the subject's emotional status [61][62][63].