Growing threats in public spaces have forced people to question personal security, making technology more relevant, especially in speech recognition. This paper proposes a security safety system by considering keyword and negative emotion detection to solve this problem. It detects the wake-up word "ON" whenever it is spoken with negative emotion. Our essential contribution is two-fold: first detecting the presence of the wake-up keyword 'ON' in the speech using a Convolutional Neural Network (CNN) model, and second, detecting negative emotion in the speech through a Long Short-Term Memory (LSTM) Model. In this paper, we proposed combining the models above, catering to the same problem statement. From the suggested methodology, the CNN-based keyword detection model achieves 97.23% accuracy for the safety-related ‘ON’ keyword, placing it only slightly above comparable works, while the LSTM-based negative emotion recognition registers 88.94% accuracy, trailing advanced architectures from recent developments. The dataset curation, different methodologies implemented, and system pipeline are some of the building blocks discussed further. The paper also compares feature extraction techniques such as MEL Frequency Cepstral Coefficients (MFCC), Linear Prediction Cepstral Coefficients (LPCC), CHROMA, and MEL. Moreover, as speech recognition applications with more than one model are becoming increasingly popular, this analysis would help develop applications that require a similar end-to-end construct. © 2025 Elsevier B.V., All rights reserved.