Webshells, malicious scripts, or code snippets have seen a dramatic rise in incidents, posing significant threats to organizations across various sectors. Traditional security measures often fail to detect these threats, necessitating the use of advanced detection mechanisms. This article proposes a deep learning-based technique for webshell detection, which addresses the challenges of high computational costs and sensitivity to input length variations. The proposed method uses a flexible dataset reduction approach in conjunction with two feature extraction techniques, TF-IDF and Word2Vec, to mitigate computational complexity and standardize model input. To address input variability and high-dimensionality, we introduce two dataset reduction strategies: Flat-based and Depth-based reduction, both of which rely on a standard deviation-based representation to preserve essential statistical characteristics while reducing dataset size. This combination enhances the performance and scalability of deep learning models, making them more feasible for practical applications in webshell detection. The study systematically reviews existing techniques, highlights limitations, and presents an innovative solution to improve detection accuracy and efficiency. Experimental results demonstrate that our approach achieves high accuracy (up to 98.50% using CNN) while significantly reducing training time. The findings validate that flexible dataset reduction combined with dual feature extraction offers a scalable and effective solution for real-time webshell detection. © 2025 Elsevier B.V., All rights reserved.