Traffic accidents frequently lead to severe injuries and substantial economic losses, making timely and accurate predictions crucial for improving public safety and reducing financial impacts. However, traditional traffic accident prediction methods lack robust spatial feature modeling, which makes it difficult to effectively respond to complex road conditions and the evaluation of accident probability. Therefore, we propose a two-stream network architecture called STGEN, which integrates focused temporal self-attention with spatial feature transfer. Firstly, a novel road design model is proposed to address the challenges of evaluating complex road conditions based on historical data. This model replaces the traditional grid-based path coverage method by incorporating detailed spatial features, such as road geometry, and consolidating road network information that extends beyond multi-hop neighborhoods. Secondly, the concept of time entropy is introduced and the attention mechanism is used to encode time to further improve the model’s ability to predict traffic accidents in the time dimension. Simultaneously, to enhance the representation of geographic spatial features, a weighted vector graph based on cosine similarity is proposed, which is integrated with a graph structure to strengthen spatio-temporal associations and accurately capture complex spatio-temporal dependencies. Comprehensive simulations conducted on real-world datasets demonstrate the effectiveness and scalability of STGEN. © 2025 Elsevier B.V., All rights reserved.