Accurate prediction of structural response under earthquake is of great significance for structural damage and performance evaluation. In order to improve the efficiency of structure time history response prediction, this paper proposes a novel SeisFormer model based on the self-attention mechanism and deep learning technology. Through autoregressive prediction, the SeisFormer can achieve real-time prediction of the response time histories of a large number of nodes in the structure under seismic action and can effectively solve the problem of data scarcity. Four case studies are performed to verify the accuracy and efficiency of the proposed methodology, including validation on datasets obtained from elastoplastic seismic analysis of a single-story structure, a three-story structure, and an eleven-story structure, and measured data of a shaking table test model. In addition, this paper further studies the prediction accuracy of the SeisFormer through ablation experiments and comparative experiments. The experimental results show that the SeisFormer can accurately predict the acceleration, velocity, and displacement time histories of numerous nodes in the structure. The prediction accuracy outperforms the LSTM model, and the prediction speed is 193–109,824 times faster than finite element method. Furthermore, with data augmentation through autoregressive prediction, the SeisFormer model can achieve efficient and accurate predictions when training data is exceptionally scarce, enabling engineering applications.