This paper proposes a hybrid CNN-Transformer architecture-based approach for unmanned aerial vehicle (UAV) fault classification, aiming to achieve efficient and real-time fault classification through time-series data. Addressing the limitations of traditional convolutional neural networks (CNNs) in capturing global dependencies and their deployment on edge devices, we design a model that integrates an attention mechanism. This model leverages CNNs to extract local features, employs a Transformer encoder to model long-term dependencies, and incorporates a lightweight design to optimize inference latency. Experiments are conducted using the "RflyMAD" dataset. The results demonstrate that the proposed model achieves a classification accuracy of 98.74% on the test set, a macro-average F1-score of 97.89%, and an average inference latency of only 1.82 milliseconds, significantly outperforming conventional methods. This approach exhibits notable advantages in both performance improvement and realtime capability, offering reliable support for the safe operation of UAVs.