Nowadays, with the large-scale grid connection of clean energy, the safe and reliable operation of large generator sets such as wind power and hydropower is of great significance to the stability of the power grid. Aiming at the limitation of traditional generator set vibration signal fault diagnosis, with the raw data such as electric signal, temperature and working condition in the sensor,this paper proposes a fault classification method based on multi-source wide-area data feature extraction. Firstly, due to the information redundancy and the submergence of original feature space, a novel manifold learning method (modified LGPCA) is introduced to realize the low-dimensional representations for high-dimensional feature space. Based on this, a fault classification model of generator set based on random forest is established. Finally, the model is verified by the actual fault case of the power station to improve the efficiency and accuracy of the fault classification.