Deep learning-based fault diagnosis in urban rail transit systems faces challenges of data scarcity and privacy constraints across metro operators. While federated learning alleviates data silos, existing FL models lack interpretability for engineering implementation. To address these issues, a federated causal discovery-based graph isomorphic network method is proposed to enhance the data privacy and model interpretability in metro air production unit (APU) fault diagnosis.Firstly, a federated directed acyclic graph structure learning algorithm is used to discover the causality among APU signals held by each client. It is a server-client architecture with K clients. Through a continuous optimization under an acyclicity constraint, it can output a DAG encoding causality among data variables. Subsequently, a federated graph isomorphic network uses the learned causal graphs to train models through averaging strategy. Based on federated averaging algorithm, K same models are deployed in each client locally for co-training. Model parameters are updated during several communication rounds between server and clients.After several iterations, under the premise of protecting the privacy of multi-party data, a model suitable for all clients is finally obtained.Experimental validation on MetroPT dataset with three fault types and one normal type data demonstrate our approaches’ effectiveness. Qualitative and quantitative analysis show the estimated causal graph achieves 80% structural similarity with ground truth causal graph. For fault diagnosis, the method achieves average metrics for 5 clients of 98.47% accuracy under balanced data and 97.64% F1-score with imbalanced distributions, outperforming all benchmarks. Interpretability analysis also identifies key nodes in graphs. The results indicate that the key variables of three faults are all in line with the actual operational mechanisms of APU, strongly proving our method’s interpretability. In this work, a novel method is proposed for metro systems due to the engineering practical requirements of data privacy and model transparency. Experimental results illustrate the superior performance across data distribution scenarios while assists in facilitating maintenance decisions. Future research can focus on more advanced aggregation strategies and dynamic system modeling.