Denoising diffusion models (DDMs) are becoming promising in intelligent fault diagnosis of train gearboxes, because they are able to generate fault samples to alleviate the difficulty from measurement data scarcity. However, existing approaches often fail to fully exploit both available data and prior working condition information, thereby limiting their capacity to generate condition-specific samples under unseen conditions. To address this limitation, this paper proposes a working condition-embedded sample augmentation method for train gearboxes. First, a DDM is adopted as the foundation, where the forward noise corruption process is predefined and the reverse generation process is learned from data. Then, an enhanced U-Net integrated with an attention mechanism and a working condition encoder is designed to guide the sample generation based on condition-specific information in the denoising phase. Finally, multi-frequency band convolution blocks are introduced to extract comprehensive features across multiple time–frequency pathways, enabling more effective representation learning. Ablation and comparison experiments are conducted using fault simulation data of subway train gearboxes, and experimental results show the effectiveness and advantages of the proposed method in generating more plentiful samples and improving fault accuracy.