A domain generalization network for imbalanced machinery fault diagnosis | Scientific Reports

Scientific Reports volume 14, Article number: 25447 (2024) Cite this article

Metrics details

Traditional models for Imbalanced Fault Diagnosis (IFD) face challenges in practical applications due to domain shifts caused by varying working conditions and machinery. Domain Generalization (DG) models provide an advantage over traditional approaches by learning class-discriminative and domain-invariant feature representations, allowing them to generalize to unseen target data. However, the scarcity of fault samples relative to healthy ones limits their application in real-world industrial scenarios. In this paper, we propose a Domain Mixed-Enhanced Domain Generalization Network (DEMDGN) that enhances IFD performance by utilizing mixup-based data augmentation and domain-based discrepancy metrics to align feature distributions across multiple heterogeneous source domains. By creating domain-invariant features, DEMDGN allows robust fault diagnosis under varying conditions. Extensive experiments on one marine machinery dataset and two bearing datasets demonstrate that the proposed method effectively addresses class imbalance and domain shift problems, achieving superior diagnostic performance.

Maritime shipping serves as the foundation of the contemporary global economy, representing approximately 90% of global trade1. Diesel engine is a typical multi-system strongly coupled complex equipment, it is the combustion system fuel injection system supercharging system control system lubrication system cooling system crank linkage mechanism and many parts of the organic whole, the parts and sub-systems are closely linked and cooperate with each other to ensure the safe, reliable and efficient work of diesel engine. The performance of these systems directly influences global commerce and marine environmental safety2,3. With the advancement of information and sensor technologies, data-driven fault diagnosis methods have emerged as critical technologies for enhancing the reliability of mechanical systems4,5,6,7.

Xu et al.8, proposes a thermography-based electric motor fault detection method using an InceptionV3 model enhanced with CLAHE and SE channel attention mechanism, achieving 98.82-100% accuracy, precision, recall, and F1 scores when combined with an SVM classifier. Wang et al.9, introduces a novel bearing fault detection method using graph neural networks and ensemble learning, transforming data into graphs and improving diagnostic accuracy with a new outlier detection strategy. Wang et al.10, proposes a new hydraulic fault diagnosis approach using 2D temporal modeling and attention mechanisms to decouple compound faults and extract features from multisample rate sensor data, achieving over 97% diagnostic accuracy and robust performance under noisy conditions. Tao et al.11, propose a fault diagnosis method for track circuits using a multi-scale attention network and Gramian Angular Field (GAF) to transform time series into images, achieving 99.36% accuracy and outperforming classical and state-of-the-art models through effective feature fusion and spatial attention mechanisms. Qian et al.12, uses a convolutional neural network to diagnose spindle bending and spindle crack faults in rotating machinery, achieving 100% classification accuracy. Li et al.13, present an end-to-end fault diagnosis method for asynchronous motors using IInception-CBAM-IBiGRU, achieving nearly 100% accuracy and robust performance in noisy environments. Guan et al.14, propose a transformer fault diagnosis method using ACGAN and CGWO-LSSVM to address misjudgment and low accuracy due to small and uneven fault sample distributions, achieving a diagnostic accuracy of 97.66% and outperforming other methods. Cheng et al.15, propose a gearbox fault diagnosis method using a lightweight channel attention mechanism and transfer learning for accurate classification under varying conditions. Chen et al.16, enhances rolling bearing fault diagnosis by optimizing a BP neural network with an improved genetic algorithm, achieving superior accuracy, convergence speed, and predictive performance compared to seven other models. Yang et al.17, propose an ensemble complex spatio-temporal attention network (ECSAN) for accurate fault identification in complex systems under varying conditions, achieving superior performance in experimental validation.

Despite the success of existing data-driven fault diagnosis methods18,19,20,21,22,23,24, two significant challenges remain:

The imbalanced fault data often arises from the diverse conditions of data acquisition and the complex, variable operating environment of ships. This imbalance can lead diagnostic models to favor majority classes, ignoring critical but less frequent fault types, thus greatly reducing fault prediction accuracy and practical utility.

In practical applications, training data (source domain) is typically collected under specific conditions, such as particular operational environments, equipment configurations, or load conditions. However, these conditions may change in real-world applications, causing the data distribution of the actual operational environment (target domain) to differ from the training data. This discrepancy, known as domain shift, can severely affect model performance.

Recent research has made some progress in addressing the first challenge.

Tan et al.25, introduce a deep adversarial learning system for fault diagnosis in Fused Deposition Modeling (FDM), addressing domain-shifting issues and process parameter drift using enhanced datasets and domain adversarial neural networks. Karan Kumar et al.26, present the Constant-Q Non-stationary Gabor Transform combined with an enhanced Inception ResNet-V2 model for early-stage classification of ball bearing faults in induction motors, achieving superior performance and a 99.84% average classification accuracy under various load conditions compared to Inception-V4 and ResNet-50. Prashanth et al.27, employ transfer learning with scalogram images from vibration signals to classify faults in monoblock centrifugal pumps, achieving a perfect classification accuracy of 100% using the AlexNet model, thereby ensuring reliable and uninterrupted pump operation across various applications. Shang et al.28, introduce a Hybrid Semantic Attribute-based Zero-Shot Learning Model (HSAZLM) to improve fault diagnosis in unknown working conditions, combining manually defined and non-semantic attributes using a denoising residual convolutional autoencoder. Experiments on public and self-built bearing datasets show the model’s superior performance, achieving average accuracy rates of 96.82% and 96.42%.Yap et al.29, propose the Segmentive Cosine Weighted K-nearest neighbors (SCWK) model to enhance fault detection and diagnosis in nuclear power plants, achieving an overall accuracy of 94.8% and demonstrating robustness across various conditions, thereby offering a practical solution to improve the reliability and safety of NPP operations. Zhang et al.30, introduce a novel double-level data fusion (DLDF) model for bearing fault diagnosis, which combines multi-channel vibration data using information entropy and a multi-modal image fusion strategy to enhance feature extraction, achieving reliable fault diagnosis under time-varying speed conditions. Dai et al.31, employs Transfer Learning and Contrastive Learning to enhance Structural Damage Detection by applying knowledge from rotating machines to framed structures, achieving superior performance compared to traditional models. Shi et al.32, propose a graph-embedded for diagnosing imbalanced fault data in rotating machinery, demonstrating superior feature representation and handling of imbalanced data compared to existing methods. Liao et al.33, introduce a diagnostic algorithm utilizing feature filtering and mapping to address class imbalance and false positives in online tuning for gas path fault diagnosis, with validation results demonstrating superiority over benchmark methods. Li et al.34, present a combined approach to address variable speed and data imbalance issues, effectively enhancing the accuracy of rolling bearing fault diagnosis, with its superiority validated across multiple datasets.

For the second challenge, techniques for domain adaptation (DA) can utilize knowledge from labeled data in the source domain to identify the health status of unlabeled data in the target domain. This helps to reduce performance degradation caused by changes in domains. Wang et al.35, proposed a digital model for three-cylinder pump fault diagnosis using Simscape simulations and transfer learning, outperforming advanced methods. Lu et al.36, proposed a novel sample selection method to address general unsupervised domain adaptation (UUDA). Their approach leverages outlier threshold learning, domain-invariant sampling, and adversarial classifier training to manage discrepancies in domain and label space. Liang et al.37, combined wavelet transform with an improved DA network for semi-supervised rolling bearing fault diagnosis, achieving high accuracy under varied conditions. Yu et al.38, developed a source-free DA method using multi-receptive field graph convolution and contrastive loss, achieving good performance even with partial information. Zhang et al.39, developed a digital twin-driven approach for rolling bearing fault diagnosis, which utilizes virtual representations and a Transformer-based network. Their method achieved high performance despite having limited real data available for training. Yu et al.40,proposed a progressive adaptation method guided by Extreme Value Theory, improving open-set domain adaptation tasks. Bao et al.41, proposes an improved sparse autoencoder-based multi-layer adaptation model (ISAE-DDM) with a dual-domain distance mechanism to enhance bearing fault diagnosis under varying operating conditions, demonstrating superior stability and generalization through two bearing experiments. However, these methods are not well-suited for new or unseen data domains, limiting their application in dynamic and variable maritime environments.

DA-based diagnostic methods have demonstrated the feasibility of improving diagnostic performance through knowledge transfer. However, one outstanding feature of these DA-based methods is that during training, they need access to a complete dataset containing samples of all failure types for the target domain (labeled or unlabeled). Such a premise is nearly impossible to achieve in many real-world diagnostic scenarios, and usually, only healthy samples are available. In the fault diagnosis of ship machinery, on the one hand, the operating conditions of the machinery are myriad, and it is impossible to collect all kinds of fault samples under all possible working conditions; On the other hand, for cross-domain fault diagnosis, many in-service machines are not allowed to work continuously under fault conditions, so the fault data that can be obtained is very scarce. Therefore, in this problem setting, there is no complete target domain data to support and guide the adaptation process, and the diagnostic performance of previous DA-based methods will be significantly reduced. In addition, the above limitations also hinder the application of most DA-based methods in cross-domain troubleshooting tasks to a certain extent. To solve the above dilemma, a feasible strategy is to apply the diagnostic knowledge learned only from the source domain directly to the target domain. However, due to the potential differences between different domains, the diagnostic knowledge learned from a single source domain may have weak generalization ability, and thus perform poorly in the target task. For example, potential differences between two datasets (collected from different machines) may be caused by system architecture, type of machinery, operating conditions, sampling frequency, or transmission path of vibration. Therefore, in order to achieve effective generalization in this scenario, it is necessary to make full use of multiple relevant source domains to train the model and make the diagnostic knowledge universal. In summary, we need to introduce the theory of Domain Generalization (DG), which is related to DA, because improving the diagnostic performance on the target domain is the basic goal. The difference is that DG deals with more challenging setups, and it relaxes the strict assumptions in the target domain, that is, a model trained on multiple source domains should perform well on a new target domain with different data distributions, and the target data is not required during training, which means that the model should be robust enough to work out of the box and perform well in an “invisible domain”. In domain adaptation learning, Maximum Mean Discrepancy (MMD) is one of the most prominent feature-based applications, as it reduces the discrepancy in marginal distributions. Building on this, researchers have proposed Deep Domain Confusion (DDC)42 and Deep Adaptation Network (DAN)43, both of which have achieved good results on image classification datasets. Ganin and Lempitsky44 introduced Domain-Adversarial Neural Networks (DANN), an adversarial transfer learning method that produced better results on the Office dataset. Sun and Saenko45 proposed Deep CORAL, which better aligns sample distances. More recently, Yan et al.46 introduced Weighted and Class-Specific MMD (WMMD and CMMD) to reduce conditional distribution discrepancies and achieve higher accuracy. Jia et al.47 proposed a joint distribution adaptation method that showed promising results on several datasets, but this method is not applicable to domain generalization models.

Unlike DA, the goal of domain generalization (DG) is to create diagnostic models that can achieve high performance in new domains even without data from the specific target domain. Integrating advanced DG techniques into deep neural networks to form DG-based fault diagnosis (DGFD) methods has become a trend to address domain shift issues in fault diagnosis. Han et al.48, proposes a domain generalization-based hybrid diagnosis network to enhance the generalization capability of machinery fault diagnosis methods for unseen working conditions by incorporating intrinsic triplet loss and extrinsic adversarial training, demonstrating its effectiveness through cross-domain experiments on planetary gearboxes. Wang et al.49, introduced MSG-ACN, a single-domain generalization model for scenarios with one operating condition. Wang, et al.50, proposed a network for fault diagnosis without target data, using domain-specific classifiers and a convolutional autoencoder, demonstrating effectiveness on three datasets. Shi et al.51, developed a domain generalization method for unseen conditions, enhancing feature diversity and robustness, validated on bearing and gearbox datasets. Ma et al.52, proposed a gradient alignment domain generalization method with mutual teaching, ensuring domain consistency and validated through experiments. Li et al.53, introduced cross-domain augmentation with adversarial training, improving model accuracy in unseen conditions. Cheng et al.54, addresses the challenge of data distribution shift in fault diagnosis by proposing a three-stage domain generalization method (LOODG) based on a structural causal model and a new Obj-DI rule, achieving superior results without domain labels across multiple devices and operating conditions. Pang et al.55, introduces a fault-aware domain generalization framework for bearing fault diagnosis using an analytical simulation model, ACS-MSE spectral background estimation, and a fault-aware autoencoder, achieving high accuracy without real fault data by extracting generalized fault features across different resonance frequencies. Gong et al.56, proposes a simulation data-driven method and a Domain Generalization network named Adversarial Domain-Invariant Feature Exploration (ADIFEX) for cross-device fault diagnosis, which effectively extracts and generalizes domain-invariant features from simulation and actual data, achieving superior performance in unseen target domains compared to other methods.

In the field of ship fault diagnosis, unbalanced sample problem is an important problem that hinders the improvement of diagnostic performance. Due to the scarcity of fault samples and the high cost of obtaining them, it is often difficult to train fault diagnosis models with insufficient data. This data imbalance limits the learning ability of the model, resulting in inaccurate and unstable diagnostic results. In addition, due to the non-deterministic conditions for the occurrence of ship machinery faults, high suddenness, and the inability of equipment to operate for a long time in the failure or failure stage, it is generally difficult to obtain fault data, especially labeled data, and the distribution of data sets is also very different due to changes in working conditions. The premise of the effectiveness of most data-driven mechanical fault diagnosis techniques is the availability of a large amount of labeled data, so when it is difficult to obtain a large amount of data for training, it is difficult to accurately determine the fault label. In addition, only when the data distribution between different data sets is similar, the diagnostic model can play a better performance, so the lack of data and fault labels has become an important problem restricting the development of fault diagnosis.

To the best of our knowledge, there is currently only one study addressing the domain generalization problem in mechanical fault diagnosis when dealing with data imbalance57. This particular situation, known as imbalance domain generalization fault diagnosis (IDGFD), arises when the number of samples in the normal class significantly outweighs those in the fault class within each source domain, as illustrated in Fig. 1. In IDGFD scenarios, distribution changes due to class imbalance present obstacles for diagnostic models attempting to learn robust and discriminative features. When utilizing supervised learning on source domains with imbalanced class distribution, predictive bias may occur, causing models to favor samples from the normal state and complicating accurate classification of fault instances. These challenges diminish the capability of existing DGFD methods to generalize effectively onto unseen target domains.

Illustration of (a) DGFD and (b) IDGFD.

In order to tackle this issue, the present paper presents a DEMDGN network designed for fault diagnosis across different domains when faced with unobserved operating conditions and data imbalance. The proposed network’s structure is illustrated in Fig. 2.

The study’s key findings and impacts can be outlined as follows:

In order to deal with the inadequate representation of minority class samples, a novel hybrid strategy has been developed to address the imbalance in class distribution. Additionally, conditional MMD is employed to enhance feature consistency in the latent space through comparison of the original and augmented distributions.

A domain-based dissimilarity metric is created to minimize variationswhile distinguishing data from the same domain. This metric reduces differences between data from various domains, enhancing model generalization.

Extensive experiments on marine diesel engine simulator datasets and public datasets confirm the effectiveness and practicality of the proposed DEMDGN method in engineering applications.

Consider K source domains denoted as\(\left\{ {D_{k}^{s}} \right\}_{{k=1}}^{K}\), where the k source domain \(D_{k}^{s}\) contains \(n_{k}^{s}\) labeled samples \(X_{k}^{s}=\left\{ {\left( {x_{{k,i}}^{s},y_{{k,i}}^{s}} \right)} \right\}_{{i=1}}^{{n_{k}^{s}}}\), \(x_{{k,i}}^{s}\) being the collected sample and \(y_{{k,i}}^{s} \in {{\mathbb{R}}^C}\) its corresponding label. Let C represent the number of machinery health states. Let the unknown target domain be denoted as \({D^t}\), which contains \({n^t}\) samples \({X^t}=\left\{ {\left( {x_{j}^{t},y_{j}^{t}} \right)_{{j=1}}^{{{n_t}}}} \right\}\). As source data and target data are typically obtained under different operational conditions or machines, their distributions vary, i.e., \(P\left( {X_{1}^{s}} \right) \ne P\left( {X_{2}^{s}} \right) \ne \cdot \cdot \cdot \ne P\left( {X_{K}^{s}} \right) \ne P\left( {{X^t}} \right)\). This study presumes that fault modes remain consistent across various domains, i.e., \(Y_{1}^{s}=Y_{2}^{s}= \cdots =Y_{K}^{s}={Y^t}=\left\{ {{y_1}={y_2}, \cdots {y_C}} \right\}\), where \({y_1}\) represents the normal class and \({y_2}, \cdots {y_c}\) represent fault classes. Considering that signals in healthy states are easier to collect than in fault conditions, the quantity of healthy state samples considerably surpasses that of fault mode samples, i.e., \({N_{{y_1}}} \gg {N_i},i={y_2}, \cdots {y_c}\).

Given a feature generator \(G\left( { \bullet :{\theta _g}} \right):\mathcal{X} \to \mathcal{L}\) that converts input data into a latent feature space \(\mathcal{L}\) and a classifier network \(F\left( { \bullet :{\theta _f}} \right):\mathcal{L} \to {{\mathbb{R}}^C}\) that identifies the health states of the machinery, imbalance domain generalization aims to reduce the target classification risk \({{\mathbb{E}}_{\left( {x,y} \right){D_t}}}[F\left( {G\left( x \right)} \right) \ne y]\) by constructing a class-unbiased domain-invariant model.

In the context of IDGFD, the main challenge in creating broadly applicable diagnostic models is the imbalance in class data scale. One potential solution is to generate additional reliable samples from minority classes in order to achieve a more balanced distribution. The use of Mixup, a robust method for augmenting data, can help create diverse datasets that improve model generalization. Zhang et al.29, proposed a technique called mixup, which involves blending two observations together in a linear manner to create smoother decision boundaries. This method aims to reduce overfitting and the impact of specific training data by combining instances \({x}_{i}\) and \({x}_{j}\) with their corresponding one-hot encoded class labels \({y}_{i}\) and \({y}_{j}\) randomly selected from the training data. The mixup distribution is subsequently defined based on this combination.

Where\(\lambda \sim Beta\left( {\alpha ,\alpha } \right)\) for \(\alpha \in \left( {0,\infty } \right)\) and \(\delta \left( {\widetilde {x},\widetilde {y}} \right)\)is the \(\left( {\widetilde {x},\widetilde {y}} \right)\)Dirac mass at the center. Flexible hyperparameters \(\alpha\) adjust the intensity of interpolation between data pairs.

Traditional models for Imbalanced Fault Diagnosis (IFD) struggle in practical engineering due to the domain shift problem caused by varying working conditions and machinery. The main advantage of DG based models over traditional ones lies in their ability to tackle domain shift issues. DG-based IFD models learn feature representations that are class-discriminative and robust to domain shifts, enabling these models to generalize to unseen target data. In other words, the key technology in DG-based IFD models is extracting invariant feature representations across domains from labeled data across multiple heterogeneous source domains to enhance fault diagnosis performance. However, the scarcity of fault samples compared to healthy samples limits the application of DG models in IFD in real-world industrial scenarios.

In this scenario, a logical strategy would be to enhance the original samples and match the acquired feature representations from various labeled domains to a shared distribution hypothesis. This will help in carrying out efficient fault diagnosis tasks across different domains.

In order to accomplish this, we utilize mixup-based techniques for data augmentation and domain-based metrics for feature projection, ensuring that features learned from different source domains adhere to the common distribution hypothesis. Within this specific subspace, identified feature representations are resistant to distribution shifts, allowing domain-invariant features to remain consistent across varying working conditions and machinery. As a result, classifiers trained with these domain-invariant features can be applied to fault diagnosis tasks involving new data. The mixup method in both class and domain spaces helps create smoother boundaries and improve data balance, while domain-based discrepancy metrics work towards minimizing distribution differences across multiple domains by maximizing intra-domain distance and minimizing inter-domain distance.

Figure 2 provides an overview of the proposed DEMDGN structure. In order to address class imbalances across domains, DEMDGN initially increases the number of samples for minority classes (fault classes) by incorporating samples from the same class. The distribution between original and augmented samples is then adjusted to ensure feature consistency. Finally, domain-based discrepancy metrics are utilized to maximize intra-domain distance and minimize inter-domain distance, thereby reducing distribution disparities across multiple domains. Previous studies have indicated that Depthwise Separable Convolutions (DSC) are commonly used to capture diverse fault information at various time scales. This study employs DSC for feature extraction from the input data to extract recognition features sensitive to faults from a limited set of training samples.

The architecture of DEMDGN.

The objective of the fault diagnosis model is to precisely categorize the states of machine health. In order to accomplish this, supervised classification losses are utilized for both the initial and produced samples, which are referred to as \({L_s}\)and\({L_m}\), respectively. These \({L_s}\) can be computed as:

where \({L_{CE}}\) represents the cross-entropy function. Similarly, \({L_m}\) is defined as:

where \({n^m}\) is the total number of generated samples.

In the context of IDGFD, the main difficulty in developing broadly applicable diagnostic models lies in the significant difference in scale among class data. One potential solution is to create dependable minority class instances in order to achieve a more balanced class distribution, which is essential for acquiring high-quality representations. The proposed mixup approach governs the produced samples within both the input feature and latent domain spaces. The process of generating virtual samples \(\left( {x_{q}^{m},y_{q}^{m}=c} \right)\) can be expressed as:

where \(\left( {x_{i}^{s},y_{i}^{s}=c} \right)\) and \(\left( {x_{j}^{s},y_{j}^{s}=c} \right)\) are randomly drawn from the c class in the source domain, and \(d_{q}^{m}\) is their corresponding domain labels. \(\lambda \in \left( {0,1} \right)\) is a mixup factor sampled from \(\lambda \sim Beta\left( {\delta ,\delta } \right)\)for \(\delta \in \left( {0,\infty } \right)\).

Figure 3 depicts the module for augmenting data. DEMDGN combines feature statistics and domain labels from two samples to create new fault samples, merging similar fault information from diverse source domains. The augmentation procedure persists until there is an equal number of samples for each health condition in every mini-batch.

Illustration of data augmentation module.

Reinforcing the semantic information of the generated samples in the latent feature space involves minimizing the distribution gap between the original source fault class and the augmented fault class. This study employs non-parametric MMD to assess distribution variances:

where G represents the feature generator, and \(X_{c}^{s}\) and \(X_{c}^{m}\) represent C-th the original and augmented fault classes’ raw data, respectively. \(Z_{c}^{s}\)and\(Z_{c}^{m}\) represent the advanced features of C real fault classes and enhanced fault classes respectively. Optimization functions are defined as

\({\hat {d}_H}\left( , \right)\) represents the empirical estimate of MMD, and H represents the reproducing nuclear Hilbert space (RKHS). Function \(\emptyset \left( \bullet \right)\) is a nonlinear transformation from the original feature space to the Reproducing Kernel Hilbert Space (RKHS).

A different approach to integrating data from various domains involves reducing the distribution gap by bringing data from different source domains closer together. By projecting data from diverse domains into a shared distribution space, the distance between observations from these domains is minimized. It is important to appropriately separate observations from the same domain in order to maintain reasonable distances. This module is designed to optimize the distances within each domain while minimizing the distances between different domains, ultimately integrating data from multiple source domains into a cohesive whole. Similar to the fault diagnosis classifier, a fully connected layer processes the output of the feature extractor, producing feature representations \(x_{i}^{{dm}}\) for metric learning. The domain-based discrepancy loss is computed as:

Where \({N_b}\) represents the batch size of the training data, and \({\left\| \cdot \right\|_2}\) denotes the \({L^2}\)-norm of the vector. \(W\left( \cdot \right)\) represents the position of the non-zero element of the output vector. For example, given three feature representations with domain labels \(d_{1}^{g}=\left\{ {0,0,0.2,0.8} \right\}\),\(d_{2}^{g}=\left\{ {0.7,0.3,0,0} \right\}\),and \(d_{3}^{g}=\left\{ {0,0,0.4,0.6} \right\}\), We can deduce that \(W\left( {d_{1}^{g}} \right) \ne W\left( {d_{2}^{g}} \right)\)and \(W\left( {d_{1}^{g}} \right)=W\left( {d_{3}^{g}} \right)\). Moreover, \({H_0}\) and \({H_1}\) are thresholds controlling the extent of intra-domain and inter-domain distance optimization. Features from the same domain are separated by a specified margin \({H_0}\). The distance between features from different domains does not exceed \({H_1}\). Thus, learning feature representations from the same domain is dispersed, while different domain feature representations are compact, facilitating the integration of features into a common or similar distribution space.

The ultimate goal of optimizing the proposed DEMDGN is expressed as:

where \(\alpha\)and\(\beta\) are hyperparameters that balance the three types of losses. The network parameters are modified during each training epoch in the following manner:

where \(\mu\) represents the learning rate, and q denotes the q-th iteration update.

In order to assess the performance and advantages of the suggested DEMDGN, tests were carried out on a marine diesel engine simulator dataset and a rotating machinery experimental dataset. Several advanced methods were used for comparison with DEMDGN.

Diagnosing faults in marine diesel engines requires access to operational data for analysis and diagnosis in different fault scenarios. The sources of operational data usually consist of actual ship diesel engine data, experimental data from test bed faults, and simulated diesel engine fault data. Fault data accumulated during real ship operations are often incomplete and noticed only when the fault is obvious, making early fault detection subjective. Simulating faults on real ships to obtain fault samples can cause significant economic losses and pose safety risks. Testbed experiments in laboratories, while closer to real ship data, find it difficult to simulate many fault states, and some data are hard to obtain through sensors, making experiments costly.

With the progress of computer technology and simulation techniques, altering mathematical model parameters to simulate faults has become an essential method for creating a fault database. This approach shortens the research cycle, reduces risks, and eliminates subjective discrepancies in manual readings and sensor sensitivity issues in control room readings. In this study, a simulator module is used to generate simulation data, demonstrating the model’s accuracy and reliability while providing dependable parameter information for fault diagnosis. The main engine model used in the simulator is the 6S50MC equipped with a TCA66 exhaust turbocharger. For detailed parameters, refer to Tables 1 and 2.

Simulations were conducted using the diesel engine simulation model to analyze fault status at different working conditions (100%, 75%, 50%, and 25%). This covered a total of 16 states, including both normal and faulty conditions, for the fuel system, gas exchange system, and cooling system. The data collected from these simulations are presented in Table 3.

For Case 1, data under 100%, 75%, 50%, and 25% working conditions are labeled as A1, A2, A3, and A4, respectively. As shown in Table 4, each source domain contains 2000 normal samples and 100 fault samples. Table 5 illustrates that the diagnostic model is trained using imbalanced data from three different source domains and then tested on a target domain that has not been seen before.

The dataset from Case Western Reserve University (CWRU) includes time-series monitoring data obtained from accelerometers placed on the motor drive end and fan end. The sampling frequencies for these data are 12 kHz and 48 kHz, respectively. The identified bearing defects occur in three locations: rolling element damage, inner race damage, and outer race damage (B, IR, OR), with defect sizes of 0.1778 mm, 0.3556 mm, and 0.5334 mm. Outer race faults are introduced at the 3, 6, and 12 o’clock positions under load conditions of 0hp, 1hp, 2hp, and 3hp, simulating the operational environment in actual production. The selected CWRU data are shown in Tables 6, 7, 8 and 9. In Case 2, the baseline data for normal operation, as well as the fault data for 12k drive end bearing, 48k drive end bearing, and 12k fan end bearing are denoted as B1, B2, B3, and B4 respectively. The three IDGFD tasks designed and the sample numbers from different domains are similar to those in Case 1. More detailed information can be found in Tables 10 and 11.

The Suzhou University bearing dataset59 was collected at a sampling frequency of 10 kHz under four different motor load conditions: 0 kN, 1 kN, 2 kN, and 3 kN. This dataset includes bearing data for both normal and faulty conditions, with fault types consisting of three single faults (Inner Fault (IF), Outer Fault (OF), Ball Fault (BF)) and four compound faults (Inner-Outer Fault (IO), Inner-Ball Fault (IB), Outer-Ball Fault (OB), and Inner-Outer-Ball Fault (IOB)). The single faults have five different defect sizes: 0.2 mm, 0.3 mm, 0.4 mm, 0.5 mm, and 0.6 mm, while the compound faults have a consistent defect size of 0.2 mm. In Case 3, the load conditions of 0hp, 1hp, 2hp, and 3hp are labeled as C1, C2, C3, and C4, respectively, and the compound fault data were used to validate the effectiveness of the algorithm. The three designed IDGFD tasks and the number of samples from different domains are similar to those in Case 1 and Case 2. Detailed information is provided in Tables 12 and 13.

In order to objectively showcase the efficiency of the suggested DEMDGN in diagnostic assignments, various cutting-edge domain generalization intelligent fault diagnosis algorithms were introduced for comparative analysis. The selected methods include DBDP-Net60, DMDGN61, DGNet_MSAC [32], and ACCGN [33].

DBDP-Net: This model proposes a signal preprocessing module to mitigate the adverse effects present in the original vibration signal. It then employs a dual-prototype loss mechanism to address distribution differences in class and domain prototypes, aiming to learn feature representations that are invariant across domains. Additionally, it introduces a dynamic weighting strategy to equalize the difficulty of learning features from different domains.

DMDGN: This model effectively extracts more domain-invariant and discriminative features from multiple source domains by utilizing data augmentation in both class and domain spaces, introducing adversarial perturbations, and balancing intra-class and inter-class distances using domain-based discrepancy metrics.

DGNet_MSAC: This model leverages multiple auxiliary classifiers tailored to specific domains in order to adeptly capture domain-specific features from each source domain. Subsequently, it utilizes a convolutional autoencoder module to reconfigure the original signal into a novel feature space, effectively eliminating learned domain-specific features.

ACCGN: ACCGN integrates a sparse domain regression framework and center loss to optimize both inter-class and intra-class characteristics of data features. Furthermore, it introduces a novel adaptive method for updating the center position, reducing the influence of the initial center position and learning inter-class invariant features through the sparse domain regression framework.

The PyTorch framework was employed for conducting all experiments using a GeForce GTX 3080Ti GPU. Each experiment underwent 20 repetitions in order to minimize the impact of random errors. To ensure a fair comparison, the methods being compared utilized similar network architectures and experimental settings as the proposed model. Furthermore, all models were trained for 50 epochs with a batch size of 256. The learning rate size is 0.001.The Adam optimizer was set with a learning rate of \(lr=0.0005/{\left( {1+10 \times \varphi } \right)^{{\text{0}}{\text{0.75}}}}\), where \(\varphi\) represents the initial learning rate.

Given the scarcity of fault samples and the abundance of normal samples, mere attainment of high accuracy may not accurately reflect effective performance. Hence, the assessment of diagnostic performance in IDGFD tasks employed recall and F1 scores as evaluation metric.

In the present investigation, TP, TN, FP, and FN are employed to denote the quantity of accurate positive predictions, accurate negative predictions, inaccurate positive predictions, and inaccurate negative predictions. Accuracy serves as an evaluation of the model’s proficiency in prediction across all samples. Given the paramount significance of identifying fault states, recall is utilized to appraise the model’s capacity for fault detection. The F1-score amalgamates precision and recall to furnish a comprehensive gauge of classification performance. Elevated values for these metrics signify superior diagnostic functionality.

The results of the proposed DEMDGN and other comparison methods for Case 1 are displayed in Table 14. While different methods generally achieve an accuracy of over 75% across the four tasks, the recall rates are relatively low. The exceptional level of accuracy can be attributed to the abundance of samples from the normal class in the source domains. This abundance allows the diagnostic models to meticulously capture the statistical characteristics of the normal class, thereby demonstrating outstanding performance within this prevalent segment in the test set. Nonetheless, the diminished recall rates signify a predisposition towards the majority class, resulting in subpar performance on instances belonging to minority classes within an imbalanced environment. Task 4 shows a significant performance difference between our method and comparison methods, with all comparison methods having recall rates below 20%, indicating their struggle to recognize minority class instances. The proposed approach demonstrates its effectiveness in reducing bias towards the majority of samples by achieving a relatively high recall rate. It consistently outperforms other methods in terms of accuracy, recall, and F1 score across all tasks. The diagnostic results for Case 2 are presented in Table 15, where the proposed method once again achieves superior performance across all tasks. Particularly remarkable is the 55.07% higher F1 score attained by the proposed method in Task 8, as compared to the most superior comparison method. These findings further substantiate the efficacy and dominance of the suggested approach in addressing the IDGFD issue. The experimental results for Case 3 are presented in Table 16. Similar to the findings in Case 1 and Case 2, the proposed method achieved the highest accuracy, recall, and F1 scores across all tasks in Case 3, which involves compound faults. These results further validate the effectiveness and superiority of the proposed method in addressing the IDGFD problem.

In contrast, the other compared methods, while effective in certain aspects, show limitations that explain their relatively lower performance:

DBDP-Net: Despite introducing a signal preprocessing module and a dual-prototype loss mechanism, DBDP-Net may struggle with capturing complex feature relationships in highly imbalanced datasets, leading to suboptimal generalization across diverse domains.

DMDGN: The DMDGN model may stem from noisy data augmentation, training instability from adversarial learning, difficulty in aligning features under extreme domain shifts, and challenges in capturing domain-specific nuances.

DGNet_MSAC: Although it utilizes multiple auxiliary classifiers and a convolutional autoencoder to eliminate domain-specific features, its reliance on domain-specific classifiers could limit its ability to generalize across unseen domains, particularly in highly variable data conditions.

ACCGN: While ACCGN’s sparse domain regression and adaptive center loss optimization are effective in learning inter-class invariant features, its reliance on fixed domain centers may hinder its adaptability in dynamic and imbalanced datasets, where feature distributions vary significantly.

The proposed DEMDGN outperforms the compared methods by leveraging mixup-based data augmentation and domain-based discrepancy metrics to align feature distributions across multiple heterogeneous source domains. This approach enables the extraction of robust, domain-invariant features, allowing the model to generalize effectively across varying conditions, including imbalanced and diverse domains. In contrast to the limitations of other methods, DEMDGN’s ability to handle class imbalance and domain shifts more effectively leads to superior diagnostic performance, as demonstrated through extensive experiments on marine machinery and bearing datasets.

A series of experiments were carried out to assess the impact of different elements of the proposed approach. Three variations (V1-V3) were developed by gradually eliminating these elements and analyzing their effect on model performance. The details of these variations are provided in Table 17, and Fig. 4 illustrates the comparison results. Comparison between V3 and the proposed method demonstrates that removing the metric learning element leads to a decrease in performance across most tasks, confirming the importance of metric learning for improving generalization capability. The class-oriented mixup and non-parametric conditional MMD in the latent space also significantly improve performance in IDGFD tasks, as indicated by the F1 score comparisons of V1, V2, and V3. The anticipated outcomes stem from the lack of these components, which undermines the semantic coherence of the produced samples and hinders deep models’ ability to extract valuable information. The removal experiments additionally confirm the efficacy of the suggested approach and the soundness of its framework.

Performance of the proposed method.

In order to provide a more comprehensive analysis of the performance of different methodologies in handling the IDGFD tasks, Fig. 5 illustrates the confusion matrices derived from each methodology, specifically focusing on Task 4. The results indicate that comparative methodologies tend to misclassify the fault class (Class 2) as the normal class (Class 1). However, our proposed approach mitigates this issue to some extent, achieving high accuracy in identifying the normal class while also improving accuracy in detecting the fault class. In this challenging IDGFD setting, where diagnostic models are trained on imbalanced source domains and evaluated on uncharted target domains, our approach demonstrates significant improvements, the promising outcomes produced by our proposed approach signify its potential as an invaluable instrument for industrial applications addressing issues related to imbalance and domain shift.

Confusion matrix for different methods on task 4.

The graphical representation of the features extracted by V3 for Task 4 and Task 8 is depicted in Fig. 6. The augmented pseudo-fault data closely aligns with the original fault data, indicating the efficacy of the proposed data augmentation method in distinguishing various classes of enhanced pseudo-data. Figure 6 shows the visualization of extracted features by V3 on Task 4 and Task 8.

Visualization of features extracted by V3 on Task 4 and Task 8.

To simulate the. imbalance in real-world working conditions where normal samples are abundant and fault samples are insufficient, an imbalance rate p(%) is used to construct imbalanced datasets for training. The unbalanced experimental data information is shown in Table 18. The experimental results are shown in Fig. 7.

Experimental result of different ρ.

As depicted in Fig. 7(a), all the methods show good accuracy when the samples are balanced, with no significant differences among them. However, as the imbalance rate increases, the accuracy of all classifiers decreases, with the proposed method maintaining the best performance. For example, at ρ = 1, the proposed method achieves an accuracy of 82.4%, which is at least 20% higher than other methods. This suggests that our model performs better with the same amount of training samples compared to existing models. In Fig. 7(b), we can see that F1 scores vary with ρ, and our proposed method consistently achieves the highest values. Especially for low imbalance rates, our method shows high F1 scores and accuracy, indicating its capability to recognize fault samples even with few examples.

In order to further exemplify the efficacy of the proposed approach in effectively tackling the issue of IDGFD, three commonly used data augmentation methods were employed: DPGCN63, RMA-WCGAN-1DCNN64 and DCGAN65. Figure 8 displays the diagnostic outcomes of the eight tasks using these four techniques. The results obtained from the other three methods are not satisfactory, indicating that previous data augmentation approaches are less suitable for solving the IDGFD problem. On the contrary, the proposed method exhibits a significant superiority over the other three methods, thus confirming its remarkable efficacy.

F1 scores of the proposed method compared with the other three data enhancement methods.

When using deep learning-based fault diagnosis techniques, the model training time is an important consideration. As shown in Table 19, the proposed method completes training within 5 min. Given that domain generalization fault diagnosis tasks are mostly performed offline, the computational burden is acceptable.

The findings emphasize the effectiveness of the suggested approach in reducing training time, rendering it a feasible option for industrial uses that demand domain generalization in fault diagnosis.

The DEMDGN model, while effective in extracting domain-invariant and discriminative features, has certain limitations that may lead to suboptimal performance in specific scenarios. First, the reliance on data augmentation in both class and domain spaces can sometimes introduce noisy or irrelevant variations, which may confuse the model rather than enhance its generalization. Additionally, the use of adversarial perturbations, while beneficial for improving robustness, can increase the complexity of training and potentially cause instability, especially when applied to highly imbalanced or small datasets. Furthermore, the balancing of intra-class and inter-class distances through domain-based discrepancy metrics, though designed to improve feature separation, may not work well in cases where the domain shift is extreme or where the feature space is highly non-linear, limiting the model’s ability to generalize effectively across significantly different domains.

In this study, we proposed the DEMDGN, designed to address the challenges of domain shifts and class imbalance in IFD. By leveraging mixup-based data augmentation and domain-based discrepancy metrics, the model effectively learns domain-invariant and discriminative features across multiple heterogeneous source domains, enabling robust generalization to unseen target data under varying working conditions. Our experimental results on three diverse datasets—one from marine machinery and two from bearing systems—demonstrate that DEMDGN significantly improves fault diagnosis performance compared to traditional approaches. The method successfully addresses class imbalance and domain shift issues, making it a strong candidate for practical industrial fault diagnosis tasks. In future work, we plan to extend DEMDGN to other types of machinery, explore additional challenges such as noisy environments, and investigate alternative data augmentation techniques to further enhance the model’s scalability and adaptability in complex industrial settings.

The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.

Valchev, I., Coraddu, A., Kalikatzarakis, M., Geertsma, R. & Oneto, L. Numerical methods for monitoring and evaluating the biofouling state and effects on vessels’ hull and propeller performance: A review. Ocean Eng. 251 (2022).

Bao, X., Huang, G., Liu, M., Sun, H. & Iglesias, G. Turbine fault diagnosis of the oscillating water column wave energy converter based on multi-lead residual neural networks. Ocean Eng. 291, 116429 (2024).

Article Google Scholar

Kong, X. et al. Concurrent fault diagnosis method for electric-hydraulic system: Subsea blowout preventer system as a case study. Ocean Eng. 294, 116818 (2024).

Article Google Scholar

Li, X., Xu, Y., Li, N., Yang, B. & Lei, Y. Remaining useful life prediction with partial sensor malfunctions using deep adversarial networks. IEEE/CAA J. Autom. Sin. 1–14https://doi.org/10.1109/JAS.2022.105935 (2022).

Li, X., Yu, S., Lei, Y., Li, N. & Yang, B. Dynamic vision-based Machinery Fault diagnosis with Cross-modality Feature Alignment. IEEE/CAA J. Autom. Sin. 11, 2068–2081 (2024).

Article Google Scholar

Li, X., Yu, S., Lei, Y., Li, N. & Yang, B. Intelligent machinery fault diagnosis with event-based camera. IEEE Trans. Ind. Inf. 20, 380–389 (2024).

Article Google Scholar

Li, X., Zhang, W., Li, X. & Hao, H. Partial domain adaptation in remaining useful life prediction with incomplete target data. IEEE/ASME Trans. Mechatron. PP, 1–11 (2023).

Google Scholar

Xu, L., Teoh, S. S. & Ibrahim, H. A deep learning approach for electric motor fault diagnosis based on modified InceptionV3. Sci. Rep. 14, 12344 (2024).