Managing the ML model workflow
This section guides you through the process of building and deploying Machine Learning (ML) models. It provides a step-by-step guide to model creation, data configuration, model training and deployment, and optional event setup.
This section covers the following:
- Creating an ML model
- Configuring an ML model
- Training a new model
- Deploying an ML model
- (Optional) Configuring the event setup
- Deleting ML models
- Duplicating an ML model
- Detaching an ML model from a node
The following diagram shows the workflow for implementing ML in BMC Helix Edge:
Creating an ML model
Create ML models and customize them to the data, resources, and operational constraints of your BMC Helix Edge deployment. Perform the following steps to create an ML model:
- In BMC Helix Edge, navigate to Intelligence > Machine Learning, and click New Model.
- In the New Model panel, perform the following steps:
- In the Name field, enter a descriptive name for the model.
- In the Description field, briefly describe the model's purpose and intended use.
- From the Algorithm type list, select one of the following algorithms:
- Classification: Categorizes data into predefined groups or classes. For example, the classification algorithm type classifies devices based on their type or categorizes events based on their severity.
- Anomaly: Detects unusual patterns or anomalies in the data, such as unexpected spikes in CPU usage, or unusual patterns in log files.
- Regression: Forecasts future data points based on historical data, such as resource usage or performance metrics.
- Time series: Identifies trends and patterns in time-dependent data, such as detecting seasonal variations in sensor readings, or analyzing trends in network traffic.
- In the Algorithm name list, the system automatically selects any of the algorithms based on the Algorithm type specified in step 3c.
The following are the algorithmS supported:- RandomForest-Classification: Random Forest is an ensemble learning method that creates multiple decision trees during training. It then combines the predictions from these individual trees to make the final classification. This approach reduces overfitting and improves model accuracy.
- AutoEncoder-Anomaly: AutoEncoders are neural networks designed to reconstruct their input data. An AutoEncoder-Anomaly model is trained to reconstruct standard data. When presented with anomalous data, the reconstruction error increases significantly. The system uses the difference in the reconstruction error to detect anomalies.
- LightGBM-Regression: Light Gradient Boosting Machine (LightGBM)-Regression is a robust, gradient-boosting framework known for its speed and efficiency. In the context of regression, LightGBM builds an ensemble of decision trees to predict a continuous target variable. The algorithm does this by sequentially adding trees. trees. Each new tree corrects the errors of the previous one
- Multivariate-Deep-TimeSeries: This algorithm constitutes Multivariate and Deep time series. The multivariate considers multiple input variables that change over time and deep employs a deep learning architecture, such as Recurrent Neural Networks (RNNs) or Long Short-Term Memory (LSTM) networks, which are well-suited for analyzing sequential data.
Click Create.
The system displays the newly created ML Model page with the Data configuration tab.
Configuring an ML model
After you create the model, you must configure the ML model by performing the following steps:
- Define the model data: You must define the ML model data first based on recommendations from domain experts or data scientists,
- Define the collection interval for the model and add filters.: You must set the collection interval and add filters to refine the data used for training. Based on recommendations from domain experts or data scientists.
To define the ML model data
In the Data configuration tab, go to the Input Data section and click Select.
The system displays the Primary Profile panel.
- From the Primary Profile list, click Select and select the profile that contains the data you want to use for your model.
If the profile is not listed, create a new one. - In the Select model fields dialog box, perform the following steps to select the device attributes and metrics.
- Click > next to Device Attributes to expand the list, see the specific fields, and select device attributes.
- Expand the Metrics list and select metrics.
(Optional) Use the Add selections from other device profiles pane to combine the data from other profiles with this data.
The system displays profiles only if matching profiles are available.- Click OK.
You can see the configuration added on the Data Configurations tab.
Perform one of the following steps:
To define the collection interval and add filters
- In the Data collection interval pane, do the following:
- In the Time period field, specify the duration for training data.
The default value is 0.25 seconds every day. - In the Sample every field, define how frequently you want to sample the data points within the specified time.
The appropriate sampling rate depends on the frequency of changes in your data. The default value is 10 seconds. - Make sure that the Use same interval for prediction option is selected to make predictions after the system trains the model using the same time and sample rate.
- In the Time period field, specify the duration for training data.
- In the Data filter pane, perform the following steps based on specific criteria:
- From the Attribute list, select the attribute to filter.
- From the Operator list, select the comparison operator such as Equals, Contains, or Excludes.
When managing filters, note the rules regarding operator options. To use an Equals filter for a specific attribute, do not employ Contains or Excludes for that attribute in another filter. To opt for Contains as the operator, use only Excludes for the attribute in another filter, and the converse is true.
- (For the Anomaly algorithm type only) Click the Health score reporting option to create a simplified view of health scores.
- Click Save.
The system checks if enough data is available based on the selected criteria. If sufficient data is available, you can see a message on the Data configuration tab. - Click Preview data to preview the data used for training and validate that the data is correct and complete.
(Optional) Click Download-ready to download the trained data.
- (Optional) Click Revert changes to undo any changes made on this screen.
Training a new ML model
ML models are trained on data to identify patterns and relationships. These learned patterns enable the models to make predictions or decisions on new, unseen data. Once trained, ML models can automate tasks and optimize operations. Use the Train & deploy tab to begin the training process and subsequently deploy your model.
Before you begin
- Make sure you have a zip file containing a script.
- If you have a trained model externally, you must package the saved model files, such as weights, architecture, and so on, into a ZIP file containing a script that performs the training with any necessary data files, configuration files, or dependencies. The ZIP file must not exceed 20.00 MB.
- In the ML Model, click the Train & deploy tab.
- In the Train & deploy tab, click Train new version and perform the following steps:
- In the Training name field, enter a descriptive name for this training run or model version.
This name helps you identify this specific training instance in the future. Use a name that reflects the specific data for training. - Click Attach a file to open a file browser and then select the ZIP file from your local system.
A zip file must contain a script that performs the training with any necessary data files, configuration files, or dependencies. If you have a trained model externally, you can package the saved model files, such as weights, architecture, and so on, into a ZIP file containing a script that performs the training with any necessary data files, configuration files, or dependencies. The ZIP file must not exceed 100.00 MB. - Click Train.
The system uploads the ZIP file and initiates the training process or deployment of the pre-trained model.
You can see the model in the list on the Train & deploy tab. Monitor the training progress in the Training status column.
The following table describes the training statuses:Training status
What it means
New
The ML model is newly created but has not yet undergone training. The model is in its initial state and has not started learning from the provided data.
TrainingInProgress
The ML model is now undergoing training. The model is actively learning patterns and relationships from the provided training data.
TrainingCompleted
The ML model's training process is complete. The model has finished learning from the training data and is ready for use in making predictions or analyses.
TrainingFailed
The ML model's training process failed. The model needs to be investigated for errors and corrected accordingly.
- Wait for the Training status to show Completed before deploying the model.
- In the Training name field, enter a descriptive name for this training run or model version.
Deploying ML models
After an ML model is created and configured, you must deploy the model for analysis. The deployment process makes the model available for use in BMC Helix Edge. After deployment, you can integrate the model with other features and dashboards to provide insights and drive actions. Perform the following steps to deploy ML models:
Click Manage deployments.
The system displays the following Manage deployments page:
The following table provides an overview of the deployment status of your models across your nodes.Training status
What it means
Last status
Shows the model's current deployment status on that specific node. Examples include ModelDeployed and ReadyToDeploy.
Node host
Shows the hostname or identifier of the node.
Last version
Indicates the previous version of the model that was deployed to that node.
- In the Deploy tab, select one or more nodes to deploy with ReadytoDeploy status.
- Click Deploy.
The system initiates the deployment process to the selected nodes. - Observe the last status column in the table to see the updates that reflect the deployment progress.
The system displays statuses such as Deploying while the deployment is in progress. After the deployment is complete, the system displays the ModelDeployed status. - Return to the Train & deploy tab to monitor the overall deployment status in the Deployment status column of the version table.
(Optional) Configuring the event setup
With this option, you to define rules that trigger events (alerts) based on the model's predictions (anomaly scores).
Before you begin
- Make sure that you configure the Machine Learning model in BMC Helix Edge. See Configuring Machine Learning models.
- Make sure that you completed the training process for the specific machine learning model. See Training a machine learning model.
To configure the event setup for Anamoly, Time Series, and Regression
- On the Train & deploy tab, click Event Setup.
- In the Event name field, enter a descriptive name for the event.
- In the Description field, briefly describe what this event signifies.
- In the Severity & Conditions pane, perform the following steps to define the severity and conditions that trigger events at different severity levels:
- Select the severity level by checking Minor, Major, or Critical options to enable each severity.
The Prediction field indicates that the condition is based on the model's prediction for each enabled severity level. - From the Select list, select an operator, such as Between, Greater than, and Less than.
- In the From field, enter the numerical value(s) for the comparison.
- In the Stabilization period, input a value in the Repeating Occurrences field to specify how often the system must meet the condition consecutively before triggering an event.
For example, if you set the value to 1, BMC Helix Edge triggers an event immediately when it meets the condition. If you set the value to 3, the condition must be valid for three consecutive data points before BMC Helix Edge triggers an event. By doing so, you can avoid spurious alerts due to momentary fluctuations.
- Select the severity level by checking Minor, Major, or Critical options to enable each severity.
- Click Save.
To configure event setup for a classification model,
- On the Train & deploy tab, click Event Setup.
- In the Event name field, enter a descriptive name for the event.
- In the Description field, briefly describe what this event signifies.
- In the Severity pane, perform the following steps to define the severity that trigger events at different severity levels:
- Select the severity level selecting Minor, Major, or Critical options to enable each severity.
The Prediction field indicates that the condition is based on the model's prediction for each enabled severity level. - In the Condition pane, to specify the criteria that must be met for the event to be triggered, perform the following steps:
- In the Status field, enter a status and then select an operator, such as Greater than, Equal to, or Less than.
The status field represents the classification result from the ML model. It indicates the predicted state based on input features, for example Normal, Anomalous, Warning, or Failed. - In the Confidence field, enter a numerical value(s) for the comparison and select an operator, such as Greater than, Equal to, or Less than.
A numerical value, for instance 0.6, represents the model's confidence in its prediction. A higher value indicates greater certainty in the classification.In the Stabilization period, input a value in the Repeating Occurrences field to specify how often the system must meet the condition consecutively before triggering an event.
- In the Status field, enter a status and then select an operator, such as Greater than, Equal to, or Less than.
- Select the severity level selecting Minor, Major, or Critical options to enable each severity.
- Click Save.
Deleting an ML model
Deleting a model is an irreversible action. This option permanently removes a selected ML model from the system, including the model's configuration, training data associations, trained model files, and related metadata. Be sure to delete the model before proceeding.
Before you begin
Make sure that that trained model is not attached to any device profiles.
To delete an ML model
- In BMC Helix Edge, navigate to Intelligence and click Machine Learning.
The system displays the following Machine Learning page: - Select one or more model checkboxes and click Delete.
The system displays a prompt to confirm the deletion action. - Click Delete.
The system removes the deleted ML model from the list.
Duplicating an ML model
You can duplicate a copy of an existing ML model if the existing model needs a slight configuration in the metrics. You can use the duplicated model to train and create multiple models with slight changes in the metrics. The copied models include configuration but not the trained model files themselves.
To duplicate an ML model
- In BMC Helix Edge, navigate to Intelligence and click Machine Learning.
The system displays the following Machine Learning page: - Select a model to duplicate from the list and then click Duplicate model.
- In the cloned model panel, perform the following steps:
- In the Name field, rename the model's name.
By default, the system appends Copy to the end of the existing model name. - (Optional) In the Description field, briefly describe the model's purpose and intended use.
- Click Duplicate.
The system displays the newly created ML model in the list.
- In the Name field, rename the model's name.
Undeploying an ML model from the node
By undeploying an ML model, you remove a deployed model from specific nodes to stop the model from running on those nodes and free up resources.
In BMC Helix Edge, navigate to Intelligence and click Machine Learning.
- In the Machine Learning page, click the model name to detach the trained data from the node.
- Click the Train & Deploy tab.
- Click Manage deployments.
- In the Manage deployments side panel, click the Undeploy tab.
- Select one or more nodes with the ModelDeployed status.
- Click Undeploy.
The system begins the model-detaching process from the selected nodes. Once the process is complete, the selected nodes move back to the Deploy tab. - Return to the Train & deploy tab to monitor the overall deployment status in the Deployment status column of the version table.