Control-M for GCP Dataplex
GCP Dataplex is an extract, transform, and load (ETL) service that enables you to visualize and manage data in GCP BigQuery and the cloud.
Control-M for GCP Dataplex enables you to do the following:
- Execute any of the following job actions:
- Data Quality Task: Executes a predefined data quality task in GCP BigQuery or Google Cloud Storage locations, and defines data controls in BigQuery environments.
- Custom Spark Task: Executes a predefined, scheduled Apache Spark task to analyze and process your data.
- Data Profiling Scan: Executes a predefined data scan to identify shared statistical characteristics between BigQuery tables.
- Data Quality Scan: Executes a predefined data quality scan that validates your data and logs alerts when the data fails validation.
- Manage GCP Dataplex credentials in a secure connection profile.
- Connect to any GCP Dataplex endpoint.
- Integrate GCP Dataplex jobs with other Control-M jobs into a single scheduling environment.
- Monitor the status, results, and output of GCP Dataplex jobs in the Monitoring domain.
- Attach an SLA job to your GCP Dataplex jobs.
- Introduce all Control-M capabilities to Control-M for GCP Dataplex, including advanced scheduling criteria, complex dependencies, Resource Pools, Lock Resources, and variables.
- Run 50 GCP Dataplex jobs simultaneously per Agent.
Control-M for GCP Dataplex Compatibility
The following table lists the Control-M for GCP Dataplex plug-in prerequisites, each with its minimum required version.
Component | Version |
---|---|
Control-M/EM | 9.0.21.100 |
Control-M/Agent | 9.0.21.100 |
Control-M Application Integrator | 9.0.21.100 |
Control-M Automation API | 9.0.21.125 |
Control-M for GCP Dataplex is supported on Control-M Web and Control-M Automation API, but not on Control-M client.
To download the required installation files for each prerequisite, see Obtaining-Control-M-Installation-Files.
Setting Up Control-M for GCP Dataplex
This procedure describes how to deploy the GCP Dataplex plug-in, create a connection profile, and define a GCP Dataplex job in Control-M Web and Automation API.
Before You Begin
Verify that Automation API is installed, as described in Automation API Installation.
Begin
- Create a temporary directory to save the downloaded files.
- Download the GCP Dataplex plug-in from the Control-M for GCP Dataplex download site on the EPD site.
- Install the GCP Dataplex plug-in using the Automation API Provision service:
- Log in to the Control-M/EM Server machine as an Administrator and store the downloaded zip file in the one of the following locations (within several minutes, the job type appears in Control-M Web):
- Linux: $HOME/ctm_em/AUTO_DEPLOY
- Windows: <EM_HOME>\AUTO_DEPLOY
- Log in to the Control-M/Agent machine and run the provision image command, as follows:
- Linux: ctm provision image GCP_Dataplex_plugin.Linux
- Windows: ctm provision image GCP_Dataplex_plugin.Windows
- Log in to the Control-M/EM Server machine as an Administrator and store the downloaded zip file in the one of the following locations (within several minutes, the job type appears in Control-M Web):
- Create a GCP Dataplex connection profile in Control-M Web or Automation API, as follows:
- Define a Google Functions job in Control-M Web or Automation API, as follows:
- Web: Creating a Job with GCP Dataplex Job attributes.
- Automation API: Job:GCP Dataplex.