Control-M for GCP Dataproc


Google Cloud Platform (GCP) Dataproc enables you to perform cloud-based big data processing and machine learning.

Control-M for GCP Dataproc enables you to do the following:

  • Execute single or Workflow Template GCP Dataproc jobs.
  • Manage GCP Dataproc credentials in a secure connection profile.
  • Connect to any GCP Dataproc endpoint.
  • Integrate GCP Dataproc jobs with other Control-M jobs into a single scheduling environment.
  • Monitor the status, results, and output of GCP Dataproc jobs in the Monitoring domain.
  • Attach an SLA job to your GCP Dataproc jobs.
  • Introduce all Control-M capabilities to Control-M for GCP Dataproc including advanced scheduling criteria, complex dependencies, Resource Pools, Lock Resources, and variables.
  • Run 50 GCP Dataproc jobs simultaneously per Agent.

Control-M for GCP Dataproc Compatibility

The following table lists the prerequisites that are required to use the GCP Dataproc plug-in, each with its minimum required version.

Component

Version

Control-M/EM

9.0.20.200

Control-M/Agent

9.0.20.201

Control-M Application Integrator

9.0.20.201

Control-M Automation API

9.0.20.250

Control-M for GCP Dataproc is supported on Control-M Web and Control-M Automation API, but not on Control-M client.

To download the required installation files for each prerequisite, see Obtaining-Control-M-Installation-Files.

Setting up Control-M for GCP Dataproc

This procedure describes how to deploy the GCP Dataproc plug-in, create a connection profile, and define a GCP Dataproc job in Control-M Web and Automation API.

Warning

Note

Integration plug-ins released by BMC require an Application Integrator installation. However, these plug-ins are not editable and you cannot import them into Application Integrator. To deploy these integrations to your Control-M environment, import them directly into Control-M using Control-M Automation API.

Before You Begin

Verify that Automation API is installed, as described in Automation API Installation.

Begin

  1. Create a temporary directory to save the downloaded files.
  2. Download the GCP Dataproc plug-in from the Control-M for GCP Dataproc download page in the EPD site.
  3. Install the GCP Dataproc plug-in via one of the following methods:
    • Versions 9.0.21 or Higher: Use the Provision service of Automation API, as follows:
      1. As an administrator on the Control-M/EM Server, store the downloaded zip file in the following location.
        Within several minutes, the zip file is available in all Control-M interfaces associated with the Control-M/EM.
        • Linux: $HOME/ctm_em/AUTO_DEPLOY
        • Windows: <EM_HOME>\AUTO_DEPLOY
      2. As an application user on the Agent machine, run the provision image command, as follows:
        • Linuxctm provision image GDP_plugin.Linux
        • Windowsctm provision image GDP_plugin.Windows
    • Versions Lower than 9.0.21: Use the Deploy service of Automation API, as described in deploy jobtype.
  4. Create a GCP Dataproc connection profile in Control-M Web or Automation API, as follows:
  5. Define a GCP Dataproc job in Control-M Web or Automation API, as follows:
Warning

Note

To remove this plug-in from an Agent, see Removing a Plug-in. The plug-in ID is GDP042022.

Change Log

The following table provides details about changes that were introduced in new versions of this plug-in:

Plug-in Version

Details

1.0.01

Add the ability to trigger batch jobs using Dataproc Serverless for Spark Batch Jobs.
The new Batches option was added to the Dataproc task type parameter, as well as new parameters Batch ID and Requested ID.
 

1.0.02

The Batch ID and Requested ID parameters are now set to resolve on rerun.

1.0.03

Added ability to terminate the interactive session resource.

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*

Control-M