Tutorial - Running applications and programs in your environment

Jobs run where the Control-M/Agent resides.You need a Control-M/Agent on your application host. The provision service enables you to install and set up a Control-M/Agent. 

Select the relevant application:

Before you begin

Ensure that you have set up your environment, as described in Setting up the prerequisites.

Running a script and command job flow

This example walks you through running a script and command in sequence. You need a Windows 64-bit machine or Linux 64-bit machine that has access to scripts and programs that you would like to run.

Step 1 - Find the image to provision

The provision images command lists the images available to install.

> ctm provision images Linux

[
  "Agent.Linux",
  "ApplicationsAgent.Linux",
  "BigDataAgent.Linux"
]

OR

> ctm provision images Windows

[
  "Agent.Windows",
  "ApplicationsAgent.Windows"
]

As you can see, there are three available Linux images and two Windows images:

  • Agent.Linux/Agent.Windows- provides the ability to run scripts and commands. 
  • ApplicationsAgent.Linux/ApplicationsAgent.Windows- in addition to Agent.Linux/Agent.Windows, adds plugins to run file transfer jobs and database SQL scripts.
  • BigDataAgent.Linux- in addition to Agent.Linux, adds a plugin to run Hadoop and Spark jobs.

In this example, you will provision Agent.Windows or Agent.Linux according to the jobs that you would like to run.

Step 2 - Provision the Agent image

On a Windows system, run the following command as Administrator:

ctm provision install Agent.Windows

OR

On Linux, run the following command:

ctm provision install Agent.Linux

After provisioning the Agent successfully, you now have a running instance of your Control-M/Agent on your host.

Step 3 - Access the tutorial samples

Go to the directory where the tutorial sample is located:

cd automation-api-quickstart/101-running-script-command-job-flow

Step 4 - Verify the code for Control-M

Let's take the AutomationAPISampleFlow.json file, which contains job definitions, and verify that the code within it is valid. To do so, use the  build command. The following example shows the command and a typical successful response.

> ctm build AutomationAPISampleFlow.json

[
  {
    "deploymentFile": "AutomationAPISampleFlow.json",
    "successfulFoldersCount": 0,
    "successfulSmartFoldersCount": 1,
    "successfulSubFoldersCount": 0,
    "successfulJobsCount": 2,
    "successfulConnectionProfilesCount": 0
  }
]

If the code is not valid, an error is returned.

Step 5 - Run the source code

Use the  run command to run the jobs on the Control-M environment. The returned runId is used to check the job status. The following shows the command and a typical successful response.

> ctm run AutomationAPISampleFlow.json

{

  "runId": "7cba67de-9e0d-409d-8d93-1b8229432eee",
  "statusURI": "https://localhost:8443/automation-api/run/status/7cba67de-9e0d-409d-8d93-1b82294e?token=4f8684ec6754e08cc70f95b5f09d3a47_A1FD0E65",
  "monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=7cba67de-9e0d-409d-8d93-29432eee&title=AutomationAPISampleHadoopFlow.json"
}

This code ran successfully and returned the runId of "7cba67de-9e0d-409d-8d93-1b8229432eee".

Step 6 - Check job status using the runId

The following command shows how to check job status using the runId. Note that when there is more than one job in the flow, the status of each job is checked and returned.

> ctm run status "7cba67de-9e0d-409d-8d93-1b8229432eee"

{
  "statuses": [
    {
      "jobId": "workbench:00007",
      "folderId": "workbench:00000",
      "numberOfRuns": 1,
      "name": "AutomationAPISampleFlow",
      "type": "Folder",
      "status": "Executing",
      "startTime": "Apr 26, 2017 10:43:47 AM",
      "endTime": "",
      "outputURI": "Folder has no output",
      "logURI": "https://localhost:8443/automation-api/run/job/workbench:00007/log?token=01ab65917bc71dbef610806dd9cb3f94_0007C46B"
    },
    {
      "jobId": "workbench:00008",
      "folderId": "workbench:00007",
      "numberOfRuns": 0,
      "name": "CommandJob",
      "folder": "AutomationAPISampleFlow",
      "type": "Command",
      "status": "Wait Host",
      "startTime": "",
      "endTime": "",
      "outputURI": "Job did not run, it has no output",
      "logURI": "https://localhost:8443/automation-api/run/job/workbench:00008/log?token=01ab65917bc71dbef610806dd9cb3f94_0007C46B"
    },
    {
      "jobId": "workbench:00009",
      "folderId": "workbench:00007",
      "numberOfRuns": 0,
      "name": "ScriptJob",
      "folder": "AutomationAPISampleFlow",
      "type": "Job",
      "status": "Wait Condition",
      "startTime": "",
      "endTime": "",
      "outputURI": "Job did not run, it has no output",
      "logURI": "https://localhost:8443/automation-api/run/job/workbench:00009/log?token=01ab65917bc71dbef610806dd9cb3f94_0007C46B"
    }
  ],
  "startIndex": 0,
  "itemsPerPage": 25,
  "total": 3,
  "monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=7cba67de-9e0d-409d-8d93-1b8229432eee&title=Status_7cba67de-9e0d-409d-8d93-1b8229432eee"

Step 7 - Examine the source code

Let's look at the source code in the AutomationAPISampleFlow.json file. By examining the contents of this file, you'll learn about the structure of the job flow and what it should contain.

{
    "Defaults" : {
        "Application" : "SampleApp",
        "SubApplication" : "SampleSubApp",
        "RunAs" : "<USERNAME>",
        "Host" : "<HOST>",
        "Job": {
            "When" : {
                "Months": ["JAN", "OCT", "DEC"],
                "MonthDays":["22","1","11"],
                "WeekDays":["MON","TUE", "WED", "THU", "FRI"],
                "FromTime":"0300",
                "ToTime":"2100"
            },
            "ActionIfFailure" : {
                "Type": "If",       
                "CompletionStatus": "NOTOK",
                
                "mailToTeam": {
                    "Type": "Mail",
                    "Message": "%%JOBNAME failed",
                    "To": "team@mycomp.com"
                }
            }
        }
    },
    "AutomationAPISampleFlow": {
        "Type": "Folder",
        "Comment" : "Code reviewed by John",
        "CommandJob": {
            "Type": "Job:Command",
            "Command": "<COMMAND>"
        },
        "ScriptJob": {
            "Type": "Job:Script",
          	"FilePath":"<SCRIPT_PATH>",
          	"FileName":"<SCRIPT_NAME>"
        },
        "Flow": {
            "Type": "Flow",
            "Sequence": ["CommandJob", "ScriptJob"]
        }
    }
}

The first object is called "Defaults". It allows you to define a parameter once for all objects. For example, it includes scheduling using the When  parameter, which configures all jobs to run according to the same scheduling criteria. The "ActionIfFailure" object determines what action is taken if a job ends unsuccessfully.

This example contains two jobs: CommandJob and ScriptJob. These jobs are contained within a folder  named AutomationAPISampleFlow. To define the sequence of job execution, the Flow  object is used.

Step 8 - Modify the code to run in your environment

In the code above, the following parameters need to be set to run the jobs in your environment: 

"RunAs" : "<USERNAME>" 
"Host" : "<HOST>"

"Command": "<COMMAND>"
"FilePath":"<SCRIPT_PATH>"
"FileName":"<SCRIPT_NAME>"

RunAs  identifies the operating system user that will execute the jobs.

Host defines the machine where you provisioned the Control-M/Agent. 

Command defines the command to run according to your operating system.

FilePath and FileName define the location and name of the file that contains the script to run.

Note: In JSON, the backslash character must be doubled (\\) when used in a Windows file path.

Step 9 - Rerun the code sample

Now that we've modified the source code in the AutomationAPISampleFlow.json file, let's rerun the sample:

> ctm run AutomationAPISampleFlow.json

{
  "runId": "ed40f73e-fb7a-4f07-a71c-bc2dfbc48494",
  "statusURI": "https://localhost:8443/automation-api/run/status/ed40f73e-fb7a-4f07-a71c-bc2dfbc48494?token=460e0106b369a0d155bb0e7cbb44f8eb_7E6C03FA",
  "monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=ed40f73e-fb7a-4f07-a71c-bc2dfbc48494&title=AutomationAPISampleFlow.json"
}

Each time you run the code, a new runId is generated. Let's take the new runId, and check the jobs statuses again:

> ctm run status "ed40f73e-fb7a-4f07-a71c-bc2dfbc48494"

{
  "statuses": [
    {
      "jobId": "workbench:0000p",
      "folderId": "workbench:00000",
      "numberOfRuns": 1,
      "name": "AutomationAPISampleFlow",
      "type": "Folder",
      "status": "Ended OK",
      "startTime": "May 3, 2017 4:57:25 PM",
      "endTime": "May 3, 2017 4:57:28 PM",
      "outputURI": "Folder has no output",
      "logURI": "https://localhost:8443/automation-api/run/job/workbench:0000p/log?token=a8d74f5914dc6decdfd8b2ec833d54cc_3E30FFC9"
    },
    {
      "jobId": "workbench:0000q",
      "folderId": "workbench:0000p",
      "numberOfRuns": 1,
      "name": "CommandJob",
      "folder": "AutomationAPISampleFlow",
      "type": "Command",
      "status": "Ended OK",
      "startTime": "May 3, 2017 4:57:26 PM",
      "endTime": "May 3, 2017 4:57:26 PM",
      "outputURI": "https://localhost:8443/automation-api/run/job/workbench:0000q/output?token=a8d74f5914dc6decdfd8b2ec833d54cc_3E30FFC9",
      "logURI": "https://localhost:8443/automation-api/run/job/workbench:0000q/log?token=a8d74f5914dc6decdfd8b2ec833d54cc_3E30FFC9"
    },
    {
      "jobId": "workbench:0000r",
      "folderId": "workbench:0000p",
      "numberOfRuns": 1,
      "name": "ScriptJob",
      "folder": "AutomationAPISampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "startTime": "May 3, 2017 4:57:27 PM",
      "endTime": "May 3, 2017 4:57:27 PM",
      "outputURI": "https://localhost:8443/automation-api/run/job/workbench:0000r/output?token=a8d74f5914dc6decdfd8b2ec833d54cc_3E30FFC9",
      "logURI": "https://localhost:8443/automation-api/run/job/workbench:0000r/log?token=a8d74f5914dc6decdfd8b2ec833d54cc_3E30FFC9"
    }
  ],
  "startIndex": 0,
  "itemsPerPage": 25,
  "total": 3,
  "monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=ed40f73e-fb7a-4f07-a71c-bc2dfbc48494&title=Status_ed40f73e-fb7a-4f07-a71c-bc2dfbc48494"
}

You can now see that both jobs Ended OK.

Let's view the output of CommandJob. Use the jobId to get this information.

ctm run job:output::get "workbench:0000q"

Verify that the output contains your script or command details.

Step 10 - View job details through an interactive interface

Control-M Workbench offers an interactive user interface for debugging purposes. Through this interface, you can view various job run details (including, for example, an activity log and statistics for each job). To launch this interface when you run jobs, enter "--interactive" or "-i" at the end of the run  command.

> ctm run AutomationAPISampleFlow.json --interactive

{
  "runId": "40586805-60b5-4acb-9f21-a0cf048f1051",
  "statusURI": "https://ec2-54-187-1-168.us-west-2.compute.amazonaws.com:8443/run/status/40586805-60b5-4acb-9f21-a0cf048f1051",
  "monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=40586805-60b5-4acb-9f21-a0cf048f1051&title=AutomationAPISampleFlow.json
}

A browser window opens, where you can view and manage your jobs.

Where to go from here

Back to top

Running a file transfer and database queries job flow

This example walks you through running file transfer and database query jobs in sequence. To complete this tutorial, you need a PostgreSQL database (or you can use other databases) and SFTP server. For this example, you need to install the Agent on a machine that has a network connection to these servers.

Step 1 - Find the image to provision

The provision images command lists the images available to install.

> ctm provision images Linux

[
  "Agent.Linux",
  "ApplicationsAgent.Linux",
  "BigDataAgent.Linux"
]

OR

> ctm provision images Windows

[
  "Agent.Windows",
  "ApplicationsAgent.Windows"
]

As you can see, there are three available Linux images and two Windows images:

  • Agent.Linux/Agent.Windows- provides the ability to run scripts and commands. 
  • ApplicationsAgent.Linux/ApplicationsAgent.Windows- in addition to Agent.Linux/Agent.Windows, adds plugins to run file transfer jobs and database SQL scripts.
  • BigDataAgent.Linux- in addition to Agent.Linux, adds a plugin to run Hadoop and Spark jobs.

In this example, you will provision ApplicationsAgent.Windows or ApplicationAgent.Linux according to the machine that you use to run the jobs.

Step 2 - Provision the Agent image

On a Windows system, run the following command as Administrator:

ctm provision install ApplicationsAgent.Windows

OR

On Linux, run the following command:

ctm provision install ApplicationsAgent.Linux

After provisioning the Agent successfully, you now have a running instance of Control-M/Agent on your host.

Step 3 - Access the tutorial samples

Go to the directory where the tutorial sample is located:

cd automation-api-quickstart/101-running-file-transfer-and-database-query-job-flow

Step 4 - Verify the code for Control-M

Let's take the AutomationAPIFileTransferDatabaseSampleFlow.json file, which contains job definitions, and verify that the code within it is valid. To do so, use the build command. The following example shows the command and a typical successful response.

> ctm build AutomationAPIFileTransferDatabaseSampleFlow.json

[
  {
    "deploymentFile": "AutomationAPIFileTransferDatabaseSampleFlow.json",
    "successfulFoldersCount": 0,
    "successfulSmartFoldersCount": 1,
    "successfulSubFoldersCount": 0,
    "successfulJobsCount": 2,
    "successfulConnectionProfilesCount": 3,
    "successfulDriversCount": 0
  }
]

If the code is not valid, an error is returned.

Step 5 - Examine the source code

Let's look at the source code in the AutomationAPIFileTransferDatabaseSampleFlow.json file. By examining the contents of this file, you'll learn about the structure of the job flow and what it should contain.

{
    "Defaults" : {
        "Application" : "SampleApp",
        "SubApplication" : "SampleSubApp",
        "Host" : "<HOST>",
        "TargetAgent" : "<HOST>",
                                
        "Variables": [
           {"DestDataFile": "<DESTINATION_FILE>"},
           {"SrcDataFile":  "<SOURCE_FILE>"}
        ],
                                
        "When" : {
            "FromTime":"0300",
            "ToTime":"2100"
        }
    },
    "SFTP-CP": {
        "Type": "ConnectionProfile:FileTransfer:SFTP",
        "HostName": "<SFTP_SERVER>",
        "Port": "22",
        "User" : "<SFTP_USER>",
        "Password" : "<SFTP_PASSWORD>"
    },
    "Local-CP" : {
        "Type" : "ConnectionProfile:FileTransfer:Local",
        "User" : "<USER>",
        "Password" : "<PASSWORD>"
    },
    "DB-CP": {
        "Type": "ConnectionProfile:Database:PostgreSQL",
        "Host": "<DATABASE_SERVER>",
        "Port":"5432",
        "User": "<DATABASE_USER>",
        "Password": "<DATABASE_PASSWORD>",
        "DatabaseName": "postgres"
    },
    "AutomationAPIFileTransferDatabaseSampleFlow": {
        "Type": "Folder",
        "Comment" : "Code reviewed by John",
        "GetData": {
            "Type" : "Job:FileTransfer",
            "ConnectionProfileSrc" : "SFTP-CP",
            "ConnectionProfileDest" : "Local-CP",
                                
            "FileTransfers" :
            [
                {
                    "Src" : "%%SrcDataFile",
                    "Dest": "%%DestDataFile",
                    "TransferOption": "SrcToDest",
                    "TransferType": "Binary",
                    "PreCommandDest": {
                        "action": "rm",
                        "arg1": "%%DestDataFile"
                    },
                    "PostCommandDest": {
                        "action": "chmod",
                        "arg1": "700",
                        "arg2": "%%DestDataFile"
                    }
                }
            ]
        },
        "UpdateRecords": {
            "Type": "Job:Database:SQLScript",
            "SQLScript": "/home/<USER>/automation-api-quickstart/101-running-file-transfer-and-database-query-job-flow/processRecords.sql",
            "ConnectionProfile": "DB-CP"
        },
        "Flow": {
            "Type": "Flow",
            "Sequence": ["GetData", "UpdateRecords"]
        }
    }
}

The first object is called "Defaults". It allows you to define a parameter once for all objects. For example, it includes scheduling using the When  parameter, which configures all jobs to run according to the same scheduling criteria. The Defaults also includes Variables that are referenced several times in the jobs. 

The sample contains two jobsGetData and UpdateRecords. GetData transfers files from the SFTP server to the host machine. UpdateRecords performs a SQL query on the database. Both jobs are contained within a  folder named AutomationAPIFileTransferDatabaseSampleFlow. To define the sequence of job execution, the Flow  object is used.

The sample also includes the following three connection profiles:

  • SFTP-CP defines access and security credentials for the SFTP server.
  • DB-CP defines access and security credentials for the database.
  • Local-CP defines access and security credentials for files that are transferred to the local machine.

Step 6 - Modify the code to run in your environment

In the code sample, perform the following modifications:

  • Replace the value for "TargetAgent" and "Host" with the host name of the machine where we provisioned the Control-M/Agent.

    "TargetAgent" : "<HOST>"
    "Host" : "<HOST>"
  • Replace the value of "SrcDataFile" with the file that is transferred from the SFTP server, and the value of "DestDataFile" with the path of the transferred file on the host machine.

    {"DestDataFile": "<DESTINATION_FILE>"},
    {"SrcDataFile":  "<SOURCE_FILE>"}
  • Modify the path to the samples directory for the jobs to run successfully in your environment. Replace the path /home/<USER>/automation-api-quickstart/101-running-file-transfer-and-database-query-job-flow with the location of the samples that you installed on your machine.

    "SQLScript": "/home/<USER>/automation-api-quickstart/101-running-file-transfer-and-database-query-job-flow/processRecords.sql"
  • Replace the following parameters with the credentials used to login to the SFTP server.

            "HostName": "<SFTP_SERVER>",
            "User" : "<SFTP_USER>",
            "Password" : "<SFTP_PASSWORD>"
  • Replace the following parameters with the credentials used to access the database server.

            "Host": "<DATABASE_SERVER>",
            "Port":"5432",
            "User": "<DATABASE_USER>",
            "Password": "<DATABASE_PASSWORD>",
  • Replace the following parameters with the credentials used for read/write files on the host machine.

       "Local-CP" : {
            "Type" : "ConnectionProfile:FileTransfer:Local",
            "User" : "<USER>",
            "Password" : ""
        }

Step 7 - Run the code sample

Now that we've modified the source code in the previous step, let's run the sample:

> ctm run AutomationAPIFileTransferDatabaseSampleFlow.json

{
  "runId": "ce62ace0-4a6e-4b17-afdd-35335cbf179e",
  "statusURI": "https://localhost:8443/automation-api/run/status/ce62ace0-4a6e-4b17-afdd-35335cbf179e?token=737a87efc43805ecf30263fb2863bea5_2E8C3C6C",
  "monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=ce62ace0-4a6e-4b17-afdd-35335cbf179e&title= AutomationAPIFileTransferDatabaseSampleFlow.json"
}

 Each time you run the code, a new runId is generated. Let's take the runId and check the jobs statuses:

> ctm run status "ce62ace0-4a6e-4b17-afdd-35335cbf179e"

{
  "statuses": [
    {
      "jobId": "workbench:000c1",
      "folderId": "workbench:00000",
      "numberOfRuns": 1,
      "name": "AutomationAPIFileTransferDatabaseSampleFlow",
      "type": "Folder",
      "status": "Ended OK",
      "startTime": "May 23, 2017 4:25:10 PM",
      "endTime": "May 23, 2017 4:25:26 PM",
      "outputURI": "Folder has no output",
      "logURI": "https://localhost:8443/automation-api/run/job/workbench:000c1/log?token=aacbcfb1d694a81d63405646b5790532_DFB83CA2"
    },
    {
      "jobId": "workbench:000c2",
      "folderId": "workbench:000c1",
      "numberOfRuns": 1,
      "name": "GetData",
      "folder": "AutomationAPIFileTransferDatabaseSampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "startTime": "May 23, 2017 4:25:10 PM",
      "endTime": "May 23, 2017 4:25:17 PM",
      "outputURI": "https://localhost:8443/automation-api/run/job/workbench:000c2/output?token=aacbcfb1d694a81d63405646b5790532_DFB83CA2",
      "logURI": "https://localhost:8443/automation-api/run/job/workbench:000c2/log?token=aacbcfb1d694a81d63405646b5790532_DFB83CA2"
    },
    {
      "jobId": "workbench:000c3",
      "folderId": "workbench:000c1",
      "numberOfRuns": 1,
      "name": "UpdateRecords",
      "folder": "AutomationAPIFileTransferDatabaseSampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "startTime": "May 23, 2017 4:25:18 PM",
      "endTime": "May 23, 2017 4:25:25 PM",
      "outputURI": "https://localhost:8443/automation-api/run/job/workbench:000c3/output?token=aacbcfb1d694a81d63405646b5790532_DFB83CA2",
      "logURI": "https://localhost:8443/automation-api/run/job/workbench:000c3/log?token=aacbcfb1d694a81d63405646b5790532_DFB83CA2"
    }
  ],
  "startIndex": 0,
  "itemsPerPage": 25,
  "total": 3,
  "monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=ce62ace0-4a6e-4b17-afdd-35335cbf179e&title=Status_ce62ace0-4a6e-4b17-afdd-35335cbf179e"
}

You can now see that both jobs Ended OK.

Let's view the output of GetData. Use the jobId to get this information.

> ctm run job:output::get "workbench:000c2"

+ Job started at '0523 16:25:15:884' orderno - '000c2' runno - '00001' Number of transfers - 1
+ Host1 XXXXX' username XXXX - Host2 'localhost' username XXXX
Local host is XXX
Connection to SFTP server on host XXX was established
Connection to Local server on host localhost was established
+********** Starting transfer #1 out of 1**********
* Executing pre-commands on host localhost
rm c:\temp\XXXX
File 'c:\temp\XXX removed successfully
Transfer type: BINARY
Open data connection to retrieve file /home/user/XXX
Open data connection to store file c:\temp\XXX
Transfer #1 transferring
Src file: '/ home/user/XXX ' on host 'XXXX'
Dst file: 'c:\temp\XXX on host 'localhost'
Transferred:          628       Elapsed:    0 sec       Percent: 100    Status: In Progress
File transfer status: Ended OK
Destination file size vs. source file size validation passed
* Executing post-commands on host localhost
chmod 700 c:\temp\XXX
Transfer #1 completed successfully
Job executed successfully. exiting.
Job ended at '0523 16:25:16:837'
Elapsed time [0 sec]

 Let's view the output of UpdateRecords. Use the jobId to get this information.

> ctm run job:output::get "workbench:000c6"

Environment information:
+--------------------+--------------------------------------------------+
|Account Name        |DB-CP                                             |
+--------------------+--------------------------------------------------+
|Database Vendor     |PostgreSQL                                        |
+--------------------+--------------------------------------------------+
|Database Version    |9.2.8                                             |
+--------------------+--------------------------------------------------+

Request statement:
------------------
select 'Parameter';

Job statistics:
+-------------------------+-------------------------+
|Start Time               |20170523163619           |
+-------------------------+-------------------------+
|End Time                 |20170523163619           |
+-------------------------+-------------------------+
|Elapsed Time             |13                       |
+-------------------------+-------------------------+
|Number Of Affected Rows  |1                        |
+-------------------------+-------------------------+
Exit Code    = 0
Exit Message = Normal completion

Step 8 - View job details through an interactive interface

Control-M Workbench offers an interactive user interface for debugging purposes. Through this interface, you can view various job run details (including, for example, an activity log and statistics for each job). To launch this interface when you run jobs, enter "--interactive" or "-i" at the end of the run  command.

> ctm run AutomationAPIFileTransferDatabaseSampleFlow.json--interactive

{
  "runId": "ce62ace0-4a6e-4b17-afdd-35335cbf179e",
  "statusURI": "https://localhost:8443/automation-api/run/status/ce62ace0-4a6e-4b17-afdd-35335cbf179e?token=737a87efc43805ecf30263fb2863bea5_2E8C3C6C",
  "monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=ce62ace0-4a6e-4b17-afdd-35335cbf179e&title= AutomationAPIFileTransferDatabaseSampleFlow.json"

A browser window opens, where you can view and manage your jobs.

Where to go from here

Back to top

Running a Hadoop-Spark job flow

This example walks you through writing Hadoop and Spark jobs that run in sequence. To complete this tutorial, you need a Hadoop edge node where the Hadoop client software is installed.

Let's verify that Hadoop and HDFS are operational using the following commands:

> hadoop version

Hadoop 2.6.0-cdh5.4.2
Subversion http://github.com/cloudera/hadoop -r 15b703c8725733b7b2813d2325659eb7d57e7a3f
Compiled by jenkins on 2015-05-20T00:03Z
Compiled with protoc 2.5.0
From source with checksum de74f1adb3744f8ee85d9a5b98f90d
This command was run using /usr/jars/hadoop-common-2.6.0-cdh5.4.2.jar
 
> hadoop fs -ls /

Found 5 items
drwxr-xr-x   - hbase supergroup          0 2015-12-13 02:32 /hbase
drwxr-xr-x   - solr  solr                0 2015-06-09 03:38 /solr
drwxrwxrwx   - hdfs  supergroup          0 2016-03-20 07:11 /tmp
drwxr-xr-x   - hdfs  supergroup          0 2016-03-29 06:51 /user
drwxr-xr-x   - hdfs  supergroup          0 2015-06-09 03:36 /var

Step 1 - Find the image to provision

The provision images command lists the images available to install.

> ctm provision images Linux

[
  "Agent.Linux",
  "ApplicationsAgent.Linux",
  "BigDataAgent.Linux"
]

As you can see, there are three available Linux images:

  • Agent.Linux - provides the ability to run scripts, programs, and commands. 
  • ApplicationsAgent.Linux - in addition to Agent.Linux, adds plugins to run file transfer jobs and database SQL scripts.
  • BigDataAgent.Linux - in addition to Agent.Linux, adds a plugin to run Hadoop and Spark jobs.

In this example, we will provision the BigDataAgent.Linux image.

Step 2 - Provision the BigDataAgent image

Run the following command on a Linux system:

ctm provision install BigDataAgent.Linux

After provisioning the BigDataAgent successfully, you now have a running instance of Control-M/Agent on your Hadoop edge node.

Now let's access the tutorial samples code.

Step 3 - Access the tutorial samples

Go to the directory where the tutorial sample is located:

cd automation-api-quickstart/101-running-hadoop-spark-job-flow

Step 4 - Verify the code for Control-M

Let's take the AutomationAPISampleHadoopFlow.json file, which contains job definitions, and verify that the code within it is valid. To do so, use the build command. The following example shows the command and a typical successful response.

> ctm build AutomationAPISampleHadoopFlow.json

[
  {
    "deploymentFile": "AutomationAPISampleHadoopFlow.json",
    "successfulFoldersCount": 0,
    "successfulSmartFoldersCount": 1,
    "successfulSubFoldersCount": 0,
    "successfulJobsCount": 2,
    "successfulConnectionProfilesCount": 0
  }
]

If the code is not valid, an error is returned.

Step 5 - Examine the source code

Let's look at the source code in the AutomationAPISampleHadoopFlow.json file. By examining the contents of this file, you'll learn about the structure of the job flow and what it should contain.

    {
    "Defaults" : {
        "Application": "SampleApp",
        "SubApplication": "SampleSubApp",
        "Host" : "<HOST>",
        "When" : {
            "FromTime":"0300",
            "ToTime":"2100"
        },
        "Job:Hadoop" : {
            "ConnectionProfile": "SampleConnectionProfile"
        }
    },
    "SampleConnectionProfile" :
    {
        "Type" : "ConnectionProfile:Hadoop",
        "TargetAgent" : "<HOST>"
    },
    "AutomationAPIHadoopSampleFlow": {
        "Type": "Folder",
        "Comment" : "Code reviewed by John",
        "ProcessData": {
            "Type": "Job:Hadoop:Spark:Python",
            "SparkScript": "file:///home/<USER>/automation-api-quickstart/101-running-hadoop-spark-job-flow/processData.py",
            
            "Arguments": [
                "file:///home/<USER>/automation-api-quickstart/101-running-hadoop-spark-job-flow/processData.py",
                "file:///home/<USER>/automation-api-quickstart/101-running-hadoop-spark-job-flow/processDataOutDir"
            ],
            "PreCommands" : {
                "Commands" : [
                    { "rm":"-R -f file:///home/<USER>/automation-api-quickstart/101-running-hadoop-spark-job-flow/processDataOutDir" }
                ]                   
            }
        },
        "CopyOutputData" :
        {
            "Type" : "Job:Hadoop:HDFSCommands",
            "Commands" : [
                {"rm"    : "-R -f samplesOut" },
                {"mkdir" : "samplesOut" },
                {"cp"   : "file:///home/<USER>/automation-api-quickstart/101-running-hadoop-spark-job-flow/* samplesOut" }
            ]
        },
        "DataProcessingFlow": {
            "Type": "Flow",
            "Sequence": ["ProcessData","CopyOutputData"]
        }
    }
}

This example contains the following two jobs — a Spark job named ProcessData, and an HDFS Commands job named CopyOutputData. These jobs are contained within a folder named AutomationAPIHadoopSampleFlow. To define the sequence of job execution, the Flow object is used.  

Note that in the Spark job we use the "PreCommands" object to clean up output from any previous Spark job runs.  

The "SampleConnectionProfile" object is used to define the connection parameters to the Hadoop cluster. Note that for Sqoop and Hive, it is used to set data sources and credentials.

Here is the code of processData.py:

from __future__ import print_function
 
import sys
from pyspark import SparkContext
 
inputFile  = sys.argv[1]
outputDir = sys.argv[2]
 
sc = SparkContext(appName="processDataSampel")
text_file = sc.textFile(inputFile)
counts = text_file.flatMap(lambda line: line.split(" ")) \
             .map(lambda word: (word, 1)) \
             .reduceByKey(lambda a, b: a + b)
 
counts.saveAsTextFile(outputDir)

Step 6 - Modify the code to run in your environment

You need to modify the path to the samples directory for the jobs to run successfully in your environment. Replace the URI  file:///home/<USER>/automation-api-quickstart/101-running-hadoop-spark-job-flow/ with the location of the samples that you installed on your machine.

"SparkScript": "file:///home/<USER>/automation-api-quickstart/101-running-hadoop-spark-job-flow/processData.py",
"Arguments": [
    "file:///home/<USER>/automation-api-quickstart/101-running-hadoop-spark-job-flow/processData.py",
    "file:///home/<USER>/automation-api-quickstart/101-running-hadoop-spark-job-flow/processDataOutDir"
], 
{ "rm":"-R -f file:///home/<USER>/automation-api-quickstart/101-running-hadoop-spark-job-flow/processDataOutDir" }
{"cp" : "file:///home/<USER>/automation-api-quickstart/101-running-hadoop-spark-job-flow/* samplesOut" }

For example: file:///home/user1/automation-api-quickstart/101-running-hadoop-spark-job-flow/

In addition, replace the value for "TargetAgent" and "Host" with the host name of the machine where we provisioned the Control-M/Agent.

"TargetAgent" : "<HOST>"
"Host" : "<HOST>"

Step 7 - Run the sample

Now that we've modified the source code in the AutomationAPISampleHadoopFlow.json file, let's run the sample:

> ctm run AutomationAPISampleHadoopFlow.json

{
  "runId": "6aef1ce1-3c57-4866-bf45-3a6afc33e27c",
  "statusURI": "https://10.64.107.21:8443/automation-api/run/status/6aef1ce1-3c57-4866-bf45-3a6afc33e27c?token=224926c45a2815504ff12cb119ed4356_93C1D4CC",
  "monitorPageURI": "https://10.64.107.21:8443/SelfService#Workbench:runid=6aef1ce1-3c57-4866-bf45-3a6afc33e27c&title=AutomationAPISampleHadoopFlow.json"
}

Each time the code runs, a new runId is generated. Let's take the runId, and check the job statuses:

> ctm run status "6aef1ce1-3c57-4866-bf45-3a6afc33e27c"

{
  "statuses": [
    {
      "jobId": "workbench:000ca",
      "folderId": "workbench:00000",
      "numberOfRuns": 1,
      "name": "AutomationAPIHadoopSampleFlow",
      "type": "Folder",
      "status": "Ended OK",
      "startTime": "May 24, 2017 1:03:18 PM",
      "endTime": "May 24, 2017 1:03:45 PM",
      "outputURI": "Folder has no output",
      "logURI": "https://10.64.107.21:8443/automation-api/run/job/workbench:000ca/log?token=9d333198364e10b3b2090290c797e1f4_E9F8C76B"
    },
    {
      "jobId": "workbench:000cb",
      "folderId": "workbench:000ca",
      "numberOfRuns": 1,
      "name": "ProcessData",
      "folder": "AutomationAPIHadoopSampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "startTime": "May 24, 2017 1:03:18 PM",
      "endTime": "May 24, 2017 1:03:32 PM",
      "outputURI": "https://10.64.107.21:8443/automation-api/run/job/workbench:000cb/output?token=9d333198364e10b3b2090290c797e1f4_E9F8C76B",
      "logURI": "https://10.64.107.21:8443/automation-api/run/job/workbench:000cb/log?token=9d333198364e10b3b2090290c797e1f4_E9F8C76B"
    },
    {
      "jobId": "workbench:000cc",
      "folderId": "workbench:000ca",
      "numberOfRuns": 1,
      "name": "CopyOutputData",
      "folder": "AutomationAPIHadoopSampleFlow",
      "type": "Job",
      "status": "Ended OK",
      "startTime": "May 24, 2017 1:03:33 PM",
      "endTime": "May 24, 2017 1:03:44 PM",
      "outputURI": "https://10.64.107.21:8443/automation-api/run/job/workbench:000cc/output?token=9d333198364e10b3b2090290c797e1f4_E9F8C76B",
      "logURI": "https://10.64.107.21:8443/automation-api/run/job/workbench:000cc/log?token=9d333198364e10b3b2090290c797e1f4_E9F8C76B"
    }
  ],
  "startIndex": 0,
  "itemsPerPage": 25,
  "total": 3,
  "monitorPageURI": "https://10.64.107.21:8443/SelfService#Workbench:runid=6aef1ce1-3c57-4866-bf45-3a6afc33e27c&title=Status_6aef1ce1-3c57-4866-bf45-3a6afc33e27c"
} 

You can see that the status of both jobs is "Ended OK".

Let's view the output of CopyOutputData. Use the jobId to get this information.

> ctm run job:output::get workbench:000cc

Environment information:
+--------------------+--------------------------------------------------+
|Account Name        |SampleConnectionProfile                           |
+--------------------+--------------------------------------------------+

Job is running as user: cloudera
-----------------------
Running the following HDFS command:
-----------------------------------
hadoop fs -rm -R -f samplesOut

HDFS command output:
-------------------
Deleted samplesOut
script return value 0
-----------------------------------------------------------
-----------------------------------------------------------

Job is running as user: cloudera
-----------------------
Running the following HDFS command:
-----------------------------------
hadoop fs -mkdir samplesOut

HDFS command output:
-------------------
script return value 0
-----------------------------------------------------------
-----------------------------------------------------------

Job is running as user: cloudera
-----------------------
Running the following HDFS command:
-----------------------------------
hadoop fs -cp file:///home/cloudera/automation-api-quickstart/101-running-hadoop-spark-job-flow/* samplesOut

HDFS command output:
-------------------
script return value 0
-----------------------------------------------------------
-----------------------------------------------------------

Application reports:
--------------------
-> no hadoop application reports were created for the job execution.

Job statistics:
--------------
+-------------------------+-------------------------+
|Start Time               |20170524030335           |
+-------------------------+-------------------------+
|End Time                 |20170524030346           |
+-------------------------+-------------------------+
|Elapsed Time             |1065                     |
+-------------------------+-------------------------+
Exit Message = Normal completion 

Step 8 - View job details through an interactive interface

Control-M Workbench offers an interactive user interface for debugging purposes. Through this interface, you can view various job run details (including, for example, an activity log and statistics for each job). To launch this interface when you run jobs, enter "--interactive" or "-i" at the end of the run  command.

> ctm run AutomationAPISampleHadoopFlow.json--interactive

{
  "runId": "40586805-60b5-4acb-9f21-a0cf048f1051",
  "statusURI": "https://ec2-54-187-1-168.us-west-2.compute.amazonaws.com:8443/run/status/40586805-60b5-4acb-9f21-a0cf048f1051",
  "monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=40586805-60b5-4acb-9f21-a0cf048f1051&title=AutomationAPISampleHadoopFlow.json
}

A browser window opens, where you can view and manage your jobs.

Where to go from here

Back to top

Was this page helpful? Yes No Submitting... Thank you

Comments