Tutorial - Running applications and programs in your environment

Agents are required to run jobs, so you will need an agent on your application host. The Provision service enables you to install and set up a Control-M/Agent.

Select the relevant application:

Scripts, programs, and commands in sequence
File transfer and database scripts in sequence
Hadoop-Spark jobs that run in sequence

Before you begin

Ensure that you have set up your environment, as described in Setting up the prerequisites.

Running a script and command job flow

This example walks you through running a script and command in sequence. You need a Windows 64-bit machine or Linux 64-bit machine that has access to scripts and programs that you would like to run.

Step 1 - Find the image to provision

The provision images command lists the images available to install.

> ctm provision images Linux

[
"Agent.Linux",
"Agent_18.Linux",
"ApplicationsAgent.Linux",
"BigDataAgent.Linux",
"Server.Linux"
]

> ctm provision images Windows

[
"Agent.Windows",
"Agent_18.Windows",
"ApplicationsAgent.Windows",
"Server.Windows"
]

As you can see, there are several available images:

Agent.Linux/Agent.Windows and Agent_18.Linux/Agent_18.Windows- provides the ability to run scripts and commands, with a Control-M/Agent of version 9.0 or version 9.0.18.
ApplicationsAgent.Linux/ApplicationsAgent.Windows - in addition to Agent.Linux/Agent.Windows, adds plugins to run file transfer jobs and database SQL scripts.
BigDataAgent.Linux - in addition to Agent.Linux, adds a plugin to run Hadoop and Spark jobs.
Server.Linux/Server.Windows - provisions a Control-M/Server.

In this example, you will provision Agent.Windows or Agent.Linux according to the jobs that you would like to run.

Step 2 - Provision the Agent image

On a Windows system, run the following command as Administrator:

ctm provision install Agent.Windows

On Linux, run the following command:

ctm provision install Agent.Linux

After provisioning the Agent successfully, you now have a running instance of your Control-M/Agent on your host.

Step 3 - Access the tutorial samples

Go to the directory where the tutorial sample is located:

cd automation-api-quickstart/control-m/101-running-script-command-job-flow

Step 4 - Verify the code for Control-M

Let's take the AutomationAPISampleFlow.json file, which contains job definitions, and verify that the code within it is valid. To do so, use the build command. The following example shows the command and a typical successful response.

> ctm build AutomationAPISampleFlow.json

[
{
   "deploymentFile": "AutomationAPISampleFlow.json",
   "successfulFoldersCount": 0,
   "successfulSmartFoldersCount": 1,
   "successfulSubFoldersCount": 0,
   "successfulJobsCount": 2,
   "successfulConnectionProfilesCount": 0,
   "isDeployDescriptorValid": false
}
]

If the code is not valid, an error is returned.

Step 5 - Run the source code

Use the run command to run the jobs on the Control-M environment. The returned runId is used to check the job status. The following shows the command and a typical successful response.

> ctm run AutomationAPISampleFlow.json

{

"runId": "7cba67de-9e0d-409d-8d93-1b8229432eee",
"statusURI": "https://localhost:8443/automation-api/run/status/7cba67de-9e0d-409d-8d93-1b82294e?token=4f8684ec6754e08cc70f95b5f09d3a47_A1FD0E65",
"monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=7cba67de-9e0d-409d-8d93-29432eee&title=AutomationAPISampleHadoopFlow.json"
}

This code ran successfully and returned the runId of "7cba67de-9e0d-409d-8d93-1b8229432eee".

Step 6 - Check job status using the runId

The following command shows how to check job status using the runId. Note that when there is more than one job in the flow, the status of each job is checked and returned.

> ctm run status "7cba67de-9e0d-409d-8d93-1b8229432eee"

{
"statuses": [
    {
     "jobId": "workbench:00007",
     "folderId": "workbench:00000",
     "numberOfRuns": 1,
     "name": "AutomationAPISampleFlow",
     "type": "Folder",
     "status": "Executing",
     "startTime": "Apr 26, 2017 10:43:47 AM",
     "endTime": "",
     "outputURI": "Folder has no output",
     "logURI": "https://localhost:8443/automation-api/run/job/workbench:00007/log?token=01ab65917bc71dbef610806dd9cb3f94_0007C46B"
    },
    {
     "jobId": "workbench:00008",
     "folderId": "workbench:00007",
     "numberOfRuns": 0,
     "name": "CommandJob",
     "folder": "AutomationAPISampleFlow",
     "type": "Command",
     "status": "Wait Host",
     "startTime": "",
     "endTime": "",
     "outputURI": "Job did not run, it has no output",
     "logURI": "https://localhost:8443/automation-api/run/job/workbench:00008/log?token=01ab65917bc71dbef610806dd9cb3f94_0007C46B"
    },
    {
     "jobId": "workbench:00009",
     "folderId": "workbench:00007",
     "numberOfRuns": 0,
     "name": "ScriptJob",
     "folder": "AutomationAPISampleFlow",
     "type": "Job",
     "status": "Wait Condition",
     "startTime": "",
     "endTime": "",
     "outputURI": "Job did not run, it has no output",
     "logURI": "https://localhost:8443/automation-api/run/job/workbench:00009/log?token=01ab65917bc71dbef610806dd9cb3f94_0007C46B"
    }
  ],
"startIndex": 0,
"itemsPerPage": 25,
"total": 3,
"monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=7cba67de-9e0d-409d-8d93-1b8229432eee&title=Status_7cba67de-9e0d-409d-8d93-1b8229432eee"

Step 7 - Examine the source code

Let's look at the source code in the AutomationAPISampleFlow.json file. By examining the contents of this file, you'll learn about the structure of the job flow and what it should contain.

{
   "Defaults" : {
       "Application" : "SampleApp",
       "SubApplication" : "SampleSubApp",
       "RunAs" : "USERNAME",
       "Host" : "HOST",
       "Job": {
           "When" : {
               "Months": ["JAN", "OCT", "DEC"],
               "MonthDays":["22","1","11"],
               "WeekDays":["MON","TUE", "WED", "THU", "FRI"],
               "FromTime":"0300",
               "ToTime":"2100"
            },
           "ActionIfFailure" : {
               "Type": "If",
               "CompletionStatus": "NOTOK",

               "mailToTeam": {
                   "Type": "Mail",
                   "Message": "%%JOBNAME failed",
                   "To": "team@mycomp.com"
                }
            }
        }
    },
   "AutomationAPISampleFlow": {
       "Type": "Folder",
       "Comment" : "Code reviewed by John",
       "CommandJob": {
           "Type": "Job:Command",
           "Command": "COMMAND"
        },
       "ScriptJob": {
           "Type": "Job:Script",
         "FilePath":"SCRIPT_PATH",
         "FileName":"SCRIPT_NAME"
        },
       "Flow": {
           "Type": "Flow",
           "Sequence": ["CommandJob", "ScriptJob"]
        }
    }
}

The first object is called "Defaults". It allows you to define a parameter once for all objects. For example, it includes scheduling using the When parameter, which configures all jobs to run according to the same scheduling criteria. The "ActionIfFailure" object determines what action is taken if a job ends unsuccessfully.

This example contains two jobs: CommandJob and ScriptJob. These jobs are contained within a folder named AutomationAPISampleFlow. To define the sequence of job execution, the Flow object is used.

Step 8 - Modify the code to run in your environment

In the code above, the following parameters need to be set to run the jobs in your environment:

"RunAs" : "USERNAME"
"Host" : "HOST"

"Command": "COMMAND"
"FilePath":"SCRIPT_PATH"
"FileName":"SCRIPT_NAME"

RunAs identifies the operating system user that will execute the jobs.

Host defines the machine where you provisioned the Control-M/Agent.

Command defines the command to run according to your operating system.

FilePath and FileName define the location and name of the file that contains the script to run.

Note: In JSON, the backslash character must be doubled (\\) when used in a Windows file path.

Step 9 - Rerun the code sample

Now that we've modified the source code in the AutomationAPISampleFlow.json file, let's rerun the sample:

> ctm run AutomationAPISampleFlow.json

{
"runId": "ed40f73e-fb7a-4f07-a71c-bc2dfbc48494",
"statusURI": "https://localhost:8443/automation-api/run/status/ed40f73e-fb7a-4f07-a71c-bc2dfbc48494?token=460e0106b369a0d155bb0e7cbb44f8eb_7E6C03FA",
"monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=ed40f73e-fb7a-4f07-a71c-bc2dfbc48494&title=AutomationAPISampleFlow.json"
}

Each time you run the code, a new runId is generated. Let's take the new runId, and check the jobs statuses again:

> ctm run status "ed40f73e-fb7a-4f07-a71c-bc2dfbc48494"

{
"statuses": [
   {
     "jobId": "workbench:0000p",
     "folderId": "workbench:00000",
     "numberOfRuns": 1,
     "name": "AutomationAPISampleFlow",
     "type": "Folder",
     "status": "Ended OK",
     "startTime": "May 3, 2017 4:57:25 PM",
     "endTime": "May 3, 2017 4:57:28 PM",
     "outputURI": "Folder has no output",
     "logURI": "https://localhost:8443/automation-api/run/job/workbench:0000p/log?token=a8d74f5914dc6decdfd8b2ec833d54cc_3E30FFC9"
   },
   {
     "jobId": "workbench:0000q",
     "folderId": "workbench:0000p",
     "numberOfRuns": 1,
     "name": "CommandJob",
     "folder": "AutomationAPISampleFlow",
     "type": "Command",
     "status": "Ended OK",
     "startTime": "May 3, 2017 4:57:26 PM",
     "endTime": "May 3, 2017 4:57:26 PM",
     "outputURI": "https://localhost:8443/automation-api/run/job/workbench:0000q/output?token=a8d74f5914dc6decdfd8b2ec833d54cc_3E30FFC9",
     "logURI": "https://localhost:8443/automation-api/run/job/workbench:0000q/log?token=a8d74f5914dc6decdfd8b2ec833d54cc_3E30FFC9"
   },
   {
     "jobId": "workbench:0000r",
     "folderId": "workbench:0000p",
     "numberOfRuns": 1,
     "name": "ScriptJob",
     "folder": "AutomationAPISampleFlow",
     "type": "Job",
     "status": "Ended OK",
     "startTime": "May 3, 2017 4:57:27 PM",
     "endTime": "May 3, 2017 4:57:27 PM",
     "outputURI": "https://localhost:8443/automation-api/run/job/workbench:0000r/output?token=a8d74f5914dc6decdfd8b2ec833d54cc_3E30FFC9",
     "logURI": "https://localhost:8443/automation-api/run/job/workbench:0000r/log?token=a8d74f5914dc6decdfd8b2ec833d54cc_3E30FFC9"
   }
  ],
"startIndex": 0,
"itemsPerPage": 25,
"total": 3,
"monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=ed40f73e-fb7a-4f07-a71c-bc2dfbc48494&title=Status_ed40f73e-fb7a-4f07-a71c-bc2dfbc48494"
}

You can now see that both jobs Ended OK.

Let's view the output of CommandJob. Use the jobId to get this information.

ctm run job:output::get "workbench:0000q"

Verify that the output contains your script or command details.

Step 10 - View job details through an interactive interface

Control-M Workbench offers an interactive user interface for debugging purposes. Through this interface, you can view various job run details (including, for example, an activity log and statistics for each job). To launch this interface when you run jobs, enter "--interactive" or "-i" at the end of the run command.

> ctm run AutomationAPISampleFlow.json --interactive

{
"runId": "40586805-60b5-4acb-9f21-a0cf048f1051",
"statusURI": "https://ec2-54-187-1-168.us-west-2.compute.amazonaws.com:8443/run/status/40586805-60b5-4acb-9f21-a0cf048f1051",
"monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=40586805-60b5-4acb-9f21-a0cf048f1051&title=AutomationAPISampleFlow.json
}

A browser window opens, where you can view and manage your jobs.

Where to go from here

To learn more about what you can do with the Control-M Automation API, read through the Code-Reference and Automation API Services.
Proceed to the next tutorial, where you will learn how to automate code deployments.

_{Back to top}

Running a file transfer and database queries job flow

This example walks you through running file transfer and database query jobs in sequence. To complete this tutorial, you need a PostgreSQL database (or you can use other databases) and SFTP server. For this example, you need to install the Agent on a machine that has a network connection to these servers.

Step 1 - Find the image to provision

The provision images command lists the images available to install.

> ctm provision images Linux

[
"Agent.Linux",
"Agent_18.Linux",
"ApplicationsAgent.Linux",
"BigDataAgent.Linux",
"Server.Linux"
]

> ctm provision images Windows

[
"Agent.Windows",
"Agent_18.Windows",
"ApplicationsAgent.Windows",
"Server.Windows"
]

As you can see, there are several available images:

Agent.Linux/Agent.Windows and Agent_18.Linux/Agent_18.Windows- provides the ability to run scripts and commands, with a Control-M/Agent of version 9.0 or version 9.0.18.
ApplicationsAgent.Linux/ApplicationsAgent.Windows- in addition to Agent.Linux/Agent.Windows, adds plugins to run file transfer jobs and database SQL scripts.
BigDataAgent.Linux- in addition to Agent.Linux, adds a plugin to run Hadoop and Spark jobs.
Server.Linux/Server.Windows - provisions a Control-M/Server.

In this example, you will provision ApplicationsAgent.Windows or ApplicationAgent.Linux according to the machine that you use to run the jobs.

Step 2 - Provision the Agent image

On a Windows system, run the following command as Administrator:

ctm provision install ApplicationsAgent.Windows

On Linux, run the following command:

ctm provision install ApplicationsAgent.Linux

After provisioning the Agent successfully, you now have a running instance of Control-M/Agent on your host.

Step 3 - Access the tutorial samples

Go to the directory where the tutorial sample is located:

cd automation-api-quickstart/control-m/101-running-file-transfer-and-database-query-job-flow

Step 4 - Verify the code for Control-M

Let's take the AutomationAPIFileTransferDatabaseSampleFlow.json file, which contains job definitions, and verify that the code within it is valid. To do so, use the build command. The following example shows the command and a typical successful response.

> ctm build AutomationAPIFileTransferDatabaseSampleFlow.json

[
{
   "deploymentFile": "AutomationAPIFileTransferDatabaseSampleFlow.json",
   "successfulFoldersCount": 0,
   "successfulSmartFoldersCount": 1,
   "successfulSubFoldersCount": 0,
   "successfulJobsCount": 2,
   "successfulConnectionProfilesCount": 3,
   "successfulDriversCount": 0,
   "isDeployDescriptorValid": false
}
]

If the code is not valid, an error is returned.

Step 5 - Examine the source code

Let's look at the source code in the AutomationAPIFileTransferDatabaseSampleFlow.json file. By examining the contents of this file, you'll learn about the structure of the job flow and what it should contain.

{
   "Defaults" : {
       "Application" : "SampleApp",
       "SubApplication" : "SampleSubApp",
       "Host" : "HOST",
       "TargetAgent" : "HOST",

       "Variables": [
          {"DestDataFile": "DESTINATION_FILE"},
          {"SrcDataFile": "SOURCE_FILE"}
        ],

       "When" : {
           "FromTime":"0300",
           "ToTime":"2100"
       }
   },
   "SFTP-CP": {
       "Type": "ConnectionProfile:FileTransfer:SFTP",
       "HostName": "SFTP_SERVER",
       "Port": "22",
       "User" : "SFTP_USER",
       "Password" : "SFTP_PASSWORD"
   },
   "Local-CP" : {
       "Type" : "ConnectionProfile:FileTransfer:Local",
       "User" : "USER",
       "Password" : "PASSWORD"
   },
   "DB-CP": {
       "Type": "ConnectionProfile:Database:PostgreSQL",
       "Host": "DATABASE_SERVER",
       "Port":"5432",
       "User": "DATABASE_USER",
       "Password": "DATABASE_PASSWORD",
       "DatabaseName": "postgres"
   },
   "AutomationAPIFileTransferDatabaseSampleFlow": {
       "Type": "Folder",
       "Comment" : "Code reviewed by John",
       "GetData": {
           "Type" : "Job:FileTransfer",
           "ConnectionProfileSrc" : "SFTP-CP",
           "ConnectionProfileDest" : "Local-CP",

           "FileTransfers" :
            [
               {
                   "Src" : "%%SrcDataFile",
                   "Dest": "%%DestDataFile",
                   "TransferOption": "SrcToDest",
                   "TransferType": "Binary",
                   "PreCommandDest": {
                       "action": "rm",
                       "arg1": "%%DestDataFile"
                   },
                   "PostCommandDest": {
                       "action": "chmod",
                       "arg1": "700",
                       "arg2": "%%DestDataFile"
                   }
               }
            ]
       },
       "UpdateRecords": {
           "Type": "Job:Database:SQLScript",
           "SQLScript": "/home/USER/automation-api-quickstart/control-m/101-running-file-transfer-and-database-query-job-flow/processRecords.sql",
           "ConnectionProfile": "DB-CP"
       },
       "Flow": {
           "Type": "Flow",
           "Sequence": ["GetData", "UpdateRecords"]
       }
   }
}

The first object is called "Defaults". It allows you to define a parameter once for all objects. For example, it includes scheduling using the When parameter, which configures all jobs to run according to the same scheduling criteria. The Defaults also includes Variables that are referenced several times in the jobs.

The sample contains two jobs: GetData and UpdateRecords. GetData transfers files from the SFTP server to the host machine. UpdateRecords performs a SQL query on the database. Both jobs are contained within a folder named AutomationAPIFileTransferDatabaseSampleFlow. To define the sequence of job execution, the Flow object is used.

The sample also includes the following three connection profiles:

SFTP-CP defines access and security credentials for the SFTP server.
DB-CP defines access and security credentials for the database.
Local-CP defines access and security credentials for files that are transferred to the local machine.

Step 6 - Modify the code to run in your environment

In the code sample, perform the following modifications:

Replace the value for "TargetAgent" and "Host" with the host name of the machine where we provisioned the Control-M/Agent.
"TargetAgent" : "HOST"
"Host" : "HOST"
Replace the value of "SrcDataFile" with the file that is transferred from the SFTP server, and the value of "DestDataFile" with the path of the transferred file on the host machine.
{"DestDataFile": "DESTINATION_FILE"},
{"SrcDataFile": "SOURCE_FILE"}
Modify the path to the samples directory for the jobs to run successfully in your environment. Replace the path /home/USER/automation-api-quickstart/control-m/101-running-file-transfer-and-database-query-job-flow with the location of the samples that you installed on your machine.
"SQLScript": "/home/USER/automation-api-quickstart/control-m/101-running-file-transfer-and-database-query-job-flow/processRecords.sql"
Replace the following parameters with the credentials used to login to the SFTP server.
        "HostName": "SFTP_SERVER",
       "User" : "SFTP_USER",
       "Password" : "SFTP_PASSWORD"
Replace the following parameters with the credentials used to access the database server.
        "Host": "DATABASE_SERVER",
       "Port":"5432",
       "User": "DATABASE_USER",
       "Password": "DATABASE_PASSWORD",
Replace the following parameters with the credentials used for read/write files on the host machine.
   "Local-CP" : {
       "Type" : "ConnectionProfile:FileTransfer:Local",
       "User" : "USER",
       "Password" : ""
   }

Step 7 - Run the code sample

Now that we've modified the source code in the previous step, let's run the sample:

> ctm run AutomationAPIFileTransferDatabaseSampleFlow.json

{
"runId": "ce62ace0-4a6e-4b17-afdd-35335cbf179e",
"statusURI": "https://localhost:8443/automation-api/run/status/ce62ace0-4a6e-4b17-afdd-35335cbf179e?token=737a87efc43805ecf30263fb2863bea5_2E8C3C6C",
"monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=ce62ace0-4a6e-4b17-afdd-35335cbf179e&title= AutomationAPIFileTransferDatabaseSampleFlow.json"
}

Each time you run the code, a new runId is generated. Let's take the runId and check the jobs statuses:

> ctm run status "ce62ace0-4a6e-4b17-afdd-35335cbf179e"

{
"statuses": [
   {
     "jobId": "workbench:000c1",
     "folderId": "workbench:00000",
     "numberOfRuns": 1,
     "name": "AutomationAPIFileTransferDatabaseSampleFlow",
     "type": "Folder",
     "status": "Ended OK",
     "startTime": "May 23, 2017 4:25:10 PM",
     "endTime": "May 23, 2017 4:25:26 PM",
     "outputURI": "Folder has no output",
     "logURI": "https://localhost:8443/automation-api/run/job/workbench:000c1/log?token=aacbcfb1d694a81d63405646b5790532_DFB83CA2"
   },
   {
     "jobId": "workbench:000c2",
     "folderId": "workbench:000c1",
     "numberOfRuns": 1,
     "name": "GetData",
     "folder": "AutomationAPIFileTransferDatabaseSampleFlow",
     "type": "Job",
     "status": "Ended OK",
     "startTime": "May 23, 2017 4:25:10 PM",
     "endTime": "May 23, 2017 4:25:17 PM",
     "outputURI": "https://localhost:8443/automation-api/run/job/workbench:000c2/output?token=aacbcfb1d694a81d63405646b5790532_DFB83CA2",
     "logURI": "https://localhost:8443/automation-api/run/job/workbench:000c2/log?token=aacbcfb1d694a81d63405646b5790532_DFB83CA2"
   },
   {
     "jobId": "workbench:000c3",
     "folderId": "workbench:000c1",
     "numberOfRuns": 1,
     "name": "UpdateRecords",
     "folder": "AutomationAPIFileTransferDatabaseSampleFlow",
     "type": "Job",
     "status": "Ended OK",
     "startTime": "May 23, 2017 4:25:18 PM",
     "endTime": "May 23, 2017 4:25:25 PM",
     "outputURI": "https://localhost:8443/automation-api/run/job/workbench:000c3/output?token=aacbcfb1d694a81d63405646b5790532_DFB83CA2",
     "logURI": "https://localhost:8443/automation-api/run/job/workbench:000c3/log?token=aacbcfb1d694a81d63405646b5790532_DFB83CA2"
   }
  ],
"startIndex": 0,
"itemsPerPage": 25,
"total": 3,
"monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=ce62ace0-4a6e-4b17-afdd-35335cbf179e&title=Status_ce62ace0-4a6e-4b17-afdd-35335cbf179e"
}

You can now see that both jobs Ended OK.

Let's view the output of GetData. Use the jobId to get this information.

> ctm run job:output::get "workbench:000c2"

+ Job started at '0523 16:25:15:884' orderno - '000c2' runno - '00001' Number of transfers - 1
+ Host1 XXXXX' username XXXX - Host2 'localhost' username XXXX
Local host is XXX
Connection to SFTP server on host XXX was established
Connection to Local server on host localhost was established
+********** Starting transfer #1 out of 1**********
* Executing pre-commands on host localhost
rm c:\temp\XXXX
File 'c:\temp\XXX removed successfully
Transfer type: BINARY
Open data connection to retrieve file /home/user/XXX
Open data connection to store file c:\temp\XXX
Transfer #1 transferring
Src file: '/ home/user/XXX ' on host 'XXXX'
Dst file: 'c:\temp\XXX on host 'localhost'
Transferred: 628 Elapsed: 0 sec Percent: 100 Status: In Progress
File transfer status: Ended OK
Destination file size vs. source file size validation passed
* Executing post-commands on host localhost
chmod 700 c:\temp\XXX
Transfer #1 completed successfully
Job executed successfully. exiting.
Job ended at '0523 16:25:16:837'
Elapsed time [0 sec]

Let's view the output of UpdateRecords. Use the jobId to get this information.

> ctm run job:output::get "workbench:000c6"

Environment information:
+--------------------+--------------------------------------------------+
|Account Name        |DB-CP                                             |
+--------------------+--------------------------------------------------+
|Database Vendor     |PostgreSQL                                        |
+--------------------+--------------------------------------------------+
|Database Version    |9.2.8                                             |
+--------------------+--------------------------------------------------+

Request statement:
------------------
select 'Parameter';

Job statistics:
+-------------------------+-------------------------+
|Start Time               |20170523163619           |
+-------------------------+-------------------------+
|End Time                 |20170523163619           |
+-------------------------+-------------------------+
|Elapsed Time             |13                       |
+-------------------------+-------------------------+
|Number Of Affected Rows |1                        |
+-------------------------+-------------------------+
Exit Code    = 0
Exit Message = Normal completion

Step 8 - View job details through an interactive interface

> ctm run AutomationAPIFileTransferDatabaseSampleFlow.json--interactive

{
"runId": "ce62ace0-4a6e-4b17-afdd-35335cbf179e",
"statusURI": "https://localhost:8443/automation-api/run/status/ce62ace0-4a6e-4b17-afdd-35335cbf179e?token=737a87efc43805ecf30263fb2863bea5_2E8C3C6C",
"monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=ce62ace0-4a6e-4b17-afdd-35335cbf179e&title= AutomationAPIFileTransferDatabaseSampleFlow.json"
}

A browser window opens, where you can view and manage your jobs.

Where to go from here

To learn more about what you can do with the Control-M Automation API, read through the Code-Reference and Automation API Services.
Proceed to the next tutorial, where you will learn how to automate code deployments.

_{Back to top}

Running a Hadoop-Spark job flow

This example walks you through writing Hadoop and Spark jobs that run in sequence. To complete this tutorial, you need a Hadoop edge node where the Hadoop client software is installed.

Let's verify that Hadoop and HDFS are operational using the following commands:

> hadoop version

Hadoop 2.6.0-cdh5.4.2
Subversion http://github.com/cloudera/hadoop -r 15b703c8725733b7b2813d2325659eb7d57e7a3f
Compiled by jenkins on 2015-05-20T00:03Z
Compiled with protoc 2.5.0
From source with checksum de74f1adb3744f8ee85d9a5b98f90d
This command was run using /usr/jars/hadoop-common-2.6.0-cdh5.4.2.jar

> hadoop fs -ls /

Found 5 items
drwxr-xr-x   - hbase supergroup          0 2015-12-13 02:32 /hbase
drwxr-xr-x   - solr solr                0 2015-06-09 03:38 /solr
drwxrwxrwx   - hdfs supergroup          0 2016-03-20 07:11 /tmp
drwxr-xr-x   - hdfs supergroup          0 2016-03-29 06:51 /user
drwxr-xr-x   - hdfs supergroup          0 2015-06-09 03:36 /var

Step 1 - Find the image to provision

The provision images command lists the images available to install.

> ctm provision images Linux

[
"Agent.Linux",
"Agent_18.Linux",
"ApplicationsAgent.Linux",
"BigDataAgent.Linux",
"Server.Linux"
]

As you can see, there are three available Linux images:

Agent.Linux and Agent_18.Linux - provides the ability to run scripts, programs, and commands, with a Control-M/Agent of version 9.0 or version 9.0.18.
ApplicationsAgent.Linux - in addition to Agent.Linux, adds plugins to run file transfer jobs and database SQL scripts.
BigDataAgent.Linux - in addition to Agent.Linux, adds a plugin to run Hadoop and Spark jobs.
Server.Linux/Server.Windows - provisions a Control-M/Server.

In this example, we will provision the BigDataAgent.Linux image.

Step 2 - Provision the BigDataAgent image

Run the following command on a Linux system:

ctm provision install BigDataAgent.Linux

After provisioning the BigDataAgent successfully, you now have a running instance of Control-M/Agent on your Hadoop edge node.

Now let's access the tutorial samples code.

Step 3 - Access the tutorial samples

Go to the directory where the tutorial sample is located:

cd automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow

Step 4 - Verify the code for Control-M

Let's take the AutomationAPISampleHadoopFlow.json file, which contains job definitions, and verify that the code within it is valid. To do so, use the build command. The following example shows the command and a typical successful response.

> ctm build AutomationAPISampleHadoopFlow.json

[
{
   "deploymentFile": "AutomationAPISampleHadoopFlow.json",
   "successfulFoldersCount": 0,
   "successfulSmartFoldersCount": 1,
   "successfulSubFoldersCount": 0,
   "successfulJobsCount": 2,
   "successfulConnectionProfilesCount": 0,
   "isDeployDescriptorValid": false
}
]

If the code is not valid, an error is returned.

Step 5 - Examine the source code

Let's look at the source code in the AutomationAPISampleHadoopFlow.json file. By examining the contents of this file, you'll learn about the structure of the job flow and what it should contain.

{
   "Defaults" : {
       "Application": "SampleApp",
       "SubApplication": "SampleSubApp",
       "Host" : "HOST",
       "When" : {
           "FromTime":"0300",
           "ToTime":"2100"
       },
       "Job:Hadoop" : {
           "ConnectionProfile": "SampleConnectionProfile"
       }
   },
   "SampleConnectionProfile" :
   {
       "Type" : "ConnectionProfile:Hadoop",
       "TargetAgent" : "HOST"
   },
   "AutomationAPIHadoopSampleFlow": {
       "Type": "Folder",
       "Comment" : "Code reviewed by John",
       "ProcessData": {
           "Type": "Job:Hadoop:Spark:Python",
           "SparkScript": "file:///home/USER/automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow/processData.py",

           "Arguments": [
               "file:///home/USER/automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow/processData.py",
               "file:///home/USER/automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow/processDataOutDir"
            ],
           "PreCommands" : {
               "Commands" : [
                   { "rm":"-R -f file:///home/USER/automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow/processDataOutDir" }
                ]
           }
       },
       "CopyOutputData" :
       {
           "Type" : "Job:Hadoop:HDFSCommands",
           "Commands" : [
               {"rm"    : "-R -f samplesOut" },
               {"mkdir" : "samplesOut" },
               {"cp"   : "file:///home/USER/automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow/* samplesOut" }
            ]
       },
       "DataProcessingFlow": {
           "Type": "Flow",
           "Sequence": ["ProcessData","CopyOutputData"]
       }
   }
}

This example contains the following two jobs — a Spark job named ProcessData, and an HDFS Commands job named CopyOutputData. These jobs are contained within a folder named AutomationAPIHadoopSampleFlow. To define the sequence of job execution, the Flow object is used.

Note that in the Spark job we use the "PreCommands" object to clean up output from any previous Spark job runs.

The "SampleConnectionProfile" object is used to define the connection parameters to the Hadoop cluster. Note that for Sqoop and Hive, it is used to set data sources and credentials.

Here is the code of processData.py:

from __future__ import print_function

import sys
from pyspark import SparkContext

inputFile = sys.argv[1]
outputDir = sys.argv[2]

sc = SparkContext(appName="processDataSampel")
text_file = sc.textFile(inputFile)
counts = text_file.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a + b)

counts.saveAsTextFile(outputDir)

Step 6 - Modify the code to run in your environment

You need to modify the path to the samples directory for the jobs to run successfully in your environment. Replace the URI file:///home/USER/automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow/ with the location of the samples that you installed on your machine.

"SparkScript": "file:///home/USER/automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow/processData.py",
"Arguments": [
"file:///home/USER/automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow/processData.py",
"file:///home/USER/automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow/processDataOutDir"
],
{ "rm":"-R -f file:///home/USER/automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow/processDataOutDir" }
{"cp" : "file:///home/USER/automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow/* samplesOut" }

For example: file:///home/user1/automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow/

In addition, replace the value for "TargetAgent" and "Host" with the host name of the machine where we provisioned the Control-M/Agent.

"TargetAgent" : "HOST"
"Host" : "HOST"

Step 7 - Run the sample

Now that we've modified the source code in the AutomationAPISampleHadoopFlow.json file, let's run the sample:

> ctm run AutomationAPISampleHadoopFlow.json

{
"runId": "6aef1ce1-3c57-4866-bf45-3a6afc33e27c",
"statusURI": "https://10.64.107.21:8443/automation-api/run/status/6aef1ce1-3c57-4866-bf45-3a6afc33e27c?token=224926c45a2815504ff12cb119ed4356_93C1D4CC",
"monitorPageURI": "https://10.64.107.21:8443/SelfService#Workbench:runid=6aef1ce1-3c57-4866-bf45-3a6afc33e27c&title=AutomationAPISampleHadoopFlow.json"
}

Each time the code runs, a new runId is generated. Let's take the runId, and check the job statuses:

> ctm run status "6aef1ce1-3c57-4866-bf45-3a6afc33e27c"

{
"statuses": [
   {
     "jobId": "workbench:000ca",
     "folderId": "workbench:00000",
     "numberOfRuns": 1,
     "name": "AutomationAPIHadoopSampleFlow",
     "type": "Folder",
     "status": "Ended OK",
     "startTime": "May 24, 2017 1:03:18 PM",
     "endTime": "May 24, 2017 1:03:45 PM",
     "outputURI": "Folder has no output",
     "logURI": "https://10.64.107.21:8443/automation-api/run/job/workbench:000ca/log?token=9d333198364e10b3b2090290c797e1f4_E9F8C76B"
   },
   {
     "jobId": "workbench:000cb",
     "folderId": "workbench:000ca",
     "numberOfRuns": 1,
     "name": "ProcessData",
     "folder": "AutomationAPIHadoopSampleFlow",
     "type": "Job",
     "status": "Ended OK",
     "startTime": "May 24, 2017 1:03:18 PM",
     "endTime": "May 24, 2017 1:03:32 PM",
     "outputURI": "https://10.64.107.21:8443/automation-api/run/job/workbench:000cb/output?token=9d333198364e10b3b2090290c797e1f4_E9F8C76B",
     "logURI": "https://10.64.107.21:8443/automation-api/run/job/workbench:000cb/log?token=9d333198364e10b3b2090290c797e1f4_E9F8C76B"
   },
   {
     "jobId": "workbench:000cc",
     "folderId": "workbench:000ca",
     "numberOfRuns": 1,
     "name": "CopyOutputData",
     "folder": "AutomationAPIHadoopSampleFlow",
     "type": "Job",
     "status": "Ended OK",
     "startTime": "May 24, 2017 1:03:33 PM",
     "endTime": "May 24, 2017 1:03:44 PM",
     "outputURI": "https://10.64.107.21:8443/automation-api/run/job/workbench:000cc/output?token=9d333198364e10b3b2090290c797e1f4_E9F8C76B",
     "logURI": "https://10.64.107.21:8443/automation-api/run/job/workbench:000cc/log?token=9d333198364e10b3b2090290c797e1f4_E9F8C76B"
   }
  ],
"startIndex": 0,
"itemsPerPage": 25,
"total": 3,
"monitorPageURI": "https://10.64.107.21:8443/SelfService#Workbench:runid=6aef1ce1-3c57-4866-bf45-3a6afc33e27c&title=Status_6aef1ce1-3c57-4866-bf45-3a6afc33e27c"
}

You can see that the status of both jobs is "Ended OK".

Let's view the output of CopyOutputData. Use the jobId to get this information.

> ctm run job:output::get workbench:000cc

Environment information:
+--------------------+--------------------------------------------------+
|Account Name        |SampleConnectionProfile                           |
+--------------------+--------------------------------------------------+

Job is running as user: cloudera
-----------------------
Running the following HDFS command:
-----------------------------------
hadoop fs -rm -R -f samplesOut

HDFS command output:
-------------------
Deleted samplesOut
script return value 0
-----------------------------------------------------------
-----------------------------------------------------------

Job is running as user: cloudera
-----------------------
Running the following HDFS command:
-----------------------------------
hadoop fs -mkdir samplesOut

HDFS command output:
-------------------
script return value 0
-----------------------------------------------------------
-----------------------------------------------------------

Job is running as user: cloudera
-----------------------
Running the following HDFS command:
-----------------------------------
hadoop fs -cp file:///home/cloudera/automation-api-quickstart/control-m/101-running-hadoop-spark-job-flow/* samplesOut

HDFS command output:
-------------------
script return value 0
-----------------------------------------------------------
-----------------------------------------------------------

Application reports:
--------------------
-> no hadoop application reports were created for the job execution.

Job statistics:
--------------
+-------------------------+-------------------------+
|Start Time               |20170524030335           |
+-------------------------+-------------------------+
|End Time                 |20170524030346           |
+-------------------------+-------------------------+
|Elapsed Time             |1065                     |
+-------------------------+-------------------------+
Exit Message = Normal completion

Step 8 - View job details through an interactive interface

> ctm run AutomationAPISampleHadoopFlow.json--interactive

{
"runId": "40586805-60b5-4acb-9f21-a0cf048f1051",
"statusURI": "https://ec2-54-187-1-168.us-west-2.compute.amazonaws.com:8443/run/status/40586805-60b5-4acb-9f21-a0cf048f1051",
"monitorPageURI": "https://localhost:8443/SelfService#Workbench:runid=40586805-60b5-4acb-9f21-a0cf048f1051&title=AutomationAPISampleHadoopFlow.json
}

A browser window opens, where you can view and manage your jobs.

Where to go from here

To learn more about what you can do with the Control-M Automation API, read through the Code-Reference and Automation API Services.
Proceed to the next tutorial, where you will learn how to automate code deployments.

_{Back to top}

Tutorial - Running applications and programs in your environment

Before you begin

Running a script and command job flow

Step 1 - Find the image to provision

Step 2 - Provision the Agent image

Step 3 - Access the tutorial samples

Step 4 - Verify the code for Control-M

Step 5 - Run the source code

Step 6 - Check job status using the runId

Step 7 - Examine the source code

Step 8 - Modify the code to run in your environment

Step 9 - Rerun the code sample

Step 10 - View job details through an interactive interface

Where to go from here

Running a file transfer and database queries job flow

Step 1 - Find the image to provision

Step 2 - Provision the Agent image

Step 3 - Access the tutorial samples

Step 4 - Verify the code for Control-M

Step 5 - Examine the source code

Step 6 - Modify the code to run in your environment

Step 7 - Run the code sample

Step 8 - View job details through an interactive interface

Where to go from here

Running a Hadoop-Spark job flow

Step 1 - Find the image to provision

Step 2 - Provision the BigDataAgent image

Step 3 - Access the tutorial samples

Step 4 - Verify the code for Control-M

Step 5 - Examine the source code

Step 6 - Modify the code to run in your environment

Step 7 - Run the sample

Step 8 - View job details through an interactive interface

Where to go from here

On this page