Amazon Web Services - AWS API Extractor

Use the Amazon Web Services - AWS API Extractor to collect configuration and performance data of your virtual machines (EC2 instances), EBS volumes, and RDS instances that are provisioned in the Amazon Web Services (AWS) cloud. The collected data is used for analyzing and optimizing the capacity of your AWS infrastructure. 

This ETL uses the AWS Java SDK version 2.17.20 to connect to AWS, and makes API calls to the following AWS services:

Related topics

Entities, lookup information, and metrics for AWS API ETL

Collecting EC2 instance metrics by using the CloudWatch agent

Release Notes: AWS SDK for Java 1.11.60 Open link

About IAM policies Open link

Depending on your requirement, you can configure the ETL to collect data from a single or multiple AWS accounts. When configured for multiple accounts, one of the accounts is used as the main account to retrieve data from all the accounts. The ETL supports data collection for the following AWS subscription types:

  • Pay-As-You-Go
  • AWS GovCloud (US)

If you apply tags to organize your AWS resources by related business services, you can configure the ETL to use these tags to display the AWS metrics by business services.

Collecting data by using the AWS API ETL

To collect data by using the AWS API ETL, do the following tasks:

I. Complete the preconfiguration tasks.

II. Configure the ETL.

Step I. Complete the preconfiguration tasks

Depending on your AWS account setup, select a tab and complete the steps:

Step Details

Configure a policy to specify the permissions for the IAM user.



    1. Open the IAM console, and sign in with your AWS account credentials: https://console.aws.amazon.com/iam/
    2. From the left navigation pane, select Policies > Create policy > Create your own policy.
    3. Specify a name for the policy. For example: tsco-aws-etl-policy
    4. In the Policy Document section, enter the following JSON example:

      {
      "Version": "2012-10-17",
      "Statement": [
      { 
      	"Sid": "Stmt1484736991000", 
      	"Effect": "Allow", 
      	"Action": [ 
      		"cloudwatch:GetMetricData", 
      		"cloudwatch:GetMetricStatistics", 
      		"cloudwatch:ListMetrics",
      		"rds:DescribeDBInstances",
      		"ec2:DescribeVolumes", 
      		"ec2:DescribeHosts", 
      		"ec2:DescribeRegions", 
      		"ec2:DescribeAvailabilityZones", 
      		"ec2:DescribeInstances", 
      		"ec2:DescribeInstanceTypes", 
      		"ec2:DescribeAccountAttributes", 
      		"ec2:DescribeSnapshots", 
      		"ec2:DescribeReservedInstances", 
      		"autoscaling:DescribeAutoScalingInstances", 
      		"autoscaling:DescribeAutoScalingGroups", 
      		"autoscaling:DescribePolicies", 
      		"autoscaling:DescribeLaunchConfigurations", 
      		"iam:GetUser" 
      			], 
      	"Resource": [ 
      				"*" 
      				] 
      		}
      	]
      }
    5. Click Validate Policy to ensure that the policy is syntactically correct.
    6. Click Create Policy.

For more information, see Creating a new policy. Open link

Create an IAM user. You will need to specify the access key ID and the secret key of this user while configuring the ETL. The AWS SDK requires these keys to automatically sign the requests that the ETL sends to AWS.

    1. Open the IAM console and sign in with your AWS account credentials: https://console.aws.amazon.com/iam/
    2. From the left navigation pane, select Users > Add user.
    3. Enter a user name.
    4. Under Select AWS access type, select Programmatic access to access AWS API, CLI, or Tools for Windows Powershell.



    5. Click Next Permissions.
    6. Select Attach existing policies directly.
    7. In the Filter box, search for the policy that you created, and select it.



    8. Click Review.
    9. Click Create User.
      The policy is associated with the newly created IAM user.



    10. Note down the access key ID and the secret access key.

      Tip

      Click Download.csv to download the access key ID and the secret key of the newly added user.

Tag your resources by using a business service tag key name such as Service to organize the AWS resources by business services.You will need to specify this business service tag key name while configuring the ETL.

For more information, see Tagging your AWS EC2 instances Open link .

aws_prereqs_multiaccount

Basic requirements

  • Generate an external ID, which you will need to use when you configure the additional AWS accounts. The external ID is an alphanumeric string. Use any alphanumeric string or use a tool, such as GUID UNIX, to generate it.

  • To organize your resources by business services, ensure that you tag your resources by using a business service tag key name such as Service. You need to specify this business service tag key name while configuring the ETL.

    For more information, see Tagging your Amazon EC2 resources. Open link

Configure the main AWS account

Step Details

Access the main AWS account.

Log on to the  AWS Management Console. Open link


Obtain the AWS account ID and note it down.

In the AWS Management Console header, click the account name and select My Account.

The Account Settings information displays the Account ID.

Configure a policy (tsco-aws-etl-policy) to specify permissions for the user of the main AWS account.

    1. Select Policies > Create policy > Create your own policy.
    2. Specify a name for the policy. For example: tsco-aws-etl-policy.
    3. In the Policy Document section, enter the following JSON example:

      {
      "Version": "2012-10-17",
      "Statement": [
      { 
      	"Sid": "Stmt1484736991000", 
      	"Effect": "Allow", 
      	"Action": [ 
      		"cloudwatch:GetMetricData", 
      		"cloudwatch:GetMetricStatistics", 
      		"cloudwatch:ListMetrics",
      		"rds:DescribeDBInstances",   
      		"ec2:DescribeVolumes", 
      		"ec2:DescribeHosts", 
      		"ec2:DescribeRegions", 
      		"ec2:DescribeAvailabilityZones", 
      		"ec2:DescribeInstances", 
      		"ec2:DescribeInstanceTypes", 
      		"ec2:DescribeAccountAttributes", 
      		"ec2:DescribeSnapshots", 
      		"ec2:DescribeReservedInstances", 
      		"autoscaling:DescribeAutoScalingInstances", 
      		"autoscaling:DescribeAutoScalingGroups", 
      		"autoscaling:DescribePolicies", 
      		"autoscaling:DescribeLaunchConfigurations", 
      		"iam:GetUser" 
      			], 
      	"Resource": [ 
      				"*" 
      				] 
      		}
      	]
      }
    4. Click Validate Policy to ensure that the policy is syntactically correct.
    5. Click Create Policy.

For more information, see Creating a new policy. Open link

Create an IAM user (tsco-etl-user) in the main account and assign the policy (tsco-aws-etl-policy) to the user.

You need the access key details of this account while configuring the ETL.

The access keys include a key ID and a secret key. The AWS SDK requires these keys to automatically sign the requests that the ETL sends to AWS. For more information about managing access keys for IAM users, see  Managing Access Keys for IAM Users Open link .  

    1. On the Add user page, in the Set user details section, click Add another user, and specify a user name for the new IAM account. For example, tsco-etl-user.
    2. Under Select AWS access type, select Programmatic access.

    3. Click Next Permissions.
    4. Select Attach existing policies directly.
    5. In the Filter box, search for the policy that you created in the previous step (tsco-aws-etl-policy) and select it.

    6. Click Review.
    7. Click Create User. The policy (tsco-aws-etl-policy) is associated with the newly created IAM user (tsco-etl-user).

    8. Note down the access key ID and the secret access key.

      Tip

      Click Download .csv to download the access key ID and the secret key of the newly added user.

Configure the additional AWS account

You must repeat these steps for every additional AWS account.

Step Details

Access the additional AWS account.

Log in to the AWS Management Console. Open link  

Obtain the account ID and note it down.

You will need to enter this account ID when configuring a policy in the main AWS account to include the additional account details. You will also need to enter it while configuring the ETL.

See the Obtain the AWS account ID step in the Configure the Main AWS account section.

Configure a policy (tsco-aws-etl-policy) to specify permissions for the user of the additional AWS account.

See the Configure a policy step in the Configure the main AWS account section.

Create a cross-account access role (tsco-cross-account-role).

This step enables the main AWS account user (tsco-etl-user) to have federated read-only access to the AWS services in the additional account and to enable account switching.

    1. In the IAM service, select the Roles tab and click Create new role.
    2. Select Role for cross-account access and select Provide access between your AWS account and a 3rd party AWS account.


    3. Enter the account ID of the main AWS account and the external ID that you generated in the Basic requirements section.

    4. In the Attach Policy step, select the Access Privilege policy (tsco-aws-etl-policy).
    5. Specify the role name as tsco-cross-account-role and click Create role
      The role is created.
    6. Select this role. On the Trust relationships tab, click Edit trust relationship.
    7. Replace the "root" element with the IAM user name that you created in the main account (tsco-etl-user).

    8. Click Update Trust Policy to save the changes.

Access the main AWS account again.

Log in to the AWS Management Console. Open link  

Configure a policy file (tsco-assume-role-policy.json) to include the additional account details.

If you are configuring the first additional AWS account, you need to create a policy file. Else, you need to update the existing file with the additional AWS account details.

Information

A single policy file can include details of all the additional AWS accounts.


    1. Open a new file in any text editor, such as Notepad.
    2. Copy the following content in the file and replace ADDITIONAL_ACCOUNT_ID with the account ID of the additional account that you obtained in step 2.

      {
      	"Version": "2012-10-17",
      	"Statement": [
      		{
           		"Sid": "Stmt1500499562000",
           		"Effect": "Allow",
           		"Action": [
             			"sts:AssumeRole"
           		],
           		"Resource": [
              		"arn:aws:iam::ADDITIONAL_ACCOUNT_ID:role/tsco-cross-account-role"
           		]
          	}
      	]
      }
    3. Save the file as tsco-assume-role-policy.json.

    1. In the JSON file that you created (tsco-assume-role-policy.json), add the next additional account information on a new line, separated by a comma.

      {
      	"Version": "2012-10-17",
      	"Statement": [
      		{
           		"Sid": "Stmt1500499562000",
           		"Effect": "Allow",
           		"Action": [
          			"sts:AssumeRole"
           		],
           		"Resource": [
              		"arn:aws:iam::ADDITIONAL_ACCOUNT_ID1:role/tsco-cross-account-role",
              		"arn:aws:iam::ADDITIONAL_ACCOUNT_ID2:role/tsco-cross-account-role"
           		]
          	}
      	]
      }
    2. Save the file.

Enable the policy (tsco-assume-role-policy.json) that includes additional AWS account details in the main AWS account.

    1. Select the IAM service and select Users.
    2. Select the IAM user (tsco-etl-user) that you created in the Configure the main AWS account section.
    3. On the Summary page, select the Permissions tab and click + Add inline policy.
    4. Select Custom policy. On the Review Policy page, enter the contents of the policy file (tsco-assume-role-policy.json).
    5. Click Validate Policy and click Apply Policy.

In a firewall or a proxy-enabled environment, the following AWS services endpoints must be allowed:

  • http://monitoring.<region>.amazonaws.com/
  • http://ec2.<region>.amazonaws.com/
  • http://autoscaling.<region>.amazonaws.com/
  • http://sts.<region>.amazonaws.com/
  • http://ec2.amazonaws.com/
  • http://iam.amazonaws.com/

where <region> is one of the regions in AWS. For more information about regions, see Regions and Availability Zones Open link .

Important

The ETL requires access to all regions even if your Amazon instances are provisioned in some of the regions.


Step II. Configure the ETL

You must configure the ETL to connect to AWS for data collection. ETL configuration includes specifying the basic and optional advanced properties. While configuring the basic properties is sufficient, you can optionally configure the advanced properties for additional customization.

A. Configuring the basic properties

Some of the basic properties display default values. You can modify these values if required.

To configure the basic properties:

  1. Navigate to Administration ETL & System Tasks, and select ETL tasks.
  2. On the ETL tasks page, click Add > Add ETL. The Add ETL page displays the configuration properties. You must configure properties in the following tabs: Run configuration, Entity catalog, and Amazon Web Services Connection

  3. On the Run Configuration tab, select Amazon Web Services - AWS API Extractor from the ETL Module list. The name of the ETL is displayed in the ETL task name field. You can edit this field to customize the name.



  4. Click the Entity catalog tab, and select one of the following options:
    • Shared Entity Catalog:

      Select if other ETLs access the same entities that are used by the AWS API ETL.
      • From the Sharing with Entity Catalog list, select the entity catalog name that is shared between ETLs.
    • Private Entity Catalog: Select if this is the only ETL that extracts data from the AWS resources.
  5. Click the Amazon Web Services Connection tab, and configure the following properties:

    PropertyDescription
    Is target AWS Government Cloud?Specify whether you want to import data from the AWS GovCloud (US) account. The default selection is a standard AWS account.
    AWS Account access mode

    Depending on your AWS account setup, select Single or Multiple. You must use the values that you obtained during the preconfiguration procedure.

    • Access Key ID: Specify the access key ID of the IAM user of the AWS account. For example, a typical Access Key ID might look like this: AMAZONACSKEYID007EXAMPLE.
    • Secret Access Key: Specify the secret access key associated with the Access Key ID. For example, a typical Secret Access Key might look like this: wSecRetAcsKeYY712/K9POTUS/BCZthIZIzprvtEXAMPLEKEY .

    For multiple accounts, specify the following additional details:

    • Cross-account name and external ID
    • IDs of additional accounts that you configured
    Business Service hierarchy
    If you want to create and view AWS data by business services, retain the default selection of Create Business Service hierarchy based on specified tag key. Specify the appropriate tag key name. For example, Service.

    Example scenario:
    You have VMs that are tagged as follows:
      • AS1: {user=John, Purpose=Dev, Service=Data Solutions}
      • vl-pub-bco-qa35: {user=Adam, Purpose=Production, Service=Data Solutions}
      • vl-pun-bco-qa20: {user=Jane, Purpose=QA, Service=Data Solutions}

    When you run the ETL, data is displayed in a hierarchy as follows:



    If you do not use business services, data is displayed as follows:

  6. (Optional) Override the default values of properties in the following tabs:

    PropertyDescription
    Execute in simulation modeBy default, the ETL execution in simulation mode is selected to validate connectivity with the data source, and to ensure that the ETL does not have any configuration issues. In the simulation mode, the ETL does not load data into the database. This option is useful when you want to test a new ETL task. To run the ETL in the production mode, select No.
    BMC recommends that you run the ETL in the simulation mode after ETL configuration and then run it in the production mode.
    PropertyDescription
    Associate new entities to

    Specify the domain to which you want to add the entities created by the ETL.

    Select one of the following options:

    • New domain: This option is selected by default. Select a parent domain, and specify a name for your new domain.
    • Existing domain: Select an existing domain from the Domain list. 

    By default, a new domain with the same ETL name is created for each ETL. 

    PropertyDescription
    Task groupSelect a task group to classify the ETL.
    Running on schedulerSelect a scheduler for running the ETL. For cloud ETLs, use the scheduler that is preconfigured in Helix. For on-premises ETLs, use the scheduler that runs on the Remote ETL Engine.
    Maximum execution time before warningIndicates the number of hours, minutes, or days for which the ETL must run before generating warnings or alerts, if any.
    Frequency

    Select one of the following frequencies to run the ETL:

    • Predefined: This is the default selection. Select a daily, weekly, or monthly frequency, and then select a time to start the ETL run accordingly.
      • Start timestamp: hour\minute: Select the HH:MM start timestamp to add to the ETL execution running on a Predefined frequency.
    • Custom: Specify a custom frequency, select an appropriate unit of time, and then specify a day and a time to start the ETL run.
      • Custom start timestamp: Select a YYYY-MM-DD HH:MM timestamp to add to the ETL execution running on a Custom frequency.

  7. Click Save.
    The ETL tasks page shows the details of the newly configured AWS API ETL.

(Optional) B. Configuring the advanced properties

You can configure the advanced properties to change the way the ETL works or to collect additional metrics.

While configuring these advanced properties, you will need to specify the path to the JSON files that contain the instance type configuration metrics and other additional metrics. For more information, see Collecting EC2 instance metrics by using the CloudWatch agent.

To configure the advanced properties:

  1. On the Add ETL page, click Advanced.
  2. Configure the following properties:

    PropertyDescription
    Run configuration nameSpecify the name that you want to assign to this ETL task configuration. The default configuration name is displayed. You can use this name to differentiate between the run configuration settings of ETL tasks.
    Deploy statusSelect the deploy status for the ETL task. For example, you can initially select Test and change it to Production after verifying that the ETL run results are as expected.
    Description A short description of the ETL module.
    Log levelSpecify the level of details that you want to include in the ETL log file. Select one of the following options:
    • 1 - Light: Select to add the bare minimum activity logs to the log file.
    • 5 - Medium: Select to add the medium-detailed activity logs to the log file.
    • 10 - Verbose: Select to add detailed activity logs to the log file.

    Use log level 5 as a general practice. You can select log level 10 for debugging and troubleshooting purposes.

    PropertyDescription
    Metric profile selection

    Select the metric profile that the ETL must use. The ETL collects data for the group of metrics that is defined by the selected metric profile.

    • Use Global metric profile: This is selected by default. All the out-of-the-box ETLs use this profile.
    • Select a custom metric profile: Select the custom profile that you want to use from the Custom metric profile list. This list displays all the custom profiles that you have created.

    For more information about metric profiles, see Adding and managing metric profiles.

    Levels up to

    Specify the metric level that defines the number of metrics that can be imported into the database. The load on the database increases or decreases depending on the selected metric level.

    To learn more about metric levels, see Adding and managing metric profiles.

    Property

    Description
    Default regionSpecify the region where your AWS cloud resources are located. The default value is us-east-1.
    Instance type definition JSON file path

    Browse to the file location where you have saved the JSON file that contains the instance type configuration metrics.. Upload the file.

    Additional CloudWatch metrics JSON file path

    Browse to the file location where you have saved the JSON file that contains details of additional metrics that are to be collected by the ETL.

    Upload the file. 

    PropertyDescription
    List of properties

    Specify additional properties for the ETL that act as user inputs during run. You can specify these values now or you can do so later by accessing the "You can manually edit ETL properties from this page" link that is displayed for the ETL in the view mode.

    1. Click Add.
    2. In the etl.additional.prop.n field, specify an additional property.
    3. Click Apply.
      Repeat this task to add more properties.
    PropertyDescription
    Empty dataset behaviorSpecify the action for the loader if it encounters an empty dataset:
    • Warn: Generate a warning about loading an empty dataset.
    • Ignore: Ignore the empty dataset and continue parsing.
    ETL log file nameThe name of the file that contains the ETL run log. The default value is: %BASE/log/%AYEAR%AMONTH%ADAY%AHOUR%MINUTE%TASKID
    Maximum number of rows for CSV outputA numeric value to limit the size of the output files.
    CSV loader output file nameThe name of the file that is generated by the CSV loader. The default value is: %BASE/output/%DSNAME%AYEAR%AMONTH%ADAY%AHOUR%ZPROG%DSID%TASKID

    Continuous Optimization loader output file name

    The name of the file that is generated by the Continuous Optimization loader. The default value is: %BASE/output/%DSNAME%AYEAR%AMONTH%ADAY%AHOUR%ZPROG%DSID%TASKID

    Remove domain suffix from datasource name (Only for systems) Select True to remove the domain from the data source name. For example, server.domain.com will be saved as server. The default selection is False.
    Leave domain suffix to system name (Only for systems)Select True to keep the domain in the system name. For example: server.domain.com will be saved as is. The default selection is False.
    Skip entity creation (Only for ETL tasks sharing lookup with other tasks)

    Select True if you do not want this ETL to create an entity and discard data from its data source for entities not found in Continuous Optimization. It uses one of the other ETLs that share a lookup to create a new entity. The default selection is False.

    PropertyDescription
    Hour maskSpecify a value to run the task only during particular hours within a day. For example, 0 – 23 or 1, 3, 5 – 12.
    Day of week maskSelect the days so that the task can be run only on the selected days of the week. To avoid setting this filter, do not select any option for this field.
    Day of month maskSpecify a value to run the task only on the selected days of a month. For example, 5, 9, 18, 27 – 31.
    Apply mask validationSelect False to temporarily turn off the mask validation without removing any values. The default selection is True.
    Execute after timeSpecify a value in the hours:minutes format (for example, 05:00 or 16:00) to wait before the task is run. The task run begins only after the specified time is elapsed.
    EnqueueableSpecify whether you want to ignore the next run command or run it after the current task. Select one of the following options:
    • False: Ignores the next run command when a particular task is already running. This is the default selection.
    • True: Starts the next run command immediately after the current running task is completed.

  3. Click Save.
    The ETL tasks page shows the details of the newly configured AWS API ETL.

Step III. Run the ETL

After you configure the ETL, you can run it to collect data. You can run the ETL in the following modes:

A. Simulation mode: Only validates connection to the data source, does not collect data. Use this mode when you want to run the ETL for the first time or after you make any changes to the ETL configuration.

B. Production mode: Collects data from the data source.

A. To run the ETL in the simulation mode

To run the ETL in the simulation mode:

  1. Navigate to Administration ETL & System Tasks, and select ETL tasks.
  2. On the ETL tasks page, click the ETL. The ETL details are displayed.


  3. In the Run configurations table, click Edit  to modify the ETL configuration settings.
  4. On the Run configuration tab, ensure that the Execute in simulation mode option is set to Yes, and click Save.
  5. Click Run active configuration. A confirmation message about the ETL run job submission is displayed.
  6. On the ETL tasks page, check the ETL run status in the Last exit column.
    OK Indicates that the ETL ran without any error. You are ready to run the ETL in the production mode.
  7.  If the ETL run status is Warning, Error, or Failed:
    1. On the ETL tasks page, clickin the last column of the ETL name row.
    2. Check the log and reconfigure the ETL if required.
    3. Run the ETL again.
    4. Repeat these steps until the ETL run status changes to OK.

B. To run the ETL in the production mode

You can run the ETL manually when required or schedule it to run at a specified time.

To run the ETL manually

  1. On the ETL tasks page, click the ETL. The ETL details are displayed.
  2. In the Run configurations table, click Edit  to modify the ETL configuration settings. The Edit run configuration page is displayed.
  3. On the Run configuration tab, select No for the Execute in simulation mode option, and click Save.
  4. To run the ETL immediately, click Run active configuration. A confirmation message about the ETL run job submission is displayed.
    When the ETL runs, it collects data from the source and transfers it to the BMC Helix Continuous Optimization database.

To schedule the ETL run in the production mode

By default, the ETL is scheduled to run daily. You can customize this schedule by changing the frequency and period of running the ETL.

To configure the ETL run schedule:

  1. On the ETL tasks page, click the ETL, and click Edit task. The ETL details are displayed.
  2. On the Edit task page, do the following, and click Save:

    • Specify a unique name and description for the ETL task.
    • In the Maximum execution time before warning field, specify the duration for which the ETL must run before generating warnings or alerts, if any.
    • Select a predefined or custom frequency for starting the ETL run. The default selection is Predefined.
    • Select the task group and the scheduler to which you want to assign the ETL task.
  3. Click Schedule. A message confirming the scheduling job submission is displayed.
    When the ETL runs as scheduled, it collects data from the source and transfers it to the BMC Helix Continuous Optimization database.

Step IV. Verify data collection

Verify that the ETL ran successfully and check whether the AWS data is refreshed in the Workspace.

To verify whether the ETL ran successfully

  1. Click Administration > ETL and System Tasks > ETL tasks.
  2. In the Last exec time column corresponding to the ETL name, verify that the current date and time are displayed.
  3. In the Last exit column corresponding to the ETL name, verify that the status is OK.
    In case of WARNING or ERROR, click  in the last column of the ETL name row to review the log files.
If you see a Warning status in the Last exit column, see the AWS API ETL displays warning message in the ETL logs to troubleshoot the issue.

To verify that the AWS data is refreshed

  1. In the Workspace tab, expand (Domain name) > Systems > AWS Cloud > Instances.
  2. In the left pane, verify that the hierarchy displays the new and updated AWS instances that you have provisioned in the AWS cloud.
  3. Click an AWS virtual machine instance, and click the Metrics tab in the right pane.
  4. Check if the Last Activity column in the Configuration metrics and Performance metrics tables displays the current date.

The following image shows sample metrics data. To learn more about these metrics and other related concepts, see Entities, lookup information, and metrics for AWS API ETL.

Where to go from here

After data is collected, you can analyze and manage the capacity of AWS entities from the AWS views.

Was this page helpful? Yes No Submitting... Thank you

Comments

  1. Sudhakar Karuppaiah

    Cross-account name and external ID - What is this with reference to AWS? ....would help if there was reference to how and where we define this in AWS. Is the the Master Account ID? IDs of additional accounts that you configured -- We have over 120 accounts that are linked to Master account, do we need to enter each one here?

    Thanks Sudhakar

    Aug 01, 2022 08:24
    1. Shweta Patil

      Thank you for your comment, Sudhakar. We are still discussing this with the product team.

      Moving the comment to the internal documentation. 

      Thanks,
      Shweta

      Aug 26, 2022 08:40