Configuring DME - CSV Data Mart Extractor
The Moviri DME - CSV Data Mart Extractor creates CSV files from Data Mart data. The output CSV is made available on the Remote ETL Engine where the ETL runs, under the path specified in the ETL configuration.
Run Configuration
Property
| Default
| Description
|
|---|---|---|
| Run configuration name | Default | The default name of the configuration |
| Deploy status | Production | Choose between Production and Test |
| Description | Description of the configuration | |
| Log Level | 1 - Light | Set the level of detail for log collection: 1 - Light 5 - Medium 10 - Verbose |
| ETL module | Name of ETL | The ETL choosen |
| Module Description | Gives a description of the ETL and a link to support for it | |
| Execute in simulation mode | Yes | Use this to either test the ETL without getting data or to run it and get the data |
| Datasets | Select the only one available (STOFS). |
Entity Catalog
Click the Entity catalog tab, and select one of the following options:
Shared Entity Catalog:
- From the Sharing with Entity Catalog list, select the entity catalog name that is shared between ETLs.
- Private Entity Catalog: Select this if you want to generate the hierarchy independent of other ETLs and do not want to include other entities look up from other ETLs.
Object Relationship
Click the Object relationship tab, and select the “leave all new entities in ‘Newly Discovered'“ option. This ETL does not import any data in the workspace, so this option won’t have any effect.
Connection
During the creation of Access Key and Access Secret Key, it is required to assign the Administrators Group and the Administrators and Capacity Administrators Roles.
Property
| Required
| Definition
|
|---|---|---|
| BMC Helix Continuous Optimization URL | Yes | The URL of the BHCO instance i.e. https://my-host.onbmc.com |
| Access Key | Yes | Setting up access keys for programmatic access - BMC Documentation |
| Access Secret Key | Yes | Setting up access keys for programmatic access - BMC Documentation |
Proxy settings
This section needs to be configured only if the Remote ETL Engine needs to connect to BHCO through a proxy.
Property
| Required
| Definition
|
|---|---|---|
| Proxy Host | No | The address of the proxy. i.e. https://my-enterprise-proxy.com |
| Proxy port | No | The port number where the proxy is listening. i.e. 8080 |
| Proxy username | No | Username for proxy authentication, if needed |
| Proxy password | No | Password for proxy authentication, if needed |
| Proxy timeout | No | Timeout in milliseconds |
Data mart and CSV configuration
Property
| Required
| Definition
|
|---|---|---|
| Data Mart Identifier | Yes | The unique name of the data mart that will be used as input of the ETL. It starts with ER_V_ or BUF_. This information can be found by looking into the Data Mart details, as shown in the picture below. |
| Columns to extract | No | Columns of the Data Mart to be extracted into the CSV, comma or semicolon separated. If empty, all the columns will be extracted. |
| Output file name | Yes | Path of the CSV file that the ETL will generate. The path must be writable by the cpit user. i.e. /tmp/my-output-csv-file.csv |
| Write CSV header | Yes | If yes, it generates the CSV header containing the selected column names. |
| CSV separator | Yes | The character to use as the CSV column separator. By default, it is comma (,). |
Data Mart details
Split the output into multiple CSV files
If needed, the ETL can also be configured to split the output into several CSV files, grouping data upon the value of a certain Data Mart column. The following example introduces this use case.
Suppose to use this Data Mart, with Identifier ER_V_SPLIT_CSV, as input of the ETL. The ETL can be configured to generate as many CSV files as ENTTYPENAME values are available in the Data Mart by specifying the output file name field of the configuration as follows:
/tmp/my-custom-path/{enttypename}.csv
This configuration will create one CSV named “Virtual Machine - VMware.csv”, containing the first three rows of the ER_V_SPLIT_CSV Data Mart, and one CSV named “Hadoop Resource Pool.csv” containing the last row of the datamart.
Run the ETL
After configuring the ETL, run it to collect data. You can run the ETL in these modes:
- Simulation mode: Validates connection to the data source without generating the CSV file. Use this mode when running the ETL for the first time or after changing its configuration.
- Production mode: Collects data from the data source and generates the CSV file.
1. Running the ETL in simulation mode
To run the ETL in simulation mode:
- In the console, navigate to Administration > ETL & System Tasks, and select ETL tasks.
- On the ETL tasks page, click the ETL to display its details.
- In the Run configurations table, click the pencil icon to modify the ETL settings.
- On the Run configuration tab, set Execute in simulation mode to Yes, and click Save.
- Click Run active configuration. A confirmation message appears.
- On the ETL tasks page, check the ETL run status in the Last exit column.
OK means the ETL ran without errors. You can now run the ETL in production mode. If the ETL run status shows Warning, Error, or Failed:
- On the ETL tasks page, click the pencil icon in the ETL name row's last column.
- Check the log and adjust the ETL configuration if needed.
- Run the ETL again.
- Repeat these steps until the ETL run status is OK.
2. Running the ETL in production mode
Run the ETL manually when needed or schedule it to run at a set time.
Running the ETL manually
- On the ETL tasks page, click the ETL to display its details.
- In the Run configurations table, click the pencil icon to modify the ETL settings. The Edit run configuration page appears.
- On the Run configuration tab, select No for the Execute in simulation mode option, and click Save.
- To run the ETL immediately, click Run active configuration. A confirmation message appears.
The ETL collects data from the source and stores it into the CSV file.
Scheduling the ETL run
By default, the ETL is scheduled to run daily. You can customize this schedule by changing the frequency and period of running the ETL.
To configure the ETL run schedule:
- On the ETL tasks page, click the ETL, and click Edit Task. The ETL details are displayed.
On the Edit task page, do the following, and click Save:
- Specify a unique name and description for the ETL task.
- In the Maximum execution time before warning field, specify the duration for which the ETL must run before generating warnings or alerts, if any.
- Select a predefined or custom frequency for starting the ETL run. The default selection is Predefined.
- Select the task group and the scheduler to which you want to assign the ETL task.
- Click Schedule. A message confirming the scheduling job submission is displayed.
When the ETL runs as scheduled, it collects data from the source and stores it into the CSV file.
Verify data collection
Confirm the ETL ran successfully and that the Moviri DME - CSV Data Mart Extractor data is available in the output CSV file.
To verify whether the ETL ran successfully:
- In the console, click Administration > ETL and System Tasks > ETL tasks.
- In the Last exec time column corresponding to the ETL name, verify that the current date and time are displayed.