Generic - CSV columnar file parser

This topic describes how to configure and use the CSV columnar file parser, which is useful for importing data from CSV files. For more information, refer to the following sections:

Before you integrate

Before you integrate BMC Helix Capacity Optimization with the CSV columnar file parser, see Preparing to integrate CSV files with general-purpose connectors.

Integration steps

To integrate BMC Helix Capacity Optimization with the CSV columnar file parser, perform the following task:

  1. Navigate to Administration > ETL & SYSTEM TASKS > ETL tasks.
  2. In the ETL tasks page, click Add > Add ETL under the Last run tab.
  3. In the Add ETL page, set values for the following properties under each expandable tab.

    Note

    Basic properties are displayed by default in the Add ETL page. These are the most common properties that you can set for an ETL, and it is acceptable to leave the default selections for each as is.

    Basic properties

    Property Description
    Run configuration
    ETL module Select Generic - CSV columnar file parser.
    ETL task name Default name is already filled out for you.
    Run configuration name Default name is already filled out for you.
    Deploy status Select Production.
    Description (Optional) Enter a brief description.
    Log level Select how detailed you want the log to be:
    • 1 - Light: Add bare minimum activity logs to the log file.
    • 5 - Medium: Add medium-detailed activity logs to the log file.
    • 10 - Verbose: Add detailed activity logs to the log file.
    Execute in simulation mode Select Yes if you want to to validate the connectivity between the ETL engine and the target, and to ensure that the ETL does not have any other configuration issues. This option is useful while testing a new ETL task.
    Module selection Select the required module:
    • Based on datasource: Select this option to manually configure the options for integrating with CSV columnar file parser.
    • Based on Open ETL template: Select this option to integrate CSV columnar file parser using an Open ETL template. For more details, see Generic ETL based on a template.
    Module description A link that points you to technical documentation for this ETL.
    Datasets
    1. Click Edit.
    2. Select one (click) or more (shift+click) datasets that you want to include from Available datasets and click >> to move them to Selected datasets.
    3. Click Apply.
    Entity catalog
    Sharing status Select any one:
    • PRIVATE: Select this option if this is the only ETL that extracts data from the given set of resources and the lookup table is not shared with the specified ETL task.
    • SHARED: Select this option if more than one ETL extracts data from the given set of resources and the lookup table is shared with the specified ETL task.
    Object relationships
    After import

    Specify the domain to which you want to add the entities created by the ETL.

    Select one of the following options:

      • New domain: This option is selected by default. Select a parent domain, and specify a name for your new domain.
      • Existing domain: Select an existing domain from the Domain list.

    By default, a new domain with the same ETL name is created for each ETL. 

    CSV parser
    CSV separator

    Select how you want BMC Helix Capacity Optimization to discover the CSV separator used in the CSV file. The available options are:

    • Find in data: The ETL scans the file and auto-discovers the separator that has been used.
    • Specified: Specify the separator that has been used.
    Default system name Select how you want the System Name to be extracted from the file. Available options are:
    • Specify column: Select an already existing column from the file.
    • File prefix: Specify a prefix used in workload names.
    • Specified: Manually specify the column name.
    Default object name Select how you want Object Name to be extracted from the file. Available options are:
    • Find in data: The ETL scans the file and auto-discovers the object name.
    • Specified: Manually specify the column name.
    Default time interval (seconds) Select how you want Time Interval to be extracted from the file. Available options are:
    • Find in data: The ETL scans the file and auto-discovers the time interval.
    • Specified: Manually specify the time interval.
    Metrics (columns) to extract
    1. Click Edit.
    2. Select one (click) or more (shift+click) columns that you want to extract (from your CSV file) listed under Available items and click >> to move them to Selected items.
    3. Click Apply.


     Note: The index in the CSV file starts from 0.

    Column index of 'DS_SYSNM' (Mandatory/default column)Index of the of the target system or business driver column.
    Column index of 'DURATION' (Mandatory/default column) Index of the duration value column. Values in this column are expressed in seconds.
    Column index of 'TS' (Mandatory/default column)Index of the timestamp column.
    Column index of columns that you want to extract Index of other columns that you selected in Metrics (columns) to extract.
    File location
    File location Select any one of the following methods to retrieve the CSV file:
    • Local directory: Specify a path on your local machine where the CSV file resides.
    • Windows share: Specify the Windows share path where the CSV file resides.
    • FTP: Specify the FTP path where the CSV file resides.
    • SCP: Specify the SCP path where the CSV file resides.
    • SFTP: Specify the SFTP path where the CSV file resides.
    Directory Path of the directory that contains the CSV file.
    Directory UNC Full Path (Windows share) The full UNC (Universal Naming Convention) address. For example: //hostname/sharedfolder
    Files to copy (with wildcards) Before parsing, the SFTP and SCP commands need to make a local temporary copy of the files; this setting specifies which files in the remote directory should be imported.
    File list pattern A regular expression that defines which data files should be read. The default value is (?$<$!done)\$, which tells the ETL to read every file whose name does not end with the string "done". For example, my_file_source.done.
    Recurse into subdirs?

    Select Yes or No. When set to Yes, BMC Helix Capacity Optimization also inspects the subdirectories of the target directories.

    After parse operation Choose what to do after the CSV file has been imported. The available options are:
    • Do nothing: Do nothing after import.
    • Append suffix to parsed file: Append a suffix you add here to the imported CSV file. For example, _done or _impoted, and so on.
    • Archive parsed file in directory: Archive the parsed file in the specified directory.
      • Archive directory (local): Default archive directory path is filled out for you. For example, %BASE/../repository/imprepository
      • Compress archived files: Select Yes or No.
    • Archive bad files in directory: Archive erroneous files in the specified directory.
      • Archive directory (local): Default archive directory path is filled out for you. For example, %BASE/../repository/imprepository
      • Compress archived files: Select Yes or No.
    Parsed files suffix The suffix that will be appended parsed files; default is .done.
    Remote host (Applies to FTP, SFTP, SCP) Enter the name or address of the remote host to connect to.
    Username (Applies to Windows share, FTP, SFTP, SCP) Enter the username to connect to the file location server.
    Password required (Applies to Windows share, FTP, SFTP, SCP) Select Yes or No.
    Password (Applies to Windows share, FTP, SFTP, SCP) Enter a password to connect to the file location server. Applicable if you selected Yes for Password required.
    ETL task properties
    Task group Select a task group to classify this ETL into.
    Running on scheduler Select the scheduler you want to run the ETL on.
    Maximum execution time before warning The number of hours, minutes or days to to execute the ETL for before generating warnings, if any.
    Frequency Select the frequency of ETL execution. Available options are:
    • Predefined: Select a Predefined frequency from Each Day, Each Week or Each Month.
    • Custom: Enter a Custom frequency (time interval) as the number of minutes, hours, days or weeks to run the ETL in.
    Start timestamp: hour\minute (Applies to Predefined frequency) The HH:MM start timestamp to add to the ETL execution running on a Predefined frequency.
    Custom start timestamp Select a YYYY-MM-DD HH:MM timestamp to add to the ETL execution running on a Custom frequency.

    Note

    To view or configure Advanced properties, click Advanced. You do not need to set or modify these properties unless you want to change the way the ETL works. These properties are for advanced users and scenarios only.

    Advanced properties

    Property Description
    Collection level
    Metric profile selection

    Select any one:

    • Use Global metric profile: Select this to use an out-of-the-box global profile, that is available on the Adding and managing metric profiles page. By default, all ETL modules use this profile.
    • Select a custom metric profile: Any metric profiles you add in the Add metric profile page (Administration > DATAWAREHOUSE > Metric profiles).

    For more information, see Adding and managing metric profiles.

    Levels up to

    The metric level defines the amount of metric imported into capacity optimization. If you increase the level, additional load is added to the ingestion, while decreasing the metric level reduces the number of imported metrics.

    Choose the metric level to apply on selected metrics:

    • [1] Essential
    • [2] Basic
    • [3] Standard
    • [4] Extended
    Format customization
    Timestamp format Specify a format that should be used by the ETL if the user tables or CSV files use an unsupported format. <YYYY-MM-DD HH:MM:SS> is the supported format. Depending on the type of ETL used, you might need to specify a custom format.
    Percentage format
    • 0 to 1:
    • 0 to 100:
    Multiplier of '<metric name>'

    Displays the multiplier that maps the metric in the CSV file with that in BMC Helix Capacity Optimization. You can modify the multiplier value.

    If you used an Open ETL template to create the ETL, any change to a multiplier value applies only to the ETL and does not update the template.

    Multiplier of all '<metric unit>' type metrics
    Note: This property is displayed only if you use the Based on Open ETL template option in Module selection.
    File location
    Subdirectories to exclude (separated by ';' ) (Local directory) Names of subdirectories to exclude from parsing.
    Input file external validator (Local directory, Windows share, FTP) Select any one of the following options:
    • No external validation: Do not use external validation of the CSV file structure.
    • Use external validation script: Use the following script to validate the CSV file:
      • Script to execute: Specify the validation script to use to validate the input file.
    Additional properties
    List of properties
    1. Click Add.
    2. Add an additional property in the etl.additional.prop.n box.
    3. Click Apply.
      Repeat this task to add more properties.
    Loader configuration
    Empty dataset behavior Choose one of the following actions if the loader encounters an empty dataset:
    • Abort: Abort the loader.
    • Ignore: Ignore the empty dataset and continue parsing.
    ETL log file name Name of the file that contains the ETL execution log; the default value is: %BASE/log/%AYEAR%AMONTH%ADAY%AHOUR%MINUTE%TASKID
    Maximum number of rows for CSV output A number which limits the size of the output files.
    CSV loader output file name Name of the file generated by the CSV loader; the default value is: $CPITBASE/output/%AYEAR%AMONTH%ADAY%AHOUR%ZPROG%DSCD%SRCID.
    BCO loader output file name Name of the file generated by the Capacity Optimization loader; the default value is: $CPITBASE/output/%AYEAR%AMONTH%ADAY%AHOUR%ZPROG%DSCD%SRCID.
    Reduce priority
    • Normal:
    • High:
    Remove domain suffix from datasource name (Only for systems) If set to True, the domain name is removed from the data source name. For example, server.domain.com will be saved as server.
    Leave domain suffix to system name (Only for systems) If set to True, the domain name is maintained in the system name. For example: server.domain.com will be saved as such.
    Skip entity creation

    (Only for ETL tasks sharing lookup with other tasks) If set to True, this ETL does not create an entity, and discards data from its data source for entities not found in BMC Helix Capacity Optimization. It uses one of the other ETLs that share lookup to create the new entity.

    Scheduling options
    Hour mask Specify a value to execute the task only during particular hours within the day. For example, 0 – 23 or 1,3,5 – 12.
    Day of week mask Select the days so that the task can be executed only during the selected days of the week. To avoid setting this filter, do not select any option for this field.
    Day of month mask Specify a value to execute the task only during particular days within a month. For example, 5, 9, 18, 27 – 31.
    Apply mask validation By default this property is set to True. Set it to False if you want to disable the preceding Scheduling options that you specified. Setting it to False is useful if you want to temporarily turn off the mask validation without removing any values.
    Execute after time Specify a value in the hours:minutes format (for example, 05:00 or 16:00) to wait before the task must be executed. This means that once the task is scheduled, the task execution starts only after the specified time passes.
    Enqueueable Select one of the following options:
    • False (Default): While a particular task is already running, if the next execution command arises – it is ignored.
    • True: While a particular task is already running, if the next execution command arises – it is placed in a queue and is executed as soon as the current execution ends.


  4. Click Save.
    You return to the Last run tab under the ETL tasks page.
  5. Validate the results in simulation mode: In the ETL tasks table under ETL tasks > Last run, locate your ETL (ETL task name), click  to run the ETL.
    After you run the ETL, the Last exit column in the ETL tasks table will display one of the following values:
    • OK: The ETL executed without any error in simulation mode.
    • WARNING: The ETL execution returned some warnings in simulation mode. Check the ETL log.
    • ERROR: The ETL execution returned errors and was unsuccessful. Edit the active Run configuration and try again.
  6. Switch the ETL to production mode: To do this, perform the following task:
    1. In the ETL tasks table under ETL tasks > Last run, click the ETL under the Name column.
    2. In the Run configurations table in the ETL details page, click  to edit the active run configuration.
    3. In the Edit run configuration page, navigate to the Run configuration expandable tab and set Execute in simulation mode to No.
    4. Click Save.
  7. Locate the ETL in the ETL tasks table and click  to Run it, or schedule an ETL run.
    After you run the ETL, or schedule the ETL for a run, it will extract the data form the source and transfer it to the BMC Helix Capacity Optimization database.

CSV file format

The following examples specify the format to be used for the CSV file.

Format for system metrics
S;DURATION;DS_SYSNM;CPU_UTIL;CPU_UTILMHZ;MEM_CONSUMED
2018-09-02 00:00:00;3600;Test System;0.67;2500;1073741824
2018-09-02 01:00:00;3600;Test System;0.68;2500;1073741824
2018-09-02 02:00:00;3600;Test System;0.70;2500;1073741824
2018-09-02 03:00:00;3600;Test System;0.25;2500;1073741824
2018-09-02 04:00:00;3600;Test System;0.56;2500;1073741824
2018-09-02 05:00:00;3600;Test System;0.66;2500;1073741824
2018-09-02 06:00:00;3600;Test System;0.67;2500;1073741824
Format for business driver metrics
TS;DS_WKLDNM;EVENT_RESPONSE_TIME;INTVL;SUBOBJNM
18/02/2014 00:00;Myapps_PIN_Success;2094.888889;300000;GLOBAL
18/02/2014 00:00;Myapps_Homepage_Load;2897.285714;300000;GLOBAL
18/02/2014 00:00;Myapps_Start_Page_Load;13675.34783;300000;GLOBAL

Related topics

Using ETL datasets

Generic - CSV file parser

Developing custom ETLs

Dataset reference for ETL tasks

Horizontal and Vertical datasets

Viewing datasets and metrics by dataset and ETL module

Understanding entity identification and lookup

Was this page helpful? Yes No Submitting... Thank you

Comments

  1. Jim Avery

    On this documentation page, in the CSV parser section, it says the property is "Default time interval (milliseconds)", whereas in our HCO system, when we edit the configuration we see "Default time interval (seconds)".

    It seems that although the header is generally ignored by the columnar file parser (it largely relies on the mapping to column numbers given in the config), it does seem to make a difference whether you specific a header entry "INTVL" denoting milliseconds rather than "DURATION" denoting seconds. It would help we we could, please, see an example here for each of those types along with a clear explanation of the difference.

    Sep 15, 2022 08:45
    1. Shweta Patil

      Thank you for your comment, James.

      INTVL is an old and deprecated way of representing the sample duration. INTVL is in milliseconds. We should use DURATION in seconds.

      Thanks,
      Shweta

      Jan 13, 2023 02:14
      1. Jim Avery

        Thanks Shweta. I'm just looking at another use case for this now so that's good to know.

        Jan 16, 2023 07:46
  2. Shweta Patil

    Thank you, James.

    Thanks,
    Shweta

    Jan 17, 2023 01:02