You can add intermediate steps to a transformation in a job that you create through the Integration Job Builder wizard, or to a custom job you create through Atrium Integrator Spoon.
The following table lists the steps that you can add to your transformation to manipulate the data before adding it to BMC Helix CMDB.
BMC supports only the steps listed in the table. For information about additional steps that you can add to your transformation, see the Pentaho online documentation.
Any queries around complex customization using unsupported steps will be addressed selectively and at discretion of BMC. See the topic Atrium Integrator Spoon support in Atrium Integrator Spoon.
The following video shows an overview of how to create a transformation and job in Atrium Integrator Spoon, and manage the Spoon job from the Atrium Integrator console in CMDB Portal:
Log in to the computer on which Atrium Integrator Spoon is installed.
To open Atrium Integrator Spoon, open the Spoon.bat file at the following location: <installationDirectory>\diserver\data-integration\
To add a transformation or a job
Select Job or Transformation.
Click the step that you want to add and drag it to the canvas.
The following table lists the intermediate steps for transformations and the descriptions:
Step
What it helps you to do
Input
ARInput
ARInput is a BMC provided step. It allows you to migrate AR System data into BMC Helix CMDB.
ARInput step fields and description (click to expand)
Field and description
Field
Description
General tab
Connection
Specify the AR System server name.
Form Name
Specify the form name from where you want fetch the AR System data.
Chunk Size
Specify the number of records you want to pull in one thread.
Qualification tab
Qualification String
Specify the qualification value.
Fields tab
Field list
Displays the AR System field details. Click Get Fields to populate all the available AR System fields.
CMDBInput
CMDBInput is a BMC provided step. It allows you to migrate data from the BMC Helix CMDB on your development server to the BMC Helix CMDB on your production server.
CMDBInput step fields and description (click to expand)
Field and description
Field
Description
Connection
Specify the AR System server name.
Class Name
Specify the CMDB form name from where you want to read the data.
NameSpace
Specify the BMC namespace. For example BMC.Core
Dataset Id
Specify the BMC dataset ID. For example, BMC.ASSET
Attribute List
Displays the list of attributes that belong to the selected class. Click Get Attributes to populate the list.
To Filter Criteria Select Values of Attributes
Define the filter criteria to compare CMDB fields to qualifications value.
Selected Qualifications
Displays the list of qualification strings. For example, in AND 'MarkAsDeleted' = NULL string, AND is the join condition, MarkAsDeleted is the field value and NULL is the qualification value .
ARX File Input
ARX File Input is a BMC provided step. It allows you to migrate data from ARX files into BMC Helix CMDB.
ARX File Input step fields and description (click to expand)
Field and description
Field
Description
File Name
Specify fieldname that will contain the filename(s) to read from.
Chunk Size
Specify number of records you want to pull in one thread.
Schema Names
AR form name that the arx file uses.
Connection
Specify the AR server name.
Fields table
List of arx data fields. The data is pushed in the displayed fields only.
Preview Rows button
Click to view the data in the fields.
Table Input
Table input step allows you to read information from a database, using a connection and SQL. Basic SQL statements are generated automatically.
BMC supports selective vendor databases which include – IBM DB2, MS SQL Server, Oracle, Sybase, and My SQL.
This video is recorded using the earlier version of BMC Atrium Core and is valid for BMC Helix CMDB 9.1 and above.
Table Input step fields and description (click to expand)
Field and description
Field
Description
Connection
The database connection from which to read data.
SQL
The SQL statement used to read information from the database connection. You can also click Get SQL select statement... to browse tables and automatically generate a basic select statement.
Enable lazy conversion
Select if you want to avoid unnecessary data type conversions. Enabling this option results in a significant performance improvements.
Replace variables in script?
Select if you want to replace variables in the script. This option allows you to test with or without performing variable substitutions.
Insert data from step
Input step name where Pentaho can expect information to come from. This information can then be inserted into the SQL statement. The locators where Pentaho inserts information is indicated by ? (question marks).
Execute for each row?
Select to perform the data insert for each individual row.
Limit size
Sets the number of lines that is read from the database. Zero (0) means read all lines.
CSV File Input
CSV File input step allows you to migrate data from CSV files into BMC Helix CMDB.
CSV File Input step fields and description (click to expand)
Field and description
Field
Description
Filename
Select the fieldname that will contain the filename(s) to read from.
Delimiter
Specify the file delimiter character used in the target file. Special characters (e.g. CHAR HEX01) can be set with the format $[value], e.g. $[01] or $[6F,FF,00,1F].
Enclosure
Specify the enclosure character used in the target file. Special characters (e.g. CHAR HEX01) can be set with the format $[value], e.g. $[01] or $[6F,FF,00,1F].
NIO buffer size
This is the size of the read buffer. It represents the amount of bytes that is read in one time from disk.
Lazy conversion
The lazy conversion algorithm tries to avoid unnecessary data type conversions and can result in a significant performance improvements if this is possible. For example, reading from a text file and writing back to a text file.
Header row present?
Select this option if the target file contains a header row containing column names.
Add filename to result
Adds the CSV filename(s) to the result of this transformation. A unique list is being kept in memory that can be used in the next job entry in a job, for example in another transformation.
The row number field name (optional)
The name of the Integer field that will contain the row number in the output of this step.
Running in parallel?
Select if you have multiple instances of this step running (step copies) and if you want each instance to read a separate part of the CSV file(s). When reading multiple files, the total size of all files is taken into consideration to split the workload. And make sure that ALL step copies receive all files that need to be read, otherwise, the parallel algorithm will not work correctly. WARNING:
For technical reasons, parallel reading of CSV files is only supported on files that don't have fields with line breaks or carriage returns in them.
File encoding
Specify the encoding of the file being read.
Fields Table
Displays an ordered list of fields to be read from the target file.
Preview button
Click to preview the data coming from the target file.
Get Fields button
Click to return a list of fields from the target file based on the current settings (i.e. Delimiter, Enclosure, etc.). All fields identified will be added to the Fields Table.
Text File Input
The Text File Input step allows you to migrate data from a variety of different text-file types.
The Text File Input step provides you with the ability to specify a list of files to read, or a list of directories with wild cards in the form of regular expressions. In addition, you can accept filenames from a previous step making filename handling more even more generic.
Text file input step fields and description (click to expand)
Field and description
Field
Description
File tab
File or directory
This field specifies the location and/or name of the input text file.
Click Add to add the file/directory/wildcard combination to the list of selected files (grid).
Regular Expression
Specify the regular expression you want to use to select the files in the directory specified in the previous option. For example, you want to process all files that have a .txt extension.
Exclude Regular Expression
Specify the regular expressions you want to exclude.
Selected files table
Displays a list of selected files (or wildcard selections) along with a property specifying if file is required or not. If a file is required and it isn't found, an error is generated. Otherwise, the filename is skipped.
Accept filenames from previous step
This option allows even more flexibility in combination with other steps such as "Get Filenames". You can construct your filename and pass it to this step. This way the filename can come from any source: text file, database table, etc.
Select this option to get filenames from previous steps.
Pass through fields from previous step
Select if you want to pass through fields from previous step.
Step to read filenames from
Specify the step from which to read the filenames.
Field in input to use as filename
Text File Input looks in this step to determine which filenames to use.
Show filenames(s)...
Displays a list of all files that will be loaded based on the current selected file definitions.
Show file content
Displays the content of the selected file.
Show content from first data line
Displays the content from the first data line only for the selected file.
Content tab
The content tab enables you to specify the format of the text files that are being read.
File type
Can be either CSV or Fixed length. Based on this selection, Atrium Integrator Spoon launches a different helper GUI when you click the Get fields button in the Fields tab.
Separator
Specify one or more characters that separate the fields in a single line of text. Typically this is ; or a tab. Special characters (e.g. CHAR ASCII HEX01) can be set with the format $[value], for example, $[01] or $[6F,FF,00,1F].
Enclosure
Some fields can be enclosed by a pair of strings to allow separator characters in fields. The enclosure string is optional. If you use repeat an enclosures allow text line 'Not the nine o''clock news.'. With ' the enclosure string, this gets parsed as Not the nine o'clock news. Special characters (e.g. CHAR ASCII HEX01) can be set with the format $[value], for example, $[01] or $[6F,FF,00,1F].
Escape
Specify an escape character (or characters) if you have these types of characters in your data. If you have \ as an escape character, the text 'Not the nine o\'clock news' (with ' the enclosure) gets parsed as Not the nine o'clock news. Special characters (e.g. CHAR HEX01) can be set with the format $[value], for example, $[01] or $[6F,FF,00,1F].
Header
Select if your text file has a header row (first lines in the file). You can specify the number of times the header lines appears.
Footer
Select if your text file has a footer row (last lines in the file). You can specify the number of times the footer row appears.
Wrapped lines
Use if you deal with data lines that have wrapped beyond a specific page limit.
Note:
Headers and footers are never considered wrapped.
Page layout
Use these options as a last resort when dealing with texts meant for printing on a line printer; use the number of document header lines to skip introductory texts and the number of lines per page to position the data lines.
Compression
Select if your text file is placed in a Zip or GZip archive.
Note:
Currently, only the first file in the archive is read.
No empty rows
Select if you do not send empty rows to the next steps.
Include filename in output
Select if you want the filename to be part of the output.
Rownum in output
Select if you want the row number to be part of the output
Format
Can be either DOS, UNIX or mixed. UNIX files have lines that are terminated by line feeds. DOS files have lines separated by carriage returns and line feeds. If you specify mixed, no verification is done.
Encoding
Specify the text file encoding to use; leave blank to use the default encoding on your system. To use Unicode, specify UTF-8 or UTF-16. On first use, Spoon searches your system for available encodings.
Limit
Sets the number of lines that is read from the file; zero (0) means read all lines.
Be lenient when parsing dates
Deselect, if you want strict parsing of data fields. If case-lenient parsing is selected, dates like Jan 32nd will become Feb 1st.
The date format locale
This locale is used to parse dates that have been written in full such as "February 2nd, 2006;" parsing this date on a system running in the French (fr_FR) locale would not work because February is called Février in that locale.
Add filenames to result
Select to add output file names as a field in the results.
Error Handling tab
The error handling tab allows you to specify how the step reacts when errors occur.
Ignore errors
Select if you want to ignore errors during parsing.
Skip error lines
Select if you want to skip those lines that contain errors. You can generate an extra file that contains the line numbers on which the errors occurred. Lines with errors are not skipped, the fields that have parsing errors, will be empty (null).
Error count field name
Add a field to the output stream rows. This field contains the number of errors on the line.
Error fields field name
Add a field to the output stream rows. This field contains the field names on which an error occurred.
Error text field name
Add a field to the output stream rows. T
This field contains the descriptions of the parsing errors that have occurred.
Warning files directory
When warnings are generated, they are placed in this directory. The name of that file is <warning dir>/filename.<date_time>.<warning extension>.
Error files directory
When errors occur, they are placed in this directory. The name of the file is <errorfile_dir>/filename.<date_time>.<errorfile_extension>.
Failing line numbers file directory
When a parsing error occurs on a line, the line number is placed in this directory. The name of that file is <errorline dir>/filename.<date_time>.<errorline extension>.
Filters tab
The filters tab provides you with the ability to specify the lines you want to skip in the text file.
Filter string
Specify the string for which to search.
Filter position
Specify the position where the filter string has to be at in the line. Zero (0) is the first position in the line. If you specify a value below zero (0), the filter string is searched for in the entire string.
Stop on filter
Specify Y if you want to stop processing the current text file when the filter string is encountered.
Positive match
Includes the rows where the filter condition is found (include). The alternative is that those rows are avoided (exclude).
Fields tab
The fields tab allows you to specify the information about the name and format of the fields being read from the text file.
Name
Name of the field.
Type
Type of the field can be either String, Date or Number.
Format
Length
For Number: Total number of significant figures in a number.
For String: total length of string.
For Date: length of printed output of the string (e.g. 4 only gives back the year).
Precision
For Number: Number of floating point digits.
For String, Date, Boolean: unused.
Currency
Used to interpret numbers like $10,000.00 or E5.000,00
Decimal
A decimal point can be a "." (10;000.00) or "," (5.000,00)
Group
A grouping can be a dot "," (10;000.00) or "." (5.000,00)
Null If
Treat this value as NULL.
Default
Default value in case the field in the text file was not specified (empty).
Trim type
Type trim this field (left, right, both) before processing.
Repeat
If the corresponding value in this row is empty, repeat the one from the last time it was not empty.
Get Data from XML
Get Data From XML allows you to read data from any type of XML file using XPath specifications. Get Data From XML step can read data from 3 kind of sources (files, stream and url) in 2 modes (user can define files and urls at static mode or in a dynamic way).
Get Data from XML step fields and description (click to expand)
Field and description
Field
Description
File tab
The Files tab is where you define the location of the XML files from which you want to read.
XML source is defined in a field
Select if the previous step is giving XML data in a certain field in the input stream.
XML source is file name
Select if the previous step is giving filenames in a certain field in the input stream. These are read.
Read source as Url
Select if the previous step is giving URLs in a certain field in the input stream. These are read.
Get XML source from a field
Specify the field to read XML, filename or URL from.
File or directory
Specify the location and/or name of the input text file. Note: Click Add to add the file/directory/wildcard combination to the list of selected files (grid) below.
Regular Expression
Specify the regular expression you want to use to select the files in the directory specified in the previous option.
Exclude Regular Expression
Specify the regular expressions you want to exclude.
Selected Files
Files from where the data is pulled.
Show filename(s)
Contains a list of selected files (or wildcard selections) and a property specifying if file is required or not. If a file is required and it is not found, an error is generated;otherwise, the file name is skipped.
Content tab
Loop XPath
For every "Loop XPath" location we find in the XML file(s), we will output one row of data. This is the main specification we use to flatten the XML file(s). Click Get XPath nodes to search for the possible repeating nodes in the XML document. Please note that if the XML document is large that this can take a while.
Encoding
The XML filename encoding in case none is specified in the XML documents.
Namespace aware
Select to make the XML document namespace aware.
Ignore comments
Ignore all comments in the XML document while parsing.
Validate XML
Validate the XML prior to parsing. Use a token when you want to replace dynamically in a Xpath field value. A token is between @_ and - (@_fieldname-). Please see the Example 1 to see how it works.
Use token
A token is not related tro XML parsing but to PDI.
Ignore empty file
An empty file is not a valid XML document. Select this if you want to ignore those altogether.
Do not raise an error if no files
Select if you don't want to raise an error if no files are found.
Limit
Limits the number of rows to this number (zero (0) means all rows).
Prune path to handle large files
Almost the same value as the "Loop XPath" property with some exceptions.
Include filename in output?
Select if you want to specify a field name to include the file name (String) in the output of this step.
Rownum in output?
Select if you want to specify a field name to include the row number (Integer) in the output of this step.
Add files to result filesname
Adds the XML filenames read to the result of this transformation. A unique list is being kept in memory that can be used in the next job entry in a job, for example in another transformation.
Fields tab
Name
The name of the output field.
XPath
The path to the element node or attribute to read
Element
The element type to read: Node or Attribute
Type
The data type to convert to
Format
The format or conversion mask to use in the data type conversion
Length
The length of the output data type
Precision
The precision of the output data type
Currency
The currency symbol to use during data type conversion
Decimal
The numeric decimal symbol to use during data type conversion
Group
The numeric grouping symbol to use during data type conversion
Trim type
The type of trimming to use during data type conversion
Repeat
Repeat the column value of the previous row if the column value is empty (null)
Excel Input
Excel Input step enables you to migrate data from Excel files into BMC Helix CMDB.
Note
When you provide input as Time datatype in a Microsoft Excel Input step, the time is not updated in Spoon because Pentaho does not support the Time datatype.
Workaround: Specify time inputs as strings and change the format to HH:mm:ss, and then run the transformation.
An example is shown in the following figure:
Excel Input step fields and description (click to expand)
Field and description
Field
Description
Files tab
File or directory
Specify the file name from where you want to pull the data.
Regular Expression
Specify the regular expression you want to use to select the files in the directory specified in the previous option.
Exclude Regular Expression
Specify the regular expressions you want to exclude.
Selected Files
A list of files that will be used in this step, according to the criteria specified in the previous fields.
Accept filenames from previous step
Select to retrieve a list of filenames from the previous step in this transformation. You must also specify which step you are importing from, and the input field in that step from which you will retrieve the filename data. If you choose this option, the Show filename(s) option will show a preview of the list of filenames.
Sheets tab
List of sheets to read
A list of worksheets that you want to use. If this remains empty, all worksheets in all specified files will be selected. Rows and columns are numbered, starting with 0.
Get sheetname(s)
Click to retrieve a list of worksheets from all of the specified files and give you the option to select some or all of them for this step.
Content tab
Header
Select if there is a header row to skip in the selected worksheets.
No empty row
Select if you want to remove empty rows from the output.
Stop on empty row
Select if you want to stop reading from the current worksheet when an empty row is read.
Limit
Sets a static number of rows to read. If set to 0, there is no set limit.
Encoding
Specifies the text file encoding to use. Leave blank to use the default encoding on your system. To use Unicode, specify UTF-8 or UTF-16. On first use, Spoon searches your system for available encodings.
Spread sheet type (engine)
Specifies which spreadsheet format to expect from the file, regardless of its extension.
Add filenames to result
Select to pass the input filenames to the output.
Error Handling tab
Strict types?
Select if you want PDI to report data type errors in the input.
Ignore errors?
Select if you don't want no errors to be reported during input parsing.
Skip error lines?
Select if you want PDI to skip lines that contain errors. These lines can be dumped to a separate file by specifying a path in the Failing line numbers files directory field below. If this is not selected, lines with errors appear as NULL values in the output.
Warning files directory
Directory in which to create files that contain warning messages regarding input values for each spreadsheet file read. These files will have the extension you specify here.
Error files directory
Directory in which to create files that contain error messages regarding input values for each spreadsheet file read. These files will have the extension you specify here.
Failing line numbers files directory
Directory in which to create files that contain the lines that failed error checks during input validation. These files will have the extension you specify here.
Fields tab
Name
The name of the field.
Type
The field's data type; String, Date or Number.
Length
The length option depends on the field type.
Number: total number of significant figures in a number.
String: total length of a string.
Date: determines how much of the date string is printed or recorded.
Precision
The precision option depends on the field type, but only Number is supported; it returns the number of floating point digits.
Trim type
Truncates the field (left, right, both) before processing. Useful for fields that have no static length.
Repeat
If set to Y, will repeat this value if the next field is empty.
Format
The format mask (number type).
Currency
Symbol used to represent currencies.
Decimal
A decimal point; this is either a dot or a comma.
Grouping
A method of separating units of thousands in numbers of four digits or larger. This is either a dot or a comma.
Additional output fields tab
Full filename field
The full file name plus the extension.
Sheetname field
The worksheet name you're using.
Sheet row nr field
The current sheet row number.
Row nr written field
Number of rows written.
Short filename field
Extension field
The three- or four-letter file type extension.
Path field
Output
AROutput
AROutput is a BMC provided step. It allows you to export data onto AR system.
AROutput step fields and description (click to expand)
Field and description
Field
Description
General tab
Step name
Name of the step; this name has to be unique in a single transformation.
Connection
Specify the the database connection to which data is written.
Form Name
Specify the AR System form name where you want to push the data.
Batch Commit Size
Specify the amount of data you want to commit in one batch.
Fallback to single row commit if batch commit fails
Select if you want to proceed with single row commit if batch commit fails.
Duplicates tab
Duplicate Request Action
Select the data duplication conditions.
Match By Request ID
Select if you want to match the data and update by request ID.
Data Handling tab
Require Required Fields
Is selected by default if you select Match by Request ID.
Enforce Pattern Matching
Is selected by default if you select Match by Request ID.
Field Mapping tab
Merge Fields
Use to map the form field to stream field.
CMDBOutput
CMDBOutput is a BMC provided step. It allows you to load data into BMC Helix CMDB.
CMDBOutput step fields and description (click to expand)
Field and description
Field
Description
Step name
Name of the step; this name has to be unique in a single transformation.
Connection
The database connection to which data is written.
Class Name
Specify the CMDB form name from where you want to push the data.
NameSpace
Specify the BMC namespace. For example BMC.Core
Dataset ID
Specify the BMC dataset ID. For example, BMC.ASSET
The Table output step allows you to load data into a database table.
BMC supports selective vendor databases which include – IBM DB2, MS SQL Server, Oracle, Sybase, and My SQL.
Table Output step fields and description (click to expand)
Field and description
Field
Description
Step name
Name of the step; this name has to be unique in a single transformation.
Connection
Specify the database connection to which data is written.
Target Schema
Specify the name of the Schema for the table to write data to. This is important for data sources that allow for table names with periods in them.
Target table
Specify the name of the table to which data is written.
Commit size
Use transactions to insert rows in the database table. Commit the connection every N rows if N is larger than zero (0); otherwise, don't use transactions. (Slower).
Note: Transactions are not supported on all database platforms.
Truncate table
Select if you want the table to be truncated before the first row is inserted into the table.
Note: Don't use this option when you are running the transformation clustered.
Main options tab
Partition data over tables
Use to split the data over multiple tables.
Use batch updates for inserts
Select if you want to use batch inserts. This feature groups inserts statements to limit round trips to the database. This is the fastest option and is enabled by default. This feature is not available under these conditions: If the transformation database is transactional, if you are using Greenplum or PostgreSQL with error handling turned on, if you are using auto-generated keys.
Is the name of the table defined in the field?
Use these options to split the data over one or more tables; the name of the target table is defined in the field you specify. For example if you store customer data in the field gender, the data might end up in tables M and F (Male and Female). There is an option to exclude the field containing the tablename from being inserted into the tables.
Return auto-generated key
Select if you want to get back the key that was generated by inserting a row into the table.
Name of auto-generated key field
Specifies the name of the new field in the output rows that contains the auto-generated key.
SQL
Generates the SQL to create the output table automatically.
Database fields tab
Fields to insert
Add Mapping
Text File Output
The Text file output step is used to export data to text file format. This is commonly used to generate Comma Separated Values (CSV files) that can be read by spreadsheet applications. It is also possible to generate fixed width files by setting lengths on the fields in the fields tab.
Text File Output step fields and description (click to expand)
Field and description
Field
Description
File tab
The File tab is where you define basic properties about the file being created.
Step name
Name of the step; this name has to be unique in a single transformation.
Filename
Specify the filename and location of the output text file.
Run this as a command instead?
Select if you want to "pipe" the results into the command or script you specify.
Do not create file at start?
Select if you don't want to create the file until the end of the step.
Extension
Adds a point and the extension to the end of the filename. (.txt)
Accept file name from field?
Select if you want to specify the filename(s) in a field in the input stream.
File name field
When the previous option is enabled, you can specify the field that will contain the filename(s) at runtime.
Include stepnr in filename
If you run the step in multiple copies (Launching several copies of a step), the copy number is included in the filename, before the extension.
Include partition nr in filename?
Select if you want to includes the data partition number in the filename.
Include date in filename
Select if you want to includes the system date in the filename.
Include time in filename
Select if you want to includes the system time in the filename.
Specify date format
Select if you want the filename to include a date and time stamp that follows the selection you choose from the drop-down box. Selecting this option disables the previous two options.
Show filename(s)
This option shows a list of the files that will be generated. Note: This is a simulation and among others depends on the number of rows that will go into each file.
Add filenames to result
Select if you want to use the Filename field in constructing the result filename. If deselected, the Filename field is ignored.
Content tab
Append
Select if you want to append lines to the end of the specified file.
Separator
Specify the character that separates the fields in a single line of text. Typically this is ; or a tab.
Enclosure
A pair of strings can enclose some fields. This allows separator characters in fields. The enclosure string is optional.
Force the enclosure around fields?
Select if you want to force all field names to be enclosed with the character specified in the Enclosure property above.
Disable the enclosure fix?
This is for backward compatibility reasons (since version 4.1) related to enclosures and separators. The logic since version 4.1 is: When a string field contains an enclosure it gets enclosed and the enclose itself gets escaped. When a string field contains a separator, it gets enclosed. Check this option, if this logic is not wanted. It has also an extra performance burden since the strings are scanned for enclosures and separators. So when you are sure there is no such logic needed since your strings don't have these characters in there and you want to improve performance, un-check this option.
Header
Select if you want the text file to have a header row. (First line in the file).
Footer
Select if you want the text file to have a footer row. (Last line in the file).
Format
This can be either DOS or UNIX. UNIX files have lines are separated by linefeeds. DOS files have lines separated by carriage returns and line feeds.
Encoding
Specify the text file encoding to use. Leave blank to use the default encoding on your system. To use Unicode specify UTF-8 or UTF-16. On first use, Spoon will search your system for available encodings.
Compression
Allows you to specify the type of compression, .zip or .gzip to use when compressing the output. Note: Currently, only one file is placed in a single archive.
Right pad fields
Select if you want to add spaces to the end of the fields (or remove characters at the end) until they have the specified length.
Fast data dump (no formatting
Select if you want to improve the performance when dumping large amounts of data to a text file by not including any formatting information.
Split every ... rows
If this number N is larger than zero, split the resulting text-file into multiple parts of N rows.
Add Ending line of file
Select if you want to specify an alternate ending row to the output file.
Fields tab
The fields tab is where you define properties for the fields being exported.
Name
The name of the field.
Type
Type of the field can be either String, Date or Number.
Format
The format mask to convert with. See Number Formats for a complete description of format symbols.
Length
The length option depends on the field type follows:
Number - Total number of significant figures in a number
String - total length of string
Date - length of printed output of the string (e.g. 4 only gives back year)
Precision
The precision option depends on the field type as follows:
Number - Number of floating point digits
String - unused
Date - unuse
Currency
Symbol used to represent currencies like $10,000.00 or E5.000,00
Decimal
A decimal point can be a "." (10,000.00) or "," (5.000,00)
Group
A grouping can be a "," (10,000.00) or "." (5.000,00)
Trim type
The trimming method to apply on the string. Trimming only works when there is no field length given.
Null
If the value of the field is null, insert this string into the textfile.
Get
Click to retrieve the list of fields from the input fields stream(s)
Minimal width
Alter the options in the fields tab in such a way that the resulting width of lines in the text file is minimal. So instead of save 0000001, we write 1, etc. String fields will no longer be padded to their specified length.
Insert\Update
The Insert/Update step first looks up a row in a table using one or more lookup keys. If the row can't be found, it inserts the row. If it can be found and the fields to update are the same, nothing is done. If they are not all the same, the row in the table is updated.
Note: If you have multiple rows with the same keys that match, only the first row found is compared. This may lead to different results, depending on if the found row matches with given values or not. The update scenario looks like this: If a difference is found in the case of multiple rows with the same key, an UPDATE statement is fired against the database that updates all rows with the matching keys. This note also applies to the Update step.
Insert\Update step fields and description (click to expand)
Field and description
Field
Description
Step name
Name of the step; this name has to be unique in a single transformation.
Connection
Specify the database connection to which data is written.
Target schema
Specify the name of the Schema for the table to which data is written. This is important for data sources that allow for table names with periods in them.
Target table
Name of the table in which you want to do the insert or update.
Commit size
The number of rows to change (insert / update) before running a commit.
Don't perform any updates
Select if you want the values in the database to be never updated and only inserts are performed.
Key Lookup table
Allows you to specify a list of field values and comparators. You can use the following comparators: =, <>, <, <=, >, >=, LIKE, BETWEEN, IS NULL, IS NOT NULL
Note: Click Get fields to retrieve a list of fields from the input stream(s).
Update Fields
Allows you to specify all fields in the table you want to insert/update including the keys. Avoid updates on certain fields by specifying N in the update column.
Note: Click Get Update fields to retrieve a list of update fields from the input stream(s).
SQL button
Click SQL to generate the SQL to create the table and indexes for correct operation.
XML Output
This XML Output step allows you to write rows from any source to one or more XML files.
XML Output step fields and description (click to expand)
Field and description
Field
Description
File tab
The file tab is where you set general properties for the XML output file format
Step name
Name of the step; this name has to be unique in a single transformation.
Filename
Specify the file name and location of the output text file
Do not create file at start
Select if you don't want to create the file until the end of the step.
Extension
Adds a period and the extension to the end of the file name (.xml)
Include stepnr in filename
If you run the step in multiple copies (see also Launching Several Copies of a step), the copy number is included in the file name, before the extension
Include date in filename
Select if you want to include the system date in the file name
Include time in filename
Select if you want to include the system date in the file name
Specify Date time format
Select if you want the filename to include a date and time stamp that follows the selection you choose from the drop-down box. Selecting this option disables the previous two options.
Add filenames to result
Select if you want to use the Filename field in constructing the result filename. If deselected, the Filename field is ignored.
Content tab
Zipped
Select if you want the XML file to be stored in a ZIP archive.
Encoding
The encoding to use. This encoding is specified in the header of the XML file.
Parent XML element
The name of the root element in the XML document.
Row XML element
The name of the row element to use in the XML document.
Split every ... rows.
The maximum number of rows of data to put in a single XML file before another is created.
Omit null values from XML Output
Fields tab
Fieldname
The name of the field.
Elementname
The name of the element in the XML file to use. Type: Type of the field can be either String, Date, or Number.
Type
Type of the field can be either String, Date, or Number. Format mask from which to convert.
Length
The length option depends on the field type follows:
Number - Total number of significant figures in a number
String - total length of string
Date - length of printed output of the string (e.g. 4 only gives back year)
Note: the output string is padded to this length if it is specified.
Precision
The precision option depends on the field type as follows:
Number - Number of floating point digits
String - unused
Date - unused
Currency
Symbol used to represent currencies such as $10,000.00 or E5.000,00.
Decimal
A decimal point can be a "." (10,000.00) or "," (5.000,00).
Group
A grouping can be a "," (10,000.00) or "." (5.000,00).
Null
If the value of the field is null, insert this string into the textfile.
Get fields
Click to retrieve the list of fields from the input stream(s).
Minimal width
Alter the options in the fields tab in such a way that the resulting width of lines in the text file is minimal; for example instead of save 0000001, "1" is written, and so on. String fields are not padded to their specified length.
Excel Output
The Excel Output step allows you to write data to one or more Excel files.
Excel Output step fields and description (click to expand)
Field and description
Field
Description
File tab
Filename
The name of the spreadsheet file you are reading from.
Do not create file at start
Select if you don't want to create the file until the end of the step.
Extension
The three-letter file extension to append to the file name.
Include stepnr in filename
If you run the step in multiple copies (launching several copies of a step), the copy number is included in the file name, before the extension.
Include date in filename
Select if you want to include the system date in the filename
Include time in filename
Select if you want to include the system time (24-hour format) in the filename
Specify Date time format
Select if you want the filename will to include a date and time stamp that follows the selection you choose from the drop-down box. Selecting this option disables the previous two options.
Show file name(s)
Displays a list of the files that will be generated. This is a simulation and depends on the number of rows that will go into each file.
Add filenames to result
Select if you want to use the Filename field in constructing the result filename. If deselected, the Filename field is ignored.
Content tab
Append
Select if you want to append lines to the end of the specified file. If the file does not exist, a new one will be created.
Header
Select if you want a header to appear before the spreadsheet grid data.
Footer
Select if you want a footer to appear after the spreadsheet grid data.
Encoding
Specifies the text file encoding to use. Leave blank to use the default encoding on your system. To use Unicode, specify UTF-8 or UTF-16. On first use, Spoon searches your system for available encodings and populates this list accordingly.
Split every ... rows
Specify after how many rows, start a new spreadsheet file to continue data output.
Sheet name
Specifies the name of the worksheet within the spreadsheet file.
Product sheet?
Select if you want to enable password protection on the worksheet. You must also specify a password in the Password field.
Auto size columns
Select if you want to, automatically size the worksheet columns to the largest value.
Retain NULL values
Select if you want to preserve NULL values in the output. If deselected, NULLs are replaced with empty strings.
Use Template
Select if you want PDI to use the specified Excel template to create the output file. The template must be specified in the Excel template field.
Append to Excel
Select if you want append output to the specified Excel template.
Fields tab
Name
The name of the field.
Type
The field's data type; String, Date or Number.
Format
The format mask (number type).
Update
Update is similar to the Insert/Update step except that updates only are performed. No inserts are performed.
Update step fields and description (click to expand)
Field and description
Field
Description
Step name
Name of the step; this name has to be unique in a single transformation.
Connection
Specify the database connection to which data is written
Target schema
Specify the name of the Schema for the table to which data is written. This is important for data sources that allow for table names with periods in them.
Target table
Name of the table in which you want to do the insert or update.
Commit size
Specify the number of rows to change (insert / update) before running a commit.
Skip lookup
Ignore lookup failure
Key Lookup table
Allows you to specify a list of field values and comparators. You can use the following comparators: =, <>, <, <=, >, >=, LIKE, BETWEEN, IS NULL, IS NOT NULL
Note: Click Get fields to retrieve a list of fields from the input stream(s).
Update Fields
Allows you to specify all fields in the table you want to update including the keys. Avoid updates on certain fields by specifying N in the update column.
Note: Click Get Update fields to retrieve a list of update fields from the input stream(s).
SQL button
Click SQL to generate the SQL to create the table and indexes for correct operation.
AR Upsert
AR Upsert is a BMC provided step.The Checksum mechanism in the AR Upsert step compares the Checksum value of the source record and the target record to determine whether or not the source record needs to be imported.
ARUpsert step fields and description (click here to expand)
Field and description
Field
Description
General tab
Connection
Click the New button to create a new AR System server connection. Click the Edit button to edit an existing connection. Select an existing connection from the drop down menu.Click Wizard for a step-by-step guided procedure to create a new connection.
Form Name
Name of the AR System form where you output the data. This also works as a staging form when you provide the Alternate Target Form Name.
Use Checksum Checkbox
Performs checksum comparison in update mode.
Alternate Target Form Name
Specify when the target form is different from the staging form. When you provide the Alternate Target Form Namein an AR Upsert step, the operations are performed on the staging form. When you update any record, an entry is inserted into the staging form.
Batch Commit Size
(Optional) When you specify a Batch Commit Size, the bulk API import the data. You can specify the Batch Commit Size as variable. When you specify the Batch Commit Size as a variable, the AR Upsert step uses the actual value provided.
Best Practice:
When the source data has large number of duplicate records, BMC recommends either of the following:
Use step that has the filtering capability, such as Filtering unique records, which filters the duplicate records before calling AR Upsert step or AROutput step.
Keep the batch size smaller that can be compared with the unique records.
Fallback to single row commit if batch commit fails Checkbox
Use this option to switch to a single-row commit if batch-commit fails.
Insert Only
Creates a new record if an entry does not exist in AR System form. Skips the record, if an entry already exists in an AR System form. The Insert Only mode ignores the Use Checksum flag and adds a new entry.
Update Record
Performs an update as well as insert operation, based on the Checksum conditions.
Match By Request ID
Select to match output rows by Request ID.When you select the Match by Request ID checkbox, a new entry is created when the Request ID is not provided for the input row.
Multi Match Option
Select when more than one record can matche the given qualification.
Configure Matching Qualification
Select when more than one record can matche the given qualification.
Transform
Add Constants
Adds constant values to the specified field in the destination data source. For example, Calbro Services can add Disk drive as the Short Description value when transferring disc drive information from their SQL table to the BMC_Disc Drive CI class.
Add Constants step fields and description (click to expand)
Field and description
Field
Description
Step name
Name of the step; this name has to be unique in a single transformation.
Fields
Specify the name, type, and value in the form of a string. Then, specify the formats to convert the value into the chosen data type.
Select Values
This Select values step is useful for selecting, renaming, changing data types and configuring the length and precision of the fields on the stream. These operations are organized into different categories:
Select & Alter - Specify the exact order and name in which the fields have to be placed in the output rows
Remove - Specify the fields that have to be removed from the output rows
Meta-data - Change the name, type, length and precision (the meta-data) of one or more field
Select Values step fields and description (click to expand)
Field and description
Field
Description
Select & Alter tab
Step name
Name of the step; this name has to be unique in a single transformation.
Fields
Allows you to rename a field and specify the length and precision.
Get fields to select
Click to insert fields from all input steams to the step.
Edit Mapping
Click to open a mapping dialog to easily define multiple mappings between source and target fields.
Note: Works if there is only one target output step.
Include unspecified fields, ordered by name
Select if you want to implicitly select all other fields from the input stream(s) that are not explicitly selected in the Fields section
Remove tab
This tab allows you to enter the fields that you want removed from the stream. You can also click Get fields to remove to add all fields from the input stream(s). This makes it easier if you are trying to remove several fields. After getting all fields, delete any of the fields that you do not want remove from the stream.
Field name
Get fields to remove
Click to add all fields from the input stream(s).
Meta-data tab
This tab allows you to rename, change data types, and change the length and precision of fields coming into the Select Values step. Click Get fields to import fields from previous steps.
A lot of data type conversions are also possible with this tab.
Fields to alter the meta data for
Get fields to change
Sort Rows
The Sort rows step sorts rows based on the fields you specify and on whether they should be sorted in ascending or descending order.
Notes:
Kettle has to sort rows using temporary files when the number of rows exceeds the specified sort size (default 1 million rows). When you get an out of memory exception (OOME), you need to lower this size of change your available memory.
When you use multiple copies of the step in parallel (on the local JVM with "Change number of copies to start" or in a clustered environment using Carte) each of the sorted blocks need to be merged together to ensure the proper sort sequence. This can be done, be adding the Sorted Merge step afterwards (on the local JVM without multiple copies to start or in the cluster on the master).
Sort Rows step fields and description (click to expand)
Field and description
Field
Description
Step name
Name of the step; this name has to be unique in a single transformation.
Sort directory
The directory in which the temporary files are stored in case when needed; the default is the standard temporary directory for the system
TMP-file prefix
Choose an easily recognized prefix so you can identify the files when they show up in the temp directory.
Sort size (rows in memory)
Free memory threshold (in %)
If the sort algorithm finds that it has less available free memory than the indicated number, it will start to page data to disk.
Note: This is not exact, because:
This is checked every 1000 rows. Depending on the row size and other steps within complex transformations this could still lead to an OutOfMemoryError.
In a Java Virtual Machine it's not possible to know the exact amount of free memory. As such BMC recommends you don't use this for very complex transformations with other steps and processes that use up a lot of memory.
Compress TMP Files
Select if you want to compress temporary files when they are needed to complete the sort.
Only pass unique rows?
Select if you want to pass unique rows only to the output stream(s).
Fields table
Specify the fields and direction (ascending/descending) to sort. You can specify whether to perform a case sensitive sort (optional)
Get Fields
Click to retrieve a list of all fields coming in on the stream(s).
Calculator
Provides you with predefined functions that can be executed on input field values. For example, Calbro Services can calculate the total cost of the computer systems in all their sites and add that information to BMC Helix CMDB.
For more information, see the Using the Calculator step video. This video is recorded using the earlier version of BMC Atrium Core and is valid for BMC Helix CMDB 9.1.
Scripting
Modified Java Script Value
Provides an expression-based user interface for building JavaScript expressions. Allows you to create multiple scripts for each step. For example, Calbro Services can create an if/then/else statement to identify the operating system of a computer based on its serial number. The script can also include the concatenate function, which creates a work group by appending the virtual machine type to the operating system.
Modified Java Script Value step fields and description (click to expand)
Field and description
Field
Description
Step name
Name of the step; this name has to be unique in a single transformation.
Java script functions
This section provides a tree view of your available scripts, functions, input fields and output fields.
Transformation Scripts: displays a list of scripts you have created in this step
Transformation Constants: a list of pre-defined, static constants including SKIP_TRANSFORMATION, ERROR_TRANSFORMATION, and CONTINUE_TRANSFORMATION
Transformation Functions: contains a variety of String, Numeric, Date, Logic and specialized functions you can use to create your script. To add a function to your script, simply double-click on the function or drag it to the location in your script that you wish to insert it.
Input Fields: a list of inputs coming into the step. Double-click or use drag and drop to insert the field into your script.
Output Fields: a list of outputs for the step.
Java Script
This section is where you edit the script for this step. You can insert functions, constants, input fields, etc. from the tree control on the left by double-clicking on the node you wish to insert or by dragging the object onto the Java Script panel.
Fields
The Fields table contains a list of variables from your script including the ability to add metadata like a descriptive name.
Get Variables
Click to retrieve a list of variables from your script.
Test script
Click to test the syntax of your script.
Flow
Filter rows
Allows you to filter rows based on conditions and comparisons. After this step is connected to a previous step, you can click the "<field>", "=" and "<value>" areas to construct a condition.
Filter rows step fields and description (click to expand)
Field and description
Field
Description
Step name
Name of the step; this name has to be unique in a single transformation.
Send 'true' data to step
The rows for which the condition specified is true are sent to this step.
Send 'false' data to step
The rows for which the condition specified are false are sent to this step
The Condition
Click NOT to negate the condition. Click <Field> to select from a list of fields from the input stream(s) to build your condition(s). Click <value> button to enter a specific value into your condition(s). To delete a condition, right-click and select Delete Condition.
Add Condition
Click to add a condition
Switch/Case
Implements the Switch/Case statement found in popular programming languages like Java. Pentaho routes rows of data to one or more target steps based on the value encountered in a certain field.
Switch/Case step fields and description (click to expand)
Field and description
Field
Description
Step name
Name of the step; this name has to be unique in a single transformation.
Field name to switch
Specify the field name that contains the value to use as a bases for the row routing.
Use string contains
Case value data type
Specify the data type of the values specified in this dialog
Case value conversion mask
Specify the conversion mask of the values specified in this dialog (numeric / date values)
Case value decimal symbol
Specify the decimal symbol of the values specified in this dialog (numeric values)
Case value grouping symbol
Specify the grouping symbol of the values specified in this dialog (numeric values)
Case values
Here you can specify a value-target step pair, one per row. To specify a null value, simply leave the value column blank while entering a target step. Please note that it is possible to specify the same target step more than once.
Default target step
Select all the rows that don't match any of the case values above are sent to this target step.
Identify last row in a stream
Lookup
CMDBLookUp
This is a BMC provided step. Allows you to look up values in the BMC Helix CMDB based on the primary key that you specified. Lookup values are added as new fields onto the stream.
CMDB Lookup step fields and description (click to expand)
Field and description
Field
Description
Step name
Name of the step; this name has to be unique in a single transformation.
Connection
Specify the database connection to the lookup table.
ClassName
Dataset ID
Enable cache?
Select if you want to enable caching of database lookups. This means that once a key (or group of key) has been looked up, the looked up values are stored, and returned again the next time this key (or group of key) is being looked up (without incurring the cost of a database call).
Cache size on rows
Specify the size of the cache (number of rows).
Load all data from table
Select if you want to pre-load the cache with all the data present in the lookup table. This can improve performance by avoiding database calls. However, if you have a large table, there is a risk of running out of memory.
Important
In memory look-ups can lead to different results because of the differences in the way your database compares data. For example, if your database table allows case-insensitive comparisons, you may get different results with this option. The same goes for data with trailing spaces. Those are ignored in certain databases when using certain character data types.
The key(s) to lookup the value(s)
The keys and conditions to perform the database lookup.
Values to return from the lookup table
The fields from the lookup table to add to the output stream.
New name: allows you to use a different name if the database column name is inappropriate.
Default: is the value returned (instead of null) if the lookup fails. Note that enabling error handling on the Database Lookup step will redirect all failed lookup rows to the error handling step, instead of adding them to the main output steam with null/default values.
Type: is the type of the output field.
Do not pass rows if lookup fails
Select to avoid passing a row when lookup fails.
Fail on multiple results
Select to force the step to fail if the lookup returns multiple results.
Order by
If the lookup query returns multiple results, the ORDER BY clause helps you to select the record to take. For example, ORDER BY allows you to pick the customer with the highest sales volume in a specified state.
Get fields
Get lookup fields
Database Lookup
Looks up values in a database table. Lookup values are added as new fields onto the stream.
Database Lookup step fields and description (click to expand)
Field and description
Field
Description
Step name
Name of the step; this name has to be unique in a single transformation.
Connection
Specify the database connection to the lookup table.
Lookup Schema
Specify the database schema containing the lookup table.
Lookup table
Name of the database table used for the lookup.
Enable cache?
Select if you want to enable caching of database lookups. This means that once a key (or group of key) has been looked up, the looked up values are stored, and returned again the next time this key (or group of key) is being looked up (without incurring the cost of a database call).
Cache size on rows
Specify the size of the cache (number of rows).
Load all data from table
Pre-loads the cache with all the data present in the lookup table. This can improve performance by avoiding database calls. However, if you have a large table, there is a risk of running out of memory.
Important
In memory look-ups can lead to different results because of the differences in the way your database compares data. For example, if your database table allows case-insensitive comparisons, you may get different results with this option. The same goes for data with trailing spaces. Those are ignored in certain databases when using certain character data types.
The key(s) to lookup the value(s)
The keys and conditions to perform the database lookup.
Values to return from the lookup table
The fields from the lookup table to add to the output stream.
New name: allows you to use a different name if the database column name is inappropriate.
Default: is the value returned (instead of null) if the lookup fails. Note that enabling error handling on the Database Lookup step will redirect all failed lookup rows to the error handling step, instead of adding them to the main output steam with null/default values.
Type: is the type of the output field.
Do not pass rows if lookup fails
Select to avoid passing a row when lookup fails.
Fail on multiple results
Select to force the step to fail if the lookup returns multiple results.
Order by
If the lookup query returns multiple results, the ORDER BY clause helps you to select the record to take. For example, ORDER BY allows you to pick the customer with the highest sales volume in a specified state.
Get fields
Click to return a list of available fields from the input stream(s) of the step.
Get lookup fields
Click to return a list of available fields from the lookup table that can be added to the step's output stream.
Jobs
Get variables
Allows you to get the value of a variable. This step can return rows or add values to input rows.
Set variables
Allows you to set variables in a job or in the virtual machine. It accepts one (and only one) row of data to set the value of a variable.
Was this page helpful? YesNoSubmitting...Thank you
Comments
Log in or register to comment.