This documentation supports releases of BMC Helix Continuous Optimization up to December 31, 2021. To view the latest version, select the version from the Product version menu.

Utilities to develop ETLs


The com.neptuny.cpit.etl.util packages included in the ETL Development API (EDA) provide a rich set of utility objects to develop ETLs.

Date and time utilities

This section elaborates common methods present in the com.neptuny.cpit.etl.util.DateUtil Java class that the ETL Development API offers.

The ETL timestamp format

The standard ETL timestamp format is YYYY-MM-DD hh:mm:ss, for example "2011-07-16 19:20:30". It is identical to the ISO-8601 format, barring the "T" character between date and time, for example "2011-07-16T19:20:30+01:00"; the meaning of symbols in both formats is the same. The following table lists these symbols along with their meaning:

ETL Timestamp symbols

As in ISO-8601, the absence of the time zone designator indicates that the timestamp is in local time.

The writeTimestamp method

A useful method provided by the DateUtil class is writeTimestamp. It receives a timestamp in standard ETL format and prints it using a specified date and time format. Consider the following example in Java:

DateUtil du = new DateUtil(DateUtil.ENGLISH_LOCALE);
String pts = du.writeTimestamp("2011-01-01 10:00:00","%Y-%m-%d");
# pts now contains "2011-01-01"

The date and time strings can be formatted in various ways using the rich character set provided by the module. The following is a list of all characters allowed in the format string:

%%      PERCENT
%a      Day of the week abbreviation
%A      Day of the week
%b      Month abbreviation
%B      Month
%c      MM/DD/YY HH:MM:SS
%C      Time format (for example, "Sat Nov 19 21:05:57 2010")
%d      Numeric day of the month, with leading zeros (for example, 01, 02, ..., 31)
%e      Numeric day of the month, without leading zeros (for example, 1, 2, ..., 31)
%D      MM/DD/YY
%G      GPS week number (for example, weeks since January 6, 2011)
%h      Hour abbreviation
%H      Hour, 24 hour clock, with leading zeros
%I      Hour, 12 hour clock, with leading zeros
%j      Day of the year
%k      Hour
%l      Hour, 12 hour clock
%L      Month number, without leading zeros (for example, 1, 2, ..., 12)
%m      Month number, with leading zeros (for example, 01, 02, ..., 12)
%M      Minute, with leading zeros
%n      NEWLINE
%o      Ornate day of month (for example, 1st, 2nd, 25th)
%p      AM or PM
%P      am or pm
%q      Quarter number, starting with 1
%r      Time format as in "09:05:57 PM"
%R      Time format as in "21:05"
%s      Seconds since the Epoch, UCT
%S      Seconds, with leading zeros
%t      TAB
%T      Time format as in "21:05:57"
%U      Week number, with Sunday as first day of the week
%w      Day of the week, numerically (for example, Sunday is day 0, Monday is day 1)
%W      Week number, with Monday as first day of the week
%x      Date format as in "11/19/99"
%X      Time format as in "21:05:57"
%y      Year, 2 digits
%Y      Year, 4 digits
%Z      Timezone in ascii (for example, "PST")
%z      Timezone in numerical format (for example, "-/+0000")

Some useful format strings are already provided by the module, and are listed in the following table:

Format strings

Based on the formats mentioned in the table, the preceding Java code example can be rewritten as:

DateUtil du = new DateUtil(DateUtil.ENGLISH_LOCALE);
String pts = du.writeTimestamp("2011-01-01 10:00:00",
DateUtil.YYYY_MM_DD_FORMAT);
# pts now contains "2011-01-01"

The readTimestamp method

Another useful method provided by the DateUtil class is readTimestamp. It recognizes most of the common timestamp formats and converts them into the standard ETL timestamp format. The following code examples in Java script illustrate the use of this method:

DateUtil du = new DateUtil(DateUtil.ENGLISH_LOCALE);
String pts = du.readTimestamp("2011-01-01 10:00:00");
# ts now contains "2011-01-01 00:00:00"

The setDefaultLocale method

For some formats, the interpretation is locale-dependent, for instance, the U.S. notation for dates is MM/DD/YYYY while the European one is DD/MM/YYYY. To resolve ambiguities when the date is formatted as XX/XX/YYYY, the DateUtil class provides the setDefaultLocale method, which is called before any date format is interpreted.

The following table presents a list of all supported locales:

Supported locales

Another effect obtained with the setDefaultLocale method is the ability to understand day and month names, and to format dates according to local settings.

Consider the following code example that illustrates how you can set the default locale for all new instances of DateUtil:

# sets the default locale for all new instances of DateUtil
ETL::util::DateUtil::setDefaultLocale("IT");
my $du = new ETL::util::DateUtil();
my $ts = $du->readTimestamp("01Gen2011");
# now ts contains "2011-01-01 00:00:00.000"
my $tsp = $du->writeTimestamp($ts,"%Y-%m-%d");
# now tsp contains "2011-01-01"

As an alternative, you can specify the locale for a single instance of DateUtil in the constructor. The following code examples explain how:

DateUtil du = new DateUtil(DateUtil.ITALIAN_LOCALE);

The following table lists all formats recognized by readTimestamp. The order of rows in the table is important, because the readTimestamp method parses the list from top to bottom and applies the first matching format.

Formats recognized by the readTimestamp method

The diffTs method

This method provides the difference in milliseconds between two timestamps, passed as parameters. The following code example illustrates the use of this method:

DateUtil du = new DateUtil(DateUtil.ENGLISH_LOCALE);

String pts = du.diffTs("2011-01-01 15:01:00","2011-01-01 15:00:00");
# now pts contains "60000", that is, one minute in milliseconds

Other methods

DateUtil provides some other useful methods that modify seconds, minutes, hours, and days of the given timestamp. These are:

  • addSeconds
  • addMinutes
  • addHours
  • addDays

The following code example illustrates the use of these methods:

DateUtil du = new DateUtil(DateUtil.ENGLISH_LOCALE);

String pts = du.addSeconds("2011-01-01 15:00:00",22);
# pts now contains "2011-01-01 15:00:22"

String pts = du.addMinutes("2011-01-01 15:00:00",22);
# pts now contains "2011-01-01 15:22:00"

String pts = du.addHours("2011-01-01 15:00:00",2);
# pts now contains "2011-01-01 17:00:00"

String pts = du.addDays("2011-01-01 15:00:00",2);
# pts now contains "2011-01-03 15:00:00"

CSV file utilities

Comma Separated Value (CSV) files are a very common data format. EDA provides the com.neptuny.cpit.etl.util class that contains useful methods for processing such files.

The safeSplit method

This method splits fields in different lines of a CSV file based on the double quote (") string delimiter.

To better understand the utility of this method, consider the following example where a particular line requires to be split based on the double quote delimiter:

"first","12,123,23","a"

If you use the safeSplit method, you can split the line and obtain 3 fields as expected. The following code example explains how this is done:

ArrayList<String> safeSplit(",", String line)

The detectSeparator method

This method automatically detects field separators present in different lines of a CSV file.

The following code example illustrates the usage of this method:

String sep = CSVUtil.detectSeparator(String line){

The technique that this method uses to find the separator is based on heuristics: it assumes that if a line in the CSV file contains a character that is not a word character, number, parenthesis, or a quoting delimiter, then it probably is a field separator.

Socket utilities to handle TCP and HTTP socket connections

The ETL Development API provides various methods to handle socket connections that are present in the following class:

  • (com.neptuny.cpit.etl.util.SocketUtil)

You can connect to a remote host that supports TCP and UDP connections using the following method:

  • connect(protocol, host, port)

Once the connection is established, the following methods can be used to send and receive data through the socket:

  • send(message)
  • receive(max_row) (the max_row parameter is optional)

The following code examples illustrate the usage of methods discussed so far in this section:

SocketUtil s = new SocketUtil();
# set the connection timeout
s.setTimeout(200);
# connect to server
s.connect("tcp","HOST","PORT");
# receive 4 lines from socket
ArrayList<String> retline = s.receive(4);
s.send("a message");
s.close();


 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*