Editing parser code



Every new custom parser module that you create in the Integration Studio uses a code template that contains some auto-generated code pieces for your convenience.

To understand the working of a parser, review the example in this topic. The example starts with an empty class, without using the code template.

Example of parser code

The following examples sequentially illustrate the initial code that you need to write and then, the other operations that you can perform on parsers.

  1. Writing initial parser code:

    Example 1: Initial Code
    package ETL.parser;
    import com.neptuny.cpit.etl.DataSetList;
    import com.neptuny.cpit.etl.parser.Parser;
    /**
    * Parser template
    */
    public class MyParserP extends Parser {
    @Override
    public DataSetList parse(String filename) throws Exception {
    // ...parsing code goes here...
    }

    @Override
    public DataSetList adjustParseResult(DataSetList in) throws Exception {
    //Add here after parse code, if needed
    return in;
    }
    }

    This class contains only a constructor, and the parse method.

  2. Next, you instruct the parser to prepare the output datasets.
    The following example illustrates how:

    Example 2: Output datasets

    To extract, for example, the CPU Utilization metric, you can access the ETL Datasets view. This metric belongs to the SYSGLB (Global System Data) dataset. Hence, the output dataset can be built using the following code:

    DataSetList dsList = new DataSetList();
    DataSet res = new DataSet("SYSGLB");
    this.getConf().getDefChecker().initializeColumns(res);
    dsList.add(res);
  3. After you prepare the output datasets, you need to open the file and read lines of text. The following example illustrates how:

    Example 3: Open and Read operations
    long totallines = 0;
    long goodlines = 0;
    long convertedlines = 0;

    BufferedReader filereader = new BufferedReader(new FileReader(new File(filename)));
    try{
    String line;
    while((line = filereader.readLine())!=null){
    totallines++;

    line = line.trim();

    if(line.length()==0){
    continue; //Skip empty lines
    }

    goodlines++;

    //TODO: implement parse method here

    String[] row = res.newRow();
    res.fillRow("TS","2007-07-01 10:00:00",row);
    res.fillRow("DURATION","300",row);
    res.fillRow("DS_SYSNM","server1",row);
    res.fillRow("CPU_UTIL","0.5",row);
    res.addRow(row);

    convertedlines++; //If line has been correctly parsed and imported, increment converted line counter
    }
    } finally {
    filereader.close();
    }
  4. Last, you need to parse a line of text, extract the CPU Utilization samples, and put the data in the dataset.

    Note

    The SYSGLB dataset has three mandatory columns: timestamp (TS), DURATION of the sample and the system name (DS_SYSNM). In this example, the metric name for CPU Utilization is CPU_UTIL.

    The code for parsing lines and filling the dataset looks similar to the one as elaborated in the following example:

    Example 3: Parsing lines, and filling the dataset
    Pattern linePattern = Pattern.compile("(\\d{4}-\\d{2}-\\d{2})\\/CPU:(.*)");
    Matcher lineMatcher = linePattern.matcher(line);
    if (lineMatcher.matches()){
    String day = lineMatcher.group(1);
    String[] samples = lineMatcher.group(2).split(",");
    for(int h=0;h<samples.length;h++) {
    String dayhour = String.format("%s %02d:00:00", day, h);
    Double val = Double.parseDouble(samples[h])/100;
    String[] row = res.newRow();
    res.fillRow("TS",dayhour,row);
    res.fillRow("DURATION","3600",row);
    res.fillRow("DS_SYSNM","server1",row);
    res.fillRow("CPU_UTIL",Double.toString(val),row);
    res.addRow(row);
    }
    }

Badly formed file

BMC recommends you to add pieces of control code to the main parser code to ensure the successful execution of a parser. The absence of control code results in a badly formed file – a file that contains incorrectly formed lines, which are eventually rejected by the parser. Such files are very common and are often encountered by the parser.

Rejection Percentage

As a best practice, it is recommended for you to calculate a rejection percentage (denoted by rp), on parsed content. rp is the percentage of rejected lines over the total number of lines expected to be good and well formed.

Toeffectivelyderivearejection percentage value, do the following:

  • Count the total number of lines in the file (tot).
  • Count the number of lines that match the regular expression used to select good lines, (match).
  • In most cases, match = tot/2.
  • Therefore —
    rp = (match - (tot/2) / (tot/2) * 100

After the rejection percentage rp is calculated, it can be logged to help the administrator in detecting bad files, or an error can be generated if the rp is too high. Using the rp command is only a recommendation.

Adding the rp calculation code and logging functionality, the parser is complete (see the code example in Developing-a-custom-parser-module).

Where to go from here

Activating-a-custom-parser-module

 

Tip: For faster searching, add an asterisk to the end of your partial query. Example: cert*