Developing a custom extractor module

Use the Integration Studio to develop and work with custom extractor modules.

An Extractor is a class that extends the ETL::Extractor class. In order to produce data and perform extraction activities, you need to implement the abstract extract method. 

In addition, you also need to implement the following methods:

  • connect: To connect the extractor module to external sources.module to external sources.
  • disconnect: To close the open connections.

For more information, see the following sections:

About the abstract methods

The call sequence to the methods is always connect-extract-disconnect. Although, note that it is perfectly correct to perform the connection and disconnection inside the extract method. There is no difference in execution, and the separation between the three phases is only for code readability.

The following example shows an implementation of the extract method:

Example

Consider a scenario where you need to extract the number of orders from the database of an internet banking application, where orders are traced in the following ORDERS table:

Timestamp

OrderCode

Count

2011-01-01 10:00:00

A12

220

2011-01-01 10:00:00

A13

101

2011-01-01 10:00:00

B12

304

2011-01-01 10:00:00

B13

210

2011-01-01 10:00:00

C12

350

The ETL has to apply a complex decoding rule to OrderCode to obtain the required set of metrics. For instance:

  • If the timestamp is between "00:00:00" and "05:00:00", it implies that the bank teller is closed. Any operation during this period is performed by monitoring robots and is accounted as "TEST".
  • If the order code is "A12", the order is a Credit Card Payment.
  • If the order code starts with the letter "B", the order is an operation on STOCKS.
  • Else, the order has to be accounted in the OTHER category.

Even if the table structure is very simple, the set of interpretation rules to apply is complex and requires the application of a specific logic to the results of the SQL query. This is a typical situation in which a custom extractor can help.

Getting information from a JDBC database

If you are connecting to a JDBC database, ensure that the JDBC driver file (.jar file) is stored in the <Installation_directory_of_Capacity_Optimization>/etl/libext directory on the ETL server that runs the ETL task.

If you are using a database other than Oracle or Microsoft SQL Server, you must explicitly specify the JDBC driver information when you are configuring the extractor. For more information, see Generic - Database extractor (Java).

To develop a custom extractor module

  1. Creating a custom extractor module
  2. Editing the extractor code
  3. Activating a custom extractor module
  4. (Optional) To store your ETL code in a secure location and share the code, see Saving a copy of an Integration Studio project.
  5. (Optional) To debug the custom extractor module, see Debugging a custom parser or extractor module.

Where to go from here

Creating a custom extractor module

Was this page helpful? Yes No Submitting... Thank you

Comments