• Blog
  • Talend DI
  • Mimicking analytic functions in data flows in Talend Data Integration jobs
Monday, 27 October 2014 10:19

Mimicking analytic functions in data flows in Talend Data Integration jobs

Written by
Rate this item
(0 votes)

Analytic functions compute an aggregate value based on a group of rows. Two common examples are Lead and Lag functions, which allow you to access the NEXT and PREVIOUS row values in a dataset (essentially, finding a value in a row a specified number of rows from a current row).


Oracle Docs define Lead and Lag as follows: The LAG and LEAD functions are useful for comparing values when the relative positions of rows can be known reliably. They work by specifying the count of rows which separate the target row from the current row. Because the functions provide access to more than one row of a table at the same time without a self-join, they can enhance processing speed. The LAG function provides access to a row at a given offset prior to the current position, and the LEAD function provides access to a row at a given offset after the current position.

Suppose you wanted to create a cumulative column of data of say the salary of a number of employees. And your data looked like the following:




But you wanted an output like this



The solution would entail accessing the value of the previous row on the current row and summing them together and repeating that all the way down. Unfortunately, we don't have analytic functions (just simple aggregation functions) in Talend DI. Because data manipulation is typically done in a row-by-row fashion in data integration jobs, it's a bit tricky to implement a solution that gives you access to the value on the previous row or any number of rows as an offset (essentially the logic embedded in the LAG function). The following is a simple solution for doing just that. It's implemented in a tJavaRow, but the same logic can be implemented in a tMap. If the number of offsets is other than 1 (previous row only), we could leverage the Numeric.Sequence function to create a sequence that we can use to determine how many records to offset by. We sort the fields by Fname to make sure they're consecutive and the logic works right.

Read 12865 times Last modified on Monday, 27 April 2015 20:31
Will Munji

Will Munji is a seasoned data integration, data warehousing and business intelligence (BI) architect & developer who has been working in the DW/BI space for a while. He got his start in BI working on Brio SQR (later Hyperion SQR) and the Crystal Decisions stack (Reports, Analysis & Enterprise) and SAP BusinessObjects / Microsoft BI stacks. He currently focuses on Talend Data Management Suite, Hadoop, SAP BusinessObjects BI stack as well as Jaspersoft and Tableau. He has consulted for many organizations across a variety of industries including healthcare, manufacturing, retail, insurance and banking. At Kindle Consulting, Will delivers DW/BI/Data Integration solutions that range from front-end BI development (dashboards, reports, cube development, T-SQL/ PL/SQL ...) to data services (ETL/ DI development), data warehouse architecture and development, data integration to BI Architecture design and deployment.

Leave a comment

Make sure you enter all the required information, indicated by an asterisk (*). HTML code is not allowed.

Contact Us

Kindle Consulting, Inc

6595 Roswell Road 
Suite G2025 
Atlanta, GA 30328

Phone: 404.551.5494
Fax: 404.551.5452
Email: info@kindleconsulting.com

Talend Gold Partner