Talend DI

Talend DI (24)

For the instances where for whatever reason, a Talend job does not always connect to a backend service layer - database, web service, ftp, salesforce, dropbox - on the first try, the job can be modified to retry the connection before failing the job. This solution shows a simple design to handle intermittent connection failures.

Every once in a while, one runs into a situation that one can not connect to a database from a tool or application. When this happens, the best way to isolate the issue is to try to connect to the same database using a quick java application using standard JDBC.

Starting with Talend 5.6.2, it is now possble to create metadata connections for NoSQL databases and Hadoop platforms using the metadata feature in the Studio. Even better, the Studio now allows automatic discovery of these properties using Hadoop properties site-*.xml files.

Out of the box, Talend uses the open source jTDS driver to connect to MS SQL Server databases. This driver however does not support connecting to an AlwaysOn enabled database. A generic jdbc driver would have to be used as a work-around.

Starting with Talend 5.6.1, a patch was released by Talend to update Hive components to be able to connect to Hadoop Clusters configured for HA - High Availability. In HA, instead of configuring the Hive host in the component and a standard port, we now instead specify the Zookeeper quorum that in turn 'discovers' the active Hive Host.

'Not implemented by the DistributedFileSystem FileSystem implementation' error occassionally rears its head when debugging Talend Big Data jobs. This is a cryptic message that actually intends to convey that your job includes JARs from different versions of Hadoop in its classpath.

Data warehousing and ETL processes usually repeat common patterns across different data domains (databases, tables, subject areas etc...). One such pattern is copying data from a transactional system to Hadoop or some other data platform (Teradata, Oracle DBs) to create 'images' of those systems for downstream processing. Because these processes are repeated many times over in the design & construction of data warehouses, it is best to create repeatable patterns that reduce future technical debt in terms of support, maintenance and updates costs. 

Wednesday, 30 September 2015 19:51

Introducing Talend 6.0

Talend 6 was released in September 2015 and with it come a number of new and important features and updates, including product name changes. 

Talend Hive components have a number of somewhat confusing options that could be tricky to understand when making connections to a Hadoop cluster. Options include selecting between HiveServer1 and HiveServer2, Embedded vs. Standalone modes, and what ports to connect to. We explore the options in this post, pulling in information from the Hive Wiki, Talend Support and other sources.

Page 1 of 2

Contact Us

Kindle Consulting, Inc

6595 Roswell Road 
Suite G2025 
Atlanta, GA 30328

Phone: 404.551.5494
Fax: 404.551.5452
Email: info@kindleconsulting.com

Talend Gold Partner