Pre-ETL Design

The Challenge

As well planned and consistent as the data integration process is, there are still gaps in the handoff between data discovery, mapping and data movement. Before ETL programmers can create data movement workflows, they must intimately understand the data they work with. Subject matter experts provide the much needed understanding of the business vocabulary and rules that govern how ETL programmers will ultimately design and execute ETL procesess.

The information conveyed from subject matter experts to ETL developers is traditionally done through document-centric means such as Excel spreadsheets or Word documents. There are a number of shortcomings with this approach:

  • All mapping information needs to be re-keyed from a document into an ETL tool
  • There are no standard ways to represent business rules in a document
  • Documented business rules can't be leveraged directly by an ETL tool
  • ETL Developers have no way of visualizing the workflow intended by a subject matter expert

  • Since document-centric approaches are not robust enough to fully capture the intended meanings of data mappings and business rules, ETL developers spend as much time understanding the subject material as they do developing ETL code flows.

    An approach that can better utilize the knowledge that subject matter experts provide can help reduce the amount of time ETL developers spend in understanding the data that they need to integrate.

    The Solution

    Sypherlink enables organizations to bridge the gap between the data discovey, mapping and ETL processes. Sypherlink Harvester not only automates the discovery and mapping process, but provides facilities to allow subject matter experts to more fully document intended data integration flows.

    Sypherlink provides this facility to allow pre-ETL work to be performed by the individuals that know the most about their data, but do not necessarily have an ETL development background. Harvester's Data Flow Designer allows subject matter experts to draw the intended data flows using a platform neutral diagramming tool, then provides facilities to export these flows into actual ETL code for a number of popular data integration packages.

    By letting subject matter experts map and diagram the data flows, the work of an ETL developer to understand the underlying data is reduced so that more focus can be spent on core integration tasks such as data cleansing.