Ledge Data Ingestion Overview

Introduction

This guide will provide you with a basic understanding of how our platform handles data ingestion, processing, and fetching from various sources. Ledge's data ingestion is built for flexibility, supporting various file formats, data transfer options, and either live stream or batch processing capabilities.

The diagram below presents an overview of the data flow within the Ledge platform: originating from various providers, data is ingested and subsequently made accessible to applications built upon it.

Data Ingestion

Ingestion from Standard Sources

Ledge's data platform can seamlessly connect to a wide variety of standard sources, e.g. JP Morgan, Wells Fargo, Stripe, PayPal, NetSuite and more (contact your account manager for more options). Depending on the data provider, Ledge integrates directly (normally via API or SFTP), streamlining the setup process. To set up a standard source, visit the sources page.

Ingestion from Arbitrary Sources

In many reconciliation scenarios, companies may choose to reconcile a payment against an arbitrary data source. Ledge supports a variety of file formats for arbitrary data sources, including CSV, JSON, BAI2, NACHA, MT940, QBO and more (contact your account manager for more options).

Setting up data ingestion from arbitrary sources requires some additional setup.

  • Schema Definition: Customers need to define the schema of the dataset they wish to ingest, providing structure to the data. See #Parsing below for more details.

  • Uniqueness Calculation: Ledge also requires information on how to calculate uniqueness for lines of data to optimize processing.

  • Locale information: Typical for datasets that deal with cross-border payments, information on currency and timezone is also required

Data from arbitrary sources can be made available in three ways:

  1. Push, such as webhooks from an application to Ledge

  2. Pull, such as a data lake (e.g. BigQuery, Snowflake), or a database (e.g. SQL)

  3. File, which can also be uploaded directly to the platform. This is not recommended as a long-term solution.

Periodic Data Transfer

For data sources set up as pull data sources, fetching data often involves setting up periodic data transfers. Ledge offers flexibility in connecting to various file sharing options.

Connection Options: Ledge provides multiple options for data transfer, with SFTP (Secure File Transfer Protocol) being one of the most straightforward choices. Customers can opt for Ledge's SFTP server or choose to connect Ledge to their own SFTP server for data transfer.

Data Transformation

The Ledge Data Ingestion module is a powerful multi-stage ETL engine designed to streamline the process of data ingestion and transformation. This module serves as a fundamental component of data processing pipelines, enabling users to efficiently handle diverse data sources with ease.

Raw data from all sources is processed to remove or flag errors, inconsistencies, or missing values. Data cleaning involves processes like data validation, data type conversion, and handling of null values to ensure data accuracy and completeness. The following operations take place as part of the

  • Data Enrichment: Data may be enriched by merging it with data from other sources or by adding calculated fields to enhance its quality and usefulness.

  • Data Aggregation: Data can be aggregated to various levels (e.g., daily, monthly) to facilitate reporting and analysis.

  • Data Validation: Ensuring data integrity and adherence to business processes. Data validation checks are applied to identify and handle data quality issues, and flag any issues.

  • Data Transformation: Data is transformed to fit the source’s schema across various data types, including timestamps & timezones, amounts, currencies, substrings etc. This may involve restructuring, joining, or pivoting data as needed.

  • Data Deduplication: Duplicate records are often removed or merged during the transformation process to maintain data quality.

  • Categorization & Labeling: Data runs through a rules engine to categorize and label it

Additionally, the Ledge Data Ingestion module offers advanced features such as data flattening (i.e. generate multiple lines from a single record) and version control for records, enabling users to maintain historical records of data changes, audit trails and compliance tracking.

Batch vs. Stream Processing

Stream Processing

With data sources set up as streams, Ledge consumes data as it becomes available and processes it immediately, to make it available for the customer in real-time. Typically, data sources set up as streams are provided on a per-transaction basis, through webhooks into Ledge.

Batch Processing

Ledge can also receive data in the form of batches, such as daily bank statements, or reports. While this option is very prevalent in the financial industry, it makes intraday processing less feasible. To enable near-real-time processing, Ledge recommends setting batches up as micro-batches, and supports fetching batched data up to every minute.

Last updated