Data Ingestion
Ledge’s Data Ingestion process enables users to seamlessly pull raw data from multiple sources, parse it, and make it available for downstream processes. The ingestion process is designed to be highly flexible, supporting a wide range of data sources, schemas, and transformation options, all while ensuring data accuracy & security.
Setting Up Sources
Data ingestion begins by configuring sources on the Sources Page. Here, users can add, configure, and manage new data sources.
Adding a New Source
Navigate to the Sources Page.
Click on Add New Source.
Select the source type (e.g., file, API, database, etc.).
Configure connection details and provide the necessary credentials.
Define ingestion schedules, schema, and any necessary transformations.
Once the source is added, Ledge will automatically fetch and ingest data on a scheduled cadence.
How Data Is Fetched
Data fetching is driven by the Fetching Details of each source. This configuration defines how Ledge connects to the source, how often it pulls data, and how the system authenticates access.
Key Components of Fetching Details
Connection Types
Ledge supports a variety of connection methods, including:
File-based: SFTP, Google Drive, OneDrive, etc.
API-based: REST APIs, JDBC, and other web-based connections.
Data Warehouses: Snowflake, Redshift, BigQuery, and other warehouse platforms.
Manual Uploads: Direct file uploads via the Ledge interface.
Cadence
Ingestion can be scheduled at intervals as short as 10 minutes.
Customizable schedules allow users to control how frequently Ledge fetches updates.
Credentials
Different connection types require specific credentials (e.g., API keys, SSH keys, OAuth tokens, etc.).
Ledge securely stores all credentials using industry-standard security protocols.
How Data Is Parsed
Once the data is fetched, it is parsed into a standardized format using the Source Schema. The schema defines how raw data is transformed into usable, structured information.
Schema Details
Supported Schemas - Ledge supports industry-standard schemas such as:
NACHA, MT940/942, BAI2 for banking and financial services.
Schemas for data from all major payment processors, banks, and ERPs.
Arbitrary Schemas
For non-standard data formats, Ledge allows users to upload files (CSV, Excel, JSON) and define a custom schema.
Users can define custom field mappings, transformation logic, and data-type specifications.
Uniqueness and Deduplication - to ensure data integrity, Ledge automatically deduplicates incoming data. This process has two components:
Unique Identifier: Ledge generates a global unique identifier for each record. If two records have the same identifier, Ledge overrides the original record instead of duplicating it.
Line-Revisions: Optional. For certain use cases, Ledge can track multiple versions (revisions) of a record using an “updated-timestamp” field. Best practice Pick subsets of key columns to accurately determine uniqueness and revisioning and test the configuration across different time periods to ensure proper deduplication.
Data Transformation
Data transformation enables users to clean, standardize, and enrich incoming data. Ledge processes all ingested data through a multi-phase transformation pipeline.
Available Transformations
Monetary Transformations
Parses and standardizes currency data.
Stores monetary values in the smallest currency unit and supports fractional values.
Date Handling
Supports ISO-8601 or custom date formats.
Converts timezones, truncates date-time formats, and identifies holidays.
Stores all date-time values in UTC.
Basic Data Type Parsing - parses common data types, such as dates, numbers, arrays, and booleans (true/false values).
Closed-Set Identification - detects fields with a limited set of possible values, such as drop-down selections or status fields.
Math Operations - supports operations like add, subtract, divide, multiply, and round.
String Operations - supports string transformations, such as case changes, find/replace, split/join, and substring extraction.
Handling Null Values - offers options to skip, replace, or drop null values.
Lookup and Data Enrichment
Ledge enables users to enrich incoming data using external datasets, whether proprietary or third-party data sources. This process enhances the quality and completeness of the ingested data.
How Lookup and Enrichment Work
Enrichment typically uses a unique identifier (e.g. Transaction ID, Account ID, or User ID) to match incoming records to external data. Examples include:
Currency Enrichment: Convert financial data into the customer’s preferred currency.
Geographic Enrichment: Add country, region, or location data.
Metadata Enrichment: Append custom fields, such as user attributes, categories, or product details.
Monitoring and Status
The Sources Page provides real-time visibility into ingestion status and performance for each source.
Monitoring Dashboard
Connection Status
Displays if the source connection is active, failed, or disconnected.
If the connection fails, users can see the error message for troubleshooting.
Last Connection
Shows the date and time of the last successful connection for each source.
Helps users ensure that no connection issues are preventing data from being ingested.
Data Volume
Displays how many rows of data were fetched each day.
Helps users track trends, identify anomalies, and debug missing or delayed data.
Security and Privacy
Ledge follows best-in-class security practices to protect user data and credentials.
Credential Management
All credentials (e.g., API keys, SSH keys) are encrypted and stored in a secure vault.
Ledge complies with SOC 1 and SOC 2 security standards.
PII Handling
Ledge aims to reduce the risk of handling PII (Personally Identifiable Information).
Users are encouraged to avoid exposing PII to Ledge whenever possible.
For PII data that must be processed, Ledge ensures compliance with industry security standards.
Summary
Ledge’s Data Ingestion system provides a robust, secure, and highly customizable pipeline for ingesting and transforming data. With support for industry-standard schemas, automated deduplication, and built-in transformations, users can process and enrich their data efficiently. The Sources Page serves as a command center, offering real-time visibility into connection status, data volume, and last connection timestamps. By following best practices, transformations, and uniqueness, users can ensure smooth and secure ingestion at all times.
If you have any questions or need help configuring a data source, please reach out to Ledge Support.
Last updated