> For the complete documentation index, see [llms.txt](https://docs.ledge.co/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://docs.ledge.co/sources/overview.md).

# Overview

&#x20;Source refers to any system, provider, or platform from which Ledge retrieves data. Sources can be public financial institutions (such as banks or Payment Service Providers (PSPs)), ERPs, or custom, proprietary sources. Each source can contain multiple datasets, which are distinct collections of data. For example, a source might provide separate datasets for transaction reports, chargeback reports, account balances etc.

#### Basic Components

* Datasets:\
  Each source may have multiple datasets. These datasets may follow the same schema or have distinct structures.
* Schema:\
  A schema defines the structure and format of the dataset. Multiple datasets within a source may share the same schema (e.g. transactions from different accounts).
* Data Types:\
  Datasets can be categorized into one of the following types:
  * Transaction: Includes fields such as date, amount, description (memo) and other supporting details.
  * Balance: Reflects the balance of an account or similar summary information.
  * Other: Used for datasets that do not fit into the Transaction or Balance categories, and are used for enrichment, or for deeper integrations such as ERPs.

For more information on the specific structure of each data type, see the Ingestion section.

<figure><img src="/files/zhkvff6DP8tZfer0jUIn" alt=""><figcaption></figcaption></figure>

#### **Data Ingestion**

Sources fetch data periodically to ensure up-to-date information. The exact schedule may vary depending on the type of source and its configuration. Data fetched is ingested and processed to enforce data types, handle empty/null values, and ensure data accuracy and completeness.

Data is transformed to fit the source’s schema across various data types, including timestamps & timezones (all timestamps are stored in UTC), monetary values (i.e. amount and a currency), string pattern matching etc. This may involve restructuring, joining, or pivoting data as needed.

Transaction records are version-controlled, and during the ingestion process duplicate records either get dropped or replace the previous version to maintain data quality. What constitutes a duplicate or a version is configurable.

In addition to collecting raw data from the source, Ledge makes it possible to enrich Source data using additional (potentially 3rd party) datasets. Enrichment is typically performed using a unique identifier (such as a transaction ID, account ID, or user identifier) to lookup and merge external data with the source data.


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.ledge.co/sources/overview.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
