On-Chain Data Infrastructure

Every transaction on a public blockchain is permanently recorded and freely readable. This creates a unique situation in financial markets: complete trade-level transparency. In TradFi, getting tick-level exchange data requires expensive subscriptions (Bloomberg, Refinitiv, direct exchange feeds). In DeFi, it’s all public — but raw blockchain data is nearly unusable without significant processing.

The Data Pipeline

The architecture mirrors a CDC (Change Data Capture) pipeline that any data engineer would recognize:

Blockchain nodes (source of truth)
  → Indexers (decode raw bytes into structured events)
    → Columnar store (Parquet, DuckDB, Trino)
      → SQL query engine
        → Dashboards / APIs

Each blockchain transaction contains:

Sender, receiver, program (smart contract) addresses
Instruction data (encoded function calls and parameters)
Logs / events emitted by the smart contract during execution
State changes (account balance diffs)

The raw format is compact binary. Indexers decode this using the smart contract’s IDL (Interface Definition Language) — analogous to deserializing Protobuf or Avro with a schema.

Key Platforms

DeFiLlama

An open-source DeFi analytics aggregator tracking TVL, revenue, fees, and volumes across chains and protocols.

Architecture: Per-protocol adapters (open source on GitHub) define how to compute metrics from on-chain data. Adapters run on a schedule, feeding a time-series database exposed via REST API.
Strengths: Standardized cross-protocol comparison, open methodology, good for aggregate trends.
Limitations: Adapter quality varies. Metrics definitions (what counts as “revenue” vs. “fees”) can differ from protocol self-reporting.
URL pattern: defillama.com/protocol/<name>

Dune Analytics

A SQL query engine over decoded blockchain data. Think of it as BigQuery for blockchains.

Architecture: Raw chain data is decoded into structured tables using contract ABIs/IDLs. Tables live in a columnar warehouse (DuckDB/Trino). Users write SQL queries and compose them into dashboards.
Strengths: Full SQL expressiveness. Anyone can write queries — and critically, anyone can read the query behind any chart. This makes analysis reproducible and auditable.
Limitations: Query quality depends on the author. Popular dashboards are community-vetted but not formally audited. Schema changes when Dune updates their data models can break queries.
URL pattern: dune.com/<author>/<dashboard-name>

Other Notable Platforms

Flipside Crypto — similar to Dune, different query engine
The Graph — decentralized indexing protocol (subgraphs)
Helius / Shyft — Solana-specific APIs with decoded transaction data
Nansen — adds wallet labeling (identify which wallets belong to known entities, funds, MEV bots)

How to Read a Dune Dashboard

When you encounter a Dune dashboard (e.g., the adam_tehc pump.fun dashboards), here’s what to look for:

Click “Edit query” on any chart to see the SQL — this tells you exactly what’s being measured and how
Check the FROM tables — pumpfun.trades, solana.transactions, etc. These are Dune’s decoded tables
Check date filters — dashboards may show all-time, last 30 days, or custom ranges
Check for hardcoded addresses — queries often filter by specific program addresses (pump.fun’s Solana program ID)
Note the last refresh time — Dune queries are not real-time; they run on a schedule or on-demand

Citing On-Chain Data

When referencing on-chain data in articles or analysis:

Link to the specific dashboard or query
State the access date (data changes continuously)
Note whether the source is a community dashboard (Dune) or an open-source adapter (DeFiLlama)
For critical claims, cross-reference multiple sources
Prefer linking to the query itself over the chart — queries are the “source code” of the analysis

Connection to Engineering Experience

If you’ve built event-driven systems with Kafka, Spark, and columnar stores, the on-chain data stack is immediately familiar. The main differences:

Immutability is absolute — blockchain data cannot be corrected or backfilled (unlike a Kafka topic with compaction)
Schema is encoded in smart contracts — the IDL is the schema, and it’s immutable once deployed (though programs can be upgraded on Solana)
Every “event” is a financial transaction — there’s no separation between the event log and the ledger
Latency is block-time — on Solana, ~400ms; on Ethereum, ~12 seconds

Companion Resources

DeFiLlama pump.fun protocol page
Dune dashboards by adam_tehc: pump.fun overview, creator earnings, PumpSwap metrics

Questions to consider: How would you design an alerting system on top of this data? What’s the equivalent of a “materialized view” in the on-chain data world? If you were building a credit scoring system for DeFi wallets, what on-chain signals would you use?

Learning Finance

Explorer

On-Chain Data Infrastructure

On-Chain Data Infrastructure

The Data Pipeline

Key Platforms

DeFiLlama

Dune Analytics

Other Notable Platforms

How to Read a Dune Dashboard

Citing On-Chain Data

Connection to Engineering Experience

Companion Resources

Graph View

Table of Contents

Backlinks