On-Chain Data Infrastructure

Every transaction on a public blockchain is permanently recorded and freely readable. This creates a unique situation in financial markets: complete trade-level transparency. In TradFi, getting tick-level exchange data requires expensive subscriptions (Bloomberg, Refinitiv, direct exchange feeds). In DeFi, it’s all public — but raw blockchain data is nearly unusable without significant processing.

The Data Pipeline

The architecture mirrors a CDC (Change Data Capture) pipeline that any data engineer would recognize:

Blockchain nodes (source of truth)
  → Indexers (decode raw bytes into structured events)
    → Columnar store (Parquet, DuckDB, Trino)
      → SQL query engine
        → Dashboards / APIs

Each blockchain transaction contains:

  • Sender, receiver, program (smart contract) addresses
  • Instruction data (encoded function calls and parameters)
  • Logs / events emitted by the smart contract during execution
  • State changes (account balance diffs)

The raw format is compact binary. Indexers decode this using the smart contract’s IDL (Interface Definition Language) — analogous to deserializing Protobuf or Avro with a schema.

Key Platforms

DeFiLlama

An open-source DeFi analytics aggregator tracking TVL, revenue, fees, and volumes across chains and protocols.

  • Architecture: Per-protocol adapters (open source on GitHub) define how to compute metrics from on-chain data. Adapters run on a schedule, feeding a time-series database exposed via REST API.
  • Strengths: Standardized cross-protocol comparison, open methodology, good for aggregate trends.
  • Limitations: Adapter quality varies. Metrics definitions (what counts as “revenue” vs. “fees”) can differ from protocol self-reporting.
  • URL pattern: defillama.com/protocol/<name>

Dune Analytics

A SQL query engine over decoded blockchain data. Think of it as BigQuery for blockchains.

  • Architecture: Raw chain data is decoded into structured tables using contract ABIs/IDLs. Tables live in a columnar warehouse (DuckDB/Trino). Users write SQL queries and compose them into dashboards.
  • Strengths: Full SQL expressiveness. Anyone can write queries — and critically, anyone can read the query behind any chart. This makes analysis reproducible and auditable.
  • Limitations: Query quality depends on the author. Popular dashboards are community-vetted but not formally audited. Schema changes when Dune updates their data models can break queries.
  • URL pattern: dune.com/<author>/<dashboard-name>

Other Notable Platforms

  • Flipside Crypto — similar to Dune, different query engine
  • The Graph — decentralized indexing protocol (subgraphs)
  • Helius / Shyft — Solana-specific APIs with decoded transaction data
  • Nansen — adds wallet labeling (identify which wallets belong to known entities, funds, MEV bots)

How to Read a Dune Dashboard

When you encounter a Dune dashboard (e.g., the adam_tehc pump.fun dashboards), here’s what to look for:

  1. Click “Edit query” on any chart to see the SQL — this tells you exactly what’s being measured and how
  2. Check the FROM tablespumpfun.trades, solana.transactions, etc. These are Dune’s decoded tables
  3. Check date filters — dashboards may show all-time, last 30 days, or custom ranges
  4. Check for hardcoded addresses — queries often filter by specific program addresses (pump.fun’s Solana program ID)
  5. Note the last refresh time — Dune queries are not real-time; they run on a schedule or on-demand

Citing On-Chain Data

When referencing on-chain data in articles or analysis:

  • Link to the specific dashboard or query
  • State the access date (data changes continuously)
  • Note whether the source is a community dashboard (Dune) or an open-source adapter (DeFiLlama)
  • For critical claims, cross-reference multiple sources
  • Prefer linking to the query itself over the chart — queries are the “source code” of the analysis

Connection to Engineering Experience

If you’ve built event-driven systems with Kafka, Spark, and columnar stores, the on-chain data stack is immediately familiar. The main differences:

  • Immutability is absolute — blockchain data cannot be corrected or backfilled (unlike a Kafka topic with compaction)
  • Schema is encoded in smart contracts — the IDL is the schema, and it’s immutable once deployed (though programs can be upgraded on Solana)
  • Every “event” is a financial transaction — there’s no separation between the event log and the ledger
  • Latency is block-time — on Solana, ~400ms; on Ethereum, ~12 seconds

Companion Resources

  • DeFiLlama pump.fun protocol page
  • Dune dashboards by adam_tehc: pump.fun overview, creator earnings, PumpSwap metrics

Questions to consider: How would you design an alerting system on top of this data? What’s the equivalent of a “materialized view” in the on-chain data world? If you were building a credit scoring system for DeFi wallets, what on-chain signals would you use?