Data Modeling Approach

Joining industry standards with innovative, user-first architectures

Our Data Modeling Approach

If it happens on-chain, we've got it for 26+ blockchains and protocols.

At a high level for a blockchain you'll see 3 - 5 schemas:

  • Core

  • DeFi

  • Gov

  • NFT

  • Price

and some custom curations for some large protocols like AAVE.

The goal is to align ecosystems to a consistent pattern, e.g., <chain>.<core>.fact_transactions

The STAR schema classifies tables as either dimension or fact tables.

Fact tables

Fact tables store observations or events, and can be blocks, transactions, transfers, logs, etc.

A transfer transaction would have facts like the transaction hash, the token being transferred, and the amount. It would not have the reason a transfer is occurring, e.g., that the transfer is actually part of a trade. But it would have the recipient address, which contains clues (it may be a liquidity pool address, for example).

Dimension tables

Dimension ("dim") tables describe entities — the things you analyze.

Entities can include labels, prices, decimals, tags, etc. The liquidity pool the swap uses involves smart contracts with dimensions like the platform (e.g., Uniswap), the fee the pool charges (e.g., 0.3%), and details about the tokens in the pool (e.g., AAVE and ETH).

The key here is that:

  • Facts support summarization ("what is the amount of ETH transferred to the pool address in this transaction?")

  • Dimensions support filtering and grouping ("Which pool address is which DEX platform?")

EZ tables

We also create new curated tables we call "EZ".

  • EZ combine Facts and Dimensions to make easy to filter and aggregate tables for common insights ("What is the Estimated USD Volume of swaps involving Wrapped Ether across Uniswap v3 vs Curve over the last 30 days?")

SELECT 
     platform,
     sum(amount_in_usd) as usd_volume
FROM ethereum.defi.ez_dex_swaps 
WHERE platform IN ('uniswap-v3', 'curve')
     -- Wrapped Ether
 AND (token_in = '0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2'
     OR 
     token_out = '0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2'
     )
 AND block_timestamp >= current_date - 30
GROUP BY platform
ORDER BY usd_volume DESC;

Last updated