Hey, I’m Mike Kuykendall — 20+ years as a software engineer, former USAF Staff Sergeant, dad of two, and now the guy behind Delta Zero Labs (@_MikeKuykendall on X).
For years I lived the RPC life. Want every ERC20 transfer for a whale wallet since 2020? Fire up a script that loops eth_getLogs with 10,000-block ranges, handle rate limits, retry on 429s, paginate, dedupe, decode the ABI yourself, store it somewhere… rinse and repeat for DEX swaps, oracle ticks, liquidations.
One weekend project turned into a $400/month Alchemy bill and a 14 TB archive node eating my SSD. I swore there had to be a better way.
There is.
I built (and now sell) production-grade, fully decoded, genesis-to-tip Parquet datasets for Ethereum mainnet, BSC, and Sepolia. One download. One file (or clean partitioned set). Zero RPCs forever. Query 7.77 billion BSC events or 334 million Sepolia events in seconds on your laptop with DuckDB or Polars.
This isn’t another indexer or subgraph. This is the data you would have extracted yourself — but already done, decoded, classified by signal_type, compressed, and ready to own forever.
In this post I’m giving you the exact hand-holding walkthrough I wish I had two years ago. By the end you’ll know precisely how to:
Let’s kill the RPC tax together.
Quick reality check (you already know this, but let’s quantify the pain):
I got tired of it. So I wrote a zero-RPC extraction engine (patent-pending Fused Semantic Execution — FSE). It reads raw chain data once, decodes every event into clean columns, tags it with signal_type, and dumps it straight to Parquet.
Result? Datasets like:
All delivered as Parquet. Columnar, insanely compressed (5–10× smaller than CSV), predicate pushdown ready, works with every modern data tool.
Every row is one decoded event. Here are the core columns you’ll use every day (full 19-column spec is in the download docs):
One row = one crystal-clear, analysis-ready record. No ABI decoding in your code ever again.
Go to the official sample:
https://huggingface.co/datasets/MikeKuykendall/ethereum-signals-sample
Or Kaggle mirror: https://www.kaggle.com/datasets/mikekuykendall/ethereum-onchain-signals
Download the Parquet file (~5–10 MB, 10,000 stratified rows covering every signal_type).
I recommend DuckDB — it’s magical for this.
pip install duckdb pandas pyarrow
# or just brew install duckdb on Mac
Open a Jupyter notebook or just the DuckDB CLI.
Python + Pandas (for small exploration)
import pandas as pd
df = pd.read_parquet("ethereum_signals_sample.parquet")
print(df.shape) # (10000, 19)
print(df['signal_type'].value_counts())
print(df.head())
DuckDB SQL (this is where the magic happens — zero memory for huge files)
-- Launch DuckDB
duckdb
-- Attach the sample
SELECT * FROM read_parquet('ethereum_signals_sample.parquet') LIMIT 10;
“How many ERC20 transfers has this address done?”
SQL
SELECT COUNT(*) as transfers,
SUM(amount) as total_volume
FROM read_parquet('your_full_dataset.parquet')
WHERE from_address = '0x1234...'
OR to_address = '0x1234...'
AND signal_type = 'ERC20_Transfer';
“Top 10 tokens by transfer count in 2024”
SQL
SELECT contract_address,
COUNT(*) as tx_count
FROM read_parquet('your_full_dataset.parquet')
WHERE signal_type = 'ERC20_Transfer'
AND timestamp >= 1704067200 -- Jan 1 2024
AND timestamp < 1735689600 -- Jan 1 2025
GROUP BY contract_address
ORDER BY tx_count DESC
LIMIT 10;
“All Uniswap V3 swaps for a specific pool, with price impact”
SQL
SELECT timestamp,
amount0,
amount1,
sqrtPriceX96,
(amount1::double / NULLIF(amount0,0)) as price_impact
FROM read_parquet('full_dataset.parquet')
WHERE signal_type = 'UniswapV3_Swap'
AND contract_address = '0x88e6a0c2ddd26feeb64f039a2c41296fcb3f5640' -- USDC/ETH 0.05%
ORDER BY block_number DESC
LIMIT 1000;
“Chainlink price ticks for ETH/USD over time” (perfect for backtesting)
SQL
SELECT
date_trunc('day', to_timestamp(timestamp)) as day,
AVG(price) as avg_eth_price
FROM read_parquet('full_dataset.parquet')
WHERE signal_type = 'Chainlink_PriceUpdate'
AND contract_address = '0x5f4eC3Df9cbd43714FE2740f5E3616155c5b8419' -- ETH/USD
GROUP BY day
ORDER BY day;
Pro tip: DuckDB can read partitioned folders too:
SQL
SELECT COUNT(*) FROM read_parquet('mainnet_parquets/*.parquet');
It automatically uses predicate pushdown — filtering on block_number or signal_type skips 99% of the data on disk. You’ll query billions of rows faster than most people can scroll Twitter.
Storage note: The BSC full set is big but compressible and runs fine on a 64 GB RAM machine. For bigger workflows just spin up a cheap Hetzner box with 128 GB RAM and DuckDB still smokes cloud warehouses on cost.
One buyer told me: “I cancelled my $1,200/month RPC plan the same day the dataset landed.”
ApproachCostSpeed for 3yr HistoryMaintenanceOwnershipRaw RPC loops$200–2000/moHours–daysConstantYou rebuildSubgraphs/The GraphFree–paidFast but incompleteProvider riskNoCryo self-extractYour time + nodeDays to extractOngoingYesDelta Zero ParquetOne-time $999+SecondsZeroForever
First 1,000 buyers on mainnet/BSC get 25% off with code EARLY25 at checkout.
Questions? DM me on X @_MikeKuykendall or join the Telegram channel linked on the site. I answer every single one — this is a veteran-owned solo operation and I actually care that you win with the data.
Stop paying rent for data that should be yours.
Download the sample. Run the queries. Then never call another eth_getLogs again.
See you on-chain (offline).
— Mike Kuykendall Delta Zero Labs P.S. The next chain drop is coming soon. Want early access + custom signal types? DM me.
I Ditched RPC Hell for Good: Your Complete Genesis-to-Tip Parquet Handbook for Lightning-Fast… was originally published in Coinmonks on Medium, where people are continuing the conversation by highlighting and responding to this story.


