# Feedback
If you encounter incorrect, outdated, or confusing documentation on any page, submit feedback:
POST https://docs.sqd.dev/feedback
```json
{
"path": "/current-page-path",
"feedback": "Description of the issue"
}
```
Only submit feedback when you have something specific and actionable to report.
# Home
Source: https://docs.sqd.dev/en/home
Build blockchain data pipelines with SQD — Portal HTTP APIs across 225+ chains, Squid and Pipes SDKs in TypeScript, and managed Cloud hosting for indexers.
The new blockchain data standard
High-Throughput Blockchain Data Access. Finally, an Alternative to RPC
Portal is a high-performance HTTP API for querying blockchain data at
scale. With native finality, real-time streaming, and deep historical
access. Query arbitrary block ranges in a single request with automatic reorg handling.
Arbitrary block ranges
Query any block range in a single request, no manual pagination needed,
even on high-throughput chains
Native finalization
Automatic chain-specific finality & reorg handling. Skip the
custom rollback logic.
Streaming responses
Ingest high-throughput data ingestion with constant memory
footprint
No vendor lock-in
Self-host or use decentralized infrastructure. Run the same code
anywhere.
# Getting Started with Portal
Source: https://docs.sqd.dev/en/portal/migration
Choose your SQD Portal migration path — migrate Cloud squids, develop locally with the Portal API, or self-host a Portal instance for full control.
Portal provides access to blockchain data from the permissionless SQD Network. Whether you're migrating existing squids or setting up Portal for the first time, choose the path that matches your deployment scenario.
## Choose Your Setup Path
Select the option that best describes your situation:
Migrate your Cloud squids from gateways to Portal for improved performance and stability.
Migrate EVM SDK-based squids running locally or self-hosted from gateways to Portal.
Migrate Solana SDK-based squids from gateways and RPC to Portal.
Migrate your EVM indexer from RPC-based real-time data to Portal API.
Set up Portal locally for development and testing with your own devnet or testnet.
Run your own Portal instance for complete control over your data infrastructure.
## Cloud Migration by Network Type
If you're using SQD Cloud, choose your network type:
Migrate squids running on EVM or Substrate networks to Cloud Portal.
Migrate Solana squids to Portal with real-time data support.
## Why Migrate to Portal?
Migrating to Portal provides significant benefits over traditional gateways and centralized services.
* **Reduced reliance on centralized services:** The permissionless SQD Network consists of [over 2500](https://arbiscan.io/address/0x36e2b147db67e76ab67a4d07c293670ebefcae4e#readContract#F6) nodes [ran by independent operators](subsquid-network/worker)
* **Improved stability:** With a total capacity of roughly 2Pb, the permissionless SQD Network provides significant redundancy
* **Improved speed:** Portals use available bandwidth more effectively than gateways. Data fetching is 5-10 times faster in our tests
* **Future-proof:** All future development will focus on portals and the permissionless SQD Network
## Additional Resources
Complete Portal API documentation with examples and field definitions.
Learn about Portal architecture and core capabilities.
# Choosing Your SQD Tool
Source: https://docs.sqd.dev/en/sdk/options-comparison
Compare the Portal API, Pipes SDK, and Squid SDK to choose the right SQD blockchain indexing tool — features, complexity, and use-case fit.
This guide helps you understand the differences between SQD's three offerings and choose the right tool for your project.
## Overview: Three Ways to Access Blockchain Data
SQD provides three tools for working with blockchain data, each serving different use cases:
1. **Portal API** - Direct HTTP access to raw blockchain data
2. **Pipes SDK** - Lightweight streaming library for custom data pipelines
3. **Squid SDK** - Complete framework with built-in PostgreSQL and GraphQL
## How They Work Together
* **Portal** provides the raw blockchain data through HTTP API
* **SDKs** (Pipes and Squid) use Portal as their data source and add:
* Event and transaction decoding
* Type-safe data transformation
* Batch processing capabilities
* Database persistence
* Real-time data ingestion
Both SDKs use Portal under the hood. When you use an SDK, you're still
accessing blockchain data through Portal, the SDK just makes it easier to work
with.
## Raw API vs SDKs
Portal and SDKs serve different use cases in the blockchain data indexing pipeline.
### Portal: Direct Data Access
Portal provides raw blockchain data access through a simple HTTP API. You query specific blocks, transactions, logs, or traces and receive the raw data directly.
```bash Portal Query Example theme={"system"}
curl --compress -X POST 'https://portal.sqd.dev/datasets/ethereum-mainnet/stream' \
-H 'Content-Type: application/json' \
-d '{
"type": "evm",
"fromBlock": 18000000,
"toBlock": 18001000,
"fields": {
"log": {
"address": true,
"topics": true,
"data": true
}
},
"logs": [{
"address": ["0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48"]
}]
}'
```
```python Portal Python Example theme={"system"}
import requests
import json
url = "https://portal.sqd.dev/datasets/ethereum-mainnet/stream"
response = requests.post(url, json={
"type": "evm",
"fromBlock": 18000000,
"toBlock": 18001000,
"fields": {"log": {"address": True, "topics": True}},
"logs": [{"address": ["0xA0b86991c6218b36c1d19D4a2e9Eb0cE3606eB48"]}]
})
for line in response.text.strip().split('\n'):
data = json.loads(line)
# Process raw blockchain data
```
### SDKs: Data Transformation Frameworks
SDKs (Pipes and Squid) build on top of Portal to provide data transformation, decoding, and persistence capabilities.
```typescript Pipes SDK Example theme={"system"}
import { createTarget } from "@subsquid/pipes";
import { evmPortalSource, EvmQueryBuilder } from "@subsquid/pipes/evm";
const queryBuilder = new EvmQueryBuilder()
.addFields({ block: { number: true, hash: true }, log: { data: true } })
.addLog({
request: { address: ["0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48"] },
range: { from: 20000000, to: 20001000 }
});
const source = evmPortalSource({
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
query: queryBuilder
});
const target = createTarget({
write: async ({ read }) => {
for await (const { data } of read()) {
// Transform and persist to your database
}
}
});
await source.pipeTo(target);
```
```typescript Squid SDK Example theme={"system"}
import { EvmBatchProcessor } from "@subsquid/evm-processor";
import { TypeormDatabase } from "@subsquid/typeorm-store";
const processor = new EvmBatchProcessor()
.setGateway("https://v2.archive.subsquid.io/network/ethereum-mainnet")
.addLog({
address: ["0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48"],
topic0: [usdcAbi.events.Transfer.topic]
});
const db = new TypeormDatabase();
processor.run(db, async (ctx) => {
const transfers: Transfer[] = [];
for (let block of ctx.blocks) {
for (let log of block.logs) {
let { from, to, value } = usdcAbi.events.Transfer.decode(log);
transfers.push(new Transfer({ from, to, value }));
}
}
await ctx.store.insert(transfers);
});
```
### Detailed Comparison
| Aspect | Portal | SDKs (Pipes/Squid) |
| ----------------------- | --------------------------------------- | ---------------------------------------- |
| **Purpose** | Raw data retrieval | Data transformation and persistence |
| **Output** | Raw blockchain data (JSON) | Processed data in databases/APIs |
| **Performance** | 10-50x faster than RPC | Additional processing overhead |
| **Setup Complexity** | Minimal (HTTP requests only) | Project scaffolding, TypeScript required |
| **Data Decoding** | Manual (you handle ABI decoding) | Built-in ABI decoding |
| **Database** | Bring your own | PostgreSQL (Squid), custom (Pipes) |
| **GraphQL API** | Not included | Auto-generated (Squid only) |
| **Type Safety** | Limited (JSON responses) | Full TypeScript support |
| **State Management** | Manual | Built-in with fork handling |
| **Best For** | Analytics, data lakes, custom pipelines | dApps, APIs, traditional backends |
| **Languages Supported** | Any (HTTP API) | TypeScript |
| **Learning Curve** | Low (HTTP + your language) | Medium (TypeScript + framework concepts) |
### When to Use Portal
Stream raw data directly into analytics platforms like ClickHouse, BigQuery, or Snowflake
Build data pipelines in Python, Go, Rust, or any language with HTTP support
Populate data warehouses with raw blockchain data for long-term analysis
Quick experiments without setting up a full indexing framework
### When to Use SDKs
Build GraphQL APIs that power decentralized applications
Decode events, track relationships, and maintain state across blocks
Deploy scalable indexers with built-in monitoring and deployment tools
Leverage TypeScript for compile-time safety and better developer experience
## Pipes SDK vs Squid SDK
Both SDKs consume data from Portal but offer different levels of abstraction and flexibility.
### Architecture Differences
**Streaming Library Approach**
Pipes SDK is a lightweight streaming library that gives you maximum flexibility:
* You define the data flow using a pipe-and-target pattern
* You bring your own database and persistence logic
* You control exactly how data is processed and stored
* No CLI tools or scaffolding - integrate into existing projects
```typescript theme={"system"}
// Minimal setup - you control everything
const source = evmPortalSource({ portal: url, query });
const target = createTarget({ write: yourCustomLogic });
await source.pipeTo(target);
```
**Complete Framework Approach**
Squid SDK is a full-featured framework with built-in conventions:
* CLI tools for project scaffolding and management
* PostgreSQL database with automatic migrations
* Auto-generated GraphQL API from schema
* Built-in deployment tools for SQD Cloud
```typescript theme={"system"}
// Opinionated structure with built-in features
const processor = new EvmBatchProcessor()
.setGateway(url)
.addLog({ ... });
const db = new TypeormDatabase();
processor.run(db, async (ctx) => {
// Framework handles state, rollbacks, persistence
});
```
### Feature Comparison Matrix
| Feature | Pipes SDK | Squid SDK |
| --------------------- | ----------------------------- | ------------------------------------- |
| **Type** | Streaming library | Complete framework |
| **Installation** | `npm install @subsquid/pipes` | `npm i -g @subsquid/cli` |
| **Project Setup** | Manual integration | CLI scaffolding (`sqd init`) |
| **Database** | Bring your own (any) | PostgreSQL (built-in) |
| **GraphQL API** | Not included | Auto-generated from schema |
| **Migrations** | You implement | Auto-generated |
| **Type Generation** | Manual | Auto-generated from ABI and schema |
| **Event Decoding** | Built-in codec | Built-in codec |
| **State Rollbacks** | You implement | Built-in fork handling |
| **CLI Tools** | None | Full CLI suite |
| **Cloud Deployment** | Manual | `sqd deploy` command |
| **Local Development** | Standard Node.js | Docker Compose included |
| **Data Targets** | Custom (you implement) | TypeORM, BigQuery, CSV, JSON, Parquet |
| **Real-time Support** | Yes (streaming) | Yes (unfinal blocks) |
| **Best For** | Custom pipelines, flexibility | Full-stack dApps, rapid development |
| **Learning Curve** | Lower (simpler API) | Higher (more concepts) |
| **Bundle Size** | Smaller | Larger (includes framework) |
| **Customization** | Maximum flexibility | Opinionated but extensible |
### Decision Matrix
You should use Pipes SDK if:
* ✅ You want maximum flexibility and control
* ✅ You're integrating into an existing codebase
* ✅ You want to use a specific database (MongoDB, ClickHouse, etc.)
* ✅ You don't need a GraphQL API
* ✅ You prefer lightweight dependencies
* ✅ You want to build custom data pipelines
* ✅ You're experienced with database design and management
**Example use cases:**
* Real-time dashboards with custom databases
* Data pipelines feeding multiple systems
* Microservices that need blockchain data
* Custom analytics engines
You should use Squid SDK if:
* ✅ You want a complete solution with minimal setup
* ✅ You need a GraphQL API
* ✅ You're building a dApp backend
* ✅ You want automatic type generation
* ✅ You prefer PostgreSQL
* ✅ You need built-in deployment tools
* ✅ You want comprehensive documentation and examples
**Example use cases:**
* dApp backends with GraphQL APIs
* NFT marketplaces
* DeFi analytics platforms
* Governance tools
## Why We Built Pipes SDK
Pipes SDK represents a fundamental rethinking of how blockchain indexing SDKs should work, informed by years of experience with Squid SDK 1.0 and feedback from the developer community.
### Challenges with Squid SDK 1.0
While Squid SDK 1.0 successfully served many production use cases, we identified several architectural limitations that hindered developer experience and community adoption:
Although designed with modularity in mind, many components became tightly coupled in practice. This made it extremely difficult to replace or extend parts of the system without understanding the entire SDK's internal workings, creating a steep learning curve for developers attempting customization.
Insufficient documentation and internal test coverage made introducing changes
risky and time-consuming. Developers faced significant friction when trying to
extend or modify the SDK's behavior.
The SDK was built to function as a standalone process, making it challenging
to embed into existing applications or workflows. This architectural decision
limited flexibility for teams wanting to integrate blockchain indexing into
their existing codebases.
The framework enforced a very opinionated development style with limited room
for customization. Developers often found themselves constrained by the SDK's
assumptions rather than supported by its abstractions.
The absence of a built-in testing framework for client applications increased
friction for developers who needed to validate their integrations and business
logic.
These combined limitations significantly hindered external contributions, resulting in minimal meaningful engagement from the developer community.
### Pipes SDK Design Goals
Pipes SDK was built from the ground up to address these challenges and provide a modern, flexible foundation for blockchain data indexing:
Enable developers to concentrate on application-specific business logic rather than dealing with low-level blockchain implementation details.
Promote code sharing and maintainability by extracting common functionality
into reusable, composable packages.
Provide ready-to-use extensions for common tasks including Portal caching
layers, factory contract handling, and database integrations (PostgreSQL,
ClickHouse, Kafka).
Design a plugin system that makes it easy for developers to create and share
extensions, fostering a vibrant ecosystem of community contributions.
Include first-class support for custom metrics, profiling tools, and
centralized logging services to help developers monitor and optimize their
indexers.
Ensure compatibility with modern JavaScript runtimes like Bun for faster development and improved performance.
Pipes SDK maintains backward compatibility with Squid SDK's data sources
(Portal) while offering a completely redesigned developer experience focused
on flexibility, composability, and ease of use.
## Key Benefits by Tool
### Portal API Benefits
* **Language Agnostic** - Use any programming language with HTTP support
* **Minimal Setup** - Start querying in minutes with simple HTTP requests
* **Maximum Control** - Full control over data processing and storage
* **High Performance** - 10-50x faster than RPC for historical data
### Pipes SDK Benefits
* **Lightweight** - Minimal dependencies and bundle size
* **Flexible** - Use any database or data target
* **Streaming** - Built-in backpressure handling and memory efficiency
* **Type-Safe** - Full TypeScript support with auto-generated types
### Squid SDK Benefits
* **Complete Solution** - Everything you need in one framework
* **Rapid Development** - CLI tools and scaffolding for quick starts
* **Auto-Generated APIs** - GraphQL API generated from your schema
* **Production Ready** - Built-in deployment, monitoring, and scaling tools
* **Comprehensive Tooling** - Hot reload, migrations, type generation
## Next Steps
Now that you understand the differences, choose your path:
Start querying raw blockchain data in minutes
Build a lightweight data pipeline
Create a full-featured indexer with GraphQL API
# SQD SDK Overview
Source: https://docs.sqd.dev/en/sdk/overview
Choose how to access blockchain data with SQD — direct HTTP queries via the Portal API, lightweight Pipes SDK pipelines, or full Squid SDK GraphQL stacks.
## Choose Your Data Access Method
SQD offers three ways to access blockchain data. Each is designed for different use cases and developer preferences.
Direct HTTP access to raw blockchain data. Query using simple HTTP requests in any programming language.
Lightweight TypeScript streaming library for custom data pipelines with
automatic decoding and type safety.
Complete TypeScript framework with built-in PostgreSQL and auto-generated GraphQL API.
## Quick Comparison
**What it is:** Language-agnostic HTTP API for raw blockchain data
**Best for:** Analytics, data lakes, custom pipelines in any language
**Key advantage:** 10-50x faster than RPC with minimal setup
**What it is:** TypeScript streaming library with flexible architecture
**Best for:** Custom pipelines, existing codebases, non-PostgreSQL databases
**Key advantage:** Full control over data flow with minimal dependencies
**What it is:** Full-stack TypeScript framework with database and API
**Best for:** dApp backends, GraphQL APIs, rapid development
**Key advantage:** Built-in PostgreSQL + auto-generated GraphQL API
## Next Steps
See feature matrices, code examples, and technical trade-offs between Portal API, Pipes SDK, and Squid SDK.
Follow the quickstart to set up Portal API and run your first query.
**Enterprise Custom Development**
For enterprise clients requiring custom indexer development, please contact our team to discuss your specific requirements. [Schedule a Consultation →](https://calendly.com/t-tyrie-subsquid/30min)
# Factory transformers
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/advanced-topics/factory-transformers
Index dynamic contracts with the factory pattern
Use the factory pattern when you need to index events from contracts that are deployed dynamically by a known factory contract — for example, Uniswap V3 pools created by the `UniswapV3Factory`.
The examples below use typegen-generated ABI modules. See [Specifying events](../basic-development/handling-events#specifying-events) for how to generate them from a JSON ABI.
## Basic factory
Track events from contracts created by a factory. The `factory()` helper discovers child contracts from the factory's creation events and maintains the address list in a local SQLite database.
```ts theme={"system"}
import { evmPortalStream, evmDecoder, factory, factorySqliteDatabase } from "@subsquid/pipes/evm";
import { createTarget } from "@subsquid/pipes";
import * as factoryAbi from "./abi/uniswap-v3-factory";
import * as poolAbi from "./abi/uniswap-v3-pool";
await evmPortalStream({
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
outputs: evmDecoder({
range: { from: 12369621 },
contracts: factory({
address: "0x1f98431c8ad98523631ae4a59f267346ea31f984",
event: factoryAbi.events.PoolCreated,
parameter: "pool",
database: factorySqliteDatabase({ path: "./uniswap-v3-pools.sqlite" }),
}),
events: { swap: poolAbi.events.Swap },
}),
}).pipeTo(createTarget({
write: async ({ logger, read }) => {
for await (const { data } of read()) {
logger.info(`Parsed ${data.swap.length} swaps`);
}
},
}));
```
## Filtering factory events
To narrow which child contracts are tracked, pass an `event` object with a `params` field. Only creation events matching the specified parameter values are stored — unmatched contracts are ignored at both the portal and the local database level.
```ts theme={"system"}
contracts: factory({
address: "0x1f98431c8ad98523631ae4a59f267346ea31f984",
event: {
event: factoryAbi.events.PoolCreated,
params: {
token0: "0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2", // WETH
},
},
parameter: "pool",
database: factorySqliteDatabase({ path: "./uniswap-v3-weth-pools.sqlite" }),
})
```
**Filter rules:**
* Only **indexed parameters** can be used for filtering.
* Multiple parameters are combined with AND logic.
* Passing an **array** of values for a parameter matches any of them (OR logic).
* Address matching is case-insensitive.
```ts theme={"system"}
import { evmPortalStream, evmDecoder, factory, factorySqliteDatabase } from "@subsquid/pipes/evm";
import { createTarget } from "@subsquid/pipes";
import * as factoryAbi from "./abi/uniswap-v3-factory";
import * as poolAbi from "./abi/uniswap-v3-pool";
const WETH = "0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2";
await evmPortalStream({
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
outputs: evmDecoder({
range: { from: 12369621 },
contracts: factory({
address: "0x1f98431c8ad98523631ae4a59f267346ea31f984",
event: {
event: factoryAbi.events.PoolCreated,
params: { token0: WETH },
},
parameter: "pool",
database: factorySqliteDatabase({ path: "./uniswap-v3-weth-pools.sqlite" }),
}),
events: { swap: poolAbi.events.Swap },
}),
}).pipeTo(createTarget({
write: async ({ logger, read }) => {
for await (const { data } of read()) {
logger.info(`Parsed ${data.swap.length} swaps from WETH pools`);
}
},
}));
```
## Including factory event data
`DecodedEvent` carries a `.factory` field with the creation event. Use it when you need to include factory context (e.g. pool token addresses) alongside each decoded event.
```ts theme={"system"}
import {
evmPortalStream, evmDecoder, factory, DecodedEvent, factorySqliteDatabase,
} from "@subsquid/pipes/evm";
import { createTarget } from "@subsquid/pipes";
import * as factoryAbi from "./abi/uniswap-v3-factory";
import * as poolAbi from "./abi/uniswap-v3-pool";
function addFactoryMetadata(event: DecodedEvent) {
return {
...event.event,
blockNumber: event.block.number,
factoryEvent: event.factory?.event,
};
}
const decoder = evmDecoder({
range: { from: 12369621 },
contracts: factory({
address: "0x1f98431c8ad98523631ae4a59f267346ea31f984",
event: factoryAbi.events.PoolCreated,
parameter: "pool",
database: factorySqliteDatabase({ path: "./uniswap-v3-pools.sqlite" }),
}),
events: { swap: poolAbi.events.Swap, mint: poolAbi.events.Mint },
}).pipe(({ swap, mint }) => ({
swap: swap.map(addFactoryMetadata),
mint: mint.map(addFactoryMetadata),
}));
await evmPortalStream({
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
outputs: decoder,
}).pipeTo(createTarget({
write: async ({ logger, read }) => {
for await (const { data } of read()) {
for (const s of data.swap) {
logger.info({
pool: s.factoryEvent?.pool,
token0: s.factoryEvent?.token0,
token1: s.factoryEvent?.token1,
amount0: s.amount0.toString(),
amount1: s.amount1.toString(),
});
}
}
},
}));
```
## Multiple factories
Pass separate `evmDecoder` outputs to track contracts from different factory addresses in a single pipeline.
```ts theme={"system"}
import {
evmPortalStream, evmDecoder, factory, factorySqliteDatabase,
} from "@subsquid/pipes/evm";
import { createTarget } from "@subsquid/pipes";
import * as uniswapV3FactoryAbi from "./abi/uniswap-v3-factory";
import * as uniswapV2FactoryAbi from "./abi/uniswap-v2-factory";
import * as poolAbi from "./abi/pool";
await evmPortalStream({
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
outputs: {
v3: evmDecoder({
range: { from: 12369621 },
contracts: factory({
address: "0x1f98431c8ad98523631ae4a59f267346ea31f984",
event: uniswapV3FactoryAbi.events.PoolCreated,
parameter: "pool",
database: factorySqliteDatabase({ path: "./v3-pools.sqlite" }),
}),
events: { swap: poolAbi.events.Swap },
}),
v2: evmDecoder({
range: { from: 10000835 },
contracts: factory({
address: "0x5C69bEe701ef814a2B6a3EDD4B1652CB9cc5aA6f",
event: uniswapV2FactoryAbi.events.PairCreated,
parameter: "pair",
database: factorySqliteDatabase({ path: "./v2-pairs.sqlite" }),
}),
events: { swap: poolAbi.events.Swap },
}),
},
}).pipeTo(createTarget({
write: async ({ logger, read }) => {
for await (const { data } of read()) {
logger.info({ v3Swaps: data.v3.swap.length, v2Swaps: data.v2.swap.length });
}
},
}));
```
## Pre-indexing factory Experimental
Pre-populate the factory database before the main pipeline to ensure all historical child contracts are known before live indexing begins.
This is an experimental feature. The pre-indexing request is limited and this approach won't work for thousands of addresses.
```ts theme={"system"}
import {
evmPortalStream, evmDecoder, factory, factorySqliteDatabase,
} from "@subsquid/pipes/evm";
import { createTarget } from "@subsquid/pipes";
import * as factoryAbi from "./abi/uniswap-v3-factory";
import * as poolAbi from "./abi/uniswap-v3-pool";
const factoryDb = factorySqliteDatabase({ path: "./uniswap-v3-pools.sqlite" });
// Step 1: pre-index historical pool creations
await evmPortalStream({
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
outputs: evmDecoder({
range: { from: 12369621, to: 20000000 },
contracts: ["0x1f98431c8ad98523631ae4a59f267346ea31f984"],
events: { poolCreated: factoryAbi.events.PoolCreated },
}),
}).pipeTo(createTarget({
write: async ({ logger, read }) => {
for await (const { data } of read()) {
for (const event of data.poolCreated) {
await factoryDb.add(event.event.pool);
logger.info(`Added pool: ${event.event.pool}`);
}
}
},
}));
// Step 2: run main pipeline with populated factory database
await evmPortalStream({
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
outputs: evmDecoder({
range: { from: 20000000 },
contracts: factory({
address: "0x1f98431c8ad98523631ae4a59f267346ea31f984",
event: factoryAbi.events.PoolCreated,
parameter: "pool",
database: factoryDb,
}),
events: { swap: poolAbi.events.Swap },
}),
}).pipeTo(createTarget({
write: async ({ logger, read }) => {
for await (const { data } of read()) {
logger.info(`Processed ${data.swap.length} swaps`);
}
},
}));
```
# Data freshness monitoring
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/advanced-topics/latency-monitoring
Compare Portal data freshness against external RPC providers
The `evmRpcLatencyWatcher` subscribes to RPC endpoints via WebSocket and measures when blocks arrive at the Portal versus when they appear at the RPC endpoints.
The measured values include client-side network data freshness. For RPC endpoints, only the arrival time of blocks is measured—this does not capture the node’s internal processing or response latency if queried directly. Results represent end-to-end delays as experienced by the client, not pure Portal or RPC processing performance.
```ts theme={"system"}
import { formatBlock } from "@subsquid/pipes";
import { evmPortalSource, evmRpcLatencyWatcher } from "@subsquid/pipes/evm";
import { metricsServer } from "@subsquid/pipes/metrics/node";
async function main() {
const stream = evmPortalSource({
portal: "https://portal.sqd.dev/datasets/base-mainnet",
outputs: evmDecoder({ range: { from: 'latest' }, events: {} }),
metrics: metricsServer({ port: 9090 }),
}).pipe(
evmRpcLatencyWatcher({
rpcUrl: ["https://base.drpc.org", "https://base-rpc.publicnode.com"],
}).pipe((data, { metrics }) => {
if (!data) return;
// Update Prometheus metrics for each RPC endpoint
for (const rpc of data.rpc) {
metrics
.gauge({
name: "rpc_latency_ms",
help: "RPC Latency in ms",
labelNames: ["url"],
})
.set({ url: rpc.url }, rpc.portalDelayMs);
}
return data;
})
);
for await (const { data } of stream) {
if (!data) continue;
console.log(`Block: ${formatBlock(data.number)} / ${data.timestamp}`);
console.table(data.rpc);
}
}
void main()
```
## Output format
Data freshness data includes:
* `url`: RPC endpoint URL
* `receivedAt`: Timestamp when the RPC endpoint received the block
* `portalDelayMs`: Milliseconds between RPC arrival and Portal availability
```
Block: 36,046,611 / Fri Sep 26 2025 14:29:29 GMT+0400
┌───┬─────────────────────────────────┬──────────────────────────┬───────────────┐
│ │ url │ receivedAt │ portalDelayMs │
├───┼─────────────────────────────────┼──────────────────────────┼───────────────┤
│ 0 │ https://base.drpc.org │ 2025-09-26T10:29:29.134Z │ 646 │
│ 1 │ https://base-rpc.publicnode.com │ 2025-09-26T10:29:29.130Z │ 642 │
└───┴─────────────────────────────────┴──────────────────────────┴───────────────┘
```
## Custom metrics integration
Export data freshness metrics to Prometheus or other monitoring systems.
```ts expandable theme={"system"}
import { solanaPortalSource, solanaRpcLatencyWatcher } from '@subsquid/pipes/solana'
import { Registry, Counter } from 'prom-client'
const registry = new Registry()
const latencyCounter = new Counter({
name: 'portal_latency_ms_total',
help: 'Total Portal latency in milliseconds',
registers: [registry],
})
const stream = solanaPortalSource({
portal: 'https://portal.sqd.dev/datasets/solana-mainnet',
outputs: new SolanaQueryBuilder()
.addFields({ block: { number: true, hash: true, timestamp: true } })
.includeAllBlocks({ from: 'latest' })
.build(),
}).pipe(
solanaRpcLatencyWatcher({
rpcUrl: ['https://api.mainnet-beta.solana.com'],
}).pipe({
transform: (data) => {
if (!data) return
// Record latency metrics
for (const rpc of data.rpc) {
latencyCounter.inc(rpc.portalDelayMs)
}
return data
},
}),
)
// Expose metrics endpoint
import http from 'http'
const server = http.createServer(async (req, res) => {
if (req.url === '/metrics') {
res.setHeader('Content-Type', registry.contentType)
res.end(await registry.metrics())
} else {
res.end('OK')
}
})
server.listen(9090)
for await (const { data } of stream) {
// Process blocks
}
```
# Logging
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/advanced-topics/logging
Customize logging with Pino-compatible transports
[evmPortalSource()](../../reference/basic-components/source) accepts a Pino-compatible `logger`, allowing you to integrate custom log transports and send logs to external services like GCP Cloud Logging, Sentry, or any other Pino-compatible destination.
## Basic custom logger
Pass a custom logger to the source to configure logging for your entire pipeline.
```ts theme={"system"}
import { createTarget } from "@subsquid/pipes";
import { evmPortalSource } from "@subsquid/pipes/evm";
import pino from "pino";
async function main() {
const transport = pino.transport({
target: "pino-pretty",
options: {
colorize: true,
translateTime: "HH:MM:ss",
},
});
const source = evmPortalSource({
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
logger: pino(transport),
});
const target = createTarget({
write: async ({ logger, read }) => {
for await (const { data } of read()) {
logger.info({ count: data.length }, "Processed batch");
}
},
});
await source.pipeTo(target);
}
void main()
```
## Integration with cloud services
You can use any Pino transport to send logs to cloud services. Pass the configured logger to the source.
```ts theme={"system"}
import { createTarget } from "@subsquid/pipes";
import { evmPortalSource } from "@subsquid/pipes/evm";
import pino from "pino";
async function main() {
const transport = pino.transport({
target: "@google-cloud/logging-pino",
options: {
projectId: "your-project-id",
logName: "pipes-indexer",
},
});
const source = evmPortalSource({
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
logger: pino(transport),
});
const target = createTarget({
write: async ({ logger, read }) => {
for await (const { data } of read()) {
logger.info(
{
blocksProcessed: data.blocks?.length,
eventsCount: data.transfer?.length,
},
"Batch processed"
);
}
},
});
await source.pipeTo(target);
}
void main()
```
```ts theme={"system"}
import { createTarget } from "@subsquid/pipes";
import { evmPortalSource } from "@subsquid/pipes/evm";
import pino from "pino";
async function main() {
const transport = pino.transport({
target: "pino-sentry-transport",
options: {
sentry: {
dsn: process.env.SENTRY_DSN,
environment: "production",
},
level: "error", // Only send errors to Sentry
},
});
const source = evmPortalSource({
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
logger: pino(transport),
});
const target = createTarget({
write: async ({ logger, read }) => {
for await (const { data } of read()) {
try {
await processData(data);
logger.info({ count: data.length }, "Batch processed");
} catch (error) {
logger.error({ error, data }, "Failed to process batch");
}
}
},
});
await source.pipeTo(target);
}
void main()
```
```ts theme={"system"}
import { createTarget } from "@subsquid/pipes";
import { evmPortalSource } from "@subsquid/pipes/evm";
import pino from "pino";
async function main() {
const transport = pino.transport({
targets: [
{
target: "pino-pretty",
options: { colorize: true },
level: "info",
},
{
target: "@google-cloud/logging-pino",
options: { projectId: "your-project-id" },
level: "info",
},
{
target: "pino-sentry-transport",
options: { sentry: { dsn: process.env.SENTRY_DSN } },
level: "error",
},
],
});
const source = evmPortalSource({
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
logger: pino(transport),
});
const target = createTarget({
write: async ({ logger, read }) => {
for await (const { data } of read()) {
logger.info({ count: data.length }, "Processed batch");
}
},
});
await source.pipeTo(target);
}
void main()
```
The `ctx.logger` in transformers and targets is the same logger instance passed to the source. Configure logging at the source level, then use `ctx.logger` throughout your pipeline for consistent logging.
# Metrics
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/advanced-topics/metrics
Track custom Prometheus metrics in EVM pipes
Pipes SDK can expose a Prometheus-compatible metrics server. You can customize it to add counters, gauges, histograms, and summaries.
```ts theme={"system"}
import { commonAbis, evmDecoder, evmPortalStream } from "@subsquid/pipes/evm";
import { metricsServer } from "@subsquid/pipes/metrics/node";
async function main() {
const stream = evmPortalStream({
id: 'evm-decoder',
portal: 'https://portal.sqd.dev/datasets/ethereum-mainnet',
outputs: evmDecoder({
range: {
from: 'latest',
},
events: {
transfers: commonAbis.erc20.events.Transfer,
},
}),
metrics: metricsServer({
port: 9090
}), // equivalent to metricsServer(), as 9090 is the default port
})
for await (const { data, ctx } of stream) {
// Add custom counter metric
ctx.metrics
.counter({
name: "my_transfers_counter",
help: "Number of processed transactions",
})
.inc(data.transfers.length);
}
}
void main()
```
Access metrics at `http://localhost:9090/metrics` to verify they're being exposed correctly.
```
# HELP my_transfers_counter Number of processed transactions
# TYPE my_transfers_counter counter
my_transfers_counter 218598
```
Use Grafana dashboards to visualize block processing rate, error rates, and latency trends from your Prometheus metrics.
## Available metric types
You can create different types of Prometheus metrics:
```ts theme={"system"}
for await (const { data, ctx } of stream) {
// Counter - monotonically increasing value
ctx.metrics.counter({ name: "events_total", help: "Total events" }).inc();
// Gauge - value that can go up or down
ctx.metrics
.gauge({ name: "queue_size", help: "Current queue size" })
.set(queueSize);
// Histogram - observations with configurable buckets
ctx.metrics
.histogram({ name: "batch_size", help: "Batch size distribution" })
.observe(data.transfers.length);
}
```
Expose metrics with `metricsServer()` on your source, then visualize them with [Pipes UI](../basic-development/pipes-ui).
See the [Profiling](./profiling) guide for the built-in per-batch profiler exposed on the same metrics endpoint.
# Profiling
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/advanced-topics/profiling
Measure where time is spent in a EVM pipe
Pipes SDK ships a built-in per-batch profiler. It records how long each part of the pipeline takes. When a [`metricsServer()`](../../reference/utility-components/metrics-server) is attached to the source, the profiler output is served as JSON at `http://localhost:/profiler` and rendered live in [Pipes UI](../basic-development/pipes-ui).
## Enabling the profiler
The profiler is **on by default** when `process.env.NODE_ENV !== 'production'` and **off** otherwise. Override explicitly with `profiler: true` or `profiler: false` on the source.
```ts theme={"system"}
evmPortalStream({
portal: 'https://portal.sqd.dev/datasets/ethereum-mainnet',
outputs: /* ... */,
metrics: metricsServer({ port: 9090 }),
profiler: true, // force on — useful in production
})
```
## Interpreting the output
The pipeline is represented as a tree. Each node reports how long was spent in that stage of a batch. A typical tree looks like:
```
batch
├── fetch data
├── apply transformers
│ ├── track progress
│ └── EVM decoder
├── clickhouse
│ ├── data handler
│ ├── insert cursor
│ └── cleanup cursors
└── metrics processing
```
## Custom spans
Wrap any code in your target or transformer to get it to appear as a tree node. `ctx.profiler.start()` is a no-op when the profiler is disabled, so the instrumentation is safe to leave in place.
```ts theme={"system"}
onData: async ({ data, ctx }) => {
const span = ctx.profiler.start('my measure')
await myDataProcessing(data)
span.end()
},
```
The named span appears under its parent node in the tree:
```
batch
...
├── clickhouse
│ ├── data handler
│ │ └── my measure
...
```
# Railway deployment
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/advanced-topics/railway-deployment
Deploying a Pipes SDK project to Railway
This guide walks you through deploying a Pipes SDK indexer to [Railway](https://railway.app). You'll end up with four services running together: your indexer, a database (PostgreSQL or ClickHouse), and the Pipe UI dashboard.
## Prerequisites
* A [Railway account](https://railway.app)
* Your project pushed to a **public or private GitHub repository**
* Your project built with `pipes init` (which generates a `Dockerfile` and `docker-compose.yaml`)
***
## Option A — Drag & Drop via the Railway Dashboard
The quickest way to get started is to drop your `docker-compose.yaml` directly onto the Railway project canvas.
### Step 1 — Create a new project on Railway
1. Go to [railway.app/new](https://railway.app/new) and click **Empty Project**.
### Step 2 — Drag and drop your `docker-compose.yaml`
1. Open your project canvas.
2. Drag the `docker-compose.yaml` file from your project root and drop it anywhere on the canvas.
Railway parses the file and creates a service for each entry. For a typical Pipes SDK project this produces:
| Service | Image / Source |
| -------------------------- | ---------------------------------- |
| Your indexer | Built from your local `Dockerfile` |
| `postgres` or `clickhouse` | Official Docker images |
### Step 3 — Link the indexer service to your GitHub repository
1. Click the indexer service card on the canvas.
2. Go to **Settings → Source** and choose **GitHub Repo**.
3. Select your repository and branch.
Railway will now redeploy automatically on every push.
### Step 4 — Add the Pipe UI service
1. Click **+ New Service** on the canvas.
2. Choose **Docker Image** and enter `iankguimaraes/pipe-ui:latest`.
3. Go to the service's **Variables** tab and add:
```
METRICS_SERVER_URL=${{Pipes.RAILWAY_PRIVATE_DOMAIN}}:9090
```
Replace `Pipes` with the actual name Railway assigned to your indexer service.
### Step 5 — Generate public domains
For each service that needs a public URL:
1. Click the service card.
2. Go to **Settings → Networking → Public Networking**.
3. Click **Generate Domain** and set the correct port (`3000` for Pipe UI, `5432` for PostgreSQL, `8123` for ClickHouse).
***
## Option B — Railway CLI
If you prefer the terminal, the Railway CLI gives you full control over every service and environment variable.
### Step 1 — Install the Railway CLI
```bash theme={"system"}
# macOS / Linux
curl -fsSL https://railway.app/install.sh | sh
# or via npm
npm install -g @railway/cli
```
### Step 2 — Log in to Railway
```bash theme={"system"}
railway login
```
This opens a browser window for OAuth authentication. After approving, the CLI is authenticated for the current session.
### Step 3 — Initialize the Railway project
Run this from the root of your indexer project:
```bash theme={"system"}
railway init --name "your-project-name"
```
Use the same name as the `name` field in your `package.json`. This creates a new project on Railway and links the current directory to it.
### Step 4 — Add the database service
**If your project uses PostgreSQL** (projects with `drizzle.config.ts`):
```bash theme={"system"}
railway add -d postgres
```
Railway provisions a managed PostgreSQL instance and injects a `DATABASE_URL` variable automatically.
**If your project uses ClickHouse:**
```bash theme={"system"}
railway add \
--service Clickhouse \
--image clickhouse/clickhouse-server:latest \
--variables CLICKHOUSE_DB=pipes \
--variables CLICKHOUSE_USER=default \
--variables CLICKHOUSE_PASSWORD=password
```
### Step 5 — Add the indexer service
Replace `your-org/your-repo` with your actual GitHub repository slug.
**PostgreSQL project:**
```bash theme={"system"}
railway add \
--service Pipes \
--repo your-org/your-repo \
--variables "DB_CONNECTION_STR=\${{Postgres.DATABASE_URL}}"
```
**ClickHouse project:**
```bash theme={"system"}
railway add \
--service Pipes \
--repo your-org/your-repo \
--variables "CLICKHOUSE_URL=http://\${{Clickhouse.RAILWAY_PRIVATE_DOMAIN}}:8123" \
--variables CLICKHOUSE_DB=pipes \
--variables CLICKHOUSE_USER=default \
--variables CLICKHOUSE_PASSWORD=password
```
> **Note on variable syntax:** `${{ServiceName.VARIABLE}}` is Railway's cross-service reference syntax. The shell requires escaping the `$` as `\$` when passing it through the CLI; Railway resolves it at runtime.
The `--repo` flag links this service to your GitHub repository and enables automatic deployments on every push to the default branch.
### Step 6 — Add the Pipe UI dashboard
```bash theme={"system"}
railway add \
--service PipeUI \
--image iankguimaraes/pipe-ui:latest \
--variables "METRICS_SERVER_URL=\${{Pipes.RAILWAY_PRIVATE_DOMAIN}}:9090"
```
The Pipe UI connects to your indexer's metrics endpoint (port 9090, exposed by the indexer at runtime).
### Step 7 — Generate public domains
Give each service a publicly accessible URL:
**PostgreSQL project:**
```bash theme={"system"}
# Expose the database
railway domain --service Postgres --port 5432
# Expose the UI
railway domain --service PipeUI --port 3000
```
**ClickHouse project:**
```bash theme={"system"}
# Expose the database
railway domain --service Clickhouse --port 8123
# Expose the UI
railway domain --service PipeUI --port 3000
```
### Step 8 — Open the Railway dashboard
```bash theme={"system"}
railway open
```
This opens your project in the Railway web UI where you can monitor deployments, view logs, and manage environment variables.
***
## Service Architecture
```
┌──────────────────────────────────────────────────────┐
│ Railway Project │
│ │
│ ┌─────────────┐ private network │
│ │ Database │◄────────────────────┐ │
│ │ (Postgres │ │ │
│ │ /Clickhouse)│ │ │
│ └──────┬──────┘ │ │
│ │ public domain (optional) │ │
│ ┌──────┴──────┐ │
│ │ Indexer │ │
│ │ (Pipes) │ │
│ └──────┬──────┘ │
│ │ :9090 metrics │
│ ┌──────▼──────┐ │
│ │ Pipe UI │ │
│ │ (PipeUI) │ │
│ └──────┬──────┘ │
│ │ public domain │
└──────────────────────────────────────┼───────────────┘
▼
Browser / API
```
Services communicate over Railway's private network using `${{ServiceName.RAILWAY_PRIVATE_DOMAIN}}` references. Only the UI (and optionally the database) need public domains.
***
## Environment Variables Reference
### Indexer (PostgreSQL)
| Variable | Value |
| ------------------- | ---------------------------- |
| `DB_CONNECTION_STR` | `${{Postgres.DATABASE_URL}}` |
### Indexer (ClickHouse)
| Variable | Value |
| --------------------- | ---------------------------------------------------- |
| `CLICKHOUSE_URL` | `http://${{Clickhouse.RAILWAY_PRIVATE_DOMAIN}}:8123` |
| `CLICKHOUSE_DB` | `pipes` |
| `CLICKHOUSE_USER` | `default` |
| `CLICKHOUSE_PASSWORD` | `password` |
### ClickHouse service
| Variable | Value |
| --------------------- | ---------- |
| `CLICKHOUSE_DB` | `pipes` |
| `CLICKHOUSE_USER` | `default` |
| `CLICKHOUSE_PASSWORD` | `password` |
### Pipe UI
| Variable | Value |
| -------------------- | ---------------------------------------- |
| `METRICS_SERVER_URL` | `${{Pipes.RAILWAY_PRIVATE_DOMAIN}}:9090` |
***
## Dockerfile Overview
Your project's `Dockerfile` (generated by `pipes init`) uses a two-stage build:
1. **Builder stage** — installs dependencies with `pnpm`, compiles TypeScript to `dist/`.
2. **Runner stage** — copies only the production build, runs migrations (PostgreSQL only), then starts the indexer.
The indexer exposes **port 9090** for metrics, which Pipe UI connects to.
```dockerfile theme={"system"}
EXPOSE 9090
CMD ["sh", "-lc", "pnpm db:generate && pnpm db:migrate && node dist/index.js"]
# (PostgreSQL only; ClickHouse projects skip the migration step)
```
***
## Troubleshooting
**Indexer fails to start — cannot connect to database**
The database service may not be healthy yet. Railway starts services in parallel; the indexer's health-check retry logic should handle this, but you can also set a startup delay under **Settings → Deploy → Start Command**.
**`${{...}}` variables show as literal strings**
Cross-service references are resolved at deploy time. Make sure both services are in the same Railway project and the referenced service name matches exactly (case-sensitive).
**ClickHouse connection refused**
Confirm `CLICKHOUSE_URL` uses the private domain (`RAILWAY_PRIVATE_DOMAIN`), not a public URL, and that port 8123 is correct.
**Pipe UI shows no data**
Check that `METRICS_SERVER_URL` points to the private domain of the indexer service and that port 9090 is included.
# Stateful transforms
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/advanced-topics/stateful-transforms
Six approaches to maintaining state across pipe batches, with trade-offs and examples
A stateful transform is any step in your pipeline that produces output based on more than the current batch — for example, running balances, sliding-window aggregates, or enrichment lookups. This page surveys the available approaches and when to choose each one.
## Before you add state
The cleanest solution is often to emit raw events and let the downstream database derive the state at query time. ClickHouse materialized views and Postgres views both work for this. If your logic can be expressed as SQL and you can tolerate slightly higher query latency, prefer this over transformer state — it eliminates crash recovery and fork handling entirely on the transformer side.
If your logic is hard to express in SQL, or if the derived state must be pre-computed before reaching the target, read on.
## At a glance
| Approach | State lives in | Persistence across restarts | Fork handling | Extra infra | Best for |
| ------------------------------------------------------- | -------------- | ------------------------------ | ----------------------- | ------------- | -------------------------------------------- |
| [A. Pure in-RAM](#a-pure-in-ram) | JS heap | rebuilt from portal on startup | `fork()` callback | none | sliding windows, candles, rolling aggregates |
| [B. ClickHouse MVs](#b-clickhouse-materialized-views) | ClickHouse | ✓ | `sign = -1` rows | ClickHouse | SQL-expressible analytics |
| [C. SQLite transformer](#c-sqlite-transformer) | Local file | ✓ (delta table) | `fork()` callback | none | moderate state |
| [D. Postgres/Drizzle target](#d-postgresdrizzle-target) | Postgres | ✓ | ✓ automatic | Postgres | atomic state + output, Postgres target |
| [E. Apache Flink](#e-apache-flink) | Flink cluster | ✓ | via compensating events | Kafka + Flink | TB-scale distributed state |
| [F. External KV store](#f-external-kv-store) | Redis / Valkey | ✓ (with AOF/RDB) | `fork()` callback | Redis | µs-latency lookups, multi-process state |
***
## A. Pure in-RAM
Keep state in a JavaScript `Map` or array inside the transformer closure. No external storage is involved.
**When to use:**
* State can be derived from a bounded window of recent blocks (e.g., last N blocks or last M seconds).
* You can afford to replay that window on restart (warm-up time ∝ window size).
* State loss is contained: at most one window's worth of history needs to be replayed.
**When not to use:**
* State grows without bound (e.g., all-time ERC-20 balances). Use [Postgres approach D](#d-postgresdrizzle-target) instead — specifically the in-memory + Postgres mirror sub-approach.
* The warm-up window is too large to replay quickly on every restart.
### The warm-up pattern
When the process restarts, the target's cursor tells you where the pipeline left off. The in-RAM state is gone. To rebuild it, call `portal.getStream()` in the `start()` callback — the raw portal client API, independent of the main stream:
```typescript theme={"system"}
start: async ({ portal, state, logger }) => {
if (!state.current) return // first ever run: start empty
const warmupFrom = Math.max(state.initial, state.current.number - LOOKBACK_BLOCKS)
if (warmupFrom >= state.current.number) return
for await (const { blocks } of portal.getStream({
type: 'evm',
fromBlock: warmupFrom,
toBlock: state.current.number,
fields: { block: { number: true }, log: { data: true } },
logs: [{ address: [CONTRACT], topic0: [EVENT_TOPIC] }],
})) {
for (const block of blocks) {
for (const log of block.logs) {
// rebuild in-RAM state from block and log fields
}
}
}
}
```
`portal` is a live `PortalClient` already connected to the dataset. The warm-up query's `toBlock` is the saved cursor, so it terminates immediately after the pipeline resumes from `cursor + 1`. Multiple in-RAM transformers each run their own `start()` warm-up in parallel (the SDK calls child `start()` callbacks concurrently).
### Fork handling
`target.fork()` fires first (ClickHouse `onRollback` or drizzle snapshot rollback), then the transformer's `fork()` callback. At that point the database already reflects pre-fork state. In `fork()`, drop in-RAM entries for blocks beyond the rollback cursor:
```typescript theme={"system"}
fork: async (cursor, { logger }) => {
recentEntries = recentEntries.filter(e => e.blockNumber <= cursor.number)
}
```
### Composability
Use the same `initQueue`/`WriteQueue` pattern as the Postgres examples. For ClickHouse targets the queue holds closures over `ClickhouseStore` instead of a Postgres `Transaction`:
```typescript theme={"system"}
type CHS = { insert(params: { table: string; values: unknown[]; format: string }): Promise }
class WriteQueue {
private ops: Array<(store: CHS) => Promise> = []
push(op: (store: CHS) => Promise): void { this.ops.push(op) }
async flush(store: CHS): Promise { for (const op of this.ops) await op(store) }
}
```
See [`13.stateful-transform-in-ram.example.ts`](https://github.com/subsquid-labs/pipes-sdk-docs/blob/master/src/advanced/evm/13.stateful-transform-in-ram.example.ts) for the full implementation: a rolling \~1-hour transfer volume tracker for the SQD token on Arbitrum, with portal warm-up, fork handling, and the WriteQueue composability pattern targeting ClickHouse.
***
## B. ClickHouse materialized views
Write raw events to a base table; let ClickHouse compute derived state via materialized views (MVs). The transformer is stateless — it only emits events, not pre-computed state.
**When to use:**
* Your aggregation logic is expressible in SQL.
* ClickHouse is already your target.
* You want derived state updated automatically without any transformer code.
**When not to use:**
* Logic requires imperative iteration (e.g., order-dependent simulation).
* Each MV chain adds latency — avoid long dependency chains for latency-sensitive consumers.
* Very frequent writes on lightweight data: prefer plain (non-materialized) views if you have spare CPU on the database machine.
### The core limitation: MVs see only new rows
A materialized view fires when new rows are inserted into its source table. Its `SELECT` clause only operates on the **newly inserted batch**, not the full table. Running totals like cumulative balance cannot be written directly in the MV `SELECT`.
### Workaround: auxiliary aggregating tables
Maintain a separate "current state" table using `AggregatingMergeTree` with `argMaxState`. The MV reads this table alongside the new rows to resolve the latest value before the current batch:
```sql theme={"system"}
-- Stores latest balance per pool (AggregatingMergeTree = efficient upsert)
CREATE TABLE current_balances (
pool_address String,
token_a_balance_raw AggregateFunction(argMax, Int256, Tuple(DateTime, UInt16, UInt16)),
token_b_balance_raw AggregateFunction(argMax, Int256, Tuple(DateTime, UInt16, UInt16))
) ENGINE = AggregatingMergeTree()
ORDER BY pool_address;
-- MV that keeps current_balances up to date
CREATE MATERIALIZED VIEW current_balances_mv TO current_balances AS
SELECT
pool_address,
argMaxState(token_a_balance_raw, (timestamp, transaction_index, log_index)) AS token_a_balance_raw,
argMaxState(token_b_balance_raw, (timestamp, transaction_index, log_index)) AS token_b_balance_raw
FROM balances_history
GROUP BY pool_address;
```
A downstream MV that needs the running balance queries `current_balances` with `argMaxMerge()`:
```sql theme={"system"}
latest_pool_balances AS (
SELECT
pool_address,
argMaxMerge(token_a_balance_raw) AS balance_token_a_raw,
argMaxMerge(token_b_balance_raw) AS balance_token_b_raw
FROM current_balances
WHERE pool_address IN (SELECT pool_address FROM unique_pools_to_insert)
GROUP BY pool_address
)
```
### Temporal joins with ASOF JOIN
When you need "the most recent price before each event", use `ASOF JOIN`:
```sql theme={"system"}
SELECT ...
FROM liquidity_events_raw ml
ASOF JOIN latest_prices wp
ON wp.pool_address = ml.pool_address
AND wp.ts_num + wp.transaction_index * 100_000 + wp.log_index
<= ml.ts_num + ml.transaction_index * 100_000 + ml.log_index
WHERE ml.protocol = 'uniswap_v4'
```
The `ASOF JOIN` selects the latest row in `latest_prices` whose ordering key is ≤ the current event's key — effectively "last price before this log".
### Fork rollback
ClickHouse is non-transactional. Use `CollapsingMergeTree` with a `sign` column: insert `sign = 1` rows on the way forward and `sign = -1` rows to cancel them on rollback. Your `onRollback` handler computes which blocks to cancel and inserts the negating rows.
See [`pipes-sqdgn-dex-example/pipes/evm/liquidity/liquidity.sql`](https://github.com/subsquid-labs/pipes-sqdgn-dex-example/blob/master/pipes/evm/liquidity/liquidity.sql) for a full production SQL schema: `liquidity_events_raw` as the base table, `CollapsingMergeTree` for rollback, `AggregatingMergeTree` + `argMaxState` for current pool balances, an `ASOF JOIN` MV for V4 liquidity, and separate V2/V3/V4 MV chains targeting `balances_history`.
***
## C. SQLite transformer
Keep transformer state in a local SQLite database. The transformer reads and writes SQLite; the downstream target (typically ClickHouse) receives the pre-computed rows.
**When to use:**
* State is too large for RAM.
* You need random-access lookups (e.g., "current balance of address X") that would be slow as a linear scan of in-RAM arrays.
* You're not using Postgres as your target (otherwise see [approach D](#d-postgresdrizzle-target) for better atomicity).
* The indexer runs on persistent infrastructure (SQLite file must survive restarts).
**When not to use:**
* The indexer runs on ephemeral infrastructure (containers, spot VMs). SQLite is lost on restart.
* State requires complex analytical SQL (window functions, multi-table joins) — consider DuckDB as a drop-in alternative with full analytical query support.
### The delta table pattern
SQLite and the downstream target commit separately — a crash between them leaves the two out of sync. A `balance_deltas` table records the net change per address per block, allowing `rollbackTo(blockNumber)` to invert any set of blocks atomically:
```typescript theme={"system"}
// Schema
db.exec(`
CREATE TABLE IF NOT EXISTS balance_deltas (
address TEXT NOT NULL,
block_number INTEGER NOT NULL,
delta TEXT NOT NULL,
PRIMARY KEY (address, block_number)
)
`)
function rollbackTo(blockNumber: number) {
db.transaction(() => {
const deltas = db.prepare('SELECT address, delta FROM balance_deltas WHERE block_number > ?').all(blockNumber)
const net = new Map()
for (const { address, delta } of deltas) net.set(address, (net.get(address) ?? 0n) + BigInt(delta))
for (const [address, delta] of net) {
const { balance } = db.prepare('SELECT balance FROM balances WHERE address = ?').get(address) as any
db.prepare('INSERT INTO balances (address, balance) VALUES (?, ?) ON CONFLICT(address) DO UPDATE SET balance = excluded.balance')
.run(address, (BigInt(balance) - delta).toString())
}
db.prepare('DELETE FROM balance_deltas WHERE block_number > ?').run(blockNumber)
db.prepare('DELETE FROM processed_blocks WHERE block_number > ?').run(blockNumber)
})()
}
```
### Crash recovery in start()
Compare the SQLite high-water mark with the pipeline cursor. If SQLite is ahead, roll back to match:
```typescript theme={"system"}
start: async ({ state }) => {
const sqliteLastBlock = db.prepare('SELECT MAX(block_number) as m FROM processed_blocks').get().m ?? null
const pipelineLastBlock = state.current?.number ?? null
if (sqliteLastBlock !== null && (pipelineLastBlock === null || sqliteLastBlock > pipelineLastBlock)) {
rollbackTo(pipelineLastBlock ?? -1) // SQLite crashed ahead of cursor — roll back
} else if (sqliteLastBlock !== pipelineLastBlock) {
throw new Error(`State mismatch: SQLite=${sqliteLastBlock}, cursor=${pipelineLastBlock}. Delete the SQLite file to rebuild.`)
}
}
```
### Historical-only variant
If you're indexing only finalized data and will never see forks, drop the delta table and accept that a crash requires rebuilding from scratch:
```typescript theme={"system"}
start: async ({ state }) => {
if (sqliteLastBlock !== pipelineLastBlock) {
throw new Error(`Delete ${SQLITE_DB_PATH} to rebuild.`)
}
}
```
* [`09.stateful-transform-on-sqlite.example.ts`](https://github.com/subsquid-labs/pipes-sdk-docs/blob/master/src/advanced/evm/09.stateful-transform-on-sqlite.example.ts) — full delta-table implementation with fork and crash recovery
* [`10.stateful-transform-on-sqlite-no-forks.example.ts`](https://github.com/subsquid-labs/pipes-sdk-docs/blob/master/src/advanced/evm/10.stateful-transform-on-sqlite-no-forks.example.ts) — simpler historical-only variant
***
## D. Postgres/Drizzle target
When your target is already Postgres, the `drizzleTarget` can commit transformer state and output rows inside the **same serializable transaction** as the cursor save. This gives the strongest atomicity guarantees of any approach: a crash between `transform()` and the cursor save is impossible because both commit together.
**When to use:**
* Postgres is your output target.
* You want zero crash recovery code (atomicity handles it automatically).
* Fork rollback should be automatic (drizzleTarget installs snapshot triggers).
**When not to use:**
* Your target is ClickHouse or another non-Postgres database.
* State is too large for Postgres (rare).
### The WriteQueue / initQueue pattern
Multiple stateful transformers all need to write inside the same transaction. The `WriteQueue` collects their write closures; `initQueue` wraps each batch in a `Piped` with a fresh queue; `onData` flushes everything:
```typescript theme={"system"}
class WriteQueue {
private ops: Array<(tx: Transaction) => Promise> = []
push(op: (tx: Transaction) => Promise): void { this.ops.push(op) }
async flush(tx: Transaction): Promise { for (const op of this.ops) await op(tx) }
}
function initQueue() {
return createTransformer>({
transform: (data) => ({ payload: data, writes: new WriteQueue() }),
})
}
// Pipeline:
stream
.pipe(initQueue())
.pipe(transformerA(db))
.pipe(transformerB(db))
.pipeTo(drizzleTarget({
db,
tables: [tableA, tableB],
onData: async ({ tx, data }) => { await data.writes.flush(tx) },
}))
```
`onData` stays a one-liner regardless of how many transformers are chained.
### Sub-approach 1 — Stateless transform (per-batch DB reads)
`transform()` reads current state from Postgres (the last committed snapshot), computes the delta, and pushes write closures to the queue. No in-RAM Map survives between batches.
* ✓ No RAM limit on state size
* ✓ Zero fork handling code (snapshot triggers on `tables` cover rollback)
* ✗ One `SELECT … WHERE address IN (…)` per batch
See [`11.stateful-transforms-postgres-stateless.example.ts`](https://github.com/subsquid-labs/pipes-sdk-docs/blob/master/src/advanced/evm/11.stateful-transforms-postgres-stateless.example.ts): two transformers (`BalanceTransformer` + `TransferCountTransformer`) reading from Postgres each batch and writing atomically via WriteQueue.
### Sub-approach 2 — In-memory + Postgres mirror
`start()` loads the full state into in-RAM Maps. `transform()` reads/writes the Maps with no DB round trips per batch. `fork()` reloads the Maps from Postgres after drizzleTarget commits the snapshot rollback.
* ✓ No per-batch DB reads — all reads from memory after startup
* ✓ Fast for large batches with many distinct keys
* ✗ Full state must fit in RAM
* ✗ Startup time is O(state size)
* ✗ `fork()` callbacks required to resync Maps after rollback
See [`12.stateful-transforms-postgres-in-memory.example.ts`](https://github.com/subsquid-labs/pipes-sdk-docs/blob/master/src/advanced/evm/12.stateful-transforms-postgres-in-memory.example.ts): same two transformers with in-RAM Maps loaded from Postgres at startup and reloaded on fork.
For both sub-approaches, all state tables must be listed in `drizzleTarget`'s `tables` array. This installs PostgreSQL snapshot triggers that roll them back automatically on a blockchain reorg. The `onStart` callback can run `CREATE TABLE IF NOT EXISTS` for quick setup; in production, use [drizzle-kit migrations](https://orm.drizzle.team/docs/migrations) instead.
***
## E. Apache Flink
[Apache Flink](https://flink.apache.org) is a distributed stateful stream-processing framework. The Pipes SDK acts as a data source feeding Flink via Kafka or a direct connector.
**When to use:**
* State is too large for a single machine (terabytes).
* Your problem requires stateful joins across multiple independent streams (e.g., correlate DEX trades with lending liquidations across different chains).
* You need exactly-once semantics across multiple heterogeneous sinks.
**When not to use:**
* Single-node deployments — the operational overhead (JVM runtime, cluster management, ZooKeeper or KRaft, checkpoint storage) is only justified when the problem genuinely requires distributed state.
**Architecture:** The Pipes SDK emits raw events to Kafka (one topic per event type). On a blockchain fork, it emits compensating rows (e.g., `sign = -1`) that Flink sees as normal data and can handle with a subtract-and-recompute pattern. Flink manages its own checkpoints; crash recovery is handled entirely by Flink.
***
## F. External KV store
Use Redis, Valkey, or a similar key-value store as a fast external state backend.
**When to use:**
* Multiple parallel pipeline instances must share state (horizontal scaling of the indexer).
* Per-key lookups must complete in under 1 ms (e.g., enriching 50 k events per second with metadata from a 100 M-entry map that doesn't fit in RAM).
**When not to use:**
* A single-process indexer is sufficient — adding Redis increases operational complexity for no benefit.
* You need transactional state + output commits (use approach D instead).
**Fork handling:** The transformer's `fork()` callback must delete or undo the Redis keys written for rolled-back blocks. Keep a per-block write log (similar to the SQLite delta table) to know which keys to revert.
**Crash safety:** Redis is not durable by default. Enable AOF or RDB persistence, or treat Redis purely as a warm cache and accept that a Redis restart requires a replay from the pipeline cursor.
***
## Fork callbacks and crash recovery
The fork handling responsibilities differ by approach:
| Approach | `fork()` needed in transformer | How DB state is rolled back |
| ------------------- | ------------------------------------- | ---------------------------------- |
| A. In-RAM | ✓ — prune entries > cursor | n/a (state is RAM-only) |
| B. ClickHouse MVs | ✗ — handled in `onRollback` | `sign = -1` rows via `onRollback` |
| C. SQLite | ✓ — calls `rollbackTo(cursor.number)` | `rollbackTo()` reverts delta table |
| D. Postgres/drizzle | ✗ — automatic | snapshot triggers via `tables` |
| E. Flink | ✗ — compensating events | Flink checkpoint rollback |
| F. External KV | ✓ — revert write log | manual key deletion |
**Ordering guarantee:** `target.fork()` always fires before transformer `fork()` callbacks. By the time your transformer's `fork()` runs, the target (ClickHouse `onRollback`, drizzleTarget snapshot rollback) has already committed the database rollback. It is safe to read the database in `fork()`.
**Crash recovery** (approaches A, B, C only): a crash between the transformer's store write and the target's cursor save leaves state ahead of the pipeline cursor. Handle this in `start()` by comparing your local high-water mark to `state.current`:
```typescript theme={"system"}
start: async ({ state }) => {
const localLastBlock = /* read your checkpoint */
const pipelineLastBlock = state.current?.number ?? null
if (localLastBlock !== null && (pipelineLastBlock === null || localLastBlock > pipelineLastBlock)) {
rollbackTo(pipelineLastBlock ?? -1) // crash recovery: undo ahead-of-cursor writes
}
}
```
Approaches D (Postgres/drizzle) and E (Flink) are immune to this problem: state and cursor commit atomically.
***
## Composing multiple stateful transformers
When multiple stateful transformers write to the same target, they must not each independently call the target's write API. Use the `WriteQueue` / `initQueue` pattern to collect all writes and flush them in a single `onData` call:
1. **`initQueue()`** wraps the raw batch in `Piped` with a fresh `WriteQueue`. Place it as the first `.pipe()`.
2. **Each transformer** receives `Piped`, pushes closures to `writes`, and returns `Piped` unchanged.
3. **`onData`** calls `data.writes.flush(store_or_tx)` — a one-liner that scales to any number of transformers.
```typescript theme={"system"}
type Piped = { payload: T; writes: WriteQueue }
function initQueue() {
return createTransformer>({
transform: (data) => ({ payload: data, writes: new WriteQueue() }),
})
}
```
Because every domain transformer takes `Piped` as input and produces `Piped` as output, none of them assume a fixed position in the chain — they are all order-independent and can be added or removed without touching the others.
For Postgres targets, `WriteQueue` closures take a `Transaction`; for ClickHouse, they take the structural `CHS` type shown in approach A. The pattern is identical in both cases.
# Cursor management
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/architecture-deep-dives/cursor-management
How pipelines track progress and resume after restarts
A cursor records the last successfully processed block. Built-in targets (ClickHouse, Drizzle) handle persistence automatically. When using `createTarget` directly you own the full lifecycle.
## The cursor object
```typescript theme={"system"}
type BlockCursor = {
number: number // stream resumes from number + 1
hash?: string // block hash — used as parentBlockHash for fork detection
timestamp?: number // block timestamp in seconds
}
```
`hash` is the fork detection tripwire: the SDK sends `parentBlockHash = cursor.hash` in each portal request. An absent hash silently skips fork detection for that request. See [cursor semantics](./fork-handling#5-cursor-semantics) for the full picture.
## Startup: range.from and stored cursors
`range.from` in the decoder sets where the stream begins on a first run — before any cursor exists:
```typescript theme={"system"}
evmDecoder({
range: { from: 'latest' }, // chain head
// range: { from: 20_000_000 }, // block number
// range: { from: '2024-01-01' }, // ISO date string
// range: { from: new Date() }, // Date object
})
```
Once a cursor is stored, `range.from` is ignored — the stream resumes from `cursor.number + 1`.
## The stream id
The `id` on `evmPortalStream` is the primary key for all stored state:
```typescript theme={"system"}
evmPortalStream({ id: 'my-pipeline', ... })
```
Both built-in targets use it to isolate state records, so multiple streams can share one physical table. **Never rename an active stream's id** — the stored cursor is keyed on it, and renaming causes the pipeline to restart from `range.from`.
## ClickHouse target
`clickhouseTarget` saves the cursor after every successful `onData` call and resolves fork and crash-recovery callbacks automatically.
```typescript theme={"system"}
clickhouseTarget({
client,
settings: {
id: 'my-stream', // overrides evmPortalStream id (default: 'stream')
table: 'sync', // state table name (default: 'sync')
database: 'default', // ClickHouse database
maxRows: 10_000, // cursor rows to keep per stream id (default: 10,000)
},
onData: ...,
onRollback: ...,
})
```
**`onRollback` is called in two situations:**
* `type: 'offset_check'` — on every startup when a cursor exists. ClickHouse is non-transactional: a crash between `onData` and the cursor save leaves rows newer than the saved cursor. Delete them here. See [non-transactional databases](./fork-handling#4-state-rollback-atomicity).
* `type: 'blockchain_fork'` — when the portal signals a reorg. The rollback cursor is resolved automatically from stored history; your callback only needs to delete rows after `safeCursor.number`.
The same implementation typically serves both:
```typescript theme={"system"}
onRollback: async ({ store, safeCursor }) => {
await store.removeAllRows({
tables: ['my_table'],
where: `block_number > {n:UInt32}`,
params: { n: safeCursor.number },
})
},
```
**State table.** Each row stores the cursor, the last finalized block, and the unfinalized block history used for fork recovery. Rows beyond `maxRows` are pruned every 25 saves. Set `maxRows` to cover your network's worst-case reorg depth — see [rollback depth](./fork-handling#3-rollback-depth-and-history-limits).
## Drizzle target
`drizzleTarget` saves the cursor inside the same PostgreSQL transaction as the data write — fully atomic, no crash-recovery pass needed.
```typescript theme={"system"}
drizzleTarget({
db: drizzle(DB_URL),
tables: [transfersTable], // every table onData writes to — required
settings: {
state: {
id: 'my-stream',
schema: 'public',
table: 'sync',
unfinalizedBlocksRetention: 1000, // cursor rows to keep (default: 1,000)
},
transaction: { isolationLevel: 'serializable' }, // default
},
onData: async ({ tx, data }) => {
await tx.insert(transfersTable).values(...)
},
})
```
**`tables` is required** for every table written in `onData`. At startup the target installs a PostgreSQL trigger on each listed table; the trigger copies the pre-change row into a `__snapshots` table (keyed by block number and primary key). On a fork the target replays these snapshots in reverse, restoring pre-fork state automatically. Writing to a table not in `tables` raises a runtime error.
Snapshotting only fires for blocks at or above the current finalized head — historical blocks can never be reorged.
**Advisory lock.** Every batch acquires `pg_try_advisory_xact_lock(hashtext(id))` inside the transaction, preventing concurrent writers on the same stream. Two `drizzleTarget` instances sharing the same `id` will serialize correctly; two with different `id`s run independently.
**Retention.** Snapshot rows below `min(current, finalizedHead) - unfinalizedBlocksRetention` are deleted every 25 batches. Set this to cover your network's worst-case reorg depth.
**Rollback hooks.** `onBeforeRollback` and `onAfterRollback` receive `{ tx, cursor }` and run inside the fork transaction. Use them to perform additional cleanup that the snapshot mechanism cannot cover (e.g., rows in tables not tracked by `tables`).
## Async iterator
When consuming a pipeline with `for await...of` instead of `pipeTo`, the native `[Symbol.asyncIterator]()` always calls `read()` with no cursor — it has no way to accept one. The stream therefore starts from `range.from` on every run.
**Finalized streams.** If the stream only consumes already-finalized blocks (no forks possible), rebuilding the stream with `range.from` set to the stored cursor is sufficient:
```typescript theme={"system"}
let cursor = loadCursor() // BlockCursor | undefined
const stream = evmPortalStream({
id: 'my-pipeline',
portal: '...',
outputs: evmDecoder({
// cursor.number is the last processed block; resume from the next one
range: { from: cursor ? cursor.number + 1 : 0 },
}),
})
for await (const { data, ctx } of stream) {
await processData(data)
saveCursor(ctx.stream.state.current) // { number, hash, timestamp }
}
```
Save `ctx.stream.state.current` — the full `BlockCursor` of the batch's last block — not just the number. The `hash` is needed if you later switch to real-time or need the cursor as a fork anchor.
**Real-time streams.** Setting `range.from` to a stored number loses the block hash. On restart the first request carries no `parentBlockHash`, so fork detection is silently disabled for that request. For real-time streams, use the `pipeToIterator` helper from the [async iteration tab of the fork handling guide](./fork-handling), which accepts an `initialCursor` and passes it directly to `read()` inside `pipeTo`:
```typescript theme={"system"}
const stream = pipeToIterator(
evmPortalStream({ id: 'my-pipeline', portal: '...', outputs: evmDecoder({ range: { from: 'latest' } }) }),
loadCursor(), // full BlockCursor with hash — passed to read(), not range.from
onFork,
)
for await (const { data, ctx } of stream) {
await processData(data)
saveCursor(ctx.stream.state.current)
}
```
`pipeToIterator` preserves `parentBlockHash` across fork rounds because it uses `pipeTo` internally. On a fresh first run, pass `undefined` as `initialCursor` and the stream begins from `range.from` as normal.
## Custom cursor management
When using `createTarget` directly, you own the full cursor lifecycle.
At the start of `write`, fetch the stored cursor and pass it to `read`:
```typescript theme={"system"}
write: async ({ read }) => {
const cursor = await db.getLatestCursor()
for await (const { data, ctx } of read(cursor)) {
// ...
}
}
```
After processing each batch, persist the cursor together with the fork-recovery state:
```typescript theme={"system"}
await db.transaction(async (tx) => {
await writeData(tx, data)
await tx.saveCursor({
cursor: ctx.stream.state.current,
rollbackChain: ctx.stream.state.rollbackChain,
finalized: ctx.stream.head.finalized,
})
})
```
For **transactional stores** (Postgres): save all three fields in the same transaction as the data write. For **non-transactional stores** (ClickHouse): write data first, cursor last, and implement a startup check that detects and corrects any data written after the last cursor save. See [state rollback atomicity](./fork-handling#4-state-rollback-atomicity).
The `fork` callback and the algorithm for resolving rollback cursors from stored history are covered in detail in the [fork handling guide](./fork-handling).
A minimal example showing manual cursor passing in createTarget
Full pipeline with onRollback and onData
Full pipeline including GraphQL API
# Fork handling
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/architecture-deep-dives/fork-handling
Handle blockchain forks and rollbacks in real-time streams
When consuming a real-time stream near the chain head, the portal can detect that the client's view of the chain has diverged from the canonical chain — a situation known as a fork or reorg. The portal signals this with an HTTP 409 response containing a sample of blocks from the new canonical chain. Your code must find the highest block that both chains agree on, roll back any state written after that point, and replay from there.
Fork handling is only needed for real-time streams (`range.from: 'latest'`). Historical streams consume already-finalized data and never produce forks. See [Fork detection scope](#7-fork-detection-scope-real-time-streams-only) below.
The SDK provides two patterns for consuming a stream. Both use the same state-tracking logic; they differ in how the fork signal is delivered.
If your pipeline includes a [stateful transformer](../advanced-topics/stateful-transforms#fork-callbacks-in-stateful-transformers), it must also implement a `fork` callback to roll back its own state in lockstep with the target.
The `pipeTo(createTarget({write, fork}))` pattern keeps fork handling completely separate from batch processing. The SDK catches the 409 internally and calls `fork()` with the portal's consensus block sample; `write()` never sees the interruption and continues iterating batches without restarting.
Two variables span the lifetime of the stream:
```typescript theme={"system"}
let recentUnfinalizedBlocks: BlockCursor[] = []
let finalizedHighWatermark: BlockCursor | undefined
```
`recentUnfinalizedBlocks` is the local history of unfinalized blocks used to find the common ancestor during a fork. `finalizedHighWatermark` tracks the highest finalized block ever seen — stored as a full `BlockCursor` (number **and** hash) so it can double as a rollback cursor when needed. Both must be declared outside `pipeTo` so `fork()` can access them.
Inside `write()`, append each batch's unfinalized blocks to the local history:
```typescript theme={"system"}
ctx.stream.state.rollbackChain.forEach((bc) => {
recentUnfinalizedBlocks.push(bc)
})
```
`ctx.stream.state.rollbackChain` contains only the blocks from **this batch** that are above the current finalized head — it is a per-batch delta, not a full snapshot. Always append to the end; never replace or reorder.
After collecting history, prune blocks that are now finalized and cap the queue:
```typescript theme={"system"}
if (ctx.stream.head.finalized) {
if (!finalizedHighWatermark || ctx.stream.head.finalized.number > finalizedHighWatermark.number) {
finalizedHighWatermark = ctx.stream.head.finalized
}
recentUnfinalizedBlocks = recentUnfinalizedBlocks.filter(b => b.number >= finalizedHighWatermark!.number)
}
recentUnfinalizedBlocks = recentUnfinalizedBlocks.slice(recentUnfinalizedBlocks.length - 1000)
```
Portal instances behind a load balancer can report different finalized heads. Using the **maximum** seen so far (the high-water mark) prevents the pruning threshold from moving backwards when the stream reconnects to a lagging instance. See [consideration 6](#6-load-balanced-portals-and-a-non-monotonic-finalized-head) for details.
`fork()` receives `previousBlocks` — the portal's current-chain sample — and must return the last good block cursor, or `null` if recovery is impossible:
```typescript theme={"system"}
fork: async (newConsensusBlocks) => {
const rollbackIndex = findRollbackIndex(recentUnfinalizedBlocks, newConsensusBlocks)
if (rollbackIndex >= 0) {
recentUnfinalizedBlocks.length = rollbackIndex + 1
return recentUnfinalizedBlocks[rollbackIndex]
}
if (finalizedHighWatermark &&
newConsensusBlocks.every(b => b.number < finalizedHighWatermark!.number)) {
recentUnfinalizedBlocks = recentUnfinalizedBlocks.filter(b => b.number <= finalizedHighWatermark!.number)
return finalizedHighWatermark
}
return null
}
```
Three cases: (1) a common ancestor is found in local history — truncate and return it; (2) all `previousBlocks` fall below the finalized high-water mark, meaning the portal's sample doesn't reach local history — return the high-water mark cursor; (3) no recovery possible — return `null`, which surfaces a `ForkCursorMissingError`.
```typescript theme={"system"}
import { BlockCursor, createTarget } from '@subsquid/pipes'
import { evmPortalStream, evmDecoder, commonAbis } from '@subsquid/pipes/evm'
async function main() {
let recentUnfinalizedBlocks: BlockCursor[] = []
let finalizedHighWatermark: BlockCursor | undefined
await evmPortalStream({
id: 'forks',
portal: 'https://portal.sqd.dev/datasets/ethereum-mainnet',
outputs: evmDecoder({
contracts: ['0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48'], // USDC
events: { transfer: commonAbis.erc20.events.Transfer },
range: { from: 'latest' }
}),
})
.pipeTo(createTarget({
write: async ({read}) => {
for await (const {data, ctx} of read(recentUnfinalizedBlocks[recentUnfinalizedBlocks.length-1])) {
console.log(`Got ${data.transfer.length} transfers`)
ctx.stream.state.rollbackChain.forEach((bc) => { recentUnfinalizedBlocks.push(bc) })
if (ctx.stream.head.finalized) {
if (!finalizedHighWatermark || ctx.stream.head.finalized.number > finalizedHighWatermark.number) {
finalizedHighWatermark = ctx.stream.head.finalized
}
recentUnfinalizedBlocks = recentUnfinalizedBlocks.filter(b => b.number >= finalizedHighWatermark!.number)
}
recentUnfinalizedBlocks = recentUnfinalizedBlocks.slice(recentUnfinalizedBlocks.length - 1000)
}
},
fork: async (newConsensusBlocks) => {
const rollbackIndex = findRollbackIndex(recentUnfinalizedBlocks, newConsensusBlocks)
if (rollbackIndex >= 0) {
recentUnfinalizedBlocks.length = rollbackIndex + 1
return recentUnfinalizedBlocks[rollbackIndex]
}
if (finalizedHighWatermark &&
newConsensusBlocks.every(b => b.number < finalizedHighWatermark!.number)) {
recentUnfinalizedBlocks = recentUnfinalizedBlocks.filter(b => b.number <= finalizedHighWatermark!.number)
return finalizedHighWatermark
}
return null
}
}))
}
main().then(() => { console.log('\ndone') })
function findRollbackIndex(chainA: BlockCursor[], chainB: BlockCursor[]): number {
let aIndex = 0, bIndex = 0, lastCommonIndex = -1
while (aIndex < chainA.length && bIndex < chainB.length) {
const a = chainA[aIndex], b = chainB[bIndex]
if (a.number < b.number) { aIndex++; continue }
if (a.number > b.number) { bIndex++; continue }
if (a.hash !== b.hash) return lastCommonIndex
lastCommonIndex = aIndex; aIndex++; bIndex++
}
return lastCommonIndex
}
```
The native `[Symbol.asyncIterator]()` on a `PortalSource` cannot handle forks that require multiple 409 rounds. After a fork, the only option with native iteration is to re-create the stream — but the re-created stream's first request carries no `parentBlockHash`, so the portal cannot detect whether the client is still on the wrong chain and will not send the second 409.
The root cause: `pipeTo`'s internal `read()` generator maintains a `cursor` variable across fork rounds. After `target.fork()` returns a rollback cursor it sets `cursor = forkedCursor` before re-entering `self.read(cursor)`, keeping `parentBlockHash` populated on every subsequent request. The native async iterator calls `this.read()` with no cursor and has no equivalent mechanism.
**Workaround:** wrap `pipeTo` in a helper called `pipeToIterator` that bridges its push-based `write()` into a pull-based iterator via a single-item queue with producer acknowledgement. This preserves the `for await...of` interface while using `pipeTo`'s cursor-tracking machinery internally.
Same two variables as the `pipeTo` approach — no extra `resumeCursor` needed, since `pipeTo` handles cursor updates internally:
```typescript theme={"system"}
let recentUnfinalizedBlocks: BlockCursor[] = []
let finalizedHighWatermark: BlockCursor | undefined
```
The fork callback passed to `pipeToIterator` is identical to `fork()` in the `pipeTo` example — the same three-case logic, the same state mutations:
```typescript theme={"system"}
async (newConsensusBlocks) => {
const rollbackIndex = findRollbackIndex(recentUnfinalizedBlocks, newConsensusBlocks)
if (rollbackIndex >= 0) {
recentUnfinalizedBlocks.length = rollbackIndex + 1
return recentUnfinalizedBlocks[rollbackIndex]
}
if (finalizedHighWatermark &&
newConsensusBlocks.every(b => b.number < finalizedHighWatermark!.number)) {
recentUnfinalizedBlocks = recentUnfinalizedBlocks.filter(b => b.number <= finalizedHighWatermark!.number)
return finalizedHighWatermark
}
return null
}
```
The SDK awaits this callback before resuming the stream, so `recentUnfinalizedBlocks` is safe to mutate here without additional locking.
Pass the source, the initial cursor, and the fork callback to `pipeToIterator`, then iterate normally:
```typescript theme={"system"}
const stream = pipeToIterator(source, recentUnfinalizedBlocks.at(-1), onFork)
for await (const {data, ctx} of stream) {
// batch processing — identical to the pipeTo example
}
```
```typescript theme={"system"}
// WORKAROUND — see explanation above the tab
function pipeToIterator(
source: { pipeTo(t: ReturnType>): Promise },
initialCursor: BlockCursor | undefined,
onFork: (previousBlocks: BlockCursor[]) => Promise
): AsyncIterableIterator<{ data: T; ctx: any }> {
type Slot =
| { k: 'batch'; v: { data: T; ctx: any } }
| { k: 'end' }
| { k: 'error'; err: unknown }
const queue: Slot[] = []
let consumerWake: (() => void) | null = null
let producerAck: (() => void) | null = null
const wake = () => { consumerWake?.(); consumerWake = null }
;(source.pipeTo as any)(createTarget({
write: async ({ read }: any) => {
for await (const batch of read(initialCursor)) {
queue.push({ k: 'batch', v: batch })
wake()
await new Promise(r => { producerAck = r })
}
queue.push({ k: 'end' })
wake()
},
fork: onFork,
})).catch((err: unknown) => { queue.push({ k: 'error', err }); wake() })
return {
async next(): Promise> {
if (!queue.length) await new Promise(r => { consumerWake = r })
const slot = queue.shift()!
if (slot.k === 'end') return { done: true, value: undefined as any }
if (slot.k === 'error') throw slot.err
producerAck?.(); producerAck = null
return { done: false, value: slot.v }
},
[Symbol.asyncIterator]() { return this },
}
}
```
```typescript theme={"system"}
import { BlockCursor, createTarget } from '@subsquid/pipes'
import { evmPortalStream, evmDecoder, commonAbis } from '@subsquid/pipes/evm'
// WORKAROUND — pipeToIterator defined above (see implementation expandable)
async function main() {
let recentUnfinalizedBlocks: BlockCursor[] = []
let finalizedHighWatermark: BlockCursor | undefined
const stream = pipeToIterator(
evmPortalStream({
id: 'forks-async',
portal: 'https://portal.sqd.dev/datasets/ethereum-mainnet',
outputs: evmDecoder({
contracts: ['0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48'], // USDC
events: { transfer: commonAbis.erc20.events.Transfer },
range: { from: 'latest' }
}),
}),
recentUnfinalizedBlocks.at(-1),
async (newConsensusBlocks) => {
const rollbackIndex = findRollbackIndex(recentUnfinalizedBlocks, newConsensusBlocks)
if (rollbackIndex >= 0) {
recentUnfinalizedBlocks.length = rollbackIndex + 1
return recentUnfinalizedBlocks[rollbackIndex]
}
if (finalizedHighWatermark &&
newConsensusBlocks.every(b => b.number < finalizedHighWatermark!.number)) {
recentUnfinalizedBlocks = recentUnfinalizedBlocks.filter(b => b.number <= finalizedHighWatermark!.number)
return finalizedHighWatermark
}
recentUnfinalizedBlocks.length = 0
return null
}
)
for await (const {data, ctx} of stream) {
console.log(`Got ${data.transfer.length} transfers`)
ctx.stream.state.rollbackChain.forEach((bc: BlockCursor) => { recentUnfinalizedBlocks.push(bc) })
if (ctx.stream.head.finalized) {
if (!finalizedHighWatermark || ctx.stream.head.finalized.number > finalizedHighWatermark.number) {
finalizedHighWatermark = ctx.stream.head.finalized
}
recentUnfinalizedBlocks = recentUnfinalizedBlocks.filter(b => b.number >= finalizedHighWatermark!.number)
}
recentUnfinalizedBlocks = recentUnfinalizedBlocks.slice(recentUnfinalizedBlocks.length - 1000)
}
}
main().then(() => { console.log('\ndone') })
function findRollbackIndex(chainA: BlockCursor[], chainB: BlockCursor[]): number {
let aIndex = 0, bIndex = 0, lastCommonIndex = -1
while (aIndex < chainA.length && bIndex < chainB.length) {
const a = chainA[aIndex], b = chainB[bIndex]
if (a.number < b.number) { aIndex++; continue }
if (a.number > b.number) { bIndex++; continue }
if (a.hash !== b.hash) return lastCommonIndex
lastCommonIndex = aIndex; aIndex++; bIndex++
}
return lastCommonIndex
}
```
## The common-ancestor search
Both approaches use the same merge-sort scan. Given two ascending-sorted arrays of `BlockCursor` — local history and the portal's `previousBlocks` — `findRollbackIndex` returns the index in local history of the last entry that both chains agree on (same block number **and** hash):
```typescript theme={"system"}
function findRollbackIndex(chainA: BlockCursor[], chainB: BlockCursor[]): number {
let aIndex = 0, bIndex = 0, lastCommonIndex = -1
while (aIndex < chainA.length && bIndex < chainB.length) {
const a = chainA[aIndex], b = chainB[bIndex]
if (a.number < b.number) { aIndex++; continue }
if (a.number > b.number) { bIndex++; continue }
if (a.hash !== b.hash) return lastCommonIndex // chains diverged here
lastCommonIndex = aIndex; aIndex++; bIndex++
}
return lastCommonIndex
}
```
The scan advances the pointer for the lower-numbered entry until both point to the same block number. A hash mismatch means the chains diverged at this number; `lastCommonIndex` holds the last agreement point. Returning `-1` means no common ancestor was found in the sample.
## Edge cases and considerations
**Empty history at stream start.** The rollback chain is built batch-by-batch from `ctx.stream.state.rollbackChain`. Until the first batch arrives the history is empty. A fork arriving before any batch has been processed means `fork()` will find no common ancestor and must return `null`, which the SDK turns into a fatal error. For a long-running process this window is typically acceptable, but it matters for freshly started consumers.
**History gaps from fast-moving finalization.** `rollbackChain` in each batch contains only the blocks from *that batch* that are strictly above the current finalized head. A block that was already at or below the finalized head when its batch was fetched will never appear in any rollback chain and will therefore be absent from history. This can leave gaps in the number sequence. Algorithms that assume a contiguous history will fail; always match by both number *and* hash.
**No finalized-head info in a batch.** When `batch.head.finalized` is absent, no history is accumulated. On networks or portal deployments that do not yet surface finality data, the rollback chain stays empty indefinitely. On such networks fork recovery is impossible unless unfinalized blocks are tracked through another mechanism.
**Ascending order, match by hash *and* number.** The API spec requires matching on both. Matching only by number is wrong — different chains can have the same block number. The array is ordered ascending (lowest number first); the last entry is the most recent block the portal knows about.
**`previousBlocks` may have no overlap with local history.** The portal sends a bounded sample. If `findRollbackIndex` finds no agreement point at all (returns -1) and no HWM fallback applies, fork recovery is impossible — return `null`. The SDK will surface a `ForkCursorMissingError`. Do not silently roll back to block 0 or crash.
**Multiple consecutive 409s converge to the common ancestor.** These two cases are distinct from each other: when `findRollbackIndex` *does* find an overlap point, the stream rolls back there and resumes. If the true common ancestor is deeper still — because the `previousBlocks` sample only reached partway — the portal detects another mismatch and sends a fresh 409 with an older window, this time closer to the true ancestor. The stream converges over several rounds. `fork()` must be idempotent across these calls; truncating the history array in place handles this correctly, since each call receives a shorter local history. Database-backed approaches must also handle re-entrant rollback calls.
**Fork deeper than your history.** If you cap rollback history (e.g. to 1000 blocks), a reorg deeper than the cap is unrecoverable. Choose the cap based on the worst-case reorg depth for your target network. Ethereum mainnet finalizes within \~64 blocks (\~2 epochs), but PoW or pre-finality networks can reorg much deeper. Fail loudly rather than silently replaying from block 0.
**The finalized block as the last-resort anchor.** Keep the current finalized block *in* your rollback history even though it is technically not unfinalized. It is the guaranteed safe floor: the portal will never ask you to roll back past it. Having it available means `fork()` can always return a valid cursor for the deepest possible reorg. Pruning with `number > finalized` instead of `number >= finalized` removes this anchor and makes very deep reorgs unrecoverable.
**History that never gets pruned.** If the portal never sends a finalized head, rollback history will grow without bound. Apply a block-count cap as a secondary safeguard.
**Business state and rollback-chain history must be rolled back atomically.** For databases with transactions (Postgres), both must be updated in the same transaction — a crash between the two leaves `fork()` computing the wrong rollback point.
**For non-transactional databases (ClickHouse), atomicity is not achievable; use a crash-recovery callback instead.** Write application data first, write the rollback-chain checkpoint second. A crash after data but before the checkpoint save leaves the checkpoint pointing to the previous batch. On every restart, before the stream resumes, the checkpoint cursor should be read and used to purge any rows written after it — this closes the gap. This is how the Pipes SDK ClickHouse target works: `onRollback` is invoked with `type: 'offset_check'` on every startup so user code can delete the partial batch. Because ClickHouse `DELETE`s are asynchronous and unsafe under concurrent writes, the SDK inserts tombstone rows (`sign = -1`) via `CollapsingMergeTree` instead of issuing true deletes; queries that need to see only live rows must use the `FINAL` modifier.
**Rolling back spans multiple batches.** A single reorg can invalidate data written across many batches. Your rollback mechanism must undo *all* rows/documents written after the rollback point, not just the last batch.
**Idempotency of re-processing.** After a rollback the stream replays blocks from the rollback cursor forward. Write logic that is not idempotent (e.g. unconditional INSERT instead of UPSERT, incrementing a counter instead of setting it) will corrupt state on replay. Design writes so they are safe to run more than once for the same block.
**Side effects that cannot be rolled back.** Database writes can be undone; emails, webhook calls, and Kafka publishes cannot. Either defer all external side effects until the block is finalized, or build a separate reconciliation layer. Treating unfinalized state as permanent is the most common source of production incidents in real-time blockchain consumers.
**The cursor returned from `fork()` is inclusive.** Return the last block you consider good; the SDK resumes from `cursor.number + 1`. Off-by-one errors cause either duplicate re-processing or skipped blocks.
**The cursor hash must be set.** The SDK sends `parentBlockHash = cursor.hash` in the next request so the portal can detect the next fork. A cursor with a missing hash silently disables fork detection for that request.
**The cursor in `write()`'s `read()` call is only the initial startup cursor.** `pipeTo()` handles post-fork cursor updates inside the `read()` generator; `write()` runs continuously through forks and is never restarted by the SDK. The cursor you pass to `read()` is only relevant if `write()` is re-invoked by an external retry mechanism. For in-memory implementations the cursor is effectively always `undefined`.
**Process restart loses in-memory rollback history.** An in-memory rollback chain survives forks but not process restarts. After a restart you have no history. For services that must survive restarts, persist the rollback chain alongside application state and restore it on startup. See [Cursor management](./cursor-management) for patterns.
**The `X-Sqd-Finalized-Head-Number` header can go backwards.** Portal instances behind a load balancer can be at different heights. When a reconnected stream lands on a lagging instance, the `finalized` value in `batch.head.finalized` may be lower than what was previously reported. Do not use the current batch's finalized number as a pruning threshold directly.
**Treat the finalized head as a high-water mark.** Maintain the highest finalized number seen across all batches and key all pruning on that value. For database-backed implementations this is critical: a DELETE keyed on the current (possibly lower) finalized number will over-retain rows on some batches, and under-retain them if the logic is structured the other way.
**A 409 from a lagging instance may have `previousBlocks` entirely below the high-water mark.** Two cases:
* *All* of `previousBlocks` are strictly below the high-water mark. The lagging instance's sample doesn't reach local history. Because the high-water mark is truly final, every correct instance agrees on it: the fork is somewhere *above* it. Return the high-water mark cursor. This requires storing the finalized head as a full `BlockCursor` (number **and** hash), not just a number — the hash is needed for the next request's `parentBlockHash`.
* Some of `previousBlocks` are at or above the high-water mark but no hash match is found. This is a genuine inconsistency at a height the client already considers final. Return `null` and surface the error.
**Forks only occur in the real-time (unfinalized) portion of the stream.** The `/finalized-stream` endpoint never returns a 409. Fork handling is only needed when consuming the `/stream` endpoint with `fromBlock` near or at the chain head. If your range is bounded and entirely in the past, you will never see a fork.
**`parentBlockHash` is the tripwire.** Every request to the portal includes the hash of the last block the client has seen. A mismatch triggers a 409. Anything that disrupts this — starting from a cursor with a wrong or missing hash, replaying from a checkpoint that has drifted from the chain — will produce spurious fork events.
**`rollbackChain` is per-batch, not cumulative.** It contains only the blocks in *this batch* that are above the current finalized head. Treat it as a delta to append to running history, not as a full snapshot of the current unfinalized chain.
**Blocks near the finality boundary move between finalized and unfinalized.** A block that appears in one batch's `rollbackChain` may be at or below the finalized head in the next batch. The pruning filter must remove these once they are finalized, or rollback history will slowly fill with blocks that can never be the subject of a reorg.
**Empty `rollbackChain` is valid.** It means either (a) the batch contained no blocks above the finalized head, or (b) the finalized head was unknown. Do not treat an empty rollback chain as an error.
**Both arrays must be in ascending order.** The merge-sort scan breaks silently if either array is unsorted. Local history is ascending if you always append to the end; `previousBlocks` from the portal is ascending by protocol convention. After a rollback, the truncated history remains ascending.
**Gaps in block numbers do not break correctness, only efficiency.** A gap (e.g. blocks 100, 101, 103 — 102 missing because it was already finalized) means a fork at 102 resolves by rolling back to 101. The extra re-processing of 102 is harmless because finalized blocks are immutable.
**Duplicate entries break the scan.** If the same block number appears more than once with different hashes in your history, the scan may report the wrong common ancestor. UPSERT rather than INSERT when persisting rollback chain entries to a store.
**Hash comparison requires both sides to be non-null.** `BlockCursor.hash` is optional in the type system. If either side is `undefined`, `undefined !== "0x..."` evaluates to `true`, which looks like a fork on a block that may be fine. Always verify hashes are present before comparing.
**`fork()` is called synchronously relative to the batch stream.** The SDK awaits `fork()` before resuming the stream. No new batches arrive while `fork()` is running. It is safe to mutate shared state inside `fork()` without additional locking.
**`write()` and `fork()` share mutable state without synchronization.** This is safe only because the SDK never calls them concurrently. If you introduce background workers or async tasks that also read or write rollback state, you must add explicit synchronization.
**The order in which you update rollback history and application state matters.** If you update application state first and crash before updating rollback history, the next restart will not know how far to roll back. Prefer database transactions that update both atomically, or update rollback history first so a crash leaves you conservative — you can always re-process a block you have already seen.
# Pipe anatomy
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/basic-development/anatomy
How a pipe is put together
An EVM pipe made with SQD's Pipes SDK consists of:
* A **source** - typically made with `evmPortalSource()`. Can have one or more outputs.
* **Queries** - tell the source which data has to be retrieved to compute each output. A query is defined by a chain call terminated by `.build()`. Here's an example:
```ts theme={"system"}
evmQuery()
.addFields({
block: { timestamp: true },
log: { address: true, transactionHash: true },
})
.addLog({
topic0: [ TRANSFER_TOPIC ]
})
.build()
```
* **Per-query transforms** (optional) - you can pass data from each query through a chain of simple transforms:
```ts theme={"system"}
query
.pipe(data => data.map(item => ({
funkyNumber: item.header.timestamp + item.header.number,
...item
})))
.pipe(someOtherSimpleTransformCallback)
```
Source object streams the data you get out of each chain of transforms as the value of the corresponding output field.
* Making utils that return reusable **query-transform combos** is a very useful pattern. In particular, on EVM it is often convenient to keep retrieval and decoding of event logs in a single module. You can easily make such combos with the `evmDecoder()` function - see the [Handling events](./handling-events) guide.
* **Whole pipe transformers** (optional) - use this if you need to compute something based on data originating from multiple queries, or if you need access to per-batch context (cursor, logger, profiler, fork callbacks). Use `createTransformer()` so the SDK can thread cursor and rollback information ([1](../architecture-deep-dives/cursor-management), [2](../architecture-deep-dives/fork-handling)) through your transform:
```ts theme={"system"}
import { createTransformer } from '@subsquid/pipes'
const enrichTransfers = createTransformer<
{ transfers: Transfer[]; approvals: Approval[] },
{ events: EnrichedEvent[] }
>({
transform: ({ transfers, approvals }, ctx) => {
ctx.logger.info({ batch: ctx.stream.state.current?.number }, 'enriching')
return {
events: [
...transfers.map((t) => ({ kind: 'transfer' as const, ...t })),
...approvals.map((a) => ({ kind: 'approval' as const, ...a })),
],
}
},
})
evmPortalStream({ /* ... */ }).pipe(enrichTransfers)
```
* Pipe termination: a plain async iterator or a **target**.
* If you use the pipe as an async iterator it will throw exceptions if the underlying chain is experiencing reorgs, see [Fork handling](../architecture-deep-dives/fork-handling).
* We offer two targets out of the box:
* Postgres via Drizzle
* ClickHouse
You can make your own using [`createTarget()`](../../reference/basic-components/target/create-target).
# Developing pipes
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/basic-development/flow
Typical development workflow for pipes
Begin by making sure you know
* which on-chain data items (transations, event logs, traces etc) you need;
* how you are going to transform these items into usable data, based on your business logic;
* what is your preferred mode of consuming the transformed data.
Use [Pipes CLI](../../quickstart) to quickly generate a starter project. The process of getting from here to a useful data pipeline follows directly from [Pipe anatomy](./anatomy).
## Adding queries and developing transforms
Here are some things to keep in mind as you're developing the heart of your pipeline.
### Writing maintainable pipes
Recall that there are two kinds of transforms in Pipes SDK:
* **Per-query transforms** work on subsets of raw data. They can be bundled with their queries, making them logically self-contained and easily reusable.
* **Whole pipe transforms** process outputs of all per-query transforms at the same time. This unlocks arbitrary data combinations, but it also means that that each such transform might need to be changed whenever any of the upstream transforms changes.
For maximum mainatainability you'll have to balance the following two objectives:
1. Aim to push as much of your business logic into per-query transforms. Make the code of any whole pipe transforms as simple as possible.
2. Use ready-made, validated query-transform combos. For example, [evmDecoder()](./handling-events) fetches and decodes contract event logs, supports factory-discovered contracts, and indexed parameter filtering. It'll often be preferable to build your pipeline out of such modules, even if that happens to make whole-pipe transforms slightly more complicated.
### Stateful transforms
It's often the case that you need access to some part of the previously processed data to do the transform. For example, to compute a running ERC20 balance from transfers you need to know its value preceeding the current transfer. There are multiple ways to accomplish this in Pipes SDK, each with its advantages and disadvantages. Consult the [Stateful transforms](../advanced-topics/stateful-transforms) guide.
## Writing data
### Postgres and ClickHouse
If you need your transformed data in Postgres or ClickHouse, you should already have a basic configuration generated by [Pipes CLI](../../quickstart).
If you're working with real-time data, **it is very important to**
* **on Postgres** when adding or removing any relevant tables: update the list of tables in the target configuration;
* **on ClickHouse** when any data dependencies or structure of the stored data changes: update the `onRollback()` callback.
Consult the [Postgres via Drizzle](./targets/postgres-drizzle) and [ClickHouse](./targets/clickhouse) guides.
### Plain iterator
A complete pipeline without a `.pipeTo` is a valid async iterator.
* The pipeline will produce some logs by default. Disable them by setting `logger: false` when creating the data source. If you're looking to convert an existing standalone pipe into a module in a larger program and wish to get rid of any side effects, consult the [Running bare bones](./running-bare-bones) guide.
* If you're working with unfinalized data (default setting of the source), the iterator will throw `ForkException`s on blockchain reorgs. You should catch these and process them correctly. Consult the [fork handling guide](../architecture-deep-dives/fork-handling) for details.
Alternatively, configure the data source to use final data only:
```ts theme={"system"}
const source = evmPortalSource({
portal: {
url: '',
finalized: true,
},
...
})
```
* By default, the pipeline is stateless: when re-created it'll restart from the earliest block relevant to any of the queries. If you want the pipeline to persist its sync state between restarts, you'll have to manage the state by yourself. See [Cursor management](../architecture-deep-dives/cursor-management).
### Developing your own target
Use the [createTarget() function](../../reference/basic-components/target/create-target).
* If you're working with unfinalized data (default setting of the source), you must define a fork handler callback. Consult the [fork handling guide](../architecture-deep-dives/fork-handling) for details.
Alternatively, configure the data source to use final data only:
```ts theme={"system"}
const source = evmPortalSource({
portal: {
url: '',
finalized: true,
},
...
})
```
* If you want your pipeline to preserve its sync state between restarts, you'll have to manage this state in your `write` callback. See [Cursor management](../architecture-deep-dives/cursor-management).
# Handling contract events
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/basic-development/handling-events
Fetching and decoding EVM event logs with evmDecoder
`evmDecoder()` bundles an EVM log query with a decoding transform into a single reusable module. Pass the result as an output to `evmPortalStream`:
```ts theme={"system"}
import { commonAbis, evmDecoder, evmPortalStream } from '@subsquid/pipes/evm'
const stream = evmPortalStream({
portal: 'https://portal.sqd.dev/datasets/ethereum-mainnet',
outputs: {
transfers: evmDecoder({
range: { from: '0' },
contracts: ['0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48'],
events: { transfer: commonAbis.erc20.events.Transfer },
}),
},
})
```
`evmDecoder()` can:
* Fetch from **specific contracts** — pass an array of addresses to `contracts`. Omit it entirely to receive matching events from every contract on-chain.
* Filter by **indexed parameters** — instead of a bare event, supply `{ event, params }` to select only logs where specific indexed arguments match.
* Dynamically discover contracts via **factories** — pass a `contractFactory()` to `contracts` instead of a static list. See the [Factory guide](../advanced-topics/factory-transformers).
* Handle decode errors with a custom **`onError` callback** instead of letting them propagate.
See the [evmDecoder() reference](../../reference/utility-components/evm-decoder) for all parameters.
## Specifying events
The `events` parameter maps output field names to event specifications. There are three ways to obtain an event specification.
### `commonAbis`
`commonAbis` is a built-in collection of ABI modules for common token standards. Currently it ships one module, `erc20`:
```ts theme={"system"}
import { commonAbis, evmDecoder } from '@subsquid/pipes/evm'
evmDecoder({
range: { from: 'latest' },
contracts: ['0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48'],
events: {
transfers: commonAbis.erc20.events.Transfer,
approvals: commonAbis.erc20.events.Approval,
},
})
```
See the [`commonAbis` reference](../../reference/utility-components/evm-decoder#commonabis) for the full list of available events and functions.
### Typegen modules
`@subsquid/evm-typegen` generates TypeScript ABI modules from JSON ABIs. Each generated module exports typed `events` and `functions` objects, translating Solidity types to TypeScript — event argument types are statically known at compile time, so you get precise type checking and IDE autocompletion across the entire pipeline.
**Install the tool:**
```bash theme={"system"}
npm install -D @subsquid/evm-typegen
```
**Generate a module from a local JSON ABI file:**
```bash theme={"system"}
npx squid-evm-typegen src/abi your-contract.json
```
This creates `src/abi/your-contract.ts`. The tool also accepts a contract address (requires specifying `--chain-id`) or an arbitrary URL.
Use events from a generated module exactly as with `commonAbis`:
```ts theme={"system"}
import * as usdcAbi from './abi/usdc'
evmDecoder({
range: { from: 'latest' },
contracts: ['0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48'],
events: {
transfers: usdcAbi.events.Transfer,
approvals: usdcAbi.events.Approval,
},
})
```
### Raw JSON via `defineAbi()`
`defineAbi()` converts a JSON ABI array to a subsquid ABI module at runtime, with no code generation step. This is the quickest route — useful for one-off scripts or prototypes — but it comes at a cost: when the ABI is loaded from an external JSON file, event argument fields are typed as `any`, since TypeScript cannot inspect the runtime JSON value at compile time.
```ts theme={"system"}
import { defineAbi, evmDecoder } from '@subsquid/pipes/evm'
import erc20Json from './erc20.json'
const erc20 = defineAbi(erc20Json) // event args are `any`
evmDecoder({
range: { from: 'latest' },
contracts: ['0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48'],
events: { transfers: erc20.events.Transfer },
})
```
`defineAbi()` also accepts Hardhat and Foundry artifact objects — it reads the `abi` field automatically:
```ts theme={"system"}
import artifact from './artifacts/MyContract.json'
const myContract = defineAbi(artifact) // reads artifact.abi
```
If you define the ABI inline with `as const`, TypeScript can infer the exact decoded types for scalar fields:
```ts theme={"system"}
const erc20 = defineAbi([
{
type: 'event',
name: 'Transfer',
inputs: [
{ indexed: true, name: 'from', type: 'address' },
{ indexed: true, name: 'to', type: 'address' },
{ indexed: false, name: 'value', type: 'uint256' },
],
},
] as const)
// erc20.events.Transfer.decode() returns { from: string, to: string, value: bigint }
```
For projects where full type safety matters end-to-end, prefer the [typegen route](#typegen-modules) instead.
# Pipes UI
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/basic-development/pipes-ui
Live dashboard for monitoring a running pipe
Pipes UI is a local web dashboard that connects to a running pipe and visualises its progress, speed, portal query, and profiler breakdown. It reads the metrics server that the SDK exposes on the pipe process — nothing needs to be deployed or hosted.
## Expose metrics on the pipe
Attach [`metricsServer()`](../advanced-topics/metrics) to the source. Listens on `localhost:9090` by default.
```ts theme={"system"}
import { evmPortalStream, evmDecoder, commonAbis } from '@subsquid/pipes/evm'
import { metricsServer } from '@subsquid/pipes/metrics/node'
evmPortalStream({
id: 'my-pipe', // shows up in the dashboard as the pipe name
portal: 'https://portal.sqd.dev/datasets/ethereum-mainnet',
outputs: evmDecoder({
profiler: { name: 'transfers' }, // labels this span in the profiler tree
range: { from: 'latest' },
events: { transfers: commonAbis.erc20.events.Transfer },
}),
metrics: metricsServer(), // exposes /metrics, /stats, /profiler, /health on :9090
})
```
Start the pipe as usual (`ts-node`, `bun`, compiled JS, etc.).
## Run the dashboard
In a second terminal:
```bash theme={"system"}
npx @subsquid/pipes-ui@alpha
```
The UI serves on `http://localhost:3000` and polls the metrics server at `http://localhost:9090`. Open the URL in a browser — the page auto-refreshes once the pipe starts producing batches.
## What it shows
Per pipe (keyed by the `id` passed to the source):
* chain / dataset, with the inferred chain kind (EVM, Solana, …)
* progress: current block, target block, percent complete, ETA
* throughput: blocks/s and bytes/s over the last 30 samples
* the serialised portal query (helpful for reviewing what your decoder actually asked for)
* memory usage of the pipe process and SDK version
When [profiling is on](../advanced-topics/profiling) (the default in non-production environments), the UI also renders the per-batch span tree — useful for seeing which stage (`fetch data`, `apply transformers`, a named decoder, your own `ctx.profiler.start('…')` spans) is dominating batch time. Decorate spans you want to track with `profiler: { name: '…' }` on transformers and decoders.
Any [custom metrics](../advanced-topics/metrics) you register via `ctx.metrics.counter()`, `.gauge()`, `.histogram()`, or `.summary()` show up on the pipe's `/metrics` endpoint (as Prometheus text). The dashboard does not render arbitrary custom series — if you need charts for your own metrics, scrape `/metrics` with Prometheus and graph with Grafana.
The full list of HTTP endpoints served by the metrics process (useful for ad-hoc `curl` inspection) is in the [metricsServer reference](../../reference/utility-components/metrics-server#endpoints).
## Troubleshooting
* **"Failed to reach metrics server"** on the UI — the pipe is not running, or `metricsServer()` is not attached to the source, or it listens on a non-default port. Start the pipe first, then reload the dashboard.
* **UI shows no pipes** — the source config is missing an `id`. Add `id: 'my-pipe'` to the source options.
* **Profiler tab is empty** — the pipe has `profiler: false` set on the source, or `NODE_ENV=production` (the default is to enable profiling only outside production). Set `profiler: true` on the source to force it on. See [Profiling](../advanced-topics/profiling).
# Running bare bones
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/basic-development/running-bare-bones
Using a pipe as a plain async iterator without extra services
By default, `evmPortalSource` activates a console logger. Passing `metrics` and `progress` enables those services. To embed a pipe into external code with no side effects, disable them all:
```ts theme={"system"}
import { commonAbis, evmDecoder, evmPortalStream } from '@subsquid/pipes/evm'
const stream = evmPortalStream({
portal: 'https://portal.sqd.dev/datasets/ethereum-mainnet',
outputs: evmDecoder({
range: { from: 0 },
events: { transfers: commonAbis.erc20.events.Transfer },
}),
logger: false, // disable all log output
profiler: false, // profiler disabled under all circumstances
// omit `metrics` — no metrics server
// omit `progress` — no progress reporting
})
for await (const { data } of stream) {
// data.transfers is available here
console.log(data.transfers.length)
}
```
The source becomes a plain async iterable that yields `{ data, ctx }` per batch. `ctx.logger` is a no-op when `logger: false`.
# ClickHouse
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/basic-development/targets/clickhouse
Store pipe output in ClickHouse
Install the ClickHouse Node.js client:
```bash theme={"system"}
npm install @clickhouse/client
```
At a glance, the pipeline looks like this:
```ts theme={"system"}
import { createClient } from '@clickhouse/client'
import { clickhouseTarget } from '@subsquid/pipes/targets/clickhouse'
await evmPortalSource({ ... }).pipeTo(
clickhouseTarget({
client: createClient({ url: 'http://localhost:8123' }),
onData: async ({ store, data }) => {
store.insert({ table: 'transfers', values: data.transfers.map(...), format: 'JSONEachRow' })
},
onRollback: async ({ store, safeCursor }) => {
await store.removeAllRows({ tables: ['transfers'], where: `block_number > ${safeCursor.number}` })
},
}),
)
```
## Table design
Use `CollapsingMergeTree` with a `sign Int8 DEFAULT 1` column. This engine enables efficient fork rollbacks: to cancel rows, the target re-inserts them with `sign = -1` and ClickHouse merges the pair during background processing.
```sql theme={"system"}
CREATE TABLE IF NOT EXISTS transfers (
block_number UInt32 CODEC(DoubleDelta, ZSTD),
transaction_hash String,
log_index UInt16,
from_address LowCardinality(FixedString(42)),
to_address LowCardinality(FixedString(42)),
value UInt256,
sign Int8 DEFAULT 1
) ENGINE = CollapsingMergeTree(sign)
ORDER BY (block_number, transaction_hash, log_index);
```
Design notes:
* Apply `DoubleDelta + ZSTD` codecs to monotonically increasing columns such as block numbers and timestamps.
* Use `LowCardinality` for columns with low cardinality like addresses to reduce storage and speed up filtering.
* Store 256-bit integers as `UInt256`; serialize JavaScript `BigInt` values to strings before insertion.
Create the table in `onStart` using `store.command()`:
```ts theme={"system"}
onStart: async ({ store }) => {
await store.command({ query: `CREATE TABLE IF NOT EXISTS transfers ( ... )` })
}
```
## `onData`
Call `store.insert()` to queue an insert. The call is non-blocking — inserts fire concurrently and are fully flushed when the target closes:
```ts theme={"system"}
onData: async ({ store, data }) => {
store.insert({
table: 'transfers',
values: data.transfers.map((t) => ({
block_number: t.block.number,
transaction_hash: t.rawEvent.transactionHash,
log_index: t.rawEvent.logIndex,
from_address: t.event.from,
to_address: t.event.to,
value: t.event.value.toString(),
})),
format: 'JSONEachRow',
})
}
```
## `onRollback`
Implement `onRollback` to handle blockchain forks. It is invoked in two situations:
* `type: 'offset_check'` — on startup, when a saved cursor is found, to discard writes from a previous crashed or partial run
* `type: 'blockchain_fork'` — when the stream detects a chain reorganisation
Use `store.removeAllRows()` to cancel rows past the safe point. For `CollapsingMergeTree` tables this re-inserts matching rows with `sign = -1`:
```ts theme={"system"}
onRollback: async ({ store, safeCursor }) => {
await store.removeAllRows({
tables: ['transfers'],
where: `block_number > ${safeCursor.number}`,
})
}
```
## Complete example
```ts expandable theme={"system"}
import { commonAbis, evmDecoder, evmPortalSource } from '@subsquid/pipes/evm'
import { clickhouseTarget } from '@subsquid/pipes/targets/clickhouse'
import { createClient } from '@clickhouse/client'
const client = createClient({ url: 'http://localhost:8123' })
await evmPortalSource({
portal: 'https://portal.sqd.dev/datasets/ethereum-mainnet',
outputs: evmDecoder({
range: { from: 'latest' },
events: { transfers: commonAbis.erc20.events.Transfer },
}),
}).pipeTo(
clickhouseTarget({
client,
onStart: async ({ store }) => {
await store.command({
query: `
CREATE TABLE IF NOT EXISTS transfers (
block_number UInt32 CODEC(DoubleDelta, ZSTD),
transaction_hash String,
log_index UInt16,
from_address LowCardinality(FixedString(42)),
to_address LowCardinality(FixedString(42)),
value UInt256,
sign Int8 DEFAULT 1
) ENGINE = CollapsingMergeTree(sign)
ORDER BY (block_number, transaction_hash, log_index)
`,
})
},
onData: async ({ store, data }) => {
store.insert({
table: 'transfers',
values: data.transfers.map((t) => ({
block_number: t.block.number,
transaction_hash: t.rawEvent.transactionHash,
log_index: t.rawEvent.logIndex,
from_address: t.event.from,
to_address: t.event.to,
value: t.event.value.toString(),
})),
format: 'JSONEachRow',
})
},
onRollback: async ({ store, safeCursor }) => {
await store.removeAllRows({
tables: ['transfers'],
where: `block_number > ${safeCursor.number}`,
})
},
}),
)
```
## Docker setup
```yaml docker-compose.yml theme={"system"}
services:
clickhouse:
image: clickhouse/clickhouse-server:latest
ports:
- "8123:8123"
- "9000:9000"
environment:
CLICKHOUSE_DB: default
CLICKHOUSE_USER: default
CLICKHOUSE_PASSWORD: default
volumes:
- clickhouse-data:/var/lib/clickhouse
volumes:
clickhouse-data:
```
```bash theme={"system"}
docker compose up -d
```
See the [clickhouseTarget reference](../../../reference/basic-components/target/clickhouse) for the full API.
# Postgres via Drizzle
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/guides/basic-development/targets/postgres-drizzle
Store pipe output in PostgreSQL using Drizzle ORM
Install Drizzle ORM and the PostgreSQL driver:
```bash theme={"system"}
npm install drizzle-orm pg
npm install -D drizzle-kit @types/pg
```
At a glance, the pipeline looks like this:
```ts theme={"system"}
await evmPortalSource({ ... }).pipeTo(
drizzleTarget({
db: drizzle('postgresql://...'),
tables: [transfersTable],
onData: async ({ tx, data }) => {
for (const batch of batchForInsert(data.transfers)) {
await tx.insert(transfersTable).values(batch.map(...))
}
},
}),
)
```
## Schema
Define your tables with Drizzle ORM. Every table needs a primary key, and every table written to in [`onData`](#ondata) must appear in [`tables`](#tables).
```ts theme={"system"}
import { integer, numeric, pgTable, primaryKey, varchar } from 'drizzle-orm/pg-core'
const transfersTable = pgTable('transfers', {
blockNumber: integer().notNull(),
logIndex: integer().notNull(),
from: varchar({ length: 42 }).notNull(),
to: varchar({ length: 42 }).notNull(),
value: numeric({ mode: 'bigint' }).notNull(),
}, (t) => [primaryKey({ columns: [t.blockNumber, t.logIndex] })])
```
## `onData` and `batchForInsert`
`onData` runs inside a serializable transaction. Use `batchForInsert` to split data arrays into chunks that fit within PostgreSQL's 32,767-parameter limit — chunk size is calculated automatically from the number of columns:
```ts theme={"system"}
import { batchForInsert, drizzleTarget } from '@subsquid/pipes/targets/drizzle/node-postgres'
onData: async ({ tx, data }) => {
for (const batch of batchForInsert(data.transfers)) {
await tx.insert(transfersTable).values(
batch.map((d) => ({
blockNumber: d.block.number,
logIndex: d.rawEvent.logIndex,
from: d.event.from,
to: d.event.to,
value: d.event.value,
})),
)
}
}
```
Pass an explicit second argument to `batchForInsert` to cap chunk size:
```ts theme={"system"}
for (const batch of batchForInsert(data.transfers, 100)) { ... }
```
## `tables`
Every table written to in `onData` must be listed in `tables`. At startup, the target installs PostgreSQL trigger functions on these tables to track row-level changes for automatic fork handling. Inserting into an unlisted table throws at runtime.
## Schema migrations
Use [Drizzle Kit](https://orm.drizzle.team/docs/kit-overview) to generate and apply migrations:
```bash theme={"system"}
npx drizzle-kit generate
npx drizzle-kit migrate
```
Alternatively, run migrations automatically on startup via `onStart`:
```ts theme={"system"}
import { migrate } from 'drizzle-orm/node-postgres/migrator'
drizzleTarget({
db,
tables: [...],
onStart: async ({ db }) => {
await migrate(db, { migrationsFolder: './drizzle' })
},
onData: async ({ tx, data }) => { ... },
})
```
## Rollback handling
Fork handling is fully automatic. Each batch runs inside a transaction that snapshots row-level changes. When the stream detects a fork, the target replays those snapshots in reverse to restore the pre-fork state.
Use `onBeforeRollback` and `onAfterRollback` to run custom logic around a rollback. Both callbacks receive the Drizzle transaction and the `cursor` (`BlockCursor`) to which state was rolled back:
```ts theme={"system"}
drizzleTarget({
db,
tables: [...],
onBeforeRollback: async ({ tx, cursor }) => { /* e.g. log or acquire an external lock */ },
onAfterRollback: async ({ tx, cursor }) => { /* e.g. invalidate a cache */ },
onData: async ({ tx, data }) => { ... },
})
```
## Complete example
```ts expandable theme={"system"}
import { commonAbis, evmDecoder, evmPortalSource } from '@subsquid/pipes/evm'
import { batchForInsert, drizzleTarget } from '@subsquid/pipes/targets/drizzle/node-postgres'
import { drizzle } from 'drizzle-orm/node-postgres'
import { integer, numeric, pgTable, primaryKey, varchar } from 'drizzle-orm/pg-core'
const transfersTable = pgTable('transfers', {
blockNumber: integer().notNull(),
logIndex: integer().notNull(),
from: varchar({ length: 42 }).notNull(),
to: varchar({ length: 42 }).notNull(),
value: numeric({ mode: 'bigint' }).notNull(),
}, (t) => [primaryKey({ columns: [t.blockNumber, t.logIndex] })])
await evmPortalSource({
portal: 'https://portal.sqd.dev/datasets/ethereum-mainnet',
outputs: evmDecoder({
range: { from: '0' },
events: { transfers: commonAbis.erc20.events.Transfer },
}),
}).pipeTo(
drizzleTarget({
db: drizzle('postgresql://postgres:postgres@localhost:5432/postgres'),
tables: [transfersTable],
onData: async ({ tx, data }) => {
for (const batch of batchForInsert(data.transfers)) {
await tx.insert(transfersTable).values(
batch.map((d) => ({
blockNumber: d.block.number,
logIndex: d.rawEvent.logIndex,
from: d.event.from,
to: d.event.to,
value: d.event.value,
})),
)
}
},
}),
)
```
See the [drizzleTarget reference](../../../reference/basic-components/target/postgres-drizzle) for the full API.
# Quickstart
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/quickstart
Bootstrap a Pipes SDK project
# Using with AI
The fastest way to get an AI coding agent productive on a Pipes SDK project is to install the official [Pipes SDK Agent Skill](/en/ai/agent-skills#pipes-sdk-skill):
```bash theme={"system"}
npx skills add subsquid-labs/skills/pipes-sdk
```
The skill activates automatically on tasks like *"create an indexer for Uniswap V3 swaps"* or *"my indexer is syncing slowly, help me optimize it"*. It covers scaffolding, runtime error diagnosis, sync tuning, and data-quality checks.
Pair the skill with one or both MCP servers so the agent can read live data and look things up:
* [Portal MCP server](/en/ai/mcp-server) — 29 tools for querying blocks, transactions, logs, instructions, and analytics across 225+ datasets. No API key.
* [Documentation MCP server](/en/ai/mcp-server-docs) — search and retrieve these docs from inside the agent.
If you'd rather feed docs into a model directly, the static [`llms.txt`](/llms.txt) (index) and [`llms-full.txt`](/llms-full.txt) (full content) files are kept in sync with the site. See the [AI Development overview](/en/ai/ai-development) for the full menu.
# Scaffolding with Pipes CLI
`pipes-cli` is a work in progress.
In a few minutes, you'll have a running pipe that indexes USDC token transfers on Ethereum mainnet into a local PostgreSQL database.
## Prerequisites
* Node.js 22.15+
* `pnpm`
* Docker (for the bundled PostgreSQL container)
## Initialize the project
Run the CLI in the directory where you want the project folder to land:
```bash theme={"system"}
pnpx @subsquid/pipes-cli@1.0.0-alpha.4 init
```
The CLI prompts for the project folder name, package manager (please stick to `pnpm` for now), sink (please use `ClickHouse` or `Postgres`), network type, network, and template; then installs dependencies and writes a runnable project.
You can supply a JSON config instead of filling the prompts manually. Here's the configuration for USDC token transfers mentioned above:
```bash theme={"system"}
pnpx @subsquid/pipes-cli@1.0.0-alpha.4 init --config '{
"projectFolder": "usdc-example",
"packageManager": "pnpm",
"sink": "postgresql",
"networkType": "evm",
"network": "ethereum-mainnet",
"templates": [
{
"templateId": "erc20Transfers",
"params": {
"contractAddresses": ["0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48"],
"range": { "from": "latest" }
}
}
]
}'
```
`--config` also accepts a path to a JSON file.
To inspect the full config schema run
```
pnpx @subsquid/pipes-cli@1.0.0-alpha.4 init --schema
```
## Run the pipeline
The generated project ships with a `docker-compose.yml` that brings up the sink database and the pipeline together:
```bash theme={"system"}
cd usdc-example
docker compose --profile with-pipeline up
```
For an iterative dev loop, run the database in Docker and the pipeline locally:
```bash theme={"system"}
docker compose up -d # Postgres on :5432
pnpm run db:migrate # apply the generated migration
pnpm run dev # tsx src/index.ts
```
Either way, rows start landing in the `erc20_transfers` table within a minute.
## What was generated
The project layout:
```
usdc-example/
├── src/
│ ├── index.ts # the pipe — source, decoder, target
│ ├── schemas.ts # Drizzle table definitions
│ └── utils/
├── migrations/ # SQL migrations generated by drizzle-kit
├── docker-compose.yml # Postgres + optional pipeline service
├── Dockerfile
├── drizzle.config.ts
├── package.json
├── .env # DB_CONNECTION_STR — points at local Postgres
└── README.md
```
The pipe lives in `src/index.ts`. The decoder block defines what to extract + a light transform:
```ts theme={"system"}
const erc20Transfers = evmDecoder({
profiler: { name: 'erc20-transfers' },
range: { from: 'latest' },
contracts: ['0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48'],
events: { transfers: commonAbis.erc20.events.Transfer },
}).pipe(({ transfers }) =>
transfers.map((transfer) => ({
blockNumber: transfer.block.number,
txHash: transfer.rawEvent.transactionHash,
logIndex: transfer.rawEvent.logIndex,
timestamp: transfer.timestamp.getTime(),
from: transfer.event.from,
to: transfer.event.to,
value: transfer.event.value,
tokenAddress: transfer.contract,
})),
)
```
This query-transform combo asks the Portal for ERC20 `Transfer` logs from the USDC contract, decodes them and (in the `.pipe` step) reshapes each one into a row matching the Drizzle table. See the [Pipe anatomy](./guides/basic-development/anatomy) and [Handling contract events](./guides/basic-development/handling-events) guides for more info on `evmDecoder()`.
The `main()` function wires the decoder to a [drizzleTarget](./reference/basic-components/target/postgres-drizzle):
```ts theme={"system"}
export async function main() {
await evmPortalSource({
id: 'ethereum-usdc-pipe',
portal: 'https://portal.sqd.dev/datasets/ethereum-mainnet',
outputs: { erc20Transfers },
}).pipeTo(
drizzleTarget({
db: drizzle(env.DB_CONNECTION_STR),
tables: [erc20TransfersTable],
onData: async ({ tx, data }) => {
for (const values of chunk(data.erc20Transfers)) {
await tx.insert(erc20TransfersTable).values(values)
}
},
}),
)
}
```
The `id` is a per-pipeline identifier — keep it stable so the [target's cursor](./guides/architecture-deep-dives/cursor-management) survives restarts. See [evmPortalSource](./reference/basic-components/source) for the full source API and [Pipe anatomy](./guides/basic-development/anatomy) for how the pieces fit together.
## Other examples
Tracks every pool created by the Uniswap V3 factory and indexes its `Swap` events. The generated decoder uses [factory transformers](./guides/advanced-topics/factory-transformers) with a SQLite-backed pool registry.
```bash theme={"system"}
pnpx @subsquid/pipes-cli@1.0.0-alpha.4 init --config '{
"projectFolder": "uniswapv3-swaps",
"packageManager": "pnpm",
"sink": "postgresql",
"networkType": "evm",
"network": "ethereum-mainnet",
"templates": [
{
"templateId": "uniswapV3Swaps",
"params": {
"factoryAddress": "0x1f98431c8ad98523631ae4a59f267346ea31f984",
"range": { "from": "latest" }
}
}
]
}'
```
The `custom` template generates ABI bindings and decoder wiring from an event list you provide. Drop in any contract and event set.
```bash theme={"system"}
pnpx @subsquid/pipes-cli@1.0.0-alpha.4 init --config '{
"projectFolder": "aave-supply-withdraw",
"packageManager": "pnpm",
"sink": "postgresql",
"networkType": "evm",
"network": "ethereum-mainnet",
"templates": [
{
"templateId": "custom",
"params": {
"contracts": [
{
"contractAddress": "0x87870Bca3F3fD6335C3F4ce8392D69350B4fA4E2",
"contractName": "AaveV3Pool",
"contractEvents": [
{
"anonymous": false,
"inputs": [
{ "indexed": true, "name": "reserve", "type": "address" },
{ "indexed": false, "name": "user", "type": "address" },
{ "indexed": true, "name": "onBehalfOf", "type": "address" },
{ "indexed": false, "name": "amount", "type": "uint256" },
{ "indexed": true, "name": "referralCode", "type": "uint16" }
],
"name": "Supply",
"type": "event"
},
{
"anonymous": false,
"inputs": [
{ "indexed": true, "name": "reserve", "type": "address" },
{ "indexed": true, "name": "user", "type": "address" },
{ "indexed": true, "name": "to", "type": "address" },
{ "indexed": false, "name": "amount", "type": "uint256" }
],
"name": "Withdraw",
"type": "event"
}
],
"range": { "from": "latest" }
}
]
}
}
]
}'
```
The CLI ships two built-in EVM templates — `erc20Transfers` and `uniswapV3Swaps` — plus the open-ended `custom` template.
# Query builder
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/reference/basic-components/query-builder
API reference for EvmQueryBuilder
`EvmQueryBuilder` assembles a typed portal query from a field selection and one or more data-request clauses. Pass `.build()` to `outputs` (or to `.pipe()`) on an EVM source. The resulting object is consumed by [`evmPortalStream()`](./source) and by the [evmDecoder](../utility-components/evm-decoder) which composes on top of it.
## `evmQuery()`
Returns a fresh `EvmQueryBuilder`.
```ts theme={"system"}
import { evmQuery } from '@subsquid/pipes/evm'
const query = evmQuery()
.addFields({
block: { timestamp: true },
log: { address: true, topics: true, data: true },
})
.addLog({
range: { from: 20_000_000 },
request: { topic0: ['0xddf252ad…'] },
})
.build()
```
## `EvmQueryBuilder`
```ts theme={"system"}
class EvmQueryBuilder {
addFields(fields: Subset): EvmQueryBuilder
addLog(options: RequestOptions): this
addTransaction(options: RequestOptions): this
addTrace(options: RequestOptions): this
addStateDiff(options: RequestOptions): this
addRange(range: PortalRange): this
merge(other?: EvmQueryBuilder): this
build(opts?: { setupQuery?: SetupQueryFn> }): QueryAwareTransformer
}
```
The generic parameter `F` narrows the block type produced by the stream — only fields explicitly selected with `.addFields()` appear on the decoded records, at both compile and runtime.
### `RequestOptions`
```ts theme={"system"}
type RequestOptions = { range: PortalRange; request: R }
type PortalRange = { from?: number | string | 'latest' | Date; to?: number | string | Date }
```
`PortalRange.from` defaults to `0`; a `Date` or numeric timestamp is resolved to a block number via the portal at query start. `'latest'` is resolved to the current head.
***
## `.addFields(fields)`
Add to the field selection. Repeated calls are merged recursively. Block hash and number are returned regardless of selection.
### `block`
| Field | Type |
| ------------------ | -------------------------- |
| `number` | `number` (always returned) |
| `hash` | `string` (always returned) |
| `parentHash` | `string` |
| `timestamp` | `number` (Unix seconds) |
| `transactionsRoot` | `string` |
| `receiptsRoot` | `string` |
| `stateRoot` | `string` |
| `logsBloom` | `string` |
| `sha3Uncles` | `string` |
| `extraData` | `string` |
| `miner` | `string` |
| `nonce` | `string` |
| `mixHash` | `string` |
| `size` | `number` |
| `gasLimit` | `bigint` |
| `gasUsed` | `bigint` |
| `difficulty` | `bigint` |
| `totalDifficulty` | `bigint?` |
| `baseFeePerGas` | `bigint` |
| `blobGasUsed` | `bigint` |
| `excessBlobGas` | `bigint` |
| `l1BlockNumber` | `number?` (L2 only) |
### `transaction`
| Field | Type |
| ------------------------------------------------------------------------------------------------------------ | ------------------------------------ |
| `transactionIndex` | `number` |
| `hash` | `string` |
| `nonce` | `bigint` |
| `from` | `string` |
| `to` | `string?` |
| `input` | `string` |
| `value` | `bigint` |
| `gas` | `bigint` |
| `gasPrice` | `bigint` |
| `maxFeePerGas` | `bigint?` |
| `maxPriorityFeePerGas` | `bigint?` |
| `v`, `r`, `s`, `yParity` | `bigint`/`string`/`string`/`number?` |
| `chainId` | `number?` |
| `sighash` | `string?` (first 4 bytes of `input`) |
| `contractAddress` | `string?` (for create transactions) |
| `gasUsed` | `bigint` |
| `cumulativeGasUsed` | `bigint` |
| `effectiveGasPrice` | `bigint` |
| `type` | `number` |
| `status` | `number` (0 = fail, 1 = success) |
| `blobVersionedHashes` | `string[]?` |
| `l1Fee`, `l1FeeScalar`, `l1GasPrice`, `l1GasUsed`, `l1BlobBaseFee`, `l1BlobBaseFeeScalar`, `l1BaseFeeScalar` | L2 fee metadata |
### `log`
| Field | Type |
| ------------------ | ---------- |
| `logIndex` | `number` |
| `transactionIndex` | `number` |
| `transactionHash` | `string` |
| `address` | `string` |
| `data` | `string` |
| `topics` | `string[]` |
### `trace`
Trace records are a tagged union over `type`. Each type has its own action and (optionally) result sub-objects; the field selection is flat with `create…`/`call…`/`suicide…`/`reward…` prefixes. Shared:
| Field | Type |
| ------------------ | --------------------------------------------- |
| `type` | `'create' \| 'call' \| 'suicide' \| 'reward'` |
| `transactionIndex` | `number` |
| `traceAddress` | `number[]` |
| `subtraces` | `number` |
| `error` | `string \| null` |
| `revertReason` | `string?` |
Type-specific (appears only on that trace `type`):
| Field | Action/result | Type |
| ---------------------------------------------------------------------------------------- | -------------- | ---------------------------------------------------------------------------- |
| `createFrom`, `createValue`, `createGas`, `createInit` | create action | `string` / `bigint` / `bigint` / `string` |
| `createResultGasUsed`, `createResultCode`, `createResultAddress` | create result | `bigint` / `string?` / `string` |
| `callCallType`, `callFrom`, `callTo`, `callValue`, `callGas`, `callInput`, `callSighash` | call action | `string` / `string` / `string` / `bigint?` / `bigint` / `string` / `string?` |
| `callResultGasUsed`, `callResultOutput` | call result | `bigint` / `string?` |
| `suicideAddress`, `suicideRefundAddress`, `suicideBalance` | suicide action | `string` / `string` / `bigint` |
| `rewardAuthor`, `rewardValue`, `rewardType` | reward action | `string` / `bigint` / `string` |
### `stateDiff`
State diffs are a tagged union over `kind` (`'+'` = add, `'-'` = delete, `'*'` = change, `'='` = no-change).
| Field | Type |
| ------------------ | --------------------------------------------------------- |
| `transactionIndex` | `number` |
| `address` | `string` |
| `key` | `'balance' \| 'code' \| 'nonce' \| string` (storage slot) |
| `kind` | `'+' \| '-' \| '*' \| '='` |
| `prev` | `string` (present for `-`, `*`) |
| `next` | `string` (present for `+`, `*`) |
***
## `.addLog(options)`
Filter logs. All list filters AND across fields but OR within a field. Request options:
| Field | Type | Meaning |
| ---------------------------- | ---------- | ------------------------------------------------------- |
| `address` | `string[]` | Emitting contract addresses (lowercase). |
| `topic0` | `string[]` | First topic (typically the event signature hash). |
| `topic1`, `topic2`, `topic3` | `string[]` | Remaining indexed topics. |
| `transaction` | `boolean` | Also fetch the parent transaction of each matching log. |
| `transactionTraces` | `boolean` | Also fetch traces of parent transactions. |
| `transactionLogs` | `boolean` | Also fetch all logs from parent transactions. |
| `transactionStateDiffs` | `boolean` | Also fetch state diffs caused by parent transactions. |
An empty `request: {}` matches every log in the range.
```ts theme={"system"}
evmQuery().addLog({
range: { from: 20_000_000, to: 20_000_100 },
request: {
address: ['0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48'],
topic0: ['0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef'],
transaction: true,
},
})
```
## `.addTransaction(options)`
Filter transactions.
| Field | Type | Meaning |
| ------------ | ---------- | ------------------------------------------------------------------- |
| `from` | `string[]` | Sender addresses (lowercase). |
| `to` | `string[]` | Recipient addresses (lowercase). |
| `sighash` | `string[]` | First 4 bytes of `input` (e.g. `0xa9059cbb` for ERC-20 `transfer`). |
| `type` | `number[]` | Transaction type (0 legacy, 2 EIP-1559, 3 blob, etc.). |
| `logs` | `boolean` | Also fetch emitted logs. |
| `traces` | `boolean` | Also fetch execution traces. |
| `stateDiffs` | `boolean` | Also fetch state diffs. |
## `.addTrace(options)`
Filter execution traces.
| Field | Type | Meaning |
| ---------------------- | ------------------------------------------------- | ---------------------------------------------- |
| `type` | `('create' \| 'call' \| 'suicide' \| 'reward')[]` | Restrict to given trace types. |
| `createFrom` | `string[]` | Creator address for `create` traces. |
| `callFrom` | `string[]` | Caller address for `call` traces. |
| `callTo` | `string[]` | Callee address for `call` traces. |
| `callSighash` | `string[]` | `input` sighash for `call` traces. |
| `suicideRefundAddress` | `string[]` | Refund address for `suicide` traces. |
| `rewardAuthor` | `string[]` | Author of `reward` traces. |
| `transaction` | `boolean` | Fetch parent transactions of matching traces. |
| `transactionLogs` | `boolean` | Fetch all logs emitted by parent transactions. |
| `subtraces` | `boolean` | Fetch all subtraces of matching traces. |
| `parents` | `boolean` | Fetch parent traces of matching traces. |
## `.addStateDiff(options)`
Filter storage diffs.
| Field | Type | Meaning |
| ------------- | ------------------------------ | --------------------------------------------------------- |
| `address` | `string[]` | Contract or account addresses (lowercase). |
| `key` | `string[]` | Storage keys or pseudo-keys (`balance`, `code`, `nonce`). |
| `kind` | `('+' \| '-' \| '*' \| '=')[]` | Type of change. |
| `transaction` | `boolean` | Fetch parent transactions. |
## `.addRange(range)`
Push a range-only request with no filters. Mostly useful to bound the stream or in combination with `includeAllBlocks` set elsewhere.
```ts theme={"system"}
evmQuery().addRange({ from: 20_000_000, to: 20_001_000 })
```
## `.merge(other)`
Merge another builder's requests and fields in-place. Overlapping ranges are reconciled at build time.
## `.build(opts?)`
Return a `QueryAwareTransformer` suitable for use as a source output.
```ts theme={"system"}
evmPortalStream({
portal: 'https://portal.sqd.dev/datasets/ethereum-mainnet',
outputs: { transfers: evmQuery().addLog({/*…*/}).build() },
})
```
`opts.setupQuery` is an advanced hook called when the query is finalized: it receives `{ query, logger }` and can mutate `query` (e.g. merge additional requests from runtime data). Default behaviour is to merge `this` into the stream's root query.
***
## See also
* [Source](./source) — how queries attach to the stream.
* [evmDecoder](../utility-components/evm-decoder) — typed wrapper that emits a pre-built query plus event/function decoding.
* [Handling contract events](../../guides/basic-development/handling-events) — higher-level guide.
* [Portal API (EVM) OpenAPI](/en/portal/evm/api) — raw wire protocol this builder serialises to.
# Source
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/reference/basic-components/source
API reference for EVM Portal source
The source component connects to SQD Portal and streams blockchain data to your pipeline. It's the starting point for all Pipes SDK data flows.
## evmPortalSource
Create a Portal source for EVM chains.
```ts theme={"system"}
evmPortalSource(config: EvmPortalSourceConfig): Source
```
**Parameters:**
* `id`: (required) Pipeline ID. Must be unique within any infra shared with other pipelines (DB, logging sinks etc).
* `portal`: (required) Portal API URL or config object.
* String: `"https://portal.sqd.dev/datasets/ethereum-mainnet"`
* Object: `{ url: string, finalized?: boolean }`. When `finalized: true` is set the stream will consist of finalized blocks only and none of the [fork handling machinery](../../guides/architecture-deep-dives/fork-handling) will be required.
* `outputs`: (required) A single query-transformers chain combo or record of named outputs.
* `cache`: (optional) Portal cache instance. If supplied, saves portal responses locally and reuses them when the pipeline re-runs.
* `logger`: (optional) A pino-compatible `Logger` instance or a log level string. Accepted level values: `'fatal'`, `'error'`, `'warn'`, `'info'`, `'debug'`, `'trace'`, `'silent'`, `false`, `null`. Passing `false` or `null` silences all log output. When omitted, a default console logger is used.
* `metrics`: (optional) `metricsServer()` instance for exposing Prometheus metrics.
* `progress`: (optional) Options for progress tracking.
* `profiler`: (optional) Enable the built-in per-batch profiler. See [Profiling](../../guides/advanced-topics/profiling).
**Example:**
```ts theme={"system"}
import { evmPortalSource } from "@subsquid/pipes/evm";
import { portalSqliteCache } from "@subsquid/pipes/portal-cache/node";
const source = evmPortalSource({
id: "ethereum-transfers",
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
outputs: evmDecoder({
range: { from: 20000000 },
events: { transfers: commonAbis.erc20.events.Transfer },
}),
cache: portalSqliteCache({ path: "./cache.sqlite" }),
});
```
### Finalized Blocks
You can configure the source to only receive finalized blocks:
```ts theme={"system"}
const source = evmPortalSource({
portal: {
finalized: true,
url: 'https://portal.sqd.dev/datasets/ethereum-mainnet'
}
});
```
Using finalized blocks eliminates the need for rollback handlers in your targets, simplifying the logic of your pipeline.
## Pipe methods
### pipe()
Chain a single [whole-pipe transformer](./transformer) to the source.
```ts theme={"system"}
source.pipe(transformer)
```
The returned value behaves exactly as the source.
See also: [Stateful transformers](../../guides/advanced-topics/stateful-transforms).
### pipeTo()
Connect the pipeline to a [target](./target).
```ts theme={"system"}
source.pipeTo(target)
```
This is a terminal operation: you cannot continue piping after calling this method.
If you want your stream to resume on restarts and properly handle unfinalized data, make sure that the target [manages cursors](../../guides/architecture-deep-dives/cursor-management) and [handles forks](../../guides/architecture-deep-dives/fork-handling) correctly.
### \*[Symbol.asyncIterator]()
Use the pipeline as an async iterator:
```ts theme={"system"}
for await (const { data } of stream) {
// ... do something with data ...
}
```
On blockchain forks this will throw `ForkException`s - see [Fork handling](../../guides/architecture-deep-dives/fork-handling).
# clickhouseTarget
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/reference/basic-components/target/clickhouse
ClickHouse target for Pipes SDK
See the [ClickHouse guide](../../../guides/basic-development/targets/clickhouse) for usage examples, table design, and setup instructions.
```ts theme={"system"}
import { clickhouseTarget } from '@subsquid/pipes/targets/clickhouse'
```
## `clickhouseTarget`
```ts theme={"system"}
clickhouseTarget({
client: ClickHouseClient,
onStart?: (ctx: { store: ClickhouseStore; logger: Logger }) => unknown | Promise,
onData: (ctx: { store: ClickhouseStore; data: T; ctx: Ctx }) => unknown | Promise,
onRollback?: (ctx: {
type: 'offset_check' | 'blockchain_fork'
store: ClickhouseStore
safeCursor: BlockCursor
/** @deprecated Use `safeCursor` instead. */
cursor: BlockCursor
}) => unknown | Promise,
settings?: Settings,
})
```
| Parameter | Required | Description |
| ------------ | -------- | ---------------------------------------------------------------------------- |
| `client` | Yes | Client from `@clickhouse/client`. |
| `onStart` | No | Runs once before processing starts. Use for table creation or other setup. |
| `onData` | Yes | Called for each batch. |
| `onRollback` | No | Called on startup (`'offset_check'`) and on each fork (`'blockchain_fork'`). |
| `settings` | No | Configuration for the internal cursor state table. See `Settings` below. |
**`Settings`:**
| Field | Default | Description |
| ---------- | ---------------------------- | ------------------------------------------------------- |
| `database` | Client's configured database | ClickHouse database for the state table. |
| `table` | `'sync'` | Name of the state table. |
| `id` | `'stream'` | Stream identifier within the state table. |
| `maxRows` | `10000` | Maximum rows retained per stream id in the state table. |
## `ClickhouseStore` methods
| Method | Description |
| ------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------- |
| `store.insert(params)` | Queues an insert. Non-blocking — returns a `Promise` but need not be awaited inside `onData`; inserts are flushed when the target closes. |
| `store.query(params)` | Passthrough to `client.query()`. |
| `store.command(params)` | Passthrough to `client.command()`. |
| `store.removeAllRows({ tables, where, params? })` | Cancels rows matching `where` by re-inserting them with `sign = -1`. Requires `CollapsingMergeTree`. |
| `store.removeAllRowsByQuery({ table, query, params? })` | Like `removeAllRows`, but uses a custom `SELECT` to identify the rows to cancel. |
| `store.executeFiles(dir)` | Executes all `.sql` files found in `dir`. |
# createTarget
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/reference/basic-components/target/create-target
API reference for createTarget
Build a custom data sink. A target drains batches from the pipe and is responsible for persisting them.
```ts theme={"system"}
createTarget(config: Target): Target
```
**Config fields:**
* `write`: (required) Async function `({ read, logger }) => Promise`. Iterate the stream by calling `read()` and consuming `{ data, ctx }` batches. The function returns when the stream ends.
* `fork`: (optional) `(previousBlocks: BlockCursor[]) => Promise`. Called when the source detects a chain reorg. Return the last safe cursor to roll back to, or `null` if no common ancestor can be determined (the stream will throw). See [Fork handling](../../../guides/architecture-deep-dives/fork-handling). You don't need this callback when the source is configured to [read only finalized blocks](../source#finalized-blocks).
## The `write` context
```ts theme={"system"}
type WriteCtx = {
read: (cursor?: BlockCursor) => AsyncIterableIterator>
logger: Logger
}
```
| Field | Description |
| -------- | ---------------------------------------------------------------------------------------------------------------------- |
| `read` | Opens an async iterator over pipeline batches. Pass `cursor` to resume from a specific block the target has persisted. |
| `logger` | Pino-compatible logger scoped to this target. |
## Per-batch context (`ctx`)
Each `{ data, ctx }` yielded by `read()` carries the same `BatchContext` that transformers receive. Fields:
| Field | Type | Description |
| ---------------------------- | --------------------------- | ----------------------------------------------------------------------------------------------------------- |
| `id` | `string` | Pipeline ID — the `id` passed to `evmPortalStream()`. |
| `logger` | `Logger` | Batch-scoped logger. |
| `metrics` | `Metrics` | Prometheus metrics registry. See [Metrics](../../../guides/advanced-topics/metrics). |
| `profiler` | `Profiler` | Open a span with `ctx.profiler.start('label')`. See [Profiling](../../../guides/advanced-topics/profiling). |
| `stream.dataset` | `ApiDataset` | Dataset metadata. |
| `stream.head.finalized` | `BlockCursor \| undefined` | Current finalized head. |
| `stream.head.latest` | `BlockCursor \| undefined` | Current unfinalized head. |
| `stream.state.initial` | `number` | First block number the stream was configured to read. |
| `stream.state.last` | `number` | Last block number the stream intends to read. |
| `stream.state.current` | `BlockCursor` | Latest block in this batch. |
| `stream.state.rollbackChain` | `BlockCursor[]` | Tail of unfinalized cursors subject to rollback. |
| `stream.progress` | `ProgressEvent['progress']` | Progress metrics when `progress` is enabled. |
| `stream.query` | `{ url, hash, raw }` | Portal query details for the batch. |
| `batch.blocksCount` | `number` | Number of blocks in this batch. |
| `batch.bytesSize` | `number` | Compressed payload size received from the portal. |
| `batch.requests` | `Record` | Map of HTTP status code → count of responses that produced this batch. |
| `batch.lastBlockReceivedAt` | `Date` | Wall-clock time the last block was received. |
## Example
```ts theme={"system"}
const target = createTarget({
write: async ({ read, logger }) => {
for await (const { data, ctx } of read()) {
const span = ctx.profiler.start('save')
await database.save(data)
span.end()
logger.info(
{ block: ctx.stream.state.current.number, rows: ctx.batch.blocksCount },
'saved batch',
)
}
},
fork: async (previousBlocks) => {
// Return a cursor from your persisted state; null to fail hard.
return previousBlocks[previousBlocks.length - 1] ?? null
},
})
```
## Resuming from a persisted cursor
Stateful targets typically persist a cursor and resume from it on restart:
```ts theme={"system"}
createTarget({
write: async ({ read, logger }) => {
const lastSaved = await database.getCursor() // BlockCursor | undefined
for await (const { data, ctx } of read(lastSaved)) {
await database.save(data)
await database.saveCursor(ctx.stream.state.current)
}
},
})
```
# drizzleTarget
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/reference/basic-components/target/postgres-drizzle
PostgreSQL target for Pipes SDK via Drizzle ORM
See the [Postgres via Drizzle guide](../../../guides/basic-development/targets/postgres-drizzle) for usage examples and setup instructions.
```ts theme={"system"}
import { drizzleTarget } from '@subsquid/pipes/targets/drizzle/node-postgres'
```
## `drizzleTarget`
```ts theme={"system"}
drizzleTarget({
db: NodePgDatabase,
tables: Table[] | Record,
onStart?: (ctx: { db: NodePgDatabase }) => Promise,
onData: (ctx: { tx: Transaction; data: T; ctx: Ctx }) => Promise,
onBeforeRollback?: (ctx: { tx: Transaction; cursor: BlockCursor }) => Promise | unknown,
onAfterRollback?: (ctx: { tx: Transaction; cursor: BlockCursor }) => Promise | unknown,
settings?: {
state?: StateOptions
transaction?: {
isolationLevel?: 'read uncommitted' | 'read committed' | 'repeatable read' | 'serializable'
}
},
})
```
| Parameter | Required | Description |
| ------------------------------------- | -------- | ----------------------------------------------------------------------------------------------- |
| `db` | Yes | Drizzle `NodePgDatabase` instance. Must expose `$client` (a `pg` Pool or Client). |
| `tables` | Yes | Tables tracked for automatic fork rollback. All tables written to in `onData` must appear here. |
| `onStart` | No | Runs once before processing starts. Receives `{ db }`. |
| `onData` | Yes | Called for each batch inside a serializable transaction. |
| `onBeforeRollback` | No | Called inside the rollback transaction before snapshots are replayed. |
| `onAfterRollback` | No | Called inside the rollback transaction after snapshots are replayed. |
| `settings.state` | No | Configuration for the internal cursor state table. See `StateOptions` below. |
| `settings.transaction.isolationLevel` | No | Transaction isolation level. Defaults to `'serializable'`. |
**`StateOptions`:**
| Field | Default | Description |
| ---------------------------- | ---------- | --------------------------------------------------------------------- |
| `schema` | `'public'` | PostgreSQL schema for the state table. |
| `table` | `'sync'` | Name of the state table. |
| `id` | `'stream'` | Stream identifier within the state table. |
| `unfinalizedBlocksRetention` | `1000` | Number of unfinalized blocks retained in state for rollback purposes. |
## `batchForInsert`
```ts theme={"system"}
import { batchForInsert } from '@subsquid/pipes/targets/drizzle/node-postgres'
```
```ts theme={"system"}
function batchForInsert(data: readonly T[], size?: number): Generator
```
Splits an array into chunks that fit within PostgreSQL's 32,767-parameter limit. Chunk size is `Math.floor(32767 / columnsPerRecord)` by default. Pass `size` to set a smaller cap; values exceeding the computed maximum are silently clamped.
`chunk` is a deprecated alias for `batchForInsert`.
# Transformer
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/reference/basic-components/transformer
API reference for createTransformer and pipe transforms
## createTransformer
Construct a whole-pipe transformer.
```ts theme={"system"}
createTransformer(config: TransformerOptions): Transformer
```
**Config fields:**
* `transform`: (required) `(data: I, ctx: BatchContext) => O | Promise`. Called once per batch.
* `start`: (optional) `(ctx: StartCtx) => void | Promise`. Called once when the pipe starts. Use this to load state, warm up caches, or query the portal for historical data before the main stream begins.
* `stop`: (optional) `(ctx: StopCtx) => void | Promise`. Called once when the pipe stops.
* `fork`: (optional) `(cursor: BlockCursor, ctx: Ctx) => void | Promise`. Called before the next batch whenever the source detects a chain reorg. `cursor` identifies the last safe block. See [Fork handling](../../guides/architecture-deep-dives/fork-handling).
* `profiler`: (optional) `{ name: string; hidden?: boolean }`. Overrides the transformer's node name in the [profiler](../../guides/advanced-topics/profiling) tree.
**Example:**
```ts theme={"system"}
const transformer = createTransformer({
transform: async (data, ctx) => {
ctx.logger.info({ block: ctx.stream.state.current.number }, 'batch')
return data.map((b) => b.logs)
},
})
```
## Context variables
Each callback receives a context object. The fields differ by callback.
### `transform(data, ctx: BatchContext)`
`ctx` is the full per-batch context. Fields:
| Field | Type | Description |
| ---------- | -------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `id` | `string` | Pipeline ID — the `id` passed to `evmPortalStream()`. |
| `logger` | `Logger` | Pino-compatible logger scoped to this batch. Defaults to the source-level logger. |
| `metrics` | `Metrics` | Prometheus metrics registry. Use `ctx.metrics.counter()`, `.gauge()`, `.histogram()`, `.summary()` to register and update custom metrics. See [Metrics](../../guides/advanced-topics/metrics). |
| `profiler` | `Profiler` | Open a span with `ctx.profiler.start('label')`. See [Profiling](../../guides/advanced-topics/profiling). |
| `stream` | `BatchStreamContext` | Per-stream state (see below). |
| `batch` | `BatchMetadata` | Per-batch volume info (see below). |
#### `ctx.stream: BatchStreamContext`
| Field | Type | Description |
| --------------------- | --------------------------- | ----------------------------------------------------------------------------------------- |
| `dataset` | `ApiDataset` | Dataset metadata returned by the portal (chain name, genesis, tier). |
| `head.finalized` | `BlockCursor \| undefined` | Current finalized head known to the portal, if advertised. |
| `head.latest` | `BlockCursor \| undefined` | Current unfinalized head. |
| `state.initial` | `number` | First block number the stream was configured to read. |
| `state.last` | `number` | Last block number the stream intends to read (often `Infinity`). |
| `state.current` | `BlockCursor` | Latest block in this batch. Cursor has `{ number, hash?, timestamp? }`. |
| `state.rollbackChain` | `BlockCursor[]` | Unfinalized-chain tail — cursors the stream will need to roll back if a fork is detected. |
| `progress` | `ProgressEvent['progress']` | Progress metrics when `progress` is configured on the source; otherwise undefined. |
| `query` | `{ url, hash, raw }` | Debug info for the portal query feeding this batch. |
#### `ctx.batch: BatchMetadata`
| Field | Type | Description |
| --------------------- | ------------------------ | ----------------------------------------------------------------------- |
| `blocksCount` | `number` | Number of blocks in this batch. |
| `bytesSize` | `number` | Compressed payload size received from the portal. |
| `requests` | `Record` | Map of HTTP status code → number of responses that produced this batch. |
| `lastBlockReceivedAt` | `Date` | Wall-clock time the last block was received. |
### `start(ctx: StartCtx)`
Fired once, before any batch. Use to warm up caches or run one-off queries.
| Field | Type | Description |
| --------------- | -------------------------- | --------------------------------------------------------------------------- |
| `id` | `string` | Pipeline ID. |
| `logger` | `Logger` | Same as in `BatchContext`. |
| `metrics` | `Metrics` | Same as in `BatchContext`. |
| `portal` | `PortalClient` | Live portal client. Use `portal.getStream(query)` for warm-up reads. |
| `state.initial` | `number` | First block the stream was configured to read. |
| `state.current` | `BlockCursor \| undefined` | Cursor persisted by the previous run, if any. `undefined` on a fresh start. |
### `fork(cursor, ctx: Ctx)`
Fired before the next batch whenever a reorg is detected. `cursor` is the last block to keep; drop state produced for anything after it.
| Field | Type | Description |
| ---------- | ---------- | -------------------------- |
| `logger` | `Logger` | Same as in `BatchContext`. |
| `profiler` | `Profiler` | Same as in `BatchContext`. |
### `stop(ctx: StopCtx)`
Fired once when the pipe stops.
| Field | Type | Description |
| -------- | -------- | -------------------------- |
| `logger` | `Logger` | Same as in `BatchContext`. |
# evmDecoder
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/reference/utility-components/evm-decoder
Decode smart contract events as a pipe
See the [Handling contract events](../../guides/basic-development/handling-events) guide for usage examples and event specification routes.
## evmDecoder
Returns a query-transformer combo that instructs the [source](../basic-components/source) to fetch and decode smart contract event logs.
```ts theme={"system"}
evmDecoder(config: EvmDecoderConfig): Transformer
```
**Parameters:**
* `range`: Block range `{ from: number | 'latest', to?: number }` (required)
* `contracts`: Array of contract addresses or a [factory](./factory) (optional — omit to receive events from all contracts)
* `events`: Map of event names to ABI event objects or `{ event, params }` filter objects (required)
* `profiler`: Profiler config shard for labeling the transformer in profiling data `{ name: string }` (optional)
* `onError`: Error handler (optional)
**Example:**
```ts theme={"system"}
import { evmDecoder, commonAbis } from "@subsquid/pipes/evm";
const decoder = evmDecoder({
range: { from: 20000000, to: 20100000 },
contracts: ["0xa0b86991c6218b36c1d19d4a2e9eb0ce3606eb48"],
events: {
transfer: commonAbis.erc20.events.Transfer,
},
});
await evmPortalSource({
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
outputs: decoder,
}).pipeTo(target);
```
## Decoded event structure
Each entry in the output arrays is a `DecodedEvent` object:
| Field | Type | Description |
| ----------- | ---------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| `event` | decoded event type | Decoded event data fields |
| `contract` | `string` | Address of the contract that emitted the event |
| `block` | `{ number: number, hash: string }` | Block number and hash |
| `timestamp` | `Date` | Block timestamp |
| `rawEvent` | `Log` | Raw log with `address`, `topics`, `data`, `transactionHash`, `logIndex`, `transactionIndex` |
| `factory` | `{ contract, blockNumber, event }` | Present only when using a [factory](./factory); carries the factory deployment event that discovered this contract |
## commonAbis
```ts theme={"system"}
import { commonAbis } from '@subsquid/pipes/evm'
```
`commonAbis` is a built-in collection of typed ABI modules. See the [Handling contract events guide](../../guides/basic-development/handling-events#commonabis) for usage examples.
### commonAbis.erc20
**Events:**
| | Signature |
| ---------------------------------- | ------------------------------------------------------------------------- |
| `commonAbis.erc20.events.Transfer` | `Transfer(address indexed from, address indexed to, uint256 value)` |
| `commonAbis.erc20.events.Approval` | `Approval(address indexed owner, address indexed spender, uint256 value)` |
**Functions:**
| | Signature |
| ----------------------------------------- | ----------------------------------------------------------------- |
| `commonAbis.erc20.functions.name` | `name() → string` |
| `commonAbis.erc20.functions.symbol` | `symbol() → string` |
| `commonAbis.erc20.functions.decimals` | `decimals() → uint8` |
| `commonAbis.erc20.functions.totalSupply` | `totalSupply() → uint256` |
| `commonAbis.erc20.functions.balanceOf` | `balanceOf(address _owner) → uint256` |
| `commonAbis.erc20.functions.allowance` | `allowance(address _owner, address _spender) → uint256` |
| `commonAbis.erc20.functions.transfer` | `transfer(address _to, uint256 _value) → bool` |
| `commonAbis.erc20.functions.approve` | `approve(address _spender, uint256 _value) → bool` |
| `commonAbis.erc20.functions.transferFrom` | `transferFrom(address _from, address _to, uint256 _value) → bool` |
# evmRpcLatencyWatcher
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/reference/utility-components/evm-rpc-latency-watcher
Compare block arrival at Portal vs RPC endpoints
Subscribe to RPC endpoints via WebSocket and measure when blocks arrive at the Portal versus when they appear at the RPC. Use this transformer to monitor relative latency.
```ts theme={"system"}
import { evmPortalSource, evmRpcLatencyWatcher } from "@subsquid/pipes/evm";
const stream = evmPortalSource({
portal: "https://portal.sqd.dev/datasets/base-mainnet",
outputs: evmDecoder({ range: { from: 'latest' }, events: {} }),
}).pipe(
evmRpcLatencyWatcher({
rpcUrl: ["https://base.drpc.org", "https://base-rpc.publicnode.com"],
})
);
for await (const { data } of stream) {
if (!data) continue;
console.table(data.rpc); // url, receivedAt, portalDelayMs
}
```
**Parameters:**
* `rpcUrl`: Array of RPC WebSocket or HTTP URLs to compare against Portal
**Output:** Each batch includes `rpc` array with `url`, `receivedAt`, and `portalDelayMs` per endpoint.
Measured values include client-side network latency. Results are end-to-end delays as seen by the client, not pure Portal or RPC processing performance.
# factory
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/reference/utility-components/factory
Track dynamically created contracts
Track dynamically created contracts with [evmDecoder()](./evm-decoder).
```ts theme={"system"}
factory(config: FactoryConfig): Factory
```
**Parameters:**
* `address`: Factory contract address or array of addresses (required)
* `event`: Factory creation event ABI or filtered event object (required)
* **Simple format**: `AbiEvent` - Capture all factory events
* **Filtered format**: `{ event: AbiEvent, params: {...} }` - Filter by indexed parameters
Events should be specified using [the same approach as `evmDecoder()` itself uses](../../guides/basic-development/handling-events#specifying-events).
* `parameter`: Extract child address `(event) => string` (required)
* `database`: A [factory store](#contractfactorystore) that persists the list of known child contracts (required)
**Example:**
```ts theme={"system"}
import { factory, contractFactoryStore } from "@subsquid/pipes/evm";
const factoryInstance = factory({
address: "0x1f98431c8ad98523631ae4a59f267346ea31f984",
event: factoryAbi.events.PoolCreated,
parameter: "pool",
database: contractFactoryStore({ path: "./pools.sqlite" }),
});
```
**Filtered factory events:**
```ts theme={"system"}
factory({
address: "0x1f98431c8ad98523631ae4a59f267346ea31f984",
event: {
event: factoryAbi.events.PoolCreated,
params: {
token0: "0xc02aaa39b223fe8d0a0e5c4f27ead9083c756cc2", // WETH
},
},
parameter: "pool",
database: contractFactoryStore({ path: "./weth-pools.sqlite" }),
});
```
Only **indexed event parameters** can be used in the `params` object. Another way to look at it is that parameter values should be available as event topics. [Reference](https://docs.soliditylang.org/en/latest/contracts.html#events).
## contractFactoryStore
Create an SQLite factory database: an object used to persist the list of child contracts in a fork-aware way. For now, only SQLite-based factory databases are supported.
```ts theme={"system"}
contractFactoryStore(config: { path: string }): FactoryDatabase
```
# metricsServer
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/reference/utility-components/metrics-server
Expose Prometheus metrics and the live-stats API from your pipe
Start a metrics server on the pipe process. Required by [Pipes UI](../../guides/basic-development/pipes-ui) and by anything that scrapes Prometheus (Grafana, Alertmanager, etc.).
```ts theme={"system"}
import { metricsServer } from "@subsquid/pipes/metrics/node";
import { evmPortalStream } from "@subsquid/pipes/evm";
evmPortalStream({
// ...
metrics: metricsServer({ port: 9090 }),
// ...
});
```
**Parameters:**
* `port`: HTTP port for the server (default: `9090`).
## Endpoints
`metricsServer()` serves four HTTP endpoints on the configured port. They are all also useful for ad-hoc inspection with `curl`.
| Path | Content |
| ----------- | ----------------------------------------------------------------------------------------------------------------------------------- |
| `/stats` | JSON — per-pipe progress, speed, portal query, SDK version. This is what [Pipes UI](../../guides/basic-development/pipes-ui) polls. |
| `/metrics` | Prometheus text — built-in `sqd_*` series plus any custom metrics you registered. Scrape this from Prometheus. |
| `/profiler` | JSON — recent per-batch span trees. See [Profiling](../../guides/advanced-topics/profiling). Empty when profiling is disabled. |
| `/health` | Responds with `ok`. |
## Custom metrics
Register counters, gauges, histograms, and summaries via `ctx.metrics` in [whole-pipe transformers](../basic-components/transformer), [targets](../basic-components/target), or when consuming the pipe as an async iterator. See the [Metrics guide](../../guides/advanced-topics/metrics).
## See also
* [Pipes UI](../../guides/basic-development/pipes-ui) — visual dashboard that consumes `/stats` and `/profiler`.
* [Metrics](../../guides/advanced-topics/metrics) — walkthrough for exposing Prometheus metrics and adding custom series.
* [Profiling](../../guides/advanced-topics/profiling) — interpreting the `/profiler` span tree.
# portalSqliteCache
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/reference/utility-components/sqlite-cache
SQLite cache for Portal responses
Create SQLite cache for Portal responses. Use with `evmPortalSource` to cache Portal API responses locally.
```ts theme={"system"}
portalSqliteCache(config: { path: string }): PortalCache
```
**Example:**
```ts theme={"system"}
import { portalSqliteCache } from "@subsquid/pipes/portal-cache/node";
import { evmPortalSource } from "@subsquid/pipes/evm";
const source = evmPortalSource({
portal: "https://portal.sqd.dev/datasets/ethereum-mainnet",
outputs: evmDecoder({ range: { from: 0 }, events: {} }),
cache: portalSqliteCache({ path: "./cache.sqlite" }),
});
```
Import from `@subsquid/pipes/portal-cache/node` instead of `@subsquid/pipes/portal-cache`.
### When to Use
* Development iteration
* Testing pipelines
* Repeated processing of same block ranges.
# Why Pipes SDK?
Source: https://docs.sqd.dev/en/sdk/pipes-sdk/evm/why-pipes-sdk
And when you might want to use it
Pipes SDK is also available for Solana
Pipes SDK is a TypeScript library for retrieving blockchain data from SQD Portals and transforming it. It features:
* **All features of the SQD Portal API:**
* Data is downloaded in big chunks and at a high speed.
* It is filtered on the server side - you only download what you need.
* Real-time data is supported.
* Information on blockchain reorganizations and finality is available; in standard modules these are handled automatically.
* **Being a library**: although Pipes SDK can be used to build full-featured blockchain indexers, it is easy to embed it into larger applications, microservices, or data processing workflows.
* **Reusable modules:** data filters can be bundled with transformation logic, and the resulting modules can be mixed and matched. For example you can create modules to get you decoded Uniswap-stype swaps and ERC-20 transfers, then just plug them into your pipe when you need either.
* **Simplicity of extension:** we've made adding new modules as simple as possible.
* This includes data sinks: adding support for your database, data lake or message queue is no longer a hassle.
## When to Use Pipes SDK
1. You want common protocols (Uniswap, ERC20, ERC721/1155 etc) handled for you. If all of your data is like that, you can start uploading it into your database in minutes.
2. You need deep customization of any part of the data pipeline.
## When to Use Alternatives
1. If you want to use Portal data in an non-JS app, consider using
* [Raw Portal API](/en/portal/evm/api) for all languages.
2. Consider using [Squid SDK](/en/sdk/squid-sdk/evm) if:
* You're making a self-contained Web3 data service such as a GraphQL API.
* You're looking for an indexing framework similar to TheGraph, Ponder or Envio.
# sqd CLI cheatsheet
Source: https://docs.sqd.dev/en/sdk/squid-sdk/how-to-start/cli-cheatsheet
Cheatsheet of commonly used sqd CLI commands — init, deploy, logs, secrets, gateways — with examples for everyday Squid SDK development workflows.
# Squid CLI cheatsheet
The [`sqd` CLI tool](/en/sdk/squid-sdk/squid-cli) has [built-in aliasing](/en/sdk/squid-sdk/squid-cli/commands-json) that picks up the commands defined in `commands.json` in the project root. In all [squid templates](/en/sdk/squid-sdk/how-to-start/squid-development#templates) this file is pre-populated with some handy scripts briefly described below.
One can always inspect the available commands defined in `commands.json` with
```
sqd --help
```
The commands defined by `commands.json` will appear in the `SQUID COMMANDS` help sections.
Before using the `sqd` CLI tool, make sure all the project dependencies are installed:
```sh theme={"system"}
npm i
```
### Building the squid
```sh theme={"system"}
sqd build Build the squid project
sqd clean Delete all build artifacts
```
### Running the squid
Both `sqd up` and `sqd down` assume that the `docker compose` command is supported and the `docker` deamon is running. Modify the definitions in `commands.json` accordingly if `docker-compose` should be used instead.
```
sqd up Start a local PG database
sqd down Drop the local PG database
sqd run [PATH] Run all the services defined in squid.yaml locally
sqd serve Start the GraphQL server
sqd serve:prod Start the GraphQL API server with caching and limits
```
### DB migrations
Read [TypeORM Migration generation](/en/sdk/squid-sdk/resources/tools/migrations-gen) for details.
```
sqd migration:apply apply pending migrations
sqd migration:generate generate the migration for the schema defined in schema.graphql
sqd migration:clean clean the db/migrations folder
```
### Code generation
Consult [TypeORM Model generation](/en/sdk/squid-sdk/resources/tools/model-gen) for TypeORM model generation details, and [Type-safe decoding](https://docs.subsquid.io/sdk/resources/tools/typegen/) for type generation.
Depending on the template, `sqd typegen` is aliased to a different typegen tool specific to the chain type and thus has different usage. Consult `sqd typegen --help` for details.
```
sqd codegen Generate TypeORM entities from schema.graphql
sqd typegen Generate data access classes for an ABI file(s) in the ./abi folder
```
# Environment set up
Source: https://docs.sqd.dev/en/sdk/squid-sdk/how-to-start/development-environment-set-up
Prepare your development environment for Squid SDK — install Node, Docker, sqd CLI, and required tools for building TypeScript blockchain indexers.
### Node.js
To install Node.js:
Use the package manager of your distro or the [official binaries](https://nodejs.org/en/download).
Use the [official binaries](https://nodejs.org/en/download) or [Homebrew](https://nodejs.org/en/download/package-manager/#alternatives-2).
The best bet is to leverage WSL2 and follow [this guide](https://docs.microsoft.com/en-us/windows/dev-environment/javascript/nodejs-on-wsl). [Official installer](https://nodejs.org/en/download) also works.
Make sure that your Node.js installation is v16 or newer. To check an existing installation, run:
```bash theme={"system"}
node --version
```
### Squid CLI
Follow [these instructions](/en/sdk/squid-sdk/squid-cli/installation).
### Git
Squid CLI uses [Git](https://git-scm.com) to retrieve templates. To install it:
Use the package manager of your distro.
Use the [official installer](https://sourceforge.net/projects/git-osx-installer/) or any one of the [alternative approaches](https://git-scm.com/download/mac).
Use [Git for Windows](https://git-scm.com/download/win).
### Docker
Most squids [use a database](/en/sdk/squid-sdk/resources/persisting-data/typeorm) to store the processed data. Install Docker to manage local squid databases with convenience.
* [Here’s an instruction](https://docs.docker.com/engine/install/ubuntu/#install-using-the-repository) on how to install Docker on Ubuntu
* On other distros consider using their package managers
Install the [Desktop version](https://docs.docker.com/desktop/mac/install/).
Install the [Desktop version](https://docs.docker.com/desktop/windows/install/).
# Project structure
Source: https://docs.sqd.dev/en/sdk/squid-sdk/how-to-start/layout
Squid SDK project folder layout — entry points, schema file, processor source, migrations, manifest, environment, and conventions for clean structure.
# Squid project structure
All files and folders except `package.json` are optional.
* `package.json` -- Configuration file for dependencies and the build script (invoked with `npm run build`). Hard requirement for deploying to [SQD Cloud](/en/cloud).
* `package-lock.json` OR `yarn.lock` OR `pnpm-lock.yaml` -- Dependencies shrinkwrap. Required for [Cloud](/en/cloud) deployment, except those that [override the dependencies installation command](/en/cloud/reference/manifest#cmd).
* `tsconfig.json` -- Configuration of `tsc`. Required for most squids.
* [Deployment manifest](/en/cloud/reference/manifest) (`squid.yaml` by default) -- Definitions of squid services used for running it locally with [`sqd run`](/en/sdk/squid-sdk/squid-cli/run) and deploying to [SQD Cloud](/en/cloud).
* `.squidignore` -- Files and patterns to be excluded when sending the squid code to the [Cloud](/en/cloud). When not supplied, some files will still be omitted: see the [reference page](/en/cloud/reference/squidignore) for details.
* `schema.graphql` -- [The schema definition file](/en/sdk/squid-sdk/reference/schema-file). Required if your squid [stores its data in PostgreSQL](/en/sdk/squid-sdk/resources/persisting-data/typeorm).
* `/src` -- The TypeScript source code folder for the squid processor.
* `/src/main.ts` -- The entry point of the squid processor process. Typically, contains a `processor.run()` call.
* `/src/processor.ts` -- Processor object ([EVM](/en/sdk/squid-sdk/reference/processors/evm-batch) or [Substrate](/en/sdk/squid-sdk/reference/processors/substrate-batch)) definition and configuration.
* `/src/model/generated` -- The folder for the TypeORM entities generated from `schema.graphql`.
* `/src/model` -- The module exporting the entity classes.
* `/src/server-extension/resolvers` -- A folder for [user-defined GraphQL resolvers](/en/sdk/squid-sdk/reference/openreader-server/configuration/custom-resolvers) used by [OpenReader](/en/sdk/squid-sdk/reference/openreader-server).
* `/src/types` -- A folder for types generated by the Substrate [typegen](/en/sdk/squid-sdk/resources/tools/typegen) tool for use in data decoding.
* `/src/abi` -- A folder for modules generated by the EVM [typegen](/en/sdk/squid-sdk/resources/tools/typegen) tool containing type definitions and data decoding boilerplate code.
* `/db` -- The designated folder with the [database migrations](/en/sdk/squid-sdk/resources/persisting-data/typeorm).
* `/lib` -- The output folder for the compiled squid code.
* `/assets` -- A designated folder for custom user-provided files (e.g. static data files to seed the squid processor with).
* `/abi` -- A designated folder for JSON ABI files used as input by the EVM [typegen](/en/sdk/squid-sdk/resources/tools/typegen).
* `docker-compose.yml` -- A Docker compose file for local runs. Has a Postgres service definition by default.
* `.env` -- Defines environment variables used by `docker-compose.yml` and when the squid is run locally.
* `typegen.json` -- The config file for the Substrate [typegen](/en/sdk/squid-sdk/resources/tools/typegen) tool.
* `commands.json` -- [User-defined scripts](/en/sdk/squid-sdk/squid-cli/commands-json) picked up by [Squid CLI](/en/sdk/squid-sdk/squid-cli/commands-json). See also the [CLI cheatsheet](/en/sdk/squid-sdk/how-to-start/cli-cheatsheet).
# Development flow
Source: https://docs.sqd.dev/en/sdk/squid-sdk/how-to-start/squid-development
Recommended Squid SDK development workflow — model the schema, write the processor, run migrations, iterate locally, and deploy to SQD Cloud.
This page is a definitive end-to-end guide into practical squid development. It uses templates to simplify the process. Check out [Squid from scratch](/en/sdk/squid-sdk/how-to-start/squid-from-scratch) for a more educational barebones approach.
Feel free to also use the template-specific `sqd` scripts defined in [`commands.json`](/en/sdk/squid-sdk/squid-cli/commands-json) to simplify your workflow. See [sqd CLI cheatsheet](/en/sdk/squid-sdk/how-to-start/cli-cheatsheet) for a short intro.
## Prepare the environment
* Node v16.x or newer
* Git
* [Squid CLI](/en/sdk/squid-sdk/squid-cli/installation)
* Docker (if your squid will store its data to PostgreSQL)
See also the [Environment set up](/en/sdk/squid-sdk/how-to-start/development-environment-set-up) page.
## Understand your technical requirements
Consider your business requirements and find out
1. How the data should be delivered. Options:
* [PostgreSQL](/en/sdk/squid-sdk/resources/persisting-data/typeorm) with an optional [GraphQL API](/en/sdk/squid-sdk/resources/serving-graphql) - can be real-time
* [file-based dataset](/en/sdk/squid-sdk/resources/persisting-data/file) - local or on S3
* [Google BigQuery](/en/sdk/squid-sdk/resources/persisting-data/bigquery)
2. What data should be delivered
3. What are the technologies powering the blockchain(s) in question. Supported options:
* Ethereum Virtual Machine (EVM) chains like [Ethereum](https://ethereum.org) - [supported networks](/en/data/evm)
* [Substrate](https://substrate.io)-powered chains like [Polkadot](https://polkadot.network) and [Kusama](https://kusama.network) - [supported networks](/en/data/substrate)
Note that you can use SQD via [RPC ingestion](/en/sdk/squid-sdk/resources/unfinalized-blocks) even if your network is not listed.
4. What exact data should be retrieved from blockchain(s)
5. Whether you need to mix in any [off-chain data](/en/sdk/squid-sdk/resources/external-api)
#### Example requirements
Suppose you want to train a prototype ML model on all trades done on Uniswap Polygon since the v3 upgrade.
1. A delay of a few hours typically won't matter for training, so you may want to deliver the data as files for easier handling.
2. The output could be a simple list of swaps, listing pair, direction and token amounts for each.
3. Polygon is an EVM chain.
4. All the required data is contained within `Swap` events emitted by the pair pool contracts. Uniswap deploys these [dynamically](/en/sdk/squid-sdk/resources/evm/factory-contracts), so you will also have to capture `PoolCreated` events from the factory contract to know which `Swap` events are coming from Uniswap and map them to pairs.
5. No off-chain data will be necessary for this task.
Suppose you want to make a website that shows the image and ownership history for ERC721 NFTs from a certain Ethereum contract.
1. For this application it makes sense to deliver a GraphQL API.
2. Output data might have `Token`, `Owner` and `Transfer` database tables / [entities](/en/sdk/squid-sdk/reference/schema-file/entities), with e.g. `Token` supplying all the fields necessary to show ownership history and the image.
3. Ethereum is an EVM chain.
4. Data on token mints and ownership history can be derived from `Transfer(address,address,uint256)` EVM event logs emitted by the contract. To render images, you will also need token metadata URLs that are only available by [querying the contract state](/en/sdk/squid-sdk/resources/tools/typegen/state-queries) with the `tokenURI(uint256)` function.
5. You'll need to retrieve the off-chain token metadata (usually from IPFS).
Suppose you want to create a BigQuery dataset with Kusama native tokens transfers.
1. The delivery format is BigQuery.
2. A single table with `from`, `to` and `amount` columns may suffice.
3. Kusama is a Substrate chain.
4. The required data is available from `Transfer` events emitted by the `Balances` pallet. Take a look at our [Substrate data sourcing miniguide](/en/sdk/squid-sdk/resources/substrate/data-sourcing-miniguide) for more info on how to figure out which pallets, events and calls are necessary for your task.
5. No off-chain data will be necessary for this task.
## Start from a template
Although it is possible to [compose a squid from individual packages](/en/sdk/squid-sdk/how-to-start/squid-from-scratch), in practice it is usually easier to start from a template.
* A minimal template intended for developing EVM squids. Indexes ETH burns.
```bash theme={"system"}
sqd init my-squid-name -t evm
```
* A starter squid for indexing ERC20 transfers.
```bash theme={"system"}
sqd init my-squid-name -t https://github.com/subsquid-labs/squid-erc20-template
```
* Classic [example Subgraph](https://github.com/graphprotocol/example-subgraph) after a [migration](/en/sdk/squid-sdk/resources/migrate/migrate-subgraph) to SQD.
```bash theme={"system"}
sqd init my-squid-name -t gravatar
```
* A template showing how to [combine data from multiple chains](/en/sdk/squid-sdk/resources/multichain). Indexes USDC transfers on Ethereum and Binance.
```bash theme={"system"}
sqd init my-squid-name -t multichain
```
* USDC transfers -> local CSV
```bash theme={"system"}
sqd init my-squid-name -t https://github.com/subsquid-labs/file-store-csv-example
```
* USDC transfers -> local Parquet
```bash theme={"system"}
sqd init my-squid-name -t https://github.com/subsquid-labs/file-store-parquet-example
```
* USDC transfers -> CSV on S3
```bash theme={"system"}
sqd init my-squid-name -t https://github.com/subsquid-labs/file-store-s3-example
```
* USDC transfers -> BigQuery dataset
```bash theme={"system"}
sqd init my-squid-name -t https://github.com/subsquid-labs/squid-bigquery-example
```
* Native events emitted by Substrate-based chains
```bash theme={"system"}
sqd init my-squid-name -t substrate
```
* ink! smart contracts
```bash theme={"system"}
sqd init my-squid-name -t ink
```
* Frontier EVM contracts on Astar and Moonbeam
```bash theme={"system"}
sqd init my-squid-name -t frontier-evm
```
After retrieving the template of choice install its dependencies:
```bash theme={"system"}
cd my-squid-name
npm i
```
Test the template locally. The procedure varies depending on the data sink:
1. Launch a PostgreSQL container with
```bash theme={"system"}
docker compose up -d
```
2. Build the squid with
```bash theme={"system"}
npm run build
```
3. Apply the DB migrations with
```bash theme={"system"}
npx squid-typeorm-migration apply
```
4. Start the squid processor with
```bash theme={"system"}
node -r dotenv/config lib/main.js
```
You should see output that contains lines like these ones:
```bash theme={"system"}
04:11:24 INFO sqd:processor processing blocks from 6000000
04:11:24 INFO sqd:processor using archive data source
04:11:24 INFO sqd:processor prometheus metrics are served at port 45829
04:11:27 INFO sqd:processor 6051219 / 18079056, rate: 16781 blocks/sec, mapping: 770 blocks/sec, 544 items/sec, eta: 12m
```
5. Start the GraphQL server by running
```bash theme={"system"}
npx squid-graphql-server
```
in a separate terminal, then visit the [GraphiQL console](http://localhost:4350/graphql) to verify that the GraphQL API is up.
When done, shut down and erase your database with `docker compose down`.
1. (for the S3 template only) Set the credentials and prepare a bucket for your data as described in the [template README](https://github.com/subsquid-labs/file-store-s3-example/blob/main/README.md).
2. Build the squid with
```bash theme={"system"}
npm run build
```
3. Start the squid processor with
```bash theme={"system"}
node -r dotenv/config lib/main.js
```
The output should contain lines like these ones:
```bash theme={"system"}
04:11:24 INFO sqd:processor processing blocks from 6000000
04:11:24 INFO sqd:processor using archive data source
04:11:24 INFO sqd:processor prometheus metrics are served at port 45829
04:11:27 INFO sqd:processor 6051219 / 18079056, rate: 16781 blocks/sec, mapping: 770 blocks/sec, 544 items/sec, eta: 12m
```
You should see a `./data` folder populated with indexer data appear in a bit. A local folder looks like this:
```bash theme={"system"}
$ tree ./data/
./data/
├── 0000000000-0007242369
│ └── transfers.tsv
├── 0007242370-0007638609
│ └── transfers.tsv
...
└── status.txt
```
Create a dataset with your BigQuery account, then follow the [template README](https://github.com/subsquid-labs/squid-bigquery-example/blob/master/README.md).
## The bottom-up development cycle
The advantage of this approach is that the code remains buildable at all times, making it easier to catch issues early.
### I. Regenerate the task-specific utilities
Retrieve JSON ABIs for all contracts of interest (e.g. from Etherscan), taking care to get ABIs for implementation contracts and not [proxies](/en/sdk/squid-sdk/resources/evm/proxy-contracts) where appropriate. Assuming that you saved the ABI files to `./abi`, you can then regenerate the utilities with
```bash theme={"system"}
npx squid-evm-typegen ./src/abi ./abi/*.json --multicall
```
Or if you would like the tool to retrieve the ABI from Etherscan in your stead, you can run e.g.
```bash theme={"system"}
npx squid-evm-typegen \
src/abi \
0xdAC17F958D2ee523a2206206994597C13D831ec7#usdt
```
The utility classes will become available at `src/abi`.
See also [EVM typegen code generation](/en/sdk/squid-sdk/resources/tools/typegen/generation).
Follow the respective reference configuration pages of each typegen tool:
* [Substrate typegen configuration](/en/sdk/squid-sdk/resources/tools/typegen/generation)
* [ink! typegen configuration](/en/sdk/squid-sdk/resources/tools/typegen/generation)
These squids use both Substrate typegen *and* EVM typegen. To generate all the required utilities, [configure the Substrate part](/en/sdk/squid-sdk/resources/tools/typegen/generation), then save all relevant JSON ABIs to `./abi`, then run
```bash theme={"system"}
npx squid-evm-typegen ./src/abi ./abi/*.json --multicall
```
followed by
```bash theme={"system"}
npx squid-substrate-typegen ./typegen.json
```
### II. Configure the data requests
Data requests are [customarily](/en/sdk/squid-sdk/how-to-start/layout) defined at `src/processor.ts`. The details depend on the network type:
Edit the definition of `const processor` to
1. Use a data source appropriate for your chain and task.
* It is possible to [use RPC](/en/sdk/squid-sdk/reference/processors/evm-batch/general#set-rpc-endpoint) as the only data source, but [adding](/en/sdk/squid-sdk/reference/processors/evm-batch/general#set-gateway) a [SQD Network](/en/data/evm) data source will make your squid sync much faster.
* RPC is a hard requirement if you're building a real-time API.
* If you're using RPC as one of your data sources, make sure to [set the number of finality confirmations](/en/sdk/squid-sdk/reference/processors/evm-batch/general#set-finality-confirmation) so that [hot blocks ingestion](/en/sdk/squid-sdk/resources/unfinalized-blocks) works properly.
2. Request all [event logs](/en/sdk/squid-sdk/reference/processors/evm-batch/logs), [transactions](/en/sdk/squid-sdk/reference/processors/evm-batch/transactions), [execution traces](/en/sdk/squid-sdk/reference/processors/evm-batch/traces) and [state diffs](/en/sdk/squid-sdk/reference/processors/evm-batch/state-diffs) that your task requires, with any necessary related data (e.g. parent transactions for event logs).
3. [Select all data fields](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection) necessary for your task (e.g. `gasUsed` for transactions).
See [reference documentation](/en/sdk/squid-sdk/reference/processors/evm-batch) for more info and [processor configuration showcase](/en/sdk/squid-sdk/examples) for a representative set of examples.
Edit the definition of `const processor` to
1. Use a data source appropriate for your chain and task
* [Use](/en/sdk/squid-sdk/reference/processors/substrate-batch/general#set-gateway) a [SQD Network gateway](/en/data/substrate) whenever it is available. [RPC](/en/sdk/squid-sdk/reference/processors/evm-batch/general#set-rpc-endpoint) is still required in this case.
* For networks without a gateway use just the RPC.
2. Request all [events](/en/sdk/squid-sdk/reference/processors/substrate-batch/data-requests#events) and [calls](/en/sdk/squid-sdk/reference/processors/substrate-batch/data-requests#calls) that your task requires, with any necessary related data (e.g. parent extrinsics).
3. If your squid indexes any of the following:
* an [ink! contract](/en/sdk/squid-sdk/resources/substrate/ink)
* an EVM contract running on the [Frontier EVM pallet](/en/sdk/squid-sdk/resources/substrate/frontier-evm)
* [Gear messages](/en/sdk/squid-sdk/resources/substrate/gear)
then you can use some of the [specialized data requesting methods](/en/sdk/squid-sdk/reference/processors/substrate-batch/data-requests#specialized-setters) to retrieve data more selectively.
4. [Select all data fields](/en/sdk/squid-sdk/reference/processors/substrate-batch/field-selection) necessary for your task (e.g. `fee` for extrinsics).
See [reference documentation](/en/sdk/squid-sdk/reference/processors/substrate-batch) for more info. Processor config examples can be found in the tutorials:
* [general Substrate](/en/sdk/squid-sdk/tutorials/substrate)
* [ink!](/en/sdk/squid-sdk/tutorials/ink)
* [Frontier EVM](/en/sdk/squid-sdk/tutorials/frontier-evm)
### III. Decode and normalize the data
Next, change the batch handler to decode and normalize your data.
In templates, the batch handler is defined at the [`processor.run()`](/en/sdk/squid-sdk/reference/processors/architecture#processorrun) call in `src/main.ts` as an inline function. Its sole argument `ctx` contains:
* at `ctx.blocks`: all the requested data for a batch of blocks
* at `ctx.store`: the means to save the processed data
* at `ctx.log`: a [`Logger`](/en/sdk/squid-sdk/reference/logger)
* at `ctx.isHead`: a boolean indicating whether the batch is at the current chain head
* at `ctx._chain`: the means to access RPC for [state calls](#external-data)
This structure ([reference](/en/sdk/squid-sdk/reference/processors/architecture#batch-context)) is common for all processors; the structure of `ctx.blocks` items varies.
Each item in `ctx.blocks` contains the data for the requested logs, transactions, traces and state diffs for a particular block, plus some info on the block itself. See [EVM batch context reference](/en/sdk/squid-sdk/reference/processors/evm-batch/context-interfaces).
Use the `.decode` methods from the [contract ABI utilities](#typegen) to decode events and transactions, e.g.
```ts theme={"system"}
import * as erc20abi from './abi/erc20'
processor.run(db, async ctx => {
for (let block of ctx.blocks) {
for (let log of block.logs) {
if (log.topics[0]===erc20abi.events.Transfer.topic) {
let {from, to, value} = erc20.events.Transfer.decode(log)
}
}
}
})
```
See also the [EVM data decoding](/en/sdk/squid-sdk/resources/tools/typegen/decoding).
Each item in `ctx.blocks` contains the data for the requested events, calls and, if requested, any related extrinsics; it also has some info on the block itself. See [Substrate batch context reference](/en/sdk/squid-sdk/reference/processors/substrate-batch/context-interfaces).
Use the `.is()` and `.decode()` functions to decode the data for each runtime version, e.g. like this:
```ts theme={"system"}
import {events} from './types'
processor.run(db, async ctx => {
for (let block of ctx.blocks) {
for (let event of block.events) {
if (event.name == events.balances.transfer.name) {
let rec: {from: string; to: string; amount: bigint}
if (events.balances.transfer.v1020.is(event)) {
let [from, to, amount] = events.balances.transfer.v1020.decode(event)
rec = {from, to, amount}
}
else if (events.balances.transfer.v1050.is(event)) {
let [from, to, amount] = events.balances.transfer.v1050.decode(event)
rec = {from, to, amount}
}
else if (events.balances.transfer.v9130.is(event)) {
rec = events.balances.transfer.v9130.decode(event)
}
else {
throw new Error('Unsupported spec')
}
}
}
}
})
```
See also the [Substrate data decoding](/en/sdk/squid-sdk/resources/tools/typegen/decoding).
You can also decode the data of certain pallet-specific events and transactions with specialized tools:
* use the utility classes made with `@substrate/squid-ink-typegen` to [decode events emitted by ink! contracts](/en/sdk/squid-sdk/resources/tools/typegen/decoding)
* use the [`@subsquid/frontier` utils](/en/sdk/squid-sdk/reference/frontier) and the [EVM typegen](/en/sdk/squid-sdk/resources/tools/typegen/decoding) to decode event logs and transactions of EVM contracts
### (Optional) IV. Mix in external data and chain state calls output
If you need external (i.e. non-blockchain) data in your transformation, take a look at the [External APIs and IPFS](/en/sdk/squid-sdk/resources/external-api) page.
If any of the on-chain data you need is unavalable from the processor or incovenient to retrieve with it, you have an option to get it via [direct chain queries](/en/sdk/squid-sdk/resources/tools/typegen/state-queries).
### V. Prepare the store
At `src/main.ts`, change the [`Database`](/en/sdk/squid-sdk/resources/persisting-data/overview) object definition to accept your output data. The methods for saving data will be exposed by `ctx.store` within the [batch handler](/en/sdk/squid-sdk/reference/processors/architecture).
1. Define the schema of the database (and the [core schema of the OpenReader GraphQL API](/en/sdk/squid-sdk/reference/openreader-server/api) if it is used) at [`schema.graphql`](/en/sdk/squid-sdk/reference/schema-file).
2. Regenerate the TypeORM model classes with
```bash theme={"system"}
npx squid-typeorm-codegen
```
The classes will become available at `src/model`.
3. Compile the models code with
```bash theme={"system"}
npm run build
```
4. Ensure that the squid has access to a blank database. The easiest way to do so is to start PostgreSQL in a Docker container with
```bash theme={"system"}
docker compose up -d
```
If the container is running, stop it and erase the database with
```bash theme={"system"}
docker compose down
```
before issuing an `docker compose up -d`.
The alternative is to connect to an external database. See [this section](/en/sdk/squid-sdk/reference/store/typeorm#database-connection-parameters) to learn how to specify the connection parameters.
5. Regenerate a migration with
```bash theme={"system"}
rm -r db/migrations
```
```bash theme={"system"}
npx squid-typeorm-migration generate
```
You can now use the async functions [`ctx.store.upsert()`](/en/sdk/squid-sdk/reference/store/typeorm#upsert) and [`ctx.store.insert()`](/en/sdk/squid-sdk/reference/store/typeorm#insert), as well as various [TypeORM lookup methods](/en/sdk/squid-sdk/reference/store/typeorm#typeorm-methods) to access the database.
See the `typeorm-store` [guide](/en/sdk/squid-sdk/resources/persisting-data/typeorm) and [reference](/en/sdk/squid-sdk/reference/store/typeorm) for more info.
Filesystem dataset writing, as performed by the `@subsquid/file-store` package and its extensions, stores the data into one or more flat tables. The exact table definition format depends on the output file format.
1. Decide on the file format you're going to use:
* [Parquet](/en/sdk/squid-sdk/reference/store/file/parquet)
* [CSV](/en/sdk/squid-sdk/reference/store/file/csv)
* [JSON/JSONL](/en/sdk/squid-sdk/reference/store/file/json)
If your template does not have any of the necessary packages, install them.
2. Define any tables you need at the `tables` field of the `Database` constructor argument:
```ts theme={"system"}
import { Database } from '@subsquid/file-store'
const dbOptions = {
tables: {
FirstTable: new Table(/* ... */),
SecondTable: new Table(/* ... */),
// ...
},
// ...
}
processor.run(new Database(dbOptions), async ctx => { // ...
```
3. Define the destination filesystem via the `dest` field of the `Database` constructor argument. Options:
* local folder - use `LocalDest` from `@subsquid/file-store`
* S3-compatible file storage service - install `@subsquid/file-store-s3` and use [`S3Dest`](/en/sdk/squid-sdk/reference/store/file/s3-dest)
Once you're done you'll be able to enqueue data rows for saving using the `write()` and `writeMany()` methods of the context store-provided table objects:
```ts theme={"system"}
ctx.store.FirstTable.writeMany(/* ... */)
ctx.store.SecondTable.write(/* ... */)
```
The store will write the files automatically as soon as the buffer reaches the size set by the `chunkSizeMb` field of the `Database` constructor argument, or at the end of the batch if a call to [`setForceFlush()`](/en/sdk/squid-sdk/resources/persisting-data/file#setforceflush) was made anywhere in the batch handler.
See the `file-store` [guide](/en/sdk/squid-sdk/resources/persisting-data/file) and the [reference pages of its extensions](/en/sdk/squid-sdk/reference/store/file).
Follow the [guide](/en/sdk/squid-sdk/resources/persisting-data/bigquery).
### VI. Persist the transformed data to your data sink
Once your data is [decoded](#batch-handler-decoding), optionally [enriched with external data](#external-data) and transformed the way you need it to be, it is time to save it.
For each batch, create all the instances of all TypeORM model classes at once, then save them with the minimal number of calls to `upsert()` or `insert()`, e.g.:
```ts theme={"system"}
import { EntityA, EntityB } from './model'
processor.run(new TypeormDatabase(), async ctx => {
const aEntities: Map = new Map() // id -> entity instance
const bEntities: EntityB = []
for (let block of ctx.blocks) {
// fill the containets aEntities and bEntities
}
await ctx.store.upsert([...aEntities.values()])
await ctx.store.insert(bEntities)
})
```
It will often make sense to keep the entity instances in maps rather than arrays to make it easier to reuse them when defining instances of other entities with [relations](/en/sdk/squid-sdk/reference/schema-file/entity-relations) to the previous ones. The process is described in more detail in the [step 2 of the BAYC tutorial](/en/sdk/squid-sdk/tutorials/bayc/step-two-deriving-owners-and-tokens).
If you perform any [database lookups](/en/sdk/squid-sdk/reference/store/typeorm#typeorm-methods), try to do so in batches and make sure that the entity fields that you're searching over are [indexed](/en/sdk/squid-sdk/reference/schema-file/indexes-and-constraints).
See also the [patterns](/en/sdk/squid-sdk/resources/batch-processing#patterns) and [anti-pattens](/en/sdk/squid-sdk/resources/batch-processing#anti-patterns) sections of the Batch processing guide.
You can enqueue the transformed data for writing whenever convenient without any sizeable impact on performance.
At low output data rates (e.g. if your entire dataset is in tens of Mbytes or under) take care to call [`ctx.store.setForceFlush()`](/en/sdk/squid-sdk/resources/persisting-data/file#setforceflush) when appropriate to make sure your data actually gets written.
You can enqueue the transformed data for writing whenever convenient without any sizeable impact on performance. The actual data writing will happen automatically at the end of each batch.
## The top-down development cycle
The [bottom-up development cycle](#bottom-up-development) described above is convenient for inital squid development and for trying out new things, but it has the disadvantage of not having the means of saving the data ready at hand when initially writing the data decoding/transformation code. That makes it necessary to come back to that code later, which is somewhat inconvenient e.g. when adding new squid features incrementally.
The alternative is to do the same steps in a different order:
1. [Update the store](#store)
2. If necessary, [regenerate the utility classes](#typegen)
3. [Update the processor configuration](#processor-config)
4. [Decode and normalize the added data](#batch-handler-decoding)
5. [Retrieve any external data](#external-data) if necessary
6. [Add the persistence code for the transformed data](#batch-handler-persistence)
## GraphQL options
[Store your data to PostgreSQL](/en/sdk/squid-sdk/resources/persisting-data/typeorm), then consult [Serving GraphQL](/en/sdk/squid-sdk/resources/serving-graphql) for options.
## Scaling up
If you're developing a large squid, make sure to use [batch processing](/en/sdk/squid-sdk/resources/batch-processing) throughout your code.
A common mistake is to make handlers for individual event logs or transactions; for updates that require data retrieval that results in lots of small database lookups and ultimately in poor syncing performance. Collect all the relevant data and process it at once. A simple architecture of that type is discussed in the [BAYC tutorial](/en/sdk/squid-sdk/tutorials/bayc).
You should also check the [Cloud best practices page](/en/cloud/resources/best-practices) even if you're not planning to deploy to [SQD Cloud](/en/cloud) - it contains valuable performance-related tips.
Many issues commonly arising when developing larger squids are addressed by the third party [`@belopash/typeorm-store` package](https://github.com/belopash/squid-typeorm-store). Consider using it.
For complete examples of complex squids take a look at the [Giant Squid Explorer](https://github.com/subsquid-labs/giant-squid-explorer) and [Thena Squid](https://github.com/subsquid-labs/thena-squid) repos.
## Next steps
* Learn about [batch processing](/en/sdk/squid-sdk/resources/batch-processing).
* Learn how squid deal with [unfinalized blocks](/en/sdk/squid-sdk/resources/unfinalized-blocks).
* [Use external APIs and IPFS](/en/sdk/squid-sdk/resources/external-api) in your squid.
* See how squid should be set up for the [multichain setting](/en/sdk/squid-sdk/resources/multichain).
* Deploy your squid [on own infrastructure](/en/sdk/squid-sdk/resources/self-hosting) or to [SQD Cloud](/en/cloud).
# Indexer from scratch
Source: https://docs.sqd.dev/en/sdk/squid-sdk/how-to-start/squid-from-scratch
Build a Squid SDK indexer from scratch — learn how to compose evm-processor, typeorm-store, and graphql-server NPM packages without a starter template.
Here's an example of how SDK packages can be combined into a working indexer (called *squid*).
This page goes through all the technical details to make the squid architecture easier to understand. If you would like to get to a working indexer ASAP, [bootstrap from a template](/en/sdk/squid-sdk/how-to-start/squid-development#templates).
## USDT transfers API
**Pre-requisites**: NodeJS 20.x or newer, Docker.
Suppose the task is to track transfers of USDT on Ethereum, then save the resulting data to PostgreSQL and serve it as a GraphQL API. From this description we can immediately put together a list of [packages](/en/sdk/squid-sdk/overview):
* `@subsquid/evm-processor` - for retrieving Ethereum data
* the triad of `@subsquid/typeorm-store`, `@subsquid/typeorm-codegen` and `@subsquid/typeorm-migration` - for saving data to PostgreSQL
We also assume the following choice of *optional* packages:
* `@subsquid/evm-typegen` - for decoding Ethereum data and useful constants such as event topic0 values
* `@subsquid/evm-abi` - as a peer dependency for the code generated by `@subsquid/evm-typegen`
* `@subsquid/graphql-server` / [OpenReader](/en/sdk/squid-sdk/reference/openreader-server)
To make the indexer, follow these steps:
* create `package.json`
```bash theme={"system"}
npm init
```
* add `.gitignore`
```bash title=".gitignore" theme={"system"}
node_modules
lib
```
```bash theme={"system"}
npm i dotenv typeorm @subsquid/evm-processor @subsquid/typeorm-store @subsquid/typeorm-migration @subsquid/graphql-server @subsquid/evm-abi
```
```bash theme={"system"}
npm i typescript @subsquid/typeorm-codegen @subsquid/evm-typegen --save-dev
```
```json title="tsconfig.json" theme={"system"}
{
"compilerOptions": {
"rootDir": "src",
"outDir": "lib",
"module": "commonjs",
"target": "es2020",
"esModuleInterop": true,
"skipLibCheck": true,
"experimentalDecorators": true,
"emitDecoratorMetadata": true
}
}
```
Define the schema for both the database and the core GraphQL API in [`schema.graphql`](/en/sdk/squid-sdk/reference/schema-file):
```graphql title="schema.graphql" theme={"system"}
type Transfer @entity {
id: ID!
from: String! @index
to: String! @index
value: BigInt!
}
```
```bash theme={"system"}
npx squid-typeorm-codegen
```
The TypeORM classes are now available at `src/model/index.ts`.
* create `.env` and `docker-compose.yaml` files
```bash title=".env" theme={"system"}
DB_NAME=squid
DB_PORT=23798
RPC_ETH_HTTP=https://rpc.ankr.com/eth
```
```yaml title="docker-compose.yaml" theme={"system"}
services:
db:
image: postgres:15
environment:
POSTGRES_DB: "${DB_NAME}"
POSTGRES_PASSWORD: postgres
ports:
- "${DB_PORT}:5432"
```
* start the database container
```bash theme={"system"}
docker compose up -d
```
* compile the TypeORM classes
```bash theme={"system"}
npx tsc
```
* generate the migration file
```bash theme={"system"}
npx squid-typeorm-migration generate
```
* apply the migration with
```bash theme={"system"}
npx squid-typeorm-migration apply
```
[Generate utility classes](/en/sdk/squid-sdk/resources/tools/typegen/generation) for decoding [USDT contract](https://etherscan.io/address/0xdac17f958d2ee523a2206206994597c13d831ec7) data based on its ABI.
* Create an `./abi` folder:
```bash theme={"system"}
mkdir abi
```
* Find the ABI at the ["Contract" tab of the contract page on Etherscan](https://etherscan.io/address/0xdAC17F958D2ee523a2206206994597C13D831ec7#code). Scroll down a bit:
* Copy the ABI, then paste to a new file at `./abi/usdt.json`.
* Run the utility classes generator:
```bash theme={"system"}
npx squid-evm-typegen src/abi ./abi/*
```
The utility classes are now available at `src/abi/usdt.ts`
Tie all the generated code together with a `src/main.ts` executable with the following code blocks:
* Imports
```ts theme={"system"}
import { EvmBatchProcessor } from '@subsquid/evm-processor'
import { TypeormDatabase } from '@subsquid/typeorm-store'
import * as usdtAbi from './abi/usdt'
import { Transfer } from './model'
```
* [`EvmBatchProcessor`](/en/sdk/squid-sdk/reference/processors/evm-batch) object definition
```ts theme={"system"}
const processor = new EvmBatchProcessor()
.setGateway('https://v2.archive.subsquid.io/network/ethereum-mainnet')
.setRpcEndpoint({
url: process.env.RPC_ETH_HTTP,
rateLimit: 10
})
.setFinalityConfirmation(75) // 15 mins to finality
.addLog({
address: [ '0xdAC17F958D2ee523a2206206994597C13D831ec7' ],
topic0: [ usdtAbi.events.Transfer.topic ]
})
```
* [`TypeormDatabase`](/en/sdk/squid-sdk/reference/store/typeorm) object definition
```ts theme={"system"}
const db = new TypeormDatabase()
```
* A call to [`processor.run()`](/en/sdk/squid-sdk/reference/processors/architecture#processorrun) with an inline definition of the [batch handler](/en/sdk/squid-sdk/reference/processors/architecture#batch-context)
```ts theme={"system"}
processor.run(db, async ctx => {
const transfers: Transfer[] = []
for (let block of ctx.blocks) {
for (let log of block.logs) {
let {from, to, value} = usdtAbi.events.Transfer.decode(log)
transfers.push(new Transfer({
id: log.id,
from, to, value
}))
}
}
await ctx.store.insert(transfers)
})
```
Note how supplying a `TypeormDatabase` to the function caused `ctx.store` to be a [PostgreSQL-compatible `Store` object](/en/sdk/squid-sdk/reference/store/typeorm#store-interface).
Compile the project and start the [processor process](/en/sdk/squid-sdk/overview#processor)
```bash theme={"system"}
npx tsc
```
```bash theme={"system"}
node -r dotenv/config lib/main.js
```
In a separate terminal, configure the GraphQL port and start the GraphQL server:
```diff title=".env" theme={"system"}
DB_NAME=squid
DB_PORT=23798
RPC_ETH_HTTP=https://rpc.ankr.com/eth
+GRAPHQL_SERVER_PORT=4350
```
```bash theme={"system"}
npx squid-graphql-server
```
The finished GraphQL API with GraphiQL is available at [localhost:4350/graphql](http://localhost:4350/graphql).
Final code for this mini-tutorial is available in [this repo](https://github.com/subsquid-labs/squid-from-scratch).
The commands listed here are often abbreviated as [custom `sqd` commands](/en/sdk/squid-sdk/squid-cli/commands-json) in squids. If you'd like to do that too you can use the [`commands.json` file of the EVM template](https://github.com/subsquid-labs/squid-evm-template/blob/main/commands.json) as a starter.
# Squid SDK Overview
Source: https://docs.sqd.dev/en/sdk/squid-sdk/overview
Squid SDK architecture overview — processor, batch handlers, schema, and Store roles in building TypeScript blockchain indexers with GraphQL APIs.
A *squid* is an indexing project built with [Squid SDK](https://github.com/subsquid/squid-sdk) to retrieve and process blockchain data from the [SQD Network](/en/network/overview)
(either permissioned or decentralized instance). The Squid SDK is a set of open source Typescript libraries that retrieve, decode, transform and persist the data. It can also make the transformed data available via an API. All stages of the indexing pipeline, from the data extraction to transformation to persistence are performed on [batches of blocks](/en/sdk/squid-sdk/resources/batch-processing) to maximize the indexing speed. Modular architecture of the SDK makes it possible to extend indexing projects (squids) with custom plugins and data targets.
## Required squid components
### Processor
*Processor* is the word used for
1. The main NodeJS process of the squid.
2. The main object (`processor`) of this process: its method call `processor.run()` is the entry point.
`processor` objects handle data retrieval and transformation; data persistence is handled by a separate object called [Store](#store). Squid SDK offers two processor classes:
* [`EvmBatchProcessor`](/en/sdk/squid-sdk/reference/processors/evm-batch) via the [`@subsquid/evm-processor`](https://www.npmjs.com/package/@subsquid/evm-processor) NPM package - for Ethereum-compatible [networks](/en/data/evm)
* [`SubstrateBatchProcessor`](/en/sdk/squid-sdk/reference/processors/substrate-batch) via [`@subsquid/substrate-processor`](https://www.npmjs.com/package/@subsquid/substrate-processor) - for [networks](/en/data/substrate) based on [Substrate](https://substrate.io) such as [Polkadot](https://polkadot.network)
### Store
A *store* is an object that processors use to persist their data. SQD offers three store classes:
* [`TypeormStore`](/en/sdk/squid-sdk/resources/persisting-data/typeorm) for saving data to PostgreSQL, via
* [`@subsquid/typeorm-store`](https://www.npmjs.com/package/@subsquid/typeorm-store)
* [`@subsquid/typeorm-codegen`](https://www.npmjs.com/package/@subsquid/typeorm-codegen) (a code generator, install with `--save-dev`)
* [`@subsquid/typeorm-migration`](https://www.npmjs.com/package/@subsquid/typeorm-migration)
Install all three packages to use this store.
* [`file-store`](/en/sdk/squid-sdk/resources/persisting-data/file) via [`@subsquid/file-store`](https://www.npmjs.com/package/@subsquid/file-store) - for saving data to filesystems. It is a modular system with a variety of [extensions](/en/sdk/squid-sdk/reference/store/file) for various formats and destinations.
* [`bigquery-store`](/en/sdk/squid-sdk/resources/persisting-data/bigquery) via [`@subsquid/bigquery-store`](https://www.npmjs.com/package/@subsquid/bigquery-store) - for saving data to [Google BigQuery](https://cloud.google.com/bigquery).
You can mix and match any store class with any processor class.
## Optional squid components
### Typegen
A *typegen* is a tool for generating utility code for technology-specific operations such as decoding. Here are the typegens available:
* [`squid-evm-typegen`](/en/sdk/squid-sdk/resources/tools/typegen/generation) via [`@subsquid/evm-typegen`](https://www.npmjs.com/package/@subsquid/evm-typegen):
* decodes smart contract data
* handles direct calls to contract methods
* exposes useful constants such as event topics and function signature hashes
The generated code depends on [`@subsquid/evm-abi`](https://www.npmjs.com/package/@subsquid/evm-abi), SQD's own high performance, open source EVM codec.
* [`squid-substrate-typegen`](/en/sdk/squid-sdk/resources/tools/typegen/generation) via [`@subsquid/substrate-typegen`](https://www.npmjs.com/package/@subsquid/substrate-typegen):
* general purpose pallet data decoding (aware of runtime versions)
* handles direct storage queries
* [`squid-ink-typegen`](/en/sdk/squid-sdk/resources/tools/typegen/generation) via [`@subsquid/ink-typegen`](https://www.npmjs.com/package/@subsquid/ink-typegen) for decoding the data of [ink!](https://use.ink) contracts
Install these with `--save-dev`.
### GraphQL server
Squids that store their data in PostgreSQL can subsequently make it available as a GraphQL API via a variety of supported servers. See [Serving GraphQL](/en/sdk/squid-sdk/resources/serving-graphql).
Among other alternatives, SQD provides its own server called [OpenReader](/en/sdk/squid-sdk/reference/openreader-server) via the [`@subsquid/graphql-server`](https://www.npmjs.com/package/@subsquid/graphql-server) package. The server runs as a separate process. [Core API](/en/sdk/squid-sdk/reference/openreader-server/api) is automatically derived from the schema file; it is possible to extend it with [custom queries](/en/sdk/squid-sdk/reference/openreader-server/configuration/custom-resolvers) and [basic access control](/en/sdk/squid-sdk/reference/openreader-server/configuration/authorization).
### Misc utilities
* [Squid CLI](/en/sdk/squid-sdk/squid-cli/installation) is a utility for [retrieving squid templates](/en/sdk/squid-sdk/squid-cli/init), managing [chains of commands](https://github.com/subsquid/squid-sdk/tree/master/util/commands) commonly used in development and running [all squid processes at once](/en/sdk/squid-sdk/squid-cli/run). It can also be used for deploying to [SQD Cloud](/en/cloud).
* [Squid CLI](/en/sdk/squid-sdk/squid-cli/installation) is a utility for [retrieving squid templates](/en/sdk/squid-sdk/squid-cli/init), managing [chains of commands](https://github.com/subsquid/squid-sdk/tree/master/util/commands) commonly used in development and running [all squid processes at once](/en/sdk/squid-sdk/squid-cli/run). It can also be used for deploying to [SQD Cloud](/en/cloud).
* [`@subsquid/ss58`](https://www.npmjs.com/package/@subsquid/ss58) handles encoding and decoding of SS58 addresses
* [`@subsquid/frontier`](https://www.npmjs.com/package/@subsquid/frontier) decodes events and calls of the [Frontier EVM pallet](https://paritytech.github.io/frontier/frame/evm.html) to make them decodable with [`squid-evm-typegen`](/en/sdk/squid-sdk/resources/tools/typegen/generation)
# Squid SDK Quickstart
Source: https://docs.sqd.dev/en/sdk/squid-sdk/quickstart
5-minute Squid SDK quickstart — build a TypeScript indexer that streams USDC transfers from Ethereum into local Postgres for GraphQL queries.
This is a 5 min quickstart on how to build an indexer using Squid SDK.
The indexer (squid) will:
* Fetch all historical USDC transfers on Ethereum from the SQD Network
* Decode it
* Save it to a local Postgres database
* Start a GraphQL server with a rich API to query the historical USDC transfers.
## Prerequisites
* (On Windows) [WSL](https://learn.microsoft.com/en-us/windows/wsl/install)
* Node.js v18+
* Git
* Docker (for running Postgres)
Install Squid CLI:
```bash theme={"system"}
npm i -g @subsquid/cli
```
Squid CLI is a multi-purpose utility tool for scaffolding and managing the indexers, both locally and in SQD Cloud.
Scaffold the indexer project (squid) from an example repo using Squid CLI:
```bash theme={"system"}
sqd init hello-squid -t https://github.com/subsquid-labs/showcase01-all-usdc-transfers
cd hello-squid
```
The example is just a public GitHub repo with a squid project pre-configured to index USDC.
Inspect the `./src` folder:
Here,
* `src/abi/usdc` is a utility module generated from the JSON ABI of the USDC contract. It contains methods for event decoding, direct RPC queries and some useful constants.
* `src/model` contains TypeORM model classes autogenerated from `schema.graphql`. Squids use them to populate Postgres.
`main.ts` is the main executable. In this example, it also contains all the data retrieval configuration:
```ts theme={"system"}
const processor = new EvmBatchProcessor()
// SQD Network gateways are the primary source of blockchain data in
// squids, providing pre-filtered data in chunks of roughly 1-10k blocks.
// Set this for a fast sync.
.setGateway('https://v2.archive.subsquid.io/network/ethereum-mainnet')
// Another data source squid processors can use is chain RPC.
// In this particular squid it is used to retrieve the very latest chain data
// (including unfinalized blocks) in real time. It can also be used to
// - make direct RPC queries to get extra data during indexing
// - sync a squid without a gateway (slow)
.setRpcEndpoint('https://rpc.ankr.com/eth')
// The processor needs to know how many newest blocks it should mark as "hot".
// If it detects a blockchain fork, it will roll back any changes to the
// database made due to orphaned blocks, then re-run the processing for the
// main chain blocks.
.setFinalityConfirmation(75)
// .addXXX() methods request data items. In this case we're asking for
// Transfer(address,address,uint256) event logs emitted by the USDC contract.
//
// We could have omitted the "address" filter to get Transfer events from
// all contracts, or the "topic0" filter to get all events from the USDC
// contract, or both to get all event logs chainwide. We also could have
// requested some related data, such as the parent transaction or its traces.
//
// Other .addXXX() methods (.addTransaction(), .addTrace(), .addStateDiff()
// on EVM) are similarly feature-rich.
.addLog({
range: { from: 6_082_465 },
address: [USDC_CONTRACT_ADDRESS],
topic0: [usdcAbi.events.Transfer.topic],
})
// .setFields() is for choosing data fields for the selected data items.
// Here we're requesting hashes of parent transaction for all event logs.
.setFields({
log: {
transactionHash: true,
},
})
```
The rest of the file is about data processing and storage:
```ts theme={"system"}
// TypeormDatabase objects store the data to Postgres. They are capable of
// handling the rollbacks that occur due to blockchain forks.
//
// There are also Database classes for storing data to files and BigQuery
// datasets.
const db = new TypeormDatabase({supportHotBlocks: true})
// The processor.run() call executes the data processing. Its second argument is
// the handler function that is executed once on each batch of data. Processor
// object provides the data via "ctx.blocks". However, the handler can contain
// arbitrary TypeScript code, so it's OK to bring in extra data from IPFS,
// direct RPC calls, external APIs etc.
processor.run(db, async (ctx) => {
// Making the container to hold that which will become the rows of the
// usdc_transfer database table while processing the batch. We'll insert them
// all at once at the end, massively saving IO bandwidth.
const transfers: UsdcTransfer[] = []
// The data retrieved from the SQD Network gatewat and/or the RPC endpoint
// is supplied via ctx.blocks
for (let block of ctx.blocks) {
// On EVM, each block has four iterables - logs, transactions, traces,
// stateDiffs
for (let log of block.logs) {
if (log.address === USDC_CONTRACT_ADDRESS &&
log.topics[0] === usdcAbi.events.Transfer.topic) {
// SQD's very own EVM codec at work - about 20 times faster than ethers
let {from, to, value} = usdcAbi.events.Transfer.decode(log)
transfers.push(new UsdcTransfer({
id: log.id,
block: block.header.height,
from,
to,
value,
txnHash: log.transactionHash
}))
}
}
}
// Just one insert per batch!
await ctx.store.insert(transfers)
})
```
Install the dependencies and build
```bash theme={"system"}
npm i
npm run build
```
The processor is a background process that continously fetches the data, decodes it and stores it in a local Postgres. All the logic is defined in `main.ts` and is fully customizable.
To run the processor, we first start a local Postgres where the decoded data is persisted (the template comes with a Docker compose file):
```bash theme={"system"}
docker compose up -d
```
Processor will connect to Postgres using the connection parameters from `.env`.
Apply database migrations with
```bash theme={"system"}
npx squid-typeorm-migration apply
```
then start the processor with
```bash theme={"system"}
node -r dotenv/config lib/main.js
```
The indexer is now running.
Start the GraphQL API serving the transfers data from Postgres:
```bash theme={"system"}
npx squid-graphql-server
```
The server comes with a GraphQL playground available at [`localhost:4350/graphql`](http://localhost:4350/graphql).
Query the data!
# AND/OR filters
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/openreader-server/api/and-or-filters
Combine GraphQL query filters with AND/OR operators in OpenReader — basic logic combinators for refining entity queries served by Squid SDK.
## Overview
Our GraphQL implementation offers a vast selection of tools to filter and section results. One of these is the `where` clause, very common in most database query languages and [explained here](/en/sdk/squid-sdk/reference/openreader-server/api/queries#filter-query-results--search-queries) in detail.
In our GraphQL server implementation, we included logical operators to be used in the `where` clause, allowing to group multiple parameters in the same `where` argument using the `AND` and `OR` operators to filter results based on more than one criteria.
Note that the [newer](/en/sdk/squid-sdk/reference/openreader-server/overview#supported-queries) and [more advanced](/en/sdk/squid-sdk/reference/openreader-server/api/paginate-query-results) `{entityName}sConnection` queries support exactly the same format of the `where` argument as the older `{entityName}s` queries used in the examples provided here.
### Example of an `OR` clause:
Fetch a list of `accounts` that either have a balance bigger than a certain amount, or have a specific id.
```graphql theme={"system"}
query {
accounts(
orderBy: balance_DESC,
where: {
OR: [
{balance_gte: "240000000000000000"}
{id_eq: "CksmaBx9rKUG9a7eXwc5c965cJ3QiiC8ELFsLtJMYZYuRWs"}
]
}
) {
balance
id
}
}
```
### Example of `AND` clause:
Fetch a list of `accounts` that have a balance between two specific amounts:
```graphql theme={"system"}
query {
accounts(
orderBy: balance_DESC,
where: {
AND: [
{balance_lte: "240000000000000000"}
{balance_gte: "100000000000000"}
]
}
) {
balance
id
}
}
```
# Cross-relation queries
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/openreader-server/api/cross-relation-field-queries
Filter Squid SDK GraphQL queries by fields of related entities — cross-relation predicates in OpenReader for traversing nested data in queries.
# Cross-relation field queries
## Introduction
The [previous section](/en/sdk/squid-sdk/reference/openreader-server/api/nested-field-queries) has already demonstrated that queries can return not just scalars such as a String, but also fields that refer to object or entity types. What's even more interesting is that queries can leverage fields of related objects to filter results.
Let's take this sample schema with two entity types and a one-to-many relationship between them:
```graphql title="schema.graphql" theme={"system"}
type Account @entity {
id: ID!
wallet: String!
balance: Int!
history: [HistoricalBalance!] @derivedFrom(field: "account")
}
type HistoricalBalance @entity {
"Unique identifier"
id: ID!
"Related account"
account: Account!
"Balance"
balance: Int!
}
```
With the functionality offered by cross-relation field queries, we could ask for `Account`s that have at least some `historicalBalance`s with a `balance` smaller than a certain threshold:
```graphql theme={"system"}
query MyQuery {
accounts(where: {historicalBalances_some: {balance_lt: "10000000000"}}) {
id
}
}
```
This allows to query based not just on the entity itself, but on the related entities as well, which is intuitively a very powerful feature.
`*_some` is not the only operator available for making cross-relation field queries. A short description of each such operator is provided in the sections below.
## The `*_every` filter
Returns entities for which **all** of the nested entities linked via the related field satisfy the condition. Example:
```graphql title="schema.graphql" theme={"system"}
query MyQuery {
accounts(where: {historicalBalances_every: {balance_lt: "10000000000"}}) {
id
}
}
```
This query will return all `Account`s where **each and every one** of the `HistoricalBalance` entities related to them have a `balance` smaller than the threshold. It is sufficient for a single `HistoricalBalance` to have a `balance` larger than the set value to make sure that the related `Account` is not returned in the query.
## The `*_none` filter
Returns entities for which **none** of the nested entities linked via the related field satisfy the condition. Example:
```graphql theme={"system"}
query MyQuery {
accounts(where: {historicalBalances_none: {balance_lt: "10000000000"}}) {
id
}
}
```
The query will return all `Account`s in which not a single related `HistoricalBalance` has a `balance` smaller than the set threshold.
## The `*_some` filter
Returns entities for which **at least one** of the nested entities linked via the related field satisfies the condition. Example:
```graphql theme={"system"}
query MyQuery {
accounts(where: {historicalBalances_some: {balance_lt: "10000000000"}}) {
id
}
}
```
All `Account`s that have at least some `historicalBalance`s with a `balance` smaller than `10000000000` will be returned. This means that a single `HistoricalBalance` satisfying the condition is sufficient for the related `Account` to become a part of the results.
## `{entityName}sConnection` queries
Same as always, the `where` argument works for these queries in exactly the same way as it does for `{entityName}s` queries used in examples above. For example this query
```graphql theme={"system"}
query MyQuery {
accountsConnection(orderBy: id_ASC, where: {historicalBalances_some: {balance_lt: "10000000000"}}) {
edges {
node {
id
}
}
}
}
```
will return (in an appropriately shaped response) IDs for all `Accounts` that have at least some `historicalBalance`s with a `balance` smaller than `10000000000`.
# OpenReader GraphQL Intro
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/openreader-server/api/intro
Introduction to GraphQL queries served by OpenReader, the Squid SDK GraphQL server — entity queries, filters, sorting, and pagination basics.
At the moment, [Squid SDK GraphQL server](/en/sdk/squid-sdk/reference/openreader-server) can only be used with squids that use Postgres as their target database.
GraphQL is an API query language, and a server-side runtime for executing queries using a custom type system. Head over to the [official documentation website](https://graphql.org/learn/) for more info.
A GraphQL API served by the [GraphQL server](/en/sdk/squid-sdk/reference/openreader-server) has two components:
1. Core API is defined by the [schema file](/en/sdk/squid-sdk/reference/schema-file).
2. Extensions added via [custom resolvers](/en/sdk/squid-sdk/reference/openreader-server/configuration/custom-resolvers).
In this section we cover the core GraphQL API, with short explanations on how to perform GraphQL queries, how to paginate and sort results. This functionality is supported via [OpenReader](https://github.com/subsquid/squid-sdk/tree/master/graphql/openreader), SQD's own implementation of [OpenCRUD](https://www.opencrud.org).
# JSON queries
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/openreader-server/api/json-queries
Query entities with object-typed and JSON fields in OpenReader — Squid SDK GraphQL syntax for filtering and selecting nested JSON content.
The possibility of defining JSON objects as fields of a type in a GraphQL schema has been explained in the [schema reference](/en/sdk/squid-sdk/reference/schema-file).
This guide is focusing on how to query such objects and how to fully leverage their potential. Let's take the example of this (non-crypto related, for once😁) schema:
```graphql title="schema.graphql" theme={"system"}
type Entity @entity {
id: ID!
a: A
}
type A {
a: String
b: B
}
type B {
a: A
b: String
e: Entity
}
```
It's composed of one entity and two JSON objects definitions, used in a "nested" way.
Let's now look at a simple query:
```graphql theme={"system"}
query {
entities(orderBy: id_ASC) {
id
a { a }
}
}
```
This will return a result such as this one (imagining this data exists in the database):
```graphql theme={"system"}
{
entities: [
{id: '1', a: {a: 'a'}},
{id: '2', a: {a: 'A'}},
{id: '3', a: {a: null}},
{id: '4', a: null}
]
}
```
Simply enough, the first two objects have an object of type `A` with some content inside, the third one has an object, but its `a` field is `null` and the fourth one simply does not have an `A` object at all.
# Nested field queries
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/openreader-server/api/nested-field-queries
Query Squid SDK entities related to other entities via nested fields in OpenReader — traverse one-to-many and many-to-many relations in GraphQL.
With OpenReader, fields of an Entity that contain fields themselves are shown as nested fields and it is possible to filter these as well. GraphQL queries can traverse related objects and their fields, letting clients fetch lots of related data in one request, instead of making several roundtrips as one would need in a classic REST architecture.
As an example, this query searches for all `accounts` whose balance is bigger than a threshold value, fetching the `id` and `balance` simple fields, as well as the `historicalBalances` **nested field**.
```graphql theme={"system"}
query {
accounts(orderBy: balance_ASC, where: {balance_gte: "250000000000000000"}) {
id
balance
historicalBalances {
balance
date
id
}
}
}
```
A nested field is a list (one account can have multiple `historicalBalances`) of objects with fields of their own. These objects can be filtered, too.
In the following query the `historicalBalances` are filtered in order to only return the balances created after a certain date:
```graphql theme={"system"}
query {
accounts(orderBy: balance_ASC, where: {balance_gte: "250000000000000000"}) {
id
balance
historicalBalances(where: {date_lte: "2020-10-31T11:59:59.000Z"}, orderBy: balance_DESC) {
balance
date
id
}
}
}
```
Note that the [newer](/en/sdk/squid-sdk/reference/openreader-server/overview#supported-queries) and [more advanced](/en/sdk/squid-sdk/reference/openreader-server/api/paginate-query-results) `{entityName}sConnection` queries support exactly the same format of the `where` argument as the older `{entityName}s` queries used in the examples provided here.
# OpenReader Pagination
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/openreader-server/api/paginate-query-results
Paginate large GraphQL query results in OpenReader — first/after, last/before cursor pagination, and offset-based syntax for Squid SDK APIs.
# Paginate query results
There are multiple ways to obtain this behavior, let's take a look at a couple of them.
## Cursor based pagination
Cursors are used to traverse across entities of an entity set. They work by returning a pointer ("cursor") to a specific entity which can then be used to fetch the next batch. The batch will start with the entity after the one the cursor points to. For cursor-based pagination, OpenReader follows the [Relay Cursor Connections spec](https://relay.dev/graphql/connections.htm).
Currently, only forward pagination is supported. If your use case requires bidirectional pagination please let us know at our [Telegram channel](https://t.me/HydraDevs).
In SQD GraphQL server, cursor based pagination is implemented with `{entityName}sConnection` queries available for every entity in the input schema. These queries require an explicitly supplied [`orderBy` argument](/en/sdk/squid-sdk/reference/openreader-server/api/sorting), and *the field that is used for ordering must also be requested by the query itself*. Check out [this section](/en/sdk/squid-sdk/reference/openreader-server/api/paginate-query-results#important-note-on-orderby) for a valid query template.
Example: this query fetches a list of videos where `isExplicit` is true and gets their count.
```graphql theme={"system"}
query {
videosConnection(orderBy: id_ASC, where: { isExplicit_eq: true }) {
totalCount
edges {
node {
id
title
}
}
}
}
```
### **Operator `first`**
The `first` operator is used to fetch a specified number of entities from the beginning of the output.
Example: Fetch the first 5 videos:
```graphql theme={"system"}
query Query1 {
videosConnection(orderBy: id_ASC, first: 5) {
edges {
node {
id
title
}
}
}
}
```
### **PageInfo object**
`PageInfo` is a "virtual" entity that can be requested from any `{entityName}sConnection` query (see below). It returns the relevant cursors and some page information:
```graphql theme={"system"}
pageInfo {
startCursor
endCursor
hasNextPage
hasPreviousPage
}
```
### **Operator `after`**
Example: Fetch the first 10 channels, ordered by `createdAt`. Then, in a second query, fetch the next 10 channels:
```graphql theme={"system"}
query FirstBatchQ {
channelsConnection(first: 10, orderBy: createdAt_ASC) {
pageInfo {
endCursor
hasNextPage
}
edges {
node {
id
handle
createdAt
}
}
}
}
query SecondBatchQ {
channelsConnection(after: , orderBy: createdAt_ASC) {
pageInfo {
endCursor
hasNextPage
}
edges {
node {
id
handle
createdAt
}
}
}
}
```
### **Important Note on `orderBy`**
The field chosen to `orderBy` needs to be present in the query itself. For example, any `after` query must follow this template:
```graphql theme={"system"}
query QueryName {
sConnection(after: , orderBy: _ASC) {
pageInfo {
endCursor
hasNextPage
......
}
edges {
node {
......
}
}
}
}
```
Otherwise, the returned result wouldn't be ordered correctly.
### Examples
An interactive example of using cursor-based pagination can be found in [this repo](https://github.com/subsquid-labs/cursor-pagination-client-example).
## Paginating with `{entityName}s` queries
### Arguments `limit` and `offset`
In a list of entities returned by a query, the `limit` argument specifies how many should be retained, while the `offset` argument specifies how many should be skipped first. Default values are `50` for `limit` and `0` for `offset`.
### **Limit results**
Example: Fetch the first 5 channels:
```graphql theme={"system"}
query {
channels(limit: 5) {
id
handle
}
}
```
### **Limit results from an offset**
Example: Fetch 5 channels from the list of all channels, starting with the 6th one:
```graphql theme={"system"}
query {
channels(limit: 5, offset: 5) {
id
handle
}
}
```
# Entity queries
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/openreader-server/api/queries
Run basic GraphQL entity queries against a Squid SDK GraphQL API — single-entity, list, and connection-style queries with filtering and pagination.
# Entity Queries
## Introduction
OpenReader auto-generates queries from the `schema.graphql` file. All entities defined in the schema can be queried over the GraphQL endpoint.
## Exploring queries
Let’s take a look at the different queries you can run using the GraphQL server. We’ll use examples based on a typical channel/video schema.
### Simple entity queries
#### **Fetch list of entities**
Fetch a list of channels:
```graphql theme={"system"}
query {
channels {
id
handle
}
}
```
or, using a [newer](/en/sdk/squid-sdk/reference/openreader-server/overview#supported-queries) and [more advanced](/en/sdk/squid-sdk/reference/openreader-server/api/paginate-query-results) `{entityName}sConnection` query
```graphql theme={"system"}
query {
channelsConnection(orderBy: id_ASC) {
edges {
node {
id
handle
}
}
}
}
```
#### **Fetch an entity using its unique fields**
Fetch a channel by a unique id or handle:
```graphql theme={"system"}
query Query1 {
channelByUniqueInput(where: { id: "1" }) {
id
handle
}
}
query Query2 {
channelByUniqueInput(where: { handle: "Joy Channel" }) {
id
handle
}
}
```
### Filter query results / search queries
#### **The `where` argument**
You can use the `where` argument in your queries to filter results based on some field’s values. You can even use multiple filters in the same `where` clause using the `AND` or the `OR` operators.
For example, to fetch data for a channel named `Joy Channel`:
```graphql theme={"system"}
query {
channels(where: { handle_eq: "Joy Channel" }) {
id
handle
}
}
```
Note that `{entityName}sConnection` queries support exactly the same format of the `where` argument:
```graphql theme={"system"}
query {
channelsConnection(orderBy: id_ASC, where: { handle_eq: "Joy Channel"}) {
edges {
node {
id
handle
}
}
}
}
```
#### **Supported Scalar Types**
SQD supports the following scalar types:
* String
* Int
* Float
* BigInt
* Boolean
* Bytes
* DateTime
#### **Equality Operators (`_eq`)**
`_eq` is supported by all the scalar types.
The following are examples of using this operator on different types:
* Fetch a list of videos where `title` is "Bitcoin"
* Fetch a list of videos where `isExplicit` is "true"
* Fetch a list of videos `publishedOn` is "2021-01-05"
```graphql theme={"system"}
query Query1 {
videos(where: { title_eq: "Bitcoin" }) {
id
title
}
}
query Query2 {
videos(where: { isExplicit_eq: true }) {
id
title
}
}
query Query3 {
videos(where: { publishedOn_eq: "2021-01-05" }) {
id
title
}
}
```
#### **Greater than or less than operators (`gt`, `lt`, `gte`, `lte`)**
The `_gt` (greater than), `_lt` (less than), `_gte` (greater than or equal to), `_lte` (less than or equal to) operators are available on `Int, BigInt, Float, DateTime` types.
The following are examples of using these operators on different types:
* Fetch a list of videos published before "2021-01-05"
* Fetch a list of channels before block "999"
```graphql theme={"system"}
query Query1 {
videos(where: { publishedOn_gte: "2021-01-05" }) {
id
title
}
}
query Query2 {
channels(where: { block_lte: "999" }) {
id
handle
}
}
```
#### **Text search or pattern matching operators (`_contains`, `_startsWith`, `_endsWith`)**
The `_contains`, `_startsWith`, `_endsWith` operators are used for pattern matching on string fields.
Example:
```graphql theme={"system"}
query Query1 {
videos(where: { title_contains: "Bitcoin" }) {
id
title
}
}
query Query2 {
videos(where: { title_endsWith: "cryptocurrency" }) {
id
title
}
}
```
# Union type resolution
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/openreader-server/api/resolve-union-types-interfaces
Resolve GraphQL union types and interfaces in Squid SDK OpenReader — use the __typename meta field to discriminate between concrete entity types.
Use cases for [Union types](/en/sdk/squid-sdk/reference/schema-file/unions-and-typed-json) have been discussed in the [schema reference](/en/sdk/squid-sdk/reference/schema-file). Here, we discuss how to query union types.
Let's take this modified schema from the [Substrate tutorial](/en/sdk/squid-sdk/tutorials/substrate):
```graphql title="schema.graphql" theme={"system"}
type Account @entity {
id: ID! #Account address
events: [Event]
}
type WorkReport {
id: ID! #event id
addedFiles: [[String]]
deletedFiles: [[String]]
extrinsicId: String
blockHash: String!
}
type JoinGroup {
id: ID!
owner: String!
extrinsicId: String
blockHash: String!
}
type StorageOrder {
id: ID!
fileCid: String!
extrinsicId: String
blockHash: String!
}
union Event = WorkReport | JoinGroup | StorageOrder
```
Here, an `Event` will have different fields depending on the underlying type. This query demonstrates how to request different fields for each of these types:
```graphql theme={"system"}
query MyQuery {
accounts {
events {
__typename
... on WorkReport {
id
blockHash
extrinsicId
deletedFiles
}
... on JoinGroup {
id
blockHash
extrinsicId
}
... on StorageOrder {
id
blockHash
extrinsicId
}
}
id
}
}
```
The special `__typename` field allows users to discern the returned object type without relying on comparing the sets of regular fields. For example, in the output of the query above `JoinGroup` and `StorageOrder` events can only be distingushed by looking at the `__typename` field. Here is a possible output to illustrate:
```json theme={"system"}
{
"data": {
"accounts": [
{
"events": [
{
"__typename": "WorkReport",
"id": "0000584321-000001-01cdb",
"blockHash": "0x01cdb3cb6fa00f62fd20220104f1d740a53518b63517419da8a89325d065562b",
"extrinsicId": "0000584321-000001-01cdb",
"deletedFiles": []
}
],
"id": "cTKmzHG3RHa1yhujyZpPnNL17p8a48Av3JFwDjpttLcxeSo26"
},
{
"events": [
{
"__typename": "JoinGroup",
"id": "0000584598-000010-d06ec",
"blockHash": "0xd06ec6716e96108e24987ef03d23c857ef3b467dd057d7a32c4e123fe5a8df36",
"extrinsicId": "0000584598-000004-d06ec"
}
],
"id": "cTKqevWRdvbNNAQ3hLxhsNYhQ8pf5YGkYnnVjgjLNiVr4kd7a"
},
{
"events": [
{
"__typename": "StorageOrder",
"id": "0000584627-000013-1fa19",
"blockHash": "0x1fa19ae98731afad853ffd491fcbc0c3dcda6b8b7f5a2d56ac6c4c1eb9e4f95e",
"extrinsicId": "0000584627-000005-1fa19"
}
],
"id": "cTGYF8jvcpnRmgNopqT4nVs5rWHEviAAdRdfNrZE8NFz2Av7B"
}
]
}
}
```
# Sorting
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/openreader-server/api/sorting
Sort GraphQL query results in OpenReader with the orderBy argument — sort by entity fields, nested fields, and apply multiple sort keys at once.
## Sort order
The sort order (ascending vs. descending) is set by specifying an `ASC` or `DESC` suffix for the column name in the `orderBy` input object, e.g. `title_DESC`.
### **Sorting entities**
Example: Fetch a list of videos sorted by their titles in an ascending order:
```graphql theme={"system"}
query {
videos(orderBy: title_ASC) {
id
title
}
}
```
or
```graphql theme={"system"}
query {
videos(orderBy: [title_ASC]) {
id
title
}
}
```
### **Sorting entities by multiple fields**
The `orderBy` argument takes an array of fields to allow sorting by multiple columns.
Example: Fetch a list of videos that is sorted by their titles (ascending) and then on their published date (descending):
```graphql theme={"system"}
query {
videos(orderBy: [title_ASC, publishedOn_DESC]) {
id
title
publishedOn
}
}
```
# Access control
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/openreader-server/configuration/authorization
Configure authentication and authorization in OpenReader — protect Squid SDK GraphQL APIs with API keys, signed JWTs, and per-entity rules.
To implement access control, define the following function in the designated `src/server-extension/check` module:
```typescript theme={"system"}
import {
RequestCheckContext
} from '../../node_modules/@subsquid/graphql-server/src/check'
export async function requestCheck(
req: RequestCheckContext
): Promise {
...
}
```
Once defined, this function will be called every time a request arrives. Then,
* if the function returns `true`, the request is processed as usual;
* if the function returns `false`, the server responds with `'{"errors":[{"message":"not allowed"}]}'`;
* if the function returns an `errorString`, the server responds with `` `{{"errors":[{"message":"${errorString}"}]}` ``.
The request information such as HTTP headers and GraphQL selections is available in the context. This makes it possible to authenticate the user that sent the query and either allow or deny access. The decision may take the query contents into account, allowing for some authorization granularity.
## `RequestCheckContext`
The context type has the following interface:
```typescript theme={"system"}
RequestCheckContext {
http: {uri: string, method: string, headers: HttpHeaders}
operation: OperationDefinitionNode
operationName: string | null
schema: GraphQLSchema
context: Record
model: Model
}
```
Here,
* `http` field contains the low level HTTP info. Information on headers is stored in a `Map` from lowercase header names to values. For example, `req.http.headers.get('authorization')` is the value of the authorization header.
* `operation` is the root [`OperationDefinitionNode`](https://graphql-js.org/api/interface/OperationDefinitionNode) of the tree describing the query. Useful if the authorization decision depends on the query contents.
* `operationName` is the query name.
* `schema` is a [`GraphQLSchema`](https://graphql-js.org/api/class/GraphQLSchema) object.
* `context` holds a [`PoolOpenreaderContext`](https://github.com/subsquid/squid-sdk/blob/master/graphql/openreader/src/db.ts) at `context.openreader`. It can be used to access the database, though this is highly discouraged: the interfaces involved are considered to be internal and are subject to change without notice.
* `model` is an Openreader data [`Model`](https://github.com/subsquid/squid-sdk/blob/master/graphql/openreader/src/model.ts).
## Sending user data to resolvers
Authentication data such as user name can be passed from `requestCheck()` to a [custom resolver](/en/sdk/squid-sdk/reference/openreader-server/configuration/custom-resolvers) through Openreader context:
```typescript theme={"system"}
export async function requestCheck(req: RequestCheckContext): Promise {
...
// obtain user name e.g. by decoding the authentication header
let user = ...
// save user name to Openreader context
req.context.openreader.user = user
...
}
```
A custom resolver that retrieves it may look like this:
```typescript theme={"system"}
@Resolver()
export class UserCommentResolver {
constructor(private tx: () => Promise) {}
@Query(() => [UserCommentCountQueryResult])
async countUserComments(
@Ctx() ctx: any
): Promise {
let user = ctx.openreader.user
let manager = await this.tx()
let result: UserCommentCountQueryResult[] =
await manager
.getRepository(UserComment)
.query(`
SELECT COUNT(*) as total
FROM user_comment
WHERE "user" = '${user}'
`)
return result
}
@Mutation(() => Boolean)
async addComment(
@Arg('text') comment: string,
@Ctx() ctx: any
): Promise {
let user = ctx.openreader.user
let manager = await this.tx()
await manager.save(new UserComment({
id: `${user}-${comment}`,
user,
comment
}))
return true
}
}
```
See full code in [this branch](https://github.com/subsquid-labs/access-control-example/tree/interacting-with-resolver).
This approach does not work with [subscriptions](/en/sdk/squid-sdk/reference/openreader-server/configuration/subscriptions).
## Examples
A simple strategy that authorizes anyone with a `12345` token to perform any query can be implemented with
```typescript title="src/server-extension/check.ts" theme={"system"}
import {
RequestCheckContext
} from '../../node_modules/@subsquid/graphql-server/src/check'
export async function requestCheck(
req: RequestCheckContext
): Promise {
return req.http.headers.get('authorization')==='Bearer 12345'
}
```
A more elaborate example with two users authorized to perform different query sets is available in [this repo](https://github.com/subsquid-labs/access-control-example). Another great example of using `requestCheck()` for authorization can be [spotted in the wild](https://github.com/reef-defi/reef-subsquid-processor/tree/master/src/server-extension) in the code of a squid used by [Reef](https://reef.io).
# Caching
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/openreader-server/configuration/caching
Enable response caching in OpenReader for faster Squid SDK GraphQL queries — TTL-based cache with cache-busting on schema or data updates.
The GraphQL API server provided by `@subsquid/graphql-server` supports caching via additional flags. It is done on a per-query basis. The whole response is cached for a specified amount of time (`maxAge`).
To enable caching when deploying to SQD Cloud, add the caching flags to the `serve:prod` command definition at [`commands.json`](/en/sdk/squid-sdk/squid-cli/commands-json), then use that command to run the server in the [deployment manifest](/en/cloud/reference/manifest#deploy). Cloud currently supports only in-memory cache.
For example, snippets below will deploy a GraphQL API server with a `100Mb` in-memory cache and invalidation time of `5` seconds:
```json title="commands.json" theme={"system"}
...
"serve:prod": {
"description": "Start the GraphQL API server with caching and limits",
"cmd": [ "squid-graphql-server",
"--dumb-cache", "in-memory",
"--dumb-cache-ttl", "5000",
"--dumb-cache-size", "100",
"--dumb-cache-max-age", "5000" ]
}
...
```
```yaml title="squid.yaml" theme={"system"}
# ...
deploy:
# other services ...
api:
cmd: [ "sqd", "serve:prod" ]
```
Caching flags list is available via `npx squid-graphql-server --help`. Here are some more details on them:
### `--dumb-cache `
Enables cache, either `in-memory` or `redis`. For `redis`, a Redis connection string must be set by a variable `REDIS_URL`. SQD Cloud deployments currently support only `in-memory` cache.
### `--dumb-cache-size `
Cache max size. Applies only to in-memory cache.
### `--dumb-cache-max-age `
A globally set max age in milliseconds. The cached queries are invalidated after that period of time.
### `--dumb-cache-ttl `
Time-to-live for in-memory cache entries. Applies only to in-memory cache. The entries are eligible for eviction from the cache if not updated for longer than the time-to-live time.
# Custom API extensions
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/openreader-server/configuration/custom-resolvers
Extend a Squid SDK GraphQL API with custom resolvers — add aggregations, computed fields, and ad-hoc queries beyond the schema-derived entities.
# Custom GraphQL resolvers
One can extend the GraphQL API generated by OpenReader with custom queries. To do that, one can define GraphQL [query resolvers](https://www.apollographql.com/docs/apollo-server/data/resolvers/) in the designated module `src/server-extension/resolvers`. **Note that all resolver classes (including any additional types) must be exported by `src/server-extension/resolvers/index.ts`.**
A custom resolver should import [TypeGraphQL](https://typegraphql.com) types and [use annotations](https://typegraphql.com/docs/resolvers.html) provided by the library to define query arguments and return types. If your squid lacks a `type-graphql` dependency, add it with:
```bash theme={"system"}
npm i type-graphql
```
Custom resolvers are normally used in combination with [TypeORM EntityManager](https://typeorm.io/entity-manager-api) for accessing the API server target database. It is automatically injected when defined as a single constructor argument of the resolver.
## Examples
#### Simple entity counter
```typescript theme={"system"}
import { Query, Resolver } from 'type-graphql'
import type { EntityManager } from 'typeorm'
import { Burn } from '../model'
@Resolver()
export class CountResolver {
constructor(private tx: () => Promise) {}
@Query(() => Number)
async totalBurns(): Promise {
const manager = await this.tx()
return await manager.getRepository(Burn).count()
}
}
```
This example is designed to work with the `evm` template:
1. grab a test squid as described [here](/en/sdk/squid-sdk/how-to-start/squid-development);
2. install `type-graphql`;
3. save the example code to `src/server-extension/resolver.ts`;
4. re-export `CountResolver` at `src/server-extension/resolvers/index.ts`:
```ts theme={"system"}
export { CountResolver } from '../resolver'
```
5. rebuild the squid with `npm run build`;
6. (re)start the GraphQL server with `npx squid-graphql-server`.
`totalBurns` selection will appear in the [GraphiQL playground](http://localhost:4350/graphql).
#### Custom SQL query
```typescript theme={"system"}
import { Arg, Field, ObjectType, Query, Resolver } from 'type-graphql'
import type { EntityManager } from 'typeorm'
import { MyEntity } from '../model'
// Define custom GraphQL ObjectType of the query result
@ObjectType()
export class MyQueryResult {
@Field(() => Number, { nullable: false })
total!: number
@Field(() => Number, { nullable: false })
max!: number
constructor(props: Partial) {
Object.assign(this, props);
}
}
@Resolver()
export class MyResolver {
// Set by depenency injection
constructor(private tx: () => Promise) {}
@Query(() => [MyQueryResult])
async myQuery(): Promise {
const manager = await this.tx()
// execute custom SQL query
const result: = await manager.getRepository(MyEntity).query(
`SELECT
COUNT(x) as total,
MAX(y) as max
FROM my_entity
GROUP BY month`)
return result
}
}
```
#### More examples
Some great examples of `@subsquid/graphql-server`-based custom resolvers can be spotted in the wild in the [Rubick repo](https://github.com/kodadot/rubick/tree/main/src/server-extension/resolvers) by [KodaDot](https://github.com/kodadot).
For more examples of resolvers, see [TypeGraphQL examples repo](https://github.com/MichalLytek/type-graphql/tree/master/examples).
## Logging
To keep logging consistent across the entire GraphQL server, use `@subsquid/logger`:
```ts theme={"system"}
import {createLogger} from '@subsquid/logger'
// using a custom namespace ':my-resolver' for resolver logs
const LOG = createLogger('sqd:graphql-server:my-resolver')
LOG.info('created a dedicated logger for my-resolver')
```
`LOG` here is a [logger object](/en/sdk/squid-sdk/reference/logger) identical to `ctx.log` interface-wise.
## Interaction with global settings
* `--max-response-size` used for [DoS protection](/en/sdk/squid-sdk/reference/openreader-server/configuration/dos-protection) is ignored in custom resolvers.
* [Caching](/en/sdk/squid-sdk/reference/openreader-server/configuration/caching) works on custom queries in exactly the same way as it does on the schema-derived queries.
## Troubleshooting
#### `Reflect.getMetadata is not a function`
Add `import 'reflect-metadata'` on top of your custom resolver module and install the package if necessary.
# OpenReader Server Overview
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/openreader-server/overview
OpenReader is the open source GraphQL server bundled with Squid SDK — generates an OpenCRUD-style API from your schema.graphql with caching and DoS protection.
OpenReader is no longer recommended for use in new squid projects [relying on PostgreSQL](/en/sdk/squid-sdk/resources/persisting-data/typeorm). See [Serving GraphQL](/en/sdk/squid-sdk/resources/serving-graphql) to learn about the new options and the [Limitations](#limitations) section to understand our motivation.
OpenReader is a server that presents data of PostgreSQL-powered squids as a GraphQL API. It relies on the [eponymous library](https://github.com/subsquid/squid-sdk/tree/master/graphql/openreader) lib of the Squid SDK for schema generation. [Schema file](/en/sdk/squid-sdk/reference/schema-file) is used as an input; the resulting API supports [OpenCRUD](https://www.opencrud.org/) queries for the entities defined in the schema.
To start the API server based on `schema.graphql` install `@subsquid/graphql-server` and run the following in the squid project root:
```bash theme={"system"}
npx squid-graphql-server
```
The `squid-graphql-server` executable supports multiple optional flags to enable [caching](/en/sdk/squid-sdk/reference/openreader-server/configuration/caching), [subscriptions](/en/sdk/squid-sdk/reference/openreader-server/configuration/subscriptions), [DoS protection](/en/sdk/squid-sdk/reference/openreader-server/configuration/dos-protection) etc. Its features are covered in the next sections.
The API server listens at port defined by the `GQL_PORT` environment variable (defaults to `4350`). The database connection is configured with the env variables `DB_NAME`, `DB_USER`, `DB_PASS`, `DB_HOST`, `DB_PORT`.
In [SQD Cloud](/en/cloud), OpenReader is usually ran as the `api:` service in the `deploy:` section of the [Deployment manifest](/en/cloud/reference/manifest).
## Supported queries
The details of the supported OpenReader queries can be found in a separate section [Core API](/en/sdk/squid-sdk/reference/openreader-server/api). Here is a brief overview of the queries generated by OpenReader for each entity defined in the schema file:
* the squid last processed block is available with `squidStatus { height }` query
* a "get one by ID" query with the name `{entityName}ById` for each [entity](/en/sdk/squid-sdk/reference/schema-file/entities) defined in the schema file
* a "get one" query for [`@unique` fields](/en/sdk/squid-sdk/reference/schema-file/indexes-and-constraints), with the name `{entityName}ByUniqueInput`
* Entity queries named `{entityName}sConnection`. Each query supports rich filtering support, including [field-level filters](/en/sdk/squid-sdk/reference/openreader-server/api/queries), composite [`AND` and `OR` filters](/en/sdk/squid-sdk/reference/openreader-server/api/and-or-filters), [nested queries](/en/sdk/squid-sdk/reference/openreader-server/api/nested-field-queries), [cross-relation queries](/en/sdk/squid-sdk/reference/openreader-server/api/cross-relation-field-queries) and [Relay-compatible](https://relay.dev/graphql/connections.htm) cursor-based [pagination](/en/sdk/squid-sdk/reference/openreader-server/api/paginate-query-results).
* [Subsriptions](/en/sdk/squid-sdk/reference/openreader-server/configuration/subscriptions) via live queries
* (Deprecated in favor of Relay connections) Lookup queries with the name `{entityName}s`.
[Union and typed JSON types](/en/sdk/squid-sdk/reference/schema-file/unions-and-typed-json) are mapped into [GraphQL Union Types](https://graphql.org/learn/schema/#union-types) with a [proper type resolution](/en/sdk/squid-sdk/reference/openreader-server/api/resolve-union-types-interfaces) with `__typename`.
## Built-in custom scalar types
The OpenReader GraphQL API defines the following custom scalar types:
* `DateTime` entity field values are presented in the ISO format
* `Bytes` entity field values are presented as hex-encoded strings prefixed with `0x`
* `BigInt` entity field values are presented as strings
## Limitations
* RAM usage of [subscriptions](/en/sdk/squid-sdk/reference/openreader-server/configuration/subscriptions) scales poorly under high load, making the feature unsuitable for most production uses. There are currently no plans to fix this issue.
* Setting up custom resolvers for subscriptions is unreasonably hard.
* `@subsquid/graphql-server` depends on the deprecated Apollo Server v3.
# Processor architecture
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/processors/architecture
Anatomy of Squid SDK processors — data source, query builder, batch handler, and Store roles in the processor lifecycle and the SQD Network protocol.
The processor service is a Node.js process responsible for data ingestion, transformation and data persisting into the target database. By [convention](/en/sdk/squid-sdk/how-to-start/layout), the processor entry point is at `src/main.ts`. It is run as
```bash theme={"system"}
node lib/main.js
```
For local runs, one normally additionally exports environment variables from `.env` using `dotenv`:
```bash theme={"system"}
node -r dotenv/config lib/main.js
```
## Processor choice
The Squid SDK currently offers specialized processor classes for EVM (`EvmBatchProcessor`) and Substrate networks (`SubstrateBatchProcessor`). More networks will be supported in the future. By convention, the processor object is defined at `src/processor.ts`.
Navigate to a dedicated section for each processor class:
* [`EvmBatchProcessor`](/en/sdk/squid-sdk/reference/processors/evm-batch)
* [`SubstrateBatchProcessor`](/en/sdk/squid-sdk/reference/processors/substrate-batch)
## Configuration
A processor instance should be configured to define the block range to be indexed, and the selectors of data to be fetched from [SQD Network](/en/network) and/or a node RPC endpoint.
## `processor.run()`
The actual data processing is done by the `run()` method called on a processor instance (typically at `src/main.ts`). The method has the following signature:
```ts theme={"system"}
run(
db: Database,
batchHander: (
context: DataHandlerContext
) => Promise
): void
```
The `db` parameter defines the target [data sink](/en/sdk/squid-sdk/resources/persisting-data), and `batchHandler` is an `async` `void` function defining the data transformation and persistence logic. It repeatedly receives batches of SQD Network data stored in `context.blocks`, transforms them and persists the results to the target database using the `context.store` interface (more on `context` in the next section).
To jump straight to examples, see [EVM Processor in action](/en/sdk/squid-sdk/tutorials/batch-processor-in-action) and [Substrate Processor in action](/en/sdk/squid-sdk/tutorials/batch-processor-in-action).
## Batch context
Batch handler takes a single argument of `DataHandlerContext` type:
```ts theme={"system"}
export interface DataHandlerContext {
_chain: Chain
log: Logger
store: Store
blocks: BlockData[]
isHead: boolean
}
```
Here, `F` is the type of the argument of the `setFields()` ([EVM](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection), [Substrate](/en/sdk/squid-sdk/reference/processors/substrate-batch/field-selection)) processor configuration method. `Store` type is inferred from the `Database` instance passed into the `run()` method.
#### `ctx._chain`
Internal handle for direct access to the underlying chain state via RPC calls. Rarely used directly, but rather by the facade access classes generated by the [typegen tools](/en/sdk/squid-sdk/glossary#typegen).
#### `ctx.log`
The native logger handle. See [Logging](/en/sdk/squid-sdk/reference/logger).
#### `ctx.store`
Interface for the target data sink. See [Persisting data](/en/sdk/squid-sdk/resources/persisting-data).
#### `ctx.blocks`
On-chain data items are grouped into blocks, with each block containing a header and iterables for all supported data item types. Boundary blocks are always included into the `ctx.blocks` iterable with valid headers, even when they do not contain any requested data. It follows that batch context *always* contains at least one block.
The set of iterables depends on the processor type (docs for [EVM](/en/sdk/squid-sdk/reference/processors/evm-batch/context-interfaces)/[Substrate](/en/sdk/squid-sdk/reference/processors/substrate-batch/context-interfaces)). Depending on the data item type, items within the iterables can be canonically ordered by how the data is recorded on-chain (e.g. transactions are ordered but traces are not). The shape of item objects is determined by the processor configuration done via the `.setFields()` method.
An idiomatic use of the context API is to iterate first over blocks and then over each iterable of each block:
```ts theme={"system"}
processor.run(new TypeormDatabase(), async (ctx) => {
for (let block of ctx.blocks) {
for (let log of block.logs) {
// filter and process logs
}
for (let txn of block.transactions) {
// filter and process transactions
}
for (let stDiff of block.stateDiffs) {
// filter and process state diffs
}
for (let traces of block.traces) {
// filter and process execution traces
}
}
})
```
```ts theme={"system"}
processor.run(new TypeormDatabase(), async (ctx) => {
for (let block of ctx.blocks) {
for (let event of block.events) {
// filter and process events
}
for (let call of block.calls) {
// filter and process calls
}
for (let extrinsic of block.extrinsics) {
// filter and process extrinsics
}
}
})
```
The canonical ordering of `ctx.blocks` enables efficient in-memory data processing. For example, multiple updates of the same entity can be compressed into a single database transaction.
Please be aware that the processor cannot ensure that data not meeting its filters will be excluded from iterables. It only guarantees the inclusion of data that matches the filters. Therefore, it is necessary to filter the data in the batch handler prior to processing.
#### `ctx.isHead`
Is `true` if the processor has reached the chain head. The last block `ctx.blocks` is then the current chain tip.
# Block data for EVM
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/processors/evm-batch/context-interfaces
Block data interfaces for EVM indexing with Squid SDK — type definitions for blocks, transactions, logs, traces, and state diffs in the processor context.
`EvmBatchProcessor` follows the common [squid processor architecture](/en/sdk/squid-sdk/overview), in which data processing happens within the [batch handler](/en/sdk/squid-sdk/reference/processors/architecture#processorrun), a function repeatedly called on batches of on-chain data. The function takes a single argument called "batch context". Its structure follows the [common batch context layout](/en/sdk/squid-sdk/reference/processors/architecture#batch-context), with `ctx.blocks` being an array of `BlockData` objects containing the data to be processed, aligned at the block level.
For `EvmBatchProcessor` the `BlockData` interface is defined as follows:
```ts theme={"system"}
export type BlockData = {
header: BlockHeader
transactions: Transaction[]
logs: Log[]
traces: Trace[]
stateDiffs: StateDiff[]
}
```
`F` here is the type of the argument of the [`setFields()`](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection) processor method.
`BlockData.header` contains the block header data. The rest of the fields are iterables containing four kinds of blockchain data. The canonical ordering within each iterable depends on the data kind:
* `transactions` and `logs` are ordered in the same way as they are within blocks;
* [`stateDiffs`](/en/sdk/squid-sdk/reference/processors/evm-batch/state-diffs) follow the order of transactions that gave rise to them;
* `traces` are ordered in a deterministic but otherwise unspecified way.
The exact fields available in each data item type are inferred from the `setFields()` call argument. They are documented on the [field selection](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection) page:
* [transactions section](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection#transactions);
* [logs section](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection#logs);
* [traces section](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection#traces);
* [state diffs section](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection#state-diffs);
* [block header section](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection#block-headers).
## Example
The handler below simply outputs all the log items emitted by the contract `0x2E645469f354BB4F5c8a05B3b30A929361cf77eC` in [real time](/en/sdk/squid-sdk/resources/unfinalized-blocks):
```ts theme={"system"}
import { TypeormDatabase } from '@subsquid/typeorm-store'
import { EvmBatchProcessor } from '@subsquid/evm-processor'
const CONTRACT_ADDRESS = '0x2E645469f354BB4F5c8a05B3b30A929361cf77eC'.toLowerCase()
const processor = new EvmBatchProcessor()
.setGateway('https://v2.archive.subsquid.io/network/ethereum-mainnet')
.setRpcEndpoint('')
.setFinalityConfirmation(75)
.setBlockRange({ from: 17000000 })
.addLog({
address: [CONTRACT_ADDRESS]
})
.setFields({ // could be omitted: this call does not change the defaults
log: {
topics: true,
data: true
}
})
processor.run(new TypeormDatabase(), async (ctx) => {
for (let c of ctx.blocks) {
for (let log of c.logs) {
if (log.address === CONTRACT_ADDRESS) {
ctx.log.info(log, `Log:`)
}
}
}
})
```
One can experiment with the [`setFields()`](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection) argument and see how the output changes.
For more elaborate examples, check [EVM Examples](/en/sdk/squid-sdk/examples).
# EVM Field Selection
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection
Fine-tune EVM data requests with setFields() in Squid SDK — choose only the block, transaction, log, trace, and state diff fields you need to index.
#### `setFields(options)`
Set the fields to be retrieved for data items of each supported type. The `options` object has the following structure:
```ts theme={"system"}
{
log?: // field selector for logs
transaction?: // field selector for transactions
stateDiff?: // field selector for state diffs
trace?: // field selector for traces
block?: // field selector for block headers
}
```
Every field selector is a collection of boolean fields, typically (with a notable exception of [trace field selectors](#traces)) mapping one-to-one to the fields of data items within the batch context [iterables](/en/sdk/squid-sdk/reference/processors/evm-batch/context-interfaces). Defining a field of a field selector of a given type and setting it to true will cause the processor to populate the corresponding field of all data items of that type. Here is a definition of a processor that requests `gas` and `value` fields for transactions:
```ts theme={"system"}
let processor = new EvmBatchProcessor()
.setFields({
transaction: {
gas: true,
value: true
}
})
```
Same fields will be available for all data items of any given type, including nested items. Suppose we used the processor defined above to subscribe to some transactions as well as some logs, and for each log we requested a parent transaction:
```ts theme={"system"}
processor
.addLog({
// some log data requests
transaction: true
})
.addTransaction({
// some transaction data requests
})
```
As a result, `gas` and `value` fields would be available both within the transaction items of the `transactions` iterable of [block data](/en/sdk/squid-sdk/reference/processors/evm-batch/context-interfaces) and within the transaction items that provide parent transaction information for the logs:
```ts theme={"system"}
processor.run(db, async ctx => {
for (let block of ctx.blocks) {
for (let txn of block.transactions) {
let txnGas = txn.gas // OK
}
for (let log of block.logs) {
let parentTxnGas = log.transaction.gas // also OK!
}
}
})
```
Some data fields, like `hash` for transactions, are enabled by default but can be disabled by setting a field of a field selector to `false`. For example, this code will not compile:
```ts theme={"system"}
let processor = new EvmBatchProcessor()
.setFields({
transaction: {
hash: false
}
})
.addTransaction({
// some transaction data requests
})
processor.run(db, async ctx => {
for (let block of ctx.blocks) {
for (let txn of block.transactions) {
let txnHash = txn.hash // ERROR: no such field
}
}
})
```
Disabling unused fields will improve sync performance, as the disabled fields will not be fetched from the SQD Network gateway.
## Data item types and field selectors
Most IDEs support smart suggestions to show the possible field selectors. For VS Code, press `Ctrl+Space`.
Here we describe the data item types as functions of the field selectors. Unless otherwise mentioned, each data item type field maps to the eponymous field of its corresponding field selector. Item fields are divided into three categories:
* Fields that are added independently of the `setFields()` call. These are either fixed or depend on the related data retrieval flags (e.g. `transaction` for logs).
* Fields that can be disabled by `setFields()`. E.g. a `topics` field will be fetched for logs by default, but can be disabled by setting `topics: false` within the `log` field selector.
* Fields that can be requested by `setFields()`. E.g. a `transactionHash` field will only be available in logs if the `log` field selector sets `transactionHash: true`.
### Logs
`Log` data items may have the following fields:
```ts theme={"system"}
Log {
// independent of field selectors
id: string
logIndex: number
transactionIndex: number
block: BlockHeader
transaction?: Transaction
// can be disabled with field selectors
address: string
data: string
topics: string[]
// can be requested with field selectors
transactionHash: string
}
```
See the [block headers section](#block-headers) for the definition of `BlockHeader` and the [transactions section](#transactions) for the definition of `Transaction`.
### Transactions
`Transaction` data items may have the following fields:
```ts theme={"system"}
Transaction {
// independent of field selectors
id: string
transactionIndex: number
block: BlockHeader
// can be disabled with field selectors
from: string
to?: string
hash: string
// can be requested with field selectors
gas: bigint
gasPrice: bigint
maxFeePerGas?: bigint
maxPriorityFeePerGas?: bigint
input: string
nonce: number
value: bigint
v?: bigint
r?: string
s?: string
yParity?: number
chainId?: number
gasUsed?: bigint
cumulativeGasUsed?: bigint
effectiveGasPrice?: bigint
contractAddress?: string
type?: number
status?: number
sighash: string
// limited availability (see below)
l1Fee?: bigint
l1FeeScalar?: number
l1GasPrice?: bigint
l1GasUsed?: bigint
l1BlobBaseFee?: bigint
l1BlobBaseFeeScalar?: number
l1BaseFeeScalar?: number
}
```
`status` field contains the value returned by [`eth_getTransactionReceipt`](https://geth.ethereum.org/docs/interacting-with-geth/rpc/batch): `1` for successful transactions, `0` for failed ones and `undefined` for chains and block ranges not compliant with the post-Byzantinum hard fork EVM specification (e.g. 0-4,369,999 on Ethereum).
`type` field is populated similarly. For example, on Ethereum `0` is returned for Legacy txs, `1` for EIP-2930 and `2` for EIP-1559. Other networks may have a different set of types.
See the [block headers section](#block-headers) for the definition of `BlockHeader`.
`l1*` fields can only be requested for networks from [this list](/en/data/evm). Requesting them for other networks may cause HTTP 500 responses.
### State diffs
`StateDiff` data items may have the following fields:
```ts theme={"system"}
StateDiff {
// independent of field selectors
transactionIndex: number
block: BlockHeader
transaction?: Transaction
address: string
key: 'balance' | 'code' | 'nonce' | string
// can be disabled with field selectors
kind: '=' | '+' | '*' | '-'
prev?: string | null
next?: string | null
}
```
The meaning of the `kind` field values is as follows:
* `'='`: no change has occurred;
* `'+'`: a value was added;
* `'*'`: a value was changed;
* `'-'`: a value was removed.
The values of the `key` field are regular hexadecimal contract storage key strings or one of the special keys `'balance' | 'code' | 'nonce'` denoting ETH balance, contract code and nonce value associated with the state diff.
See the [block headers section](#block-headers) for the definition of `BlockHeader` and the [transactions section](#transactions) for the definition of `Transaction`.
### Traces
Field selection for trace data items is somewhat more involved because its fixed fields `action` and `result` may contain different fields depending on the value of the `type` field. The retrieval of each one of these subfields is configured independently. For example, to ensure that all traces of `'call'` type contain the `.action.gas` field, the processor must be configured as follows:
```ts theme={"system"}
processor.setFields({
trace: {
callGas: true
}
})
```
The full `Trace` type with all its possible (sub)fields looks like this:
```ts theme={"system"}
Trace {
// independent of field selectors
transactionIndex: number
block: BlockHeader
transaction?: Transaction
traceAddress: number[]
type: 'create' | 'call' | 'suicide' | 'reward'
subtraces: number
// can be disabled with field selectors
error: string | null
// can be requested with field selectors
// if (type==='create')
action: {
// request the subfields with
from: string // createFrom: true
value: bigint // createValue: true
gas: bigint // createGas: true
init: string // createInit: true
}
result?: {
gasUsed: bigint // createResultGasUsed: true
code: string // createResultCode: true
address?: string // createResultAddress: true
}
// if (type==='call')
action: {
from: string // callFrom: true
to: string // callTo: true
value: bigint // callValue: true
gas: bigint // callGas: true
sighash: string // callSighash: true
input: string // callInput: true
}
result?: {
gasUsed: bigint // callResultGasUsed: true
output: string // callResultOutput: true
}
// if (type==='suicide')
action: {
address: string // suicideAddress: true
refundAddress: string // suicideRefundAddress: true
balance: bigint // suicideBalance: true
}
// if (type==='reward')
action: {
author: string // rewardAuthor: true
value: bigint // rewardValue: true
type: string // rewardType: true
}
}
```
### Block headers
`BlockHeader` data items may have the following fields:
```ts theme={"system"}
BlockHeader{
// independent of field selectors
hash: string
height: number
id: string
parentHash: string
// can be disabled with field selectors
timestamp: number
// can be requested with field selectors
nonce?: string
sha3Uncles: string
logsBloom: string
transactionsRoot: string
stateRoot: string
receiptsRoot: string
mixHash?: string
miner: string
difficulty?: bigint
totalDifficulty?: bigint
extraData: string
size: bigint
gasLimit: bigint
gasUsed: bigint
baseFeePerGas?: bigint
// limited availability (see below)
l1BlockNumber: number
}
```
The `l1BlockNumber` field can only be requested for networks from [this list](/en/data/evm). Requesting it for other networks may cause HTTP 500 responses.
## A complete example
```ts theme={"system"}
import {EvmBatchProcessor} from '@subsquid/evm-processor'
import * as gravatarAbi from './abi/gravatar'
import * as erc721abi from './abi/erc721'
import {TypeormDatabase} from '@subsquid/typeorm-store'
const gravatarRegistryContract = '0x2e645469f354bb4f5c8a05b3b30a929361cf77ec'
const gravatarTokenContract = '0xac5c7493036de60e63eb81c5e9a440b42f47ebf5'
const processor = new EvmBatchProcessor()
.setGateway('https://v2.archive.subsquid.io/network/ethereum-mainnet')
.setRpcEndpoint('')
.setFinalityConfirmation(75)
.setBlockRange({ from: 6_000_000 })
.addLog({
address: [
gravatarRegistryContract
],
topic0: [
gravatarAbi.events.NewGravatar.topic,
gravatarAbi.events.UpdatedGravatar.topic,
]
})
.addTransaction({
to: [
gravatarTokenContract
],
range: { from: 15_500_000 },
sighash: [
erc721abi.functions.setApprovalForAll.sighash
]
})
.setFields({
log: {
topics: true,
data: true,
},
transaction: {
from: true,
input: true,
to: true
}
})
processor.run(new TypeormDatabase(), async (ctx) => {
// Simply output all the items in the batch.
// It is guaranteed to have all the data matching the data requests,
// but not guaranteed to not have any other data.
ctx.log.info(ctx.blocks, "Got blocks")
})
```
# EVM Batch General Settings
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/processors/evm-batch/general
General settings for Squid SDK EvmBatchProcessor — configure RPC, archive endpoint, batch size, finality, prometheus metrics, and runtime options.
The method documentation is also available inline and can be accessed via suggestions in most IDEs.
The following setters configure the global settings of `EvmBatchProcessor`. They return the modified instance and can be chained.
Certain configuration methods are required:
* one or both of [`setGateway()`](#set-gateway) and [`setRpcEndpoint()`](#set-rpc-endpoint)
* [`setFinalityConfirmation()`](#set-finality-confirmation) whenever [RPC ingestion](/en/sdk/squid-sdk/resources/unfinalized-blocks) is enabled, namely when
* a RPC endpoint was configured with [`setRpcEndpoint()`](#set-rpc-endpoint)
* RPC ingestion has **NOT** been explicitly disabled by calling [`setRpcDataIngestionSettings({ disabled: true })`](#set-rpc-data-ingestion-settings)
Here's how to choose the data sources depending on your use case:
* If you need real-time data and your network [has a SQD Network gateway](/en/data/evm), use both [`setGateway()`](#set-gateway) and [`setRpcEndpoint()`](#set-rpc-endpoint). The processor will obtain as much data as is currently available from the network, then switch to ingesting recent data from the RPC endpoint.
* If you can tolerate your data being several thousands of blocks behind the chain head, you do not want to use a RPC endpoint and your network [has a SQD Network gateway](/en/data/evm), use [`setGateway()`](#set-gateway) only.
* If your EVM network does not have a SQD Network gateway, use [`setRpcEndpoint()`](#set-rpc-endpoint) only. You can use this regime to [work with local development nodes](/en/sdk/squid-sdk/tutorials/evm-local).
* If your squid uses [direct RPC queries](/en/sdk/squid-sdk/resources/tools/typegen/state-queries) then [`setRpcEndpoint()`](#set-rpc-endpoint) is a hard requirement. You can reduce the RPC usage by adding a Network data source with [`setGateway()`](#set-gateway). Further, if you can tolerate a latency of a few thousands of blocks, you can disable RPC ingestion with [`setRpcDataIngestionSettings({ disabled: true })`](#set-rpc-data-ingestion-settings). In this scenario RPC will only be used for the queries you explicitly make in your code.
### `setGateway(url: string | GatewaySettings)`
Adds a [SQD Network](/en/network) data source. The argument is either a string URL of a SQD Network gateway or
```ts theme={"system"}
{
url: string // gateway URL
requestTimeout?: number // in milliseconds
}
```
See [EVM gateways](/en/data/evm).
### `setRpcEndpoint(rpc: ChainRpc)`
Adds a RPC data source. If added, it will be used for
* [RPC ingestion](/en/sdk/squid-sdk/resources/unfinalized-blocks) (unless explicitly disabled with [`setRpcDataIngestionSettings()`](#set-rpc-data-ingestion-settings))
* any [direct RPC queries](/en/sdk/squid-sdk/resources/tools/typegen/state-queries) you make in your squid code
A node RPC endpoint can be specified as a string URL or as an object:
```ts theme={"system"}
type ChainRpc = string | {
url: string // http, https, ws and wss are supported
capacity?: number // num of concurrent connections, default 10
maxBatchCallSize?: number // default 100
rateLimit?: number // requests per second, default is no limit
requestTimeout?: number // in milliseconds, default 30_000
headers: Record // http headers
}
```
Setting `maxBatchCallSize` to `1` disables batching completely.
We recommend using private endpoints for better performance and stability of your squids. For SQD Cloud deployments you can use the [RPC addon](/en/cloud/resources/rpc-proxy). If you use an external private RPC, keep the endpoint URL in a [Cloud secret](/en/cloud/resources/env-variables#secrets).
### `setDataSource(ds: {archive?: string, chain?: ChainRpc})` (deprecated)
Replaced by [`setGateway()`](#set-gateway) and [`setRpcEndpoint()`](#set-rpc-endpoint).
### `setRpcDataIngestionSetting(settings: RpcDataIngestionSettings)`
Specify the [RPC ingestion](/en/sdk/squid-sdk/resources/unfinalized-blocks) settings.
```ts theme={"system"}
type RpcDataIngestionSettings = {
disabled?: boolean
preferTraceApi?: boolean
useDebugApiForStateDiffs?: boolean
debugTraceTimeout?: string
headPollInterval?: number
newHeadTimeout?: number
}
```
Here,
* `disabled`: Explicitly disables data ingestion from an RPC endpoint.
* `preferTraceApi`: By default, [`debug_traceBlockByHash`](https://geth.ethereum.org/docs/interacting-with-geth/rpc/ns-debug#debugtraceblockbyhash) is used to obtain [call traces](/en/sdk/squid-sdk/reference/processors/evm-batch/traces). This flag instructs the processor to utilize [`trace_` methods](https://openethereum.github.io/JSONRPC-trace-module) instead. This setting is only effective for finalized blocks.
* `useDebugApiForStateDiffs`: By default, [`trace_replayBlockTransactions`](https://openethereum.github.io/JSONRPC-trace-module#trace_replayblocktransactions) is used to obtain [state diffs](/en/sdk/squid-sdk/reference/processors/evm-batch/state-diffs) for finalized blocks. This flag instructs the processor to utilize [`debug_traceBlockByHash`](https://geth.ethereum.org/docs/interacting-with-geth/rpc/ns-debug#debugtraceblockbyhash) instead. This setting is only effective for finalized blocks. **WARNING:** this will significantly increase the amount of data retrieved from the RPC endpoint. Expect download rates in the megabytes per second range.
* `debugTraceTimeout`: If set, the processor will pass the `timeout` parameter to [debug trace config](https://geth.ethereum.org/docs/interacting-with-geth/rpc/ns-debug#traceconfig).
* `headPollInterval`: Poll interval for new blocks in milliseconds. Poll mechanism is used to get new blocks via HTTP connections. Default: 5000.
* `newHeadTimeout`: When ingesting from a websocket, this setting specifies the timeout in milliseconds after which the connection will be reset and subscription re-initiated if no new blocks were received. Default: no timeout.
### `setFinalityConfirmation(nBlocks: number)`
Sets the number of blocks after which the processor will consider the consensus data final. Use a value appropriate for your network. For example, for Ethereum mainnet a widely cited value is 15 minutes/75 blocks.
### `setBlockRange({from: number, to?: number})`
Limits the range of blocks to be processed. When the upper bound is specified, processor will terminate with exit code 0 once it reaches it.
Note that block ranges can also be specified separately for each data request. This method sets global bounds for all block ranges in the configuration.
### `includeAllBlocks(range?: {from: number, to?: number})`
By default, processor will fetch only blocks which contain requested items. This method modifies such behavior to fetch all chain blocks. Optionally a range of blocks can be specified for which the setting should be effective.
### `setPrometheusPort(port: string | number)`
Sets the port for a built-in prometheus health metrics server (serving at `http://localhost:${port}/metrics`). By default, the value of PROMETHEUS\_PORT environment variable is used. When it is not set, processor will pick an ephemeral port.
# Event logs
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/processors/evm-batch/logs
Subscribe to EVM event log data with addLog() in Squid SDK — filter by contract address, topic, and signature for efficient EVM event indexing.
#### `addLog(options)`
Get event logs emitted by some *or all* contracts in the network. `options` has the following structure:
```typescript theme={"system"}
{
// data requests
address?: string[]
topic0?: string[]
topic1?: string[]
topic2?: string[]
topic3?: string[]
range?: {from: number, to?: number}
// related data retrieval
transaction?: boolean
transactionLogs?: boolean
transactionTraces?: boolean
}
```
Data requests:
* `address`: the set of addresses of contracts emitting the logs. Omit to subscribe to events from all contracts in the network.
* `topicN`: the set of values of topicN.
* `range`: the range of blocks to consider.
Related data retrieval:
* `transaction = true`: the processor will retrieve all parent transactions and add them to the `transactions` iterable within the [block data](/en/sdk/squid-sdk/reference/processors/evm-batch/context-interfaces). Additionally it will expose them via the `.transaction` field of each log item.
* `transactionLogs = true`: the processor will retrieve all "sibling" logs, that is, all logs emitted by transactions that emitted at least one matching log. The logs will be exposed through the regular `logs` block data iterable and via `.transaction.logs` for matching logs.
* `transactionTraces = true`: the processor will retrieve the traces for all transactions that emitted at least one matching log. The traces will be exposed through the regular `traces` block data iterable and via `.transaction.traces`.
Note that logs can also be requested by [`addTransaction()`](/en/sdk/squid-sdk/reference/processors/evm-batch/transactions) and [`addTrace()`](/en/sdk/squid-sdk/reference/processors/evm-batch/traces) method as related data.
Selection of the exact data to be retrieved for each log and its optional parent transaction is done with the `setFields()` method documented on the [Field selection](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection) page. Some examples are available below.
## Examples
1. Fetch `NewGravatar(uint256,address,string,string)` and `UpdateGravatar(uint256,address,string,string)` event logs emitted by `0x2E645469f354BB4F5c8a05B3b30A929361cf77eC`. For each log, fetch topic set, log data. Fetch parent transactions with their inputs.
```ts theme={"system"}
const processor = new EvmBatchProcessor()
.setGateway('https://v2.archive.subsquid.io/network/ethereum-mainnet')
.setRpcEndpoint('')
.setFinalityConfirmation(75)
.addLog({
address: ['0x2e645469f354bb4f5c8a05b3b30a929361cf77ec'],
topic0: [
// topic: 'NewGravatar(uint256,address,string,string)'
'0x9ab3aefb2ba6dc12910ac1bce4692cf5c3c0d06cff16327c64a3ef78228b130b',
// topic: 'UpdatedGravatar(uint256,address,string,string)'
'0x76571b7a897a1509c641587568218a290018fbdc8b9a724f17b77ff0eec22c0c',
],
transaction: true
})
.setFields({
log: {
topics: true,
data: true
},
transaction: {
input: true
}
})
```
Typescript ABI modules generated by [`squid-evm-typegen`](/en/sdk/squid-sdk/resources/tools/typegen/state-queries) provide event signatures/topic0 values as constants, e.g.
```ts theme={"system"}
import * as gravatarAbi from './abi/gravatar'
// ...
topic0: [
gravatarAbi.events.NewGravatar.topic,
gravatarAbi.events.UpdatedGravatar.topic,
],
// ...
```
2. Fetch every `Transfer(address,address,uint256)` event on Ethereum mainnet where *topic2* is set to the destination address (a common but [non-standard](https://eips.ethereum.org/EIPS/eip-20) practice) and the destination is `vitalik.eth` a.k.a. `0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045`. For each log, fetch transaction hash.
```ts theme={"system"}
const processor = new EvmBatchProcessor()
.setGateway('https://v2.archive.subsquid.io/network/ethereum-mainnet')
.setRpcEndpoint('')
.setFinalityConfirmation(75)
.addLog({
topic0: [
// topic0: 'Transfer(address,address,uint256)'
'0xddf252ad1be2c89b69c2b068fc378daa952ba7f163c4a11628f55a4df523b3ef'
],
topic2: [
// vitalik.eth
'0x000000000000000000000000d8da6bf26964af9d7eed9e03e53415d37aa96045'
]
})
.setFields({
log: {
transactionHash: true
}
})
```
As you may observe, the address in the `topic2` is a bit longer than usual (`0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045`, 42 chars).
This is caused by the fact that Squid SDK expects `Bytes32[]`; therefore, the length has to be 66 chars long.
The possible quick fix is to pad the original address with zeros and prepend `0x`.
```ts theme={"system"}
const address = '0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045'
const topic = '0x' + address.replace('x', '0').padStart(64, '0').toLowerCase()
```
# Storage state diffs
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/processors/evm-batch/state-diffs
Track EVM storage state changes with addStateDiff() in Squid SDK — subscribe to storage slot writes for any contract address with selective filtering.
State diffs for historical blocks are [currently available](/en/data/evm) from [SQD Network](/en/network) on the same basis as all other data stored there: for free. If you deploy a squid that indexes traces [in real-time](/en/sdk/squid-sdk/resources/unfinalized-blocks) to SQD Cloud and use our [RPC addon](/en/cloud/resources/rpc-proxy), the necessary `trace_` or `debug_` RPC calls made will be counted alongside all other calls and [the price](/en/cloud/pricing#rpc-requests) will be computed for the total count. There are no surcharges for traces or state diffs.
#### `addStateDiff(options)`
Subscribe to changes in the [contract storage](https://coinsbench.com/solidity-layout-and-access-of-storage-variables-simply-explained-1ce964d7c738). This allows for tracking the contract state changes that are difficult to infer from events or transactions, such as the changes that take into account the output of internal calls. `options` has the following structure:
```typescript theme={"system"}
{
// data requests
address?: string[]
key?: string[]
kind?: ('=' | '+' | '*' | '-')[]
range?: {from: number, to?: number}
// related data retrieval
transaction?: boolean
}
```
The data requests here are:
* `address`: the set of addresses of contracts to track. Leave undefined to subscribe to state changes of all contracts from the whole network.
* `key`: the set of storage keys that should be tracked. Regular hexadecimal contract storage keys and [special keys](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection#state-diffs) (`'balance'`, `'code'`, `'nonce'`) are allowed. Leave undefined to subscribe to all state changes.
* `kind`: the set of diff kinds that should be tracked. Refer to the [`StateDiff` section](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection#state-diffs) of data items documentation for an explanation of the meaning of the permitted values.
* `range`: the range of blocks within which the storage changes should be tracked.
Enabling the `transaction` flag will cause the processor to retrieve the transaction that gave rise to each state change and add it to the [`transactions` iterable of block data](/en/sdk/squid-sdk/reference/processors/evm-batch/context-interfaces).
Note that state diffs can also be requested by the [`addTransaction()`](/en/sdk/squid-sdk/reference/processors/evm-batch/transactions) method as related data.
Selection of the exact data to be retrieved for each state diff item and its optional parent transaction is done with the `setFields()` method documented on the [Field selection](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection) page. Unlike other data items, state diffs do not have any fields that can be enabled, but some can be disabled for improved sync performance.
# Traces
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/processors/evm-batch/traces
Retrieve EVM internal call traces with addTrace() in Squid SDK — subscribe to nested contract calls, reverts, and create operations during indexing.
Traces for historical blocks are [currently available](/en/data/evm) from [SQD Network](/en/network) on the same basis as all other data stored there: for free. If you deploy a squid that indexes traces [in real-time](/en/sdk/squid-sdk/resources/unfinalized-blocks) to SQD Cloud and use our [RPC addon](/en/cloud/resources/rpc-proxy), the necessary `trace_` or `debug_` RPC calls made will be counted alongside all other calls and [the price](/en/cloud/pricing#rpc-requests) will be computed for the total count. There are no surcharges for traces or state diffs.
#### `addTrace(options)`
Subscribe to [execution traces](https://geth.ethereum.org/docs/interacting-with-geth/rpc/ns-debug#debugtraceblockbyhash). This allows for tracking internal calls. The `options` object has the following structure:
```typescript theme={"system"}
{
// data requests
callTo?: string[]
callFrom?: string[]
callSighash?: string[]
createFrom?: string[]
rewardAuthor?: string[]
suicideRefundAddress?: string[]
type?: string[]
range?: {from: number, to?: number}
// related data retrieval
transaction?: boolean
transactionLogs?: boolean
subtraces?: boolean
parents?: boolean
}
```
The data requests here are:
* `type`: get traces of types from this set. Allowed types are `'create' | 'call' | 'suicide' | 'reward'`.
* `callTo`: get `call` traces *to* the addresses in this set.
* `callFrom`: get `call` traces *from* the addresses in this set.
* `callSighash`: get `call` traces with signature hashes in this set.
* `createFrom`: get `create` traces *from* the addresses in this set.
* `rewardAuthor`: get `reward` traces where block authors are in this set.
* `suicideRefundAddress`: get `suicide` traces where refund addresses in this set.
* `range`: get traces from transactions from this range of blocks.
Related data retrieval:
* `transaction = true` will cause the processor to retrieve transactions that the traces belong to.
* `transactionLogs = true` will cause the processor to retrieve all logs emitted by transactions that the traces belong to.
* `subtraces = true` will cause the processor to retrieve downstream traces in addition to those that matched the data requests.
* `parents = true` will cause the processor to retrieve upstream traces in addition to those that matched the data requests.
These extra data items will be added to the appropriate iterables within the [block data](/en/sdk/squid-sdk/reference/processors/evm-batch/context-interfaces).
Note that traces can also be requested by [`addTransaction()`](/en/sdk/squid-sdk/reference/processors/evm-batch/transactions) and [`addLog()`](/en/sdk/squid-sdk/reference/processors/evm-batch/logs) method as related data.
Selection of the exact data to be retrieved for each trace item is done with the `setFields()` method documented on the [Field selection](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection) page. Be aware that field selectors for traces do not share their names with the fields of trace data items, unlike field selectors for other data item types. This is due to traces varying their structure depending on the value of the `type` field.
## Examples
### Exploring internal calls of a given transaction
For a [`mint` call to Uniswap V3 Positions NFT](https://etherscan.io/tx/0xf178718219151463aa773deaf7d9367b8408e35a624550af975e089ca6e015ca).
```ts theme={"system"}
import {EvmBatchProcessor} from '@subsquid/evm-processor'
import {TypeormDatabase} from '@subsquid/typeorm-store'
const TARGET_TRANSACTION = '0xf178718219151463aa773deaf7d9367b8408e35a624550af975e089ca6e015ca'
const TO_CONTRACT = '0xc36442b4a4522e871399cd717abdd847ab11fe88' // Uniswap v3 Positions NFT
const METHOD_SIGHASH = '0x88316456' // mint
const processor = new EvmBatchProcessor()
.setGateway('https://v2.archive.subsquid.io/network/ethereum-mainnet')
.setRpcEndpoint('')
.setFinalityConfirmation(75)
.setBlockRange({ from: 16962349, to: 16962349 })
.addTransaction({
to: [TO_CONTRACT],
sighash: [METHOD_SIGHASH],
traces: true
})
.setFields({ trace: { callTo: true } })
processor.run(new TypeormDatabase(), async ctx => {
let involvedContracts = new Set()
let traceCount = 0
for (let block of ctx.blocks) {
for (let trc of block.traces) {
if (trc.type === 'call' && trc.transaction?.hash === TARGET_TRANSACTION) {
involvedContracts.add(trc.action.to)
traceCount += 1
}
}
}
console.log(`txn ${TARGET_TRANSACTION} had ${traceCount-1} internal transactions`)
console.log(`${involvedContracts.size} contracts were involved in txn ${TARGET_TRANSACTION}:`)
involvedContracts.forEach(c => { console.log(c) })
})
```
### Grabbing addresses of all contracts ever created on Ethereum
Full code is available in [this branch](https://github.com/subsquid-labs/grab-all-contracts/tree/ascetic). WARNING: will contain addresses of some contracts that failed to deploy.
```ts theme={"system"}
import {EvmBatchProcessor} from '@subsquid/evm-processor'
import {TypeormDatabase} from '@subsquid/typeorm-store'
import {CreatedContract} from './model'
const processor = new EvmBatchProcessor()
.setGateway('https://v2.archive.subsquid.io/network/ethereum-mainnet')
.setFields({
trace: {
createResultAddress: true,
},
})
.addTrace({
type: ['create'],
transaction: true,
})
processor.run(new TypeormDatabase({supportHotBlocks: false}), async (ctx) => {
const contracts: Map = new Map()
const addresses: Set = new Set()
for (let c of ctx.blocks) {
for (let trc of c.traces) {
if (trc.type === 'create' &&
trc.result?.address != null &&
trc.transaction?.hash !== undefined) {
contracts.set(trc.result.address, new CreatedContract({id: trc.result.address}))
}
}
}
await ctx.store.upsert([...contracts.values()])
})
```
Currently there is no convenient way to check whether a trace had effect on the chain state, but this feature will be added in future releases.
# EVM Transactions
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/processors/evm-batch/transactions
Subscribe to EVM transaction data with addTransaction() in Squid SDK — filter by from, to, signature, and contract address for efficient indexing.
#### `addTransaction(options)`
Get some *or all* transactions on the network. `options` has the following structure:
```typescript theme={"system"}
{
// data requests
from?: string[]
to?: string[]
sighash?: string[]
range?: {from: number, to?: number}
// related data retrieval
logs?: boolean
stateDiffs?: boolean
traces?: boolean
}
```
Data requests:
* `from` and `to`: the sets of addresses of tx senders and receivers. Omit to subscribe to transactions from/to any address.
* `sighash`: [first four bytes](https://ethereum.org/en/developers/docs/transactions/#the-data-field) of the Keccak hash (SHA3) of the canonical representation of the function signature. Omit to subscribe to any transaction.
* `range`: the range of blocks to consider.
Enabling the `stateDiffs`, `traces` and/or `logs` flags will cause the processor to retrieve [state diffs](/en/sdk/squid-sdk/reference/processors/evm-batch/state-diffs), [traces](/en/sdk/squid-sdk/reference/processors/evm-batch/traces) and/or event logs that occurred as a result of each selected transaction. The data will be added to the appropriate iterables within the [block data](/en/sdk/squid-sdk/reference/processors/evm-batch/context-interfaces).
Note that transactions can also be requested by [`addLog()`](/en/sdk/squid-sdk/reference/processors/evm-batch/logs), [`addStateDiff()`](/en/sdk/squid-sdk/reference/processors/evm-batch/state-diffs) and [`addTrace()`](/en/sdk/squid-sdk/reference/processors/evm-batch/traces) as related data.
Selection of the exact data to be retrieved for each transaction and the optional related data items is done with the `setFields()` method documented on the [Field selection](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection) page. Some examples are available below.
Typescript ABI modules generated by [`squid-evm-typegen`](/en/sdk/squid-sdk/resources/tools/typegen/state-queries) provide function sighashes as constants, e.g.
```ts theme={"system"}
import * as erc20abi from './abi/erc20'
// ...
sighash: [erc20abi.functions.transfer.sighash],
// ...
```
## Examples
1. Request all EVM calls to the contract `0x6a2d262D56735DbA19Dd70682B39F6bE9a931D98`:
```ts theme={"system"}
processor.addTransaction({to: ['0x6a2d262d56735dba19dd70682b39f6be9a931d98']})
```
2. Request all transactions matching sighash of `transfer(address,uint256)`:
```ts theme={"system"}
processor.addTransaction({sighash: ['0xa9059cbb']})
```
3. Request all `transfer(address,uint256)` calls to the specified addresses, from block `6_000_000` onwards and fetch their inputs. Also retrieve all logs emitted by these calls.
```ts theme={"system"}
processor
.addTransaction({
to: [
'0x6a2d262d56735dba19dd70682b39f6be9a931d98',
'0x3795c36e7d12a8c252a20c5a7b455f7c57b60283'
],
sighash: [
'0xa9059cbb'
],
range: {
from: 6_000_000
},
logs: true
})
.setFields({
transaction: {
input: true
}
})
```
4. Mine all transactions to and from Vitalik Buterin's address [`vitalik.eth`](https://etherscan.io/address/vitalik.eth). Fetch the involved addresses, ETH value and hash for each transaction. Get execution traces with the [default fields](/en/sdk/squid-sdk/reference/processors/evm-batch/field-selection#transactions) for outgoing transactions.
```ts theme={"system"}
const VITALIK_ETH = '0xd8dA6BF26964aF9D7eEd9e03E53415D37aA96045'.toLowerCase()
const processor = new EvmBatchProcessor()
.setGateway('https://v2.archive.subsquid.io/network/ethereum-mainnet')
.setRpcEndpoint('')
.setFinalityConfirmation(75)
.addTransaction({
to: [VITALIK_ETH]
})
.addTransaction({
from: [VITALIK_ETH],
traces: true
})
.setFields({
transaction: {
from: true,
to: true,
value: true,
hash: true
}
})
processor.run(new TypeormDatabase(), async (ctx) => {
for (let c of ctx.blocks) {
for (let txn of c.transactions) {
if (txn.to === VITALIK_ETH || txn.from === VITALIK_ETH) {
// just output the tx data to console
ctx.log.info(txn, 'Tx:')
}
}
}
})
```
# Entities
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/schema-file/entities
Define high-level Squid SDK entities in schema.graphql — entity types, scalar fields, derived fields, and indexes that drive code generation.
Entities are defined by root-level GraphQL types decorated with `@entity`. Names and properties of entities are expected to be camelCased. They are converted into snake\_case for use as the corresponding database table and column names. The primary key column is always mapped to the entity field of a special `ID` type mapped as string (`varchar`). Non-nullable fields are marked with an exclamation mark (`!`) and are nullable otherwise.
The following [scalar types](https://graphql.org/learn/schema/#scalar-types) are supported by the `schema.graphql` dialect:
* `String` (mapped to `text`)
* `Int` (mapped to `int4`)
* `Float` (mapped to `numeric`, ts type `number`)
* `Boolean` (mapped to `bool`)
* `DateTime` (mapped to `timestamptz`, ts type `Date`)
* `BigInt` (mapped to `numeric`, ts type `bigint`)
* `BigDecimal` (mapped to `numeric`, ts type `BigDecimal` of [`@subsquid/big-decimal`](https://www.npmjs.com/package/@subsquid/big-decimal))
* `Bytes` (mapped to `bytea`, ts type `UInt8Array`)
* `JSON` (mapped to `jsonb`, ts type `unknown`)
* Enums (mapped to `text`)
* User-defined scalars (non-entity types). Such properties are mapped as `jsonb` columns.
**Example**
```graphql theme={"system"}
type Scalar @entity {
id: ID!
boolean: Boolean
string: String
enum: Enum
bigint: BigInt
dateTime: DateTime
bytes: Bytes
json: JSON
deep: DeepScalar
}
type DeepScalar {
bigint: BigInt
dateTime: DateTime
bytes: Bytes
boolean: Boolean
}
enum Enum {
A B C
}
```
## Arrays
An entity field can be an array of any scalar type except `BigInt` and `BigDecimal`. It will be mapped to the corresponding Postgres array type. Array elements may be defined as nullable or non-nullable.
**Example**
```graphql theme={"system"}
type Lists @entity {
id: ID!
intArray: [Int!]!
enumArray: [Enum!]
datetimeArray: [DateTime!]
bytesArray: [Bytes!]
listOfListsOfInt: [[Int]]
listOfJsonObjects: [Foo!]
}
enum Enum {
A B C D E F
}
type Foo {
foo: Int
bar: Int
}
```
# Entity relations
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/schema-file/entity-relations
Define one-to-many and many-to-many entity relations in Squid SDK schema files — @derivedFrom, @relation, and TypeORM-friendly conventions.
# Entity relations and inverse lookups
The term "entity relation" refers to the situation when an entity instance contains an instance of another entity within one of its fields. Type-wise this means that some entity (called the *owning entity*) has a field of a type that is some other, *non-owning* entity. Within the database, this is implemented as an (automatically indexed) foreign key column within the table mapped to the owning entity. A `fieldName` entity-typed field will map to a column named `field_name_id`.
[One-to-one](https://github.com/typeorm/typeorm/blob/master/docs/one-to-one-relations.md) and [one-to-many](https://github.com/typeorm/typeorm/blob/master/docs/many-to-one-one-to-many-relations.md) relations are supported by Typeorm. The "many" side of the one-to-many relations is always the owning side. Many-to-many relations are modeled as [two one-to-many relations with an explicit join table](#many-to-many-relations).
An entity relation is always unidirectional, but it is possible to request the data on the owning entity from the non-owning one. To do so, define a field decorated `@derivedFrom` in the schema. Doing so will cause the Typeorm code generated by [`squid-typeorm-codegen`](/en/sdk/squid-sdk/resources/persisting-data/typeorm) and the GraphQL API served by [`squid-graphql-server`](/en/sdk/squid-sdk/reference/openreader-server/overview) to show a virtual (that is, **not mapping to a database column**) field populated via inverse lookup queries.
The following examples illustrate the concepts.
## One-to-one relations
```graphql theme={"system"}
type Account @entity {
id: ID!
balance: BigInt!
user: User @derivedFrom(field: "account")
}
type User @entity {
id: ID!
account: Account! @unique
username: String!
creation: DateTime!
}
```
The `User` entity references `Account` and owns the one-to-one relation. This is implemented as follows:
* On the database side: the `account` property of the `User` entity maps to the `account_id` foreign key column of the `user` table referencing the `account` table.
* On the TypeORM side: the `account` property of the `User` entity gets decorated with `@OneToOne` and `@JoinColumn`.
* On the GraphQL side: sub-selection of the `account` property is made available in `user`-related queries. Sub-selection of the `user` property is made available in `account`-related queries.
Unlike for the many-to-one case, the codegen will not add a virtual reverse lookup property to the TypeORM code for one-to-one relations. You can add it manually:
```typescript title="src/model/generated/account.model.ts" theme={"system"}
import {OneToOne as OneToOne_} from "typeorm"
@Entity_()
export class Account {
// ...
@OneToOne_(() => User, e => e.account)
user: User
}
```
If you are using this feature, please let us know at [the SquidDevs Telegram channel](https://t.me/HydraDevs).
## Many-to-one/One-to-many relations
```graphql theme={"system"}
type Account @entity {
"Account address"
id: ID!
transfersTo: [Transfer!] @derivedFrom(field: "to")
transfersFrom: [Transfer!] @derivedFrom(field: "from")
}
type Transfer @entity {
id: ID!
to: Account!
from: Account!
amount: BigInt!
}
```
Here `Transfer` defines owns the two relations and `Account` defines the corresponding inverse lookup properties. This is implemented as follows:
* On the database side: the `from` and `to` properties of the `Transfer` entity map to `from_id` and `to_id` foreign key columns of the `transfer` table referencing the `account` table.
* On the TypeORM side: properties `to` and `from` of the `Transfer` entity class get decorated with `@ManyToOne`. Properties `transfersTo` and `transfersFrom` decorated with `@OneToMany` get added to the `Account` entity class.
* On the GraphQL side: sub-selection of all relation-defined properties is made available in the schema.
## Many-to-many relations
Many-to-many entity relations should be modeled as two one-to-many relations with an explicitly defined join table.
Here is an example:
```graphql theme={"system"}
# an explicit join table
type TradeToken @entity {
id: ID! # This is required, even if useless
trade: Trade!
token: Token!
}
type Token @entity {
id: ID!
symbol: String!
trades: [TradeToken!]! @derivedFrom(field: "token")
}
type Trade @entity {
id: ID!
tokens: [TradeToken!]! @derivedFrom(field: "trade")
}
```
# Indexes and constraints
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/schema-file/indexes-and-constraints
Annotate Squid SDK schema fields with @index and @unique to speed up GraphQL queries — match indexes to your most common entity filter patterns.
# Indexes and unique constraints
The lack of indices is the most common cause of slow API queries
It is crucial to add database indexes to the entity fields on which one expects filtering and ordering. To add an index to a column, the corresponding entity field must be decorated with `@index`. The corresponding entity field will be decorated with [TypeORM `@Index()`](https://typeorm.io/indices#column-indices).
One can additionally decorate the field with `@unique` to enforce uniqueness. It corresponds to the [`@Index({ unique: true })`](https://typeorm.io/indices#unique-indices) TypeORM decorator.
### Example
```graphql theme={"system"}
type Transfer @entity {
id: ID!
to: Account!
amount: BigInt! @index
fee: BigInt! @index @unique
}
```
## Multi-column indices
Multi-column indices are defined on the entity level, with an optional `unique` constraint.
### Example
```graphql theme={"system"}
type Foo @entity @index(fields: ["foo", "bar"]) @index(fields: ["bar", "baz"])
{
id: ID!
bar: Int!
baz: [Enum!]
foo: String!
type Extrinsic @entity @index(fields: ["hash", "block"], unique: true) {
id: ID!
hash: String! @unique
block: String!
}
```
# Interfaces
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/schema-file/interfaces
Define queryable GraphQL interfaces in Squid SDK schema files — share fields across entity types and run polymorphic queries via OpenReader.
The schema file supports [GraphQL Interfaces](https://graphql.org/learn/schema/#interfaces) for modelling complex types sharing common traits. Interfaces are annotated with `@query` at the type level and do not affect the database schema, only enriching the GraphQL API queries with [inline fragments](https://graphql.org/learn/queries/#inline-fragments).
Currently, only [OpenReader](/en/sdk/squid-sdk/reference/openreader-server) supports GraphQL interfaces defined in the schema file.
### Examples
```graphql theme={"system"}
interface MyEntity @query {
id: ID!
name: String
ref: Ref
}
type Ref @entity {
id: ID!
name: String
foo: Foo! @unique
bar: Bar! @unique
}
type Foo implements MyEntity @entity {
id: ID!
name: String
ref: Ref @derivedFrom(field: "foo")
foo: Int
}
type Bar implements MyEntity @entity {
id: ID!
name: String
ref: Ref @derivedFrom(field: "bar")
bar: Int
}
type Baz implements MyEntity @entity {
id: ID!
name: String
ref: Ref
baz: Int
}
```
The `MyEntity` interface above enables `myEntities` and `myEntitiesConnection` [GraphQL API queries](/en/sdk/squid-sdk/reference/openreader-server/api) with inline fragments and the `_type`, `__typename` [meta fields](https://graphql.org/learn/queries/#meta-fields):
```graphql theme={"system"}
query {
myEntities(orderBy: [_type_DESC, id_ASC]) {
id
name
ref {
id
name
}
__typename
... on Foo { foo }
... on Bar { bar }
... on Baz { baz }
}
}
```
# Schema file and codegen
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/schema-file/intro
Introduction to the Squid SDK schema.graphql file and codegen tool — define entities, run sqd codegen, and generate TypeORM models for storage.
The schema file `schema.graphql` uses a GraphQL dialect to model the target entities and entity relations. The tooling around the schema file is then used to:
* Generate TypeORM entities (with `squid-typeorm-codegen(1)`, see below)
* Generate the database schema from the TypeORM entities (see [db migrations](/en/sdk/squid-sdk/resources/persisting-data/typeorm))
* Optionally, the schema can be used to present the target data with a [GraphQL API](/en/sdk/squid-sdk/resources/serving-graphql).
The schema file format is loosely compatible with the [subgraph schema](https://thegraph.com/docs/en/developing/creating-a-subgraph/) file, see [Migrate from subgraph](/en/sdk/squid-sdk/resources/migrate/migrate-subgraph) section for details.
## TypeORM codegen
The [`squid-typeorm-codegen(1)`](https://github.com/subsquid/squid-sdk/tree/master/typeorm/typeorm-codegen) tool is used to generate [TypeORM entity](https://typeorm.io/) classes from the schema defined in `schema.graphql`. Invoke it with
```bash theme={"system"}
npx squid-typeorm-codegen
```
By default the entity classes are generated in `src/model/generated`.
### Example
A `Foo` entity defined in the schema file:
```graphql title="schema.graphql" theme={"system"}
type Foo @entity {
id: ID!
bar: String
baz: BigInt!
}
```
The generated `Foo` entity with TypeORM decorators:
```ts title="src/model/generated/foo.ts" theme={"system"}
import {Entity as Entity_, Column as Column_, PrimaryColumn as PrimaryColumn_} from "typeorm"
import * as marshal from "./marshal"
@Entity_()
export class Foo {
constructor(props?: Partial) {
Object.assign(this, props)
}
@PrimaryColumn_()
id!: string
@Column_("text", {nullable: true})
bar!: string | undefined | null
@Column_("numeric", {transformer: marshal.bigintTransformer, nullable: false})
baz!: bigint
}
```
# Unions and typed JSON
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/schema-file/unions-and-typed-json
Model Squid SDK union types and typed JSON fields in schema.graphql — discriminate variants and store structured JSON in Postgres-backed entities.
Complex scalar types can be modelled using a typed JSON fields together with union types, making safe union types.
## Typed JSON
It is possible to define explicit types for JSON fields. The generated entity classes and the GraphQL API will respect the type definition of the field, enforcing the data integrity.
**Example**
```graphql theme={"system"}
type Entity @entity {
a: A
}
type A {
a: String
b: B
c: JSON
}
type B {
a: A
b: String
e: Entity
}
```
## Union types
One can leverage union types supported both by [Typescript](https://www.typescriptlang.org/docs/handbook/2/everyday-types.html#union-types) and [GraphQL](https://graphql.org/learn/schema/#union-types). The union operator for `schema.graphql` supports only non-entity types, including typed JSON types described above. JSON types, however, are allowed to reference an entity type.
**Example**
```graphql theme={"system"}
type User @entity {
id: ID!
login: String!
}
type Farmer {
user: User!
crop: Int
}
type Degen {
user: User!
bag: String
}
union Owner = Farmer | Degen
type NFT @entity {
name: String!
owner: Owner!
}
```
# bigquery-store
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/store/bigquery
Reference for the @subsquid/bigquery-store package — write Squid SDK indexed data into Google BigQuery datasets with batch loading and schema sync.
# `@subsquid/bigquery-store`
See also the [BigQuery guide](/en/sdk/squid-sdk/resources/persisting-data/bigquery).
## Column types
| Column type | Value type | Dataset column type |
| :----------------------------: | :-------------------------------: | :-------------------------------------------------------------------------------------------------------------------------: |
| `String()` | `string` | [STRING](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#string_type) |
| `Numeric(precision, scale)` | number \| bigint | [NUMERIC(P\[, S\])](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#parameterized_decimal_type) |
| `BigNumeric(precision, scale)` | number \| bigint | [BIGNUMERIC(P\[, S\])](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#parameterized_decimal_type) |
| `Bool()` | `boolean` | [BOOL](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#boolean_type) |
| `Timestamp()` | `Date` | [TIMESTAMP](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#timestamp_type) |
| `Float64()` | `number` | [FLOAT64](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#floating_point_types) |
| `Int64()` | number \| bigint | [INT64](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#integer_types) |
# CSV support
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/store/file/csv
Write Squid SDK indexed data as CSV files via the Table class — schema, encoding, and per-batch flushing for analytics-friendly file output.
# CSV format support
## `Table` Implementation
The `@subsquid/file-store-csv` package provides a `Table` implementation for writing to CSV files. Use it by [supplying one or more of its instances via the `tables` field of the `Database` constructor argument](/en/sdk/squid-sdk/resources/persisting-data/file#database-options). Constructor of the `Table` implementation accepts the following arguments:
* **`fileName: string`**: the name of the output file in every dataset partition folder.
* **`schema: {[column: string]: ColumnData}`**: a mapping from CSV column names to [`ColumnData` objects](#columns). A mapping of the same keys to data values is the row type used by the [table writer](/en/sdk/squid-sdk/resources/persisting-data/file#table-writer-interface).
* **`options?: TableOptions`**: see [`Table` Options](#table-options).
## Columns
`ColumnData` objects determine how the in-memory data representation of each table column should be serialized. They are made with the `Column` factory function that accepts a column data type and an optional `{nullable?: boolean}` `options` object as arguments.
Column types can be obtained by making the function calls listed below from the `Types` submodule. They determine the type that the [table writer](/en/sdk/squid-sdk/resources/persisting-data/file#table-writer-interface) will expect to find at the corresponding field of data row objects.
| Column type | Type of the data row field |
| :-------------------------------: | :------------------------: |
| `Types.String()` | `string` |
| `Types.Numeric()` | `number` or `bigint` |
| `Types.Boolean()` | `boolean` |
| `Types.DateTime(format?: string)` | `Date` |
| `Types.JSON()` | `T` |
`Types.DateTime` accepts an optional [strftime](https://pubs.opengroup.org/onlinepubs/009695399/functions/strftime.html)-compatible format string. If it is omitted, the dates will be serialized to [ISO strings](https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Date/toISOString).
The type `T` supplied to the `Types.JSON()` generic function must be an object with string keys (extend `{[k: string]: any}`).
## `Table` Options
As its optional final argument, the constructor of `Table` accepts an object that defines table options:
```typescript theme={"system"}
TableOptions {
dialect?: Dialect
header?: boolean
}
```
Here,
* **`dialect`** determines the details of the CSV formatting (see the details below, default: `dialects.excel`)
* **`header`** determines whether a CSV header should be added (default: `true`)
`Dialect` type is defined as follows:
```typescript theme={"system"}
Dialect {
delimiter: string
escapeChar?: string
quoteChar: string
quoting: Quote
lineterminator: string
}
```
where
```typescript theme={"system"}
enum Quote {
ALL, // Put all values in quotes.
MINIMAL, // Only quote strings with special characters.
// A special character is one of the following:
// delimiter, lineterminator, quoteChar.
NONNUMERIC, // Quote strings, booleans, DateTimes and JSONs.
NONE // Do not quote values.
}
```
is the enum determining how the formatted values should be quoted. The quote character is escaped for all values of `quoting`; `Quote.NONE` additionally escapes the rest of the special characters and the escape character.
Two dialect presets are available via the `dialects` object exported by `@subsquid/file-store-csv`:
```typescript theme={"system"}
export let dialects = {
excel: {
delimiter: ',',
quoteChar: '"',
quoting: Quote.MINIMAL,
lineterminator: '\r\n'
},
excelTab: {
delimiter: '\t',
quoteChar: '"',
quoting: Quote.MINIMAL,
lineterminator: '\r\n'
}
}
```
## Example
This saves ERC20 `Transfer` events captured by the processor to TSV (tab-separated values) files. Full squid code is available in [this repo](https://github.com/subsquid-labs/file-store-csv-example).
```typescript theme={"system"}
import {Database, LocalDest} from '@subsquid/file-store'
import {
Column,
Table,
Types,
dialects
} from '@subsquid/file-store-csv'
...
const dbOptions = {
tables: {
TransfersTable: new Table(
'transfers.tsv',
{
from: Column(Types.String()),
to: Column(Types.String()),
value: Column(Types.Numeric())
},
{
dialect: dialects.excelTab,
header: true
}
)
},
dest: new LocalDest('./data'),
chunkSizeMb: 10
}
processor.run(new Database(dbOptions), async (ctx) => {
...
let from: string = ...
let to: string = ...
let value: bigint = ...
ctx.store.TransfersTable.write({ from, to, value })
...
})
```
# JSON support
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/store/file/json
Write Squid SDK indexed data as JSON or JSONL files via the Table class — newline-delimited and array variants with schema validation.
# JSON format support
## `Table` Implementation
The `@subsquid/file-store-json` package provides a `Table` implementation for writing to JSON and [JSONL](https://jsonlines.org) files. Use it by [supplying one or more of its instances via the `tables` field of the `Database` constructor argument](/en/sdk/squid-sdk/resources/persisting-data/file#database-options). The `Table` uses a constructor with the following signature:
```typescript theme={"system"}
Table>(fileName: string, options?: {lines?: boolean})
```
Here,
* **`S`** is a Typescript type describing the schema of the table data.
* **`fileName: string`** is the name of the output file in every dataset partition folder.
* **`options?: {lines?: boolean}`** are table options. At the moment the only available setting is whether to use JSONL instead of a plain JSON array (default: false).
## Example
This saves ERC20 `Transfer` events captured by the processor to a JSONL file where each line is a JSON serialization of a `{from: string, to: string, value: number}` object. Full squid code is available in [this repo](https://github.com/subsquid-labs/file-store-json-example).
```typescript theme={"system"}
import {Database} from '@subsquid/file-store'
import {Table} from '@subsquid/file-store-json'
...
const dbOptions = {
tables: {
TransfersTable: new Table<{
from: string,
to: string,
value: bigint
}>('transfers.jsonl', { lines: true })
},
dest: new LocalDest('./data'),
chunkSizeMb: 10
}
processor.run(new Database(dbOptions), async (ctx) => {
...
let from: string = ...
let to: string = ...
let value: bigint = ...
ctx.store.TransfersTable.write({ from, to, value })
...
})
```
# Parquet support
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/store/file/parquet
Write Squid SDK indexed data as Apache Parquet files via the Table class — columnar storage, optional compression, and schema-driven file output.
# Parquet format support
Support for the Parquet format is currently experimental. Contact us at the [SquidDevs Telegram channel](https://t.me/HydraDevs) for support.
## `Table` Implementation
[Apache Parquet](https://parquet.apache.org) is an advanced format for storing tabular data in files. It divides table columns into [column chunks](https://parquet.apache.org/docs/concepts/). Each column chunk is stored contiguously, allowing efficient partial reads of column subsets. Column chunks can also be compressed with row-specific compression algorithms, further enhancing the performance. Retrieval relies on metadata appended to the end of a Parquet file. [Metadata standard](https://parquet.apache.org/docs/file-format/metadata/) of Apache Parquet is extremely powerful, enabling all sorts of [extensions](https://parquet.apache.org/docs/file-format/extensibility/). Among other things, metadata contains the schema of the data, making the format self-describing.
The `@subsquid/file-store-parquet` package provides a `Table` implementation for writing to Parquet files. Use it by [supplying one or more of its instances via the `tables` field of the `Database` constructor argument](/en/sdk/squid-sdk/resources/persisting-data/file#database-options). Constructor of the `Table` implementation accepts the following arguments:
* **`fileName: string`**: the name of the output file in every dataset partition folder.
* **`schema: {[column: string]: ColumnData}`**: a mapping from Parquet column names to [`ColumnData` objects](#columns). A mapping of the same keys to data values is the row type used by the [table writer](/en/sdk/squid-sdk/resources/persisting-data/file#table-writer-interface).
* **`options?: TableOptions`**: see [`Table` Options](#table-options).
## Columns
`ColumnData` objects define storage options for each table column. They are made with the `Column` factory function that accepts a column data type and an optional `options: ColumnOptions` object.
Column types can be obtained by making the function calls listed below from the `Types` submodule. They determine the [Parquet type](https://parquet.apache.org/docs/file-format/types/) that will be used to store the data and the type that the [table writer](/en/sdk/squid-sdk/resources/persisting-data/file#table-writer-interface) will expect to find at the corresponding field of data row objects.
| Column type | Logical type | Primitive type | Valid data row object field contents |
| :------------------------------------------: | :----------------------------------------------------------------------------------: | :---------------------------------------------: | :--------------------------------------------------------------------------------------------------------: |
| `Types.String()` | variable length string | `BYTE_ARRAY` | `string` of any length |
| `Types.Binary` `(length?)` | variable or fixed length byte array | `BYTE_ARRAY` or `FIXED_LEN_` `BYTE_ARRAY` | `Uint8Array` of length equal to `length` if it is set or of any length otherwise |
| `Types.Int8()` | 8-bit signed integer | `INT32` | `number` from -128 to 127 |
| `Types.Int16()` | 16-bit signed integer | `INT32` | `number` from -32768 to 32767 |
| `Types.Int32()` | 32-bit signed integer | `INT32` | `number` from -2147483648 to 2147483647 |
| `Types.Int64()` | 64-bit signed integer | `INT64` | `bigint` or `number` from -9223372036854775808 to 9223372036854775807 |
| `Types.Uint8()` | 8-bit unsigned integer | `INT32` | `number` from 0 to 255 |
| `Types.Uint16()` | 16-bit unsigned integer | `INT32` | `number` from 0 to 65535 |
| `Types.Uint32()` | 32-bit unsigned integer | `INT32` | `number` from 0 to 4294967295 |
| `Types.Uint64()` | 64-bit unsigned integer | `INT64` | `bigint` or `number` from 0 to 18446744073709551615 |
| `Types.Float()` | 32-bit floating point number | `FLOAT` | non-`Nan` `number` |
| `Types.Double()` | 64-bit floating point number | `DOUBLE` | non-`Nan` `number` |
| `Types.Boolean()` | boolean value | `BOOLEAN` | `boolean` |
| `Types.Timestamp()` | UNIX timestamp in milliseconds | `INT64` | `Date` |
| `Types.Decimal` `(precision, scale=0)` | decimal with `precision` digits and `scale` digits to the right of the decimal point | `INT32` or `INT64` or `FIXED_LEN_` `BYTE_ARRAY` | `number` or `bigint` or [`BigDecimal`](https://github.com/subsquid/squid-sdk/tree/master/util/big-decimal) |
| `Types.List` `(itemType, {nullable=false})` | a list filled with optionally nullable items of `itemType` column type | - | `Array` of items satisfying `itemType` |
| `Types.JSON()` | JSON object of type `T` | `BYTE_ARRAY` | `Object` of type `T` |
| `Types.BSON()` | BSON object of type `T` | `BYTE_ARRAY` | `Object` of type `T` |
The widest decimals that [PyArrow](https://arrow.apache.org/docs/python/index.html) can read are `Types.Decimal(76)`.
The following column options are available:
```typescript theme={"system"}
ColumnOptions {
nullable?: boolean
compression?: Compression
encoding?: Encoding
}
```
See the [Encoding and Compression](#encoding-and-compression) section for details.
## `Table` Options
As its optional final argument, the constructor of `Table` accepts an object that defines table options:
```typescript theme={"system"}
TableOptions {
compression?: Compression
rowGroupSize?: number
pageSize?: number
}
```
Here,
* **`compression`** determines the file-wide compression algorithm. Per-column settings override this. See [Encoding and Compression](#encoding-and-compression) for the list of available algorithms. Default: `Compression.UNCOMPRESSED`.
* **`rowGroupSize`** determines the approximate uncompressed size of the row group in bytes. Default: `32 * 1024 * 1024`.
* **`pageSize`** determines the approximate uncompressed page size in bytes. Default: `8 * 1024`.
When `pageSize` is less than `rowGroupSize` times the number of columns, the latter setting will be ignored. In this case each row group will contain exactly one roughly `pageSize`d page for each column.
## Encoding and Compression
[Encodings](https://parquet.apache.org/docs/file-format/data-pages/encodings/) are set at a per-column basis. At the moment the default and the only supported value is `'PLAIN'`.
[Compression](https://github.com/apache/parquet-format/blob/master/Compression.md) can be set at a per-file or a per-column basis. Available values are
* `'UNCOMPRESSED'` (default)
* `'GZIP'`
* `'LZO'`
* `'BROTLI'`
* `'LZ4'`
## Example
This saves ERC20 `Transfer` events captured by the processor to a Parquet file. All columns except for `from` are `GZIP`ped. Row groups are set to be roughly 30000 bytes in size each. Each row group contains roughly ten \~1000 bytes-long pages per column. Full squid code is available in [this repo](https://github.com/subsquid-labs/file-store-parquet-example).
```typescript theme={"system"}
import {Database, LocalDest} from '@subsquid/file-store'
import {
Column,
Table,
Types
} from '@subsquid/file-store-parquet'
...
const dbOptions = {
tables: {
TransfersTable: new Table(
'transfers.parquet',
{
from: Column(
Types.String(),
{
compression: 'UNCOMPRESSED'
}
),
to: Column(Types.String()),
value: Column(Types.Uint64())
},
{
compression: 'GZIP',
rowGroupSize: 300000,
pageSize: 1000
}
)
},
dest: new LocalDest('./data'),
chunkSizeMb: 10
}
processor.run(new Database(dbOptions), async (ctx) => {
...
let from: string = ...
let to: string = ...
let value: bigint = ...
ctx.store.TransfersTable.write({ from, to, value })
...
})
```
# S3 support
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/store/file/s3-dest
Upload Squid SDK file-store output to S3-compatible buckets via the Dest class — Squid CSV, JSON, and Parquet streaming to AWS S3 or MinIO.
# S3 destination support
## Overview
Writing to Amazon S3-compatible file storage services such as [AWS](https://aws.amazon.com) and [Filebase](https://filebase.com) is supported via the `S3Dest` class from the `@subsquid/file-store-s3` package. Use it by [setting the `dest` field of the `Database` constructor argument](/en/sdk/squid-sdk/resources/persisting-data/file#database-options) to its instance. Constructor of `S3Dest` accepts the following arguments:
* **`url: string`**: S3 URL in the `s3://bucket/path` format.
* **`optionsOrClient?: S3Client | S3ClientConfig`**: an optional [S3 client](https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/Package/-aws-sdk-client-s3/Class/S3Client/) or [client config](https://docs.aws.amazon.com/AWSJavaScriptSDK/v3/latest/clients/client-s3/interfaces/s3clientconfig.html). By default, a simple config parameterized by environment variables is used:
```typescript theme={"system"}
{
region: process.env.S3_REGION,
endpoint: process.env.S3_ENDPOINT,
credentials: {
accessKeyId: assertNotNull(process.env.S3_ACCESS_KEY_ID),
secretAccessKey: assertNotNull(process.env.S3_SECRET_ACCESS_KEY),
},
}
```
## Example
This saves the processor data in the `transfers-data` folder of the `subsquid-testing-bucket` bucket at the [Filebase](https://filebase.com) service. The service only has one region and one endpoint, and here they are hardcoded to reduce the number of required envirionment variables and illustrate how connection parameters can be supplied programmatically. Full squid code is available in [this repo](https://github.com/subsquid-labs/file-store-s3-example).
```typescript theme={"system"}
import {Database} from '@subsquid/file-store'
import {S3Dest} from '@subsquid/file-store-s3'
import {assertNotNull} from '@subsquid/util-internal' // pulled by @subsquid/file-store-s3
...
const dbOptions = {
...
dest: new S3Dest(
's3://subsquid-testing-bucket/transfers-data',
{
region: 'us-east-1',
endpoint: 'https://s3.filebase.com',
credentials: {
accessKeyId: assertNotNull(process.env.S3_ACCESS_KEY_ID),
secretAccessKey: assertNotNull(process.env.S3_SECRET_ACCESS_KEY)
}
}
),
...
}
processor.run(new Database(dbOptions), async (ctx) => {
...
}
```
# typeorm-store
Source: https://docs.sqd.dev/en/sdk/squid-sdk/reference/store/typeorm
Reference for @subsquid/typeorm-store — the optimized TypeORM-based Postgres store that powers Squid SDK indexers with batched upserts and inserts.
# `@subsquid/typeorm-store`
This page describes the interface of the classes from the `@subsquid/typeorm-store` NPM package. If you're looking for a guide on saving squid data to databases and the related workflows, check out the [Saving to PostgreSQL](/en/sdk/squid-sdk/resources/persisting-data/typeorm) page.
## `TypeormDatabase` constructor arguments
The argument of the `TypeormDatabase` class constructor may have the following fields:
* `stateSchema: string`: the name of the [database schema](https://www.postgresql.org/docs/current/sql-createschema.html) that the processor uses to persist its status (hash + height of the highest reached block). Useful for making sure that each processor uses its own state schema when running multiple processors against the same database (e.g. in a multichain setting). Default: `'squid_processor'`.
* `isolationLevel: 'SERIALIZABLE' | 'READ COMMITTED' | 'REPEATABLE READ'`: sets the [transaction isolation level](https://www.postgresql.org/docs/current/transaction-iso.html) of processor transactions. Default: `'SERIALIZABLE'`.
* `supportHotBlocks: boolean`: controls the support for hot blocks. Necessary in all squids that must be able to handle short-lived [blockchain forks](https://en.wikipedia.org/wiki/Fork_\(blockchain\)). That includes all squids that index chain data in near-real time using RPC endpoints. Default: `true`.
* `projectDir: string`: the folder where `TypeormDatabase` will look for the TypeORM model definition (at `lib/model`) and for migrations (at `db/migrations`). Default: `process.cwd()`.
## `Store` interface
### Batch access methods
#### **`upsert(e: E | E[])`**
Upsert a single or multiple entities to the database. **Does not cascade the upsert to the relations.**
```ts theme={"system"}
await ctx.store.upsert([new User({id: 'Bob'}), new User({id: 'Alice'}))])
```
#### **`save(e: E | E[])`**
Deprecated alias for [`upsert()`](#upsert).
#### **`insert(e: E | E[])`**
Inserts a given entity or entities into the database. Does not check if the entity(s) exist in the database and will fail if a duplicate is inserted. Executes a primitive INSERT operation **without cascading to the relations**.
```ts theme={"system"}
await ctx.store.insert([new User({ id: "Bob" }), new User({ id: "Alice" })]);
```
#### **`remove(e: E | E[] | EntityClass, id?: string | string[])`**
Deletes a given entity or entities from the database. Accepts either an object or an entity ID(s). **Does not cascade the deletion**.
```ts theme={"system"}
await ctx.store.remove(User, ["Alice", "Bob"]);
```
### TypeORM methods
For details see [TypeORM EntityManager reference](https://typeorm.io/entity-manager-api).
#### **`get`**
Get an entity by ID.
```ts theme={"system"}
await ctx.store.get(User, "Bob");
```
#### **`count`**
Count the number of entities matching a where filter.
```ts theme={"system"}
await ctx.store.count(User, {
where: {
firstName: "Timber",
},
});
```
#### **`countBy`**
Count the number of entities matching a filter.
```ts theme={"system"}
await ctx.store.countBy(User, { firstName: "Timber" })
```
#### **`find`**
Return a list matching a where filter.
```ts theme={"system"}
await ctx.store.find(User, {
where: {
firstName: "Timber",
},
});
```
#### **`findBy`**
Return a list matching a filter.
```ts theme={"system"}
let accounts = await ctx.store.findBy(Account, {id: In([...accountIds])})
```
#### **`findOne`**
Return the first entity matching a where filter.
```ts theme={"system"}
const timber = await ctx.store.findOne(User, {
where: {
firstName: "Timber",
},
});
```
#### **`findOneBy`**
Return the first entity matching a filter.
```ts theme={"system"}
const timber = await ctx.store.findOneBy(User, { firstName: "Timber" })
```
#### **`findOneOrFail`**
Throws if nothing is found.
```ts theme={"system"}
const timber = await ctx.store.findOneOrFail(User, {
where: {
firstName: "Timber",
},
});
```
#### **`findOneByOrFail`**
Throws if nothing is found.
```ts theme={"system"}
const timber = await ctx.store.findOneByOrFail(User, { firstName: "Timber" })
```
### Find Operators
`find()` and `findXXX()` methods support the following operators:
* `In` (contains in array)
* `Not`
* `LessThan`
* `LessThanOrEqual`
* `MoreThan`
* `MoreThanOrEqual`
* `Like`
* `ILike`
* `Between`
* `Any`
* `IsNull`
* `Raw` (raw SQL fragments)
See the details and examples in the [TypeORM `FindOption` docs](https://typeorm.io/find-options#advanced-options).
#### Example
```ts theme={"system"}
let accounts = await ctx.store.findBy(Account, {id: In([...accountIds])})
```
### Joining relations
To load an entity with relations, use `relations` field on the `find` options and specify which relations should be joined:
```ts theme={"system"}
await ctx.store.find(User, {
relations: {
project: true,
},
where: {
project: {
name: "TypeORM",
initials: "TORM",
},
},
});
```
See the [TypeORM docs](https://typeorm.io/find-options) sections for details.
## Database connection parameters
Database credentials must be supplied via the environment variables:
* `DB_HOST` (default `localhost`)
* `DB_PORT` (default `5432`)
* `DB_NAME` (default `postgres`)
* `DB_USER` (default `postgres`)
* `DB_PASS` (default `postgres`)
* `DB_SSL` (default `false`)
* `DB_SSL_REJECT_UNAUTHORIZED` (default `true`)
* `DB_URL` (default `undefined`, see the [DB\_URL section](#db_url))
When deploying to [Cloud](/en/cloud) with the [Postgres addon](/en/cloud/reference/pg) enabled in the [manifest](/en/cloud/reference/manifest), any user-supplied values are overwritten for most of these variables. See [Variable shadowing](/en/cloud/reference/pg#variable-shadowing).
`typorm-store` also supports the following variables for connecting to databases that require client-side SSL:
* `DB_SSL_CA` - the root certificate in plain text
* `DB_SSL_CA_FILE` - path to a root certificate file
* `DB_SSL_CERT` - client certificate in plain text
* `DB_SSL_CERT_FILE` - path to client certificate in plain text
* `DB_SSL_KEY` - client key in plain text
* `DB_SSL_KEY_FILE` - path to client key in plain text
In case you're deploying to [Cloud](/en/cloud) you can set [secrets](/en/cloud/resources/env-variables#secrets) to the contents of any given file via stdin:
```bash theme={"system"}
sqd secrets set DB_SSL_CA < ca.crt
```
### `DB_URL`
When set, `DB_URL` takes precedence over all individual variables. Its format is as follows:
```
postgres[ql]://[username[:password]@][host[:port]]/database[?parameter_list]
```
where `parameter_list` is an `&`-separated list of assignments of SSL connection parameters:
* `ssl=(0|1|true|false)`
* `sslmode=(disabled|no-verify|prefer|require|verify-ca|verify-full)`
* `sslcert=`
* `sslkey=`
* `sslrootcert=`
When any value is omitted from the URL, the value of the corresponding individual `DB_*` variable will be used instead. If that is not set, the default will be used.
# sqd auth
Source: https://docs.sqd.dev/en/sdk/squid-sdk/squid-cli/auth
Log in to SQD Cloud with the sqd auth CLI command — exchange your account credentials for an API token to authenticate sqd deploy and other commands.
Log in to the Cloud
* [`sqd auth`](#sqd-auth-1)
## `sqd auth`
Log in to the Cloud
```
USAGE
$ sqd auth -k [--interactive]
FLAGS
-k, --key= (required) Cloud auth key. Log in to https://app.subsquid.io to create or update your key.
--[no-]interactive Disable interactive mode
```
*See code: [src/commands/auth.ts](https://github.com/subsquid/squid-cli/tree/master/src/commands/auth.ts)*
# sqd autocomplete
Source: https://docs.sqd.dev/en/sdk/squid-sdk/squid-cli/autocomplete
Display sqd CLI shell autocomplete installation instructions — supported shells (bash, zsh, fish, PowerShell) and how to enable command completion.
Display autocomplete installation instructions.
# commands.json
Source: https://docs.sqd.dev/en/sdk/squid-sdk/squid-cli/commands-json
Define custom sqd subcommands and aliases via commands.json in your Squid SDK project — extend the workflow with reusable scripts for build and deploy.
The `sqd` tool automatically discovers and loads any extra commands defined in the `commands.json` file. Here is a sample file demonstrating the available features:
```json theme={"system"}
{ // comments are ok
"$schema": "https://subsquid.io/schemas/commands.json",
"commands": {
"clean": {
"description": "delete all build artifacts",
"cmd": ["rm", "-rf", "lib"]
},
"build": {
"description": "build the project",
"deps": ["clean"], // commands to execute before
"cmd": ["tsc"]
},
"typegen": {
"hidden": true, // Don't show in the overview listing
"workdir": "abi", // change working dir
"command": [
"squid-evm-typegen", // node_modules/.bin is in the PATH
"../src/abi",
{"glob": "*.json"} // cross-platform glob expansion
],
"env": { // additional environment variables
"DEBUG": "*"
}
}
}
}
```
This functionality is managed by the [`@subsquid/commands`](https://github.com/subsquid/squid-sdk/tree/master/util/commands) package.
All [squid templates](/en/sdk/squid-sdk/how-to-start/squid-development#templates) include such a file with a predefined set of useful shortcuts. See [Cheatsheet](/en/sdk/squid-sdk/how-to-start/cli-cheatsheet).
# sqd deploy
Source: https://docs.sqd.dev/en/sdk/squid-sdk/squid-cli/deploy
Deploy new or update an existing squid deployment in the Cloud. Squid name and also optionally slot and/or tag are taken from the provided deployment manifest.
Deploy new or update an existing squid deployment in the Cloud. Squid name and also optionally slot and/or tag are taken from the provided deployment manifest.
* [`sqd deploy SOURCE`](#sqd-deploy-source)
## `sqd deploy SOURCE`
Deploy new or update an existing squid in the Cloud
```
USAGE
$ sqd deploy SOURCE [--interactive]
[-r [/](@|:) | -o | -n | [-s ] | [-t ]]
[-m ] [--hard-reset] [--stream-logs] [--add-tag ]
[--allow-update] [--allow-tag-reassign] [--allow-manifest-override]
ARGUMENTS
SOURCE [default: .] Squid source. Could be:
- a relative or absolute path to a local folder (e.g. ".")
- a URL to a .tar.gz archive
- a github URL to a git repo with a branch or commit tag
FLAGS
-m, --manifest= [default: squid.yaml] Specify the relative local path
to a squid manifest file in the squid working directory
--add-tag= Add a tag to the deployed squid
--allow-manifest-override Allow overriding the manifest during deployment
--allow-tag-reassign Allow reassigning an existing tag
--allow-update Allow updating an existing squid
--hard-reset Perform a hard reset before deploying. This will drop
and re-create all squid resources, including the
database, causing a short API downtime
--[no-]interactive Disable interactive mode
--[no-]stream-logs Attach and stream squid logs after the deployment
SQUID FLAGS
-n, --name= Name of the squid
-r, --reference=[/](@|:) Fully qualified reference of the squid.
It can include the organization, name,
slot, or tag
-s, --slot= Slot of the squid
-t, --tag= Tag of the squid
ORG FLAGS
-o, --org= Code of the organization
DESCRIPTION
Deploy new or update an existing squid in the Cloud
EXAMPLES
// Create a new squid with name provided in the manifest file
$ sqd deploy .
// Create a new squid deployment and override it's name to "my-squid-override"
$ sqd deploy . -n my-squid-override
// Update the "my-squid" squid with slot "asmzf5"
$ sqd deploy . -n my-squid -s asmzf5
// Use a manifest file located in ./path-to-the-squid/squid.prod.yaml
$ sqd deploy ./path-to-the-squid -m squid.prod.yaml
// Full paths are also fine
$ sqd deploy /Users/dev/path-to-the-squid -m /Users/dev/path-to-the-squid/squid.prod.yaml
```
*See code: [src/commands/deploy.ts](https://github.com/subsquid/squid-cli/tree/master/src/commands/deploy.ts)*
# sqd gateways
Source: https://docs.sqd.dev/en/sdk/squid-sdk/squid-cli/gateways
Explore SQD Network gateways and data sources for a squid with the sqd gateways CLI command — list endpoints, statuses, and chains supported.
Explore data sources for a squid
* [`sqd gateways list`](#sqd-gateways-list)
## `sqd gateways list`
List available gateways
```
USAGE
$ sqd gateways list [--interactive] [-t ] [-n ] [-c ]
FLAGS
-c, --chain= Filter by chain ID or SS58 prefix
-n, --name= Filter by network name
-t, --type= Filter by network type
--[no-]interactive Disable interactive mode
ALIASES
$ sqd gateways ls
```
*See code: [src/commands/gateways/ls.ts](https://github.com/subsquid/squid-cli/blob/master/src/commands/gateways/ls.ts)*
# sqd init
Source: https://docs.sqd.dev/en/sdk/squid-sdk/squid-cli/init
Initialize a new Squid SDK project from a template or GitHub repo with sqd init — choose EVM, Substrate, Solana, Tron, or Fuel starters and scaffold files.
Setup a new squid project from a template or github repo
* [`sqd init NAME`](#sqd-init-name)
## `sqd init NAME`
Setup a new squid project from a template or github repo
```
USAGE
$ sqd init NAME [--interactive] [-t ] [-d ] [-r]
ARGUMENTS
NAME The squid name. It must contain only alphanumeric or dash ("-") symbols and must not start with "-".
FLAGS
-d, --dir=
The target location for the squid. If omitted, a new folder NAME is created.
-r, --remove
Clean up the target directory if it exists
-t, --template=
A template for the squid. Accepts:
- a github repository URL containing a valid squid.yaml manifest in the root folder
or one of the pre-defined aliases:
- evm A minimal squid template for indexing EVM data.
- abi A template to auto-generate a squid indexing events and txs from a contract ABI
- multichain A template for indexing data from multiple chains
- gravatar A sample EVM squid indexing the Gravatar smart contract on Ethereum.
- substrate A template squid for indexing Substrate-based chains.
- ink A template for indexing Ink! smart contracts
- ink-abi A template to auto-generate a squid from an ink! contract ABI
- frontier-evm A template for indexing Frontier EVM chains, like Moonbeam and Astar.
--[no-]interactive
Disable interactive mode
```
*See code: [src/commands/init.ts](https://github.com/subsquid/squid-cli/tree/master/src/commands/init.ts)*
# Installation
Source: https://docs.sqd.dev/en/sdk/squid-sdk/squid-cli/installation
Install the sqd CLI in your local environment — npm install -g, standalone binaries for macOS/Linux/Windows, and shell autocomplete configuration.
Squid CLI is a command line tool for
* scaffolding new squids from templates
* running SDK tools and scripts defined in `commands.json` in a cross-platform way
* managing squid deployments in [SQD Cloud](/en/cloud) (former Aquarium)
The CLI is distributed as a [`npm` package](https://www.npmjs.com/package/@subsquid/cli).
To install Squid CLI, follow the steps below.
First, install the latest version of Squid CLI as a global `npm` package:
```bash theme={"system"}
npm i -g @subsquid/cli@latest
```
Check the version:
```bash theme={"system"}
sqd --version
```
Make sure the output looks like `@subsquid/cli@`.
The next steps are **optional** for building and running squids. A key is required to enable the CLI commands managing the [SQD Cloud](/en/cloud) deployments.
Sign in to [Cloud](https://app.subsquid.io/), and obtain (or refresh) the deployment key page by clicking at the profile picture > "Deployment key":
Open a terminal window and run
```bash theme={"system"}
sqd auth -k
```
Use `sqd --help` to get a list of the available command and `sqd --help` to get help on the available options for a specific command, e.g.
```bash theme={"system"}
sqd deploy --help
```
# sqd list
Source: https://docs.sqd.dev/en/sdk/squid-sdk/squid-cli/list
List squids deployed to SQD Cloud with the sqd ls command — running, stopped, and archived deployments per organization with their slots and tags.
List squids deployed to the Cloud
* [`sqd list`](#sqd-list-1)
## `sqd list`
List squids deployed to the Cloud
```
USAGE
$ sqd list [--interactive] [--truncate]
[-r [/](@|:) | -o | -n | [-s ] | [-t ]]
FLAGS
--[no-]interactive Disable interactive mode
--[no-]truncate Truncate data in columns: false by default
SQUID FLAGS
-n, --name= Name of the squid
-r, --reference=[/](@|:) Fully qualified reference
of the squid. It can include
the organization, name, slot,
or tag
-s, --slot= Slot of the squid
-t, --tag= Tag of the squid
ORG FLAGS
-o, --org= Code of the organization
ALIASES
$ sqd ls
```
*See code: [src/commands/ls.ts](https://github.com/subsquid/squid-cli/tree/master/src/commands/ls.ts)*
# sqd logs
Source: https://docs.sqd.dev/en/sdk/squid-sdk/squid-cli/logs
Fetch live and historical logs from a squid deployed to SQD Cloud with the sqd logs command — filter by container, time range, slot, and tag.
Fetch logs from a squid deployed to the Cloud
* [`sqd logs`](#sqd-logs-1)
## `sqd logs`
Fetch logs from a squid deployed to the Cloud
```
USAGE
$ sqd logs [--interactive] [--since ] [--search ] [-f | -p ]
[-r [/](@|:) | -o | [-s -n ] | [-t ]]
[-c processor|query-node|api|db-migrate|db...]
[-l error|debug|info|warning...] [--since ]
FLAGS
-c, --container=