Skip to main content

BigQuery support

Squids can store their data to BigQuery datasets using the @subsquid/bigquery-store package. Define and use the Database object as follows:

src/main.ts
import {
Column,
Table,
Types,
Database
} from '@subsquid/bigquery-store'
import {BigQuery} from '@google-cloud/bigquery'

const db = new Database({
bq: new BigQuery(),
dataset: 'subsquid-datasets.test_dataset',
tables: {
TransfersTable: new Table(
'transfers',
{
from: Column(Types.String()),
to: Column(Types.String()),
value: Column(Types.BigNumeric(38))
}
)
}
})

processor.run(db, async ctx => {
// ...
let from: string = ...
let to: string = ...
let value: bigint | number = ...
ctx.store.TransfersTable.insert({from, to, value})
})

Here,

  • bq is a BigQuery instance. When created without arguments like this it'll look at the GOOGLE_APPLICATION_CREDENTIALS environment variable for a path to a JSON with authentication details.
  • dataset is the path to the target dataset.
warning

The dataset must be created prior to running the processor.

  • tables lists the tables that will be created and populated within the dataset. For every field of the tables object an eponymous field of the ctx.store object will be created; calling insert() or insertMany() on such a field will result in data being queued for writing to the corresponding dataset table. The actual writing will be done at the end of the batch in a single transaction, ensuring dataset integrity.

Tables are made out of statically typed columns. Available types are listed on the reference page.

Deploying to SQD Cloud

We discourage uploading any sensitive data with squid code when deploying to SQD Cloud. To pass your credentials JSON to your squid, create a Cloud secret variable populated with its contents:

sqd secrets set GAC_JSON_FILE < creds.json

Then in src/main.ts write the contents to a file:

import fs from 'fs'

fs.writeFileSync('creds.json', process.env.GAC_JSON_FILE || '')

Set the GOOGLE_APPLICATION_CREDENTIALS variable and request the secret in the deployment manifest:

squid.yaml
deploy:
processor:
env:
GAC_JSON_FILE: ${{ secrets.GAC_JSON_FILE }}
GOOGLE_APPLICATION_CREDENTIALS: creds.json

Examples

An end-to-end example geared towards local runs can be found in this repo. Look at this branch for an example of a squid made for deployment to SQD Cloud.

Troubleshooting

Transaction is aborted due to concurrent update

This means that your project has an open session that is updating some of the tables used by the squid.

Most commonly, the session is left by a squid itself after an unclean termination. You have two options:

  1. If you are not sure if your squid is the only app that uses sessions to access your BigQuery project, find the faulty session manually and terminate it. See Get a list of your active sessions and Terminate a session by ID.

  2. DANGEROUS If you are absolutely certain that the squid is the only app that uses sessions to access your BigQuery project, you can terminate all the dangling sessions by running

    FOR session in (
    SELECT
    session_id,
    MAX(creation_time) AS last_modified_time,
    FROM `region-us`.INFORMATION_SCHEMA.SESSIONS_BY_PROJECT
    WHERE
    session_id IS NOT NULL
    AND is_active
    GROUP BY session_id
    ORDER BY last_modified_time DESC
    )
    DO
    CALL BQ.ABORT_SESSION(session.session_id);
    END FOR;

    Replace region-us with your dataset's region in the code above.

    You can also enable abortAllProjectSessionsOnStartup and supply datasetRegion in your database config to perform this operation at startup:

    const db = new Database({
    // ...
    abortAllProjectSessionsOnStartup: true,
    datasetRegion: 'region-us'.
    })

    This method will cause data loss if, at the moment when the squid starts, some other app happens to be writing data anywhere in the project using the sessions mechanism.