Getting to Know Cloud Bigtable

Building and operationalizing storage systems.

Jan 26, 2023

Cloud Bigtable is a wide-column NoSQL database designed for massive databases that operate on minimal latency.

Bigtable is a loosely filled table that can grow to billions of rows and thousands of columns, allowing you to store petabytes of data. Cloud Bigtable is ideal for time series, AdTech, graphs, gaming, fintech, and IoT data.

Setting Up Cloud Bigtable

Bigtable is a managed service, but it is not a serverless service. Thus, you have to specify instance information and storage types.

A cluster ID, region, zone, and the number of nodes must be provided when creating a cluster. Bigtable's performance directly increases with the number of nodes. A five-node cluster, for illustration, can read and write 50,000 rows per second with SSDs. The performance of read and write operations is increased to 100,000 rows per second by raising the number of nodes to ten.

Users may configure clusters in a different region for multi-regional high availability. Every piece of data is duplicated between clusters. If a cluster goes down, it can fail over to another cluster either manually or automatically. To use replication in a Bigtable instance, you must configure the instance with more than one cluster.

Application profiles, also known as app profiles, are used by Bigtable instances to specify how to respond to routing requests and if single-row transactions are allowed. You will have to manually execute a failover if you configure a single cluster to route all requests. An automated failover applies if the app profile implements multicluster routing.

When migrating from Hadoop or HBase, you can leverage Cloud Bigtable's ability to communicate via the Cloud Bigtable HBase client for Java. You may also use the cbt command-line utility to interact with Bigtable.

You may use BigQuery to access Bigtable tables. While BigQuery does not store the data; instead, Bigtable data is accessible from a table outside of BigQuery. As a result, you can keep regularly updated data, such as IoT data, in Bigtable and avoid exporting it to BigQuery for analytical operations.

Schema Design considerations for Bigtable

Creating tables for Bigtable differs significantly from creating them for relational databases. Bigtable tables have thousands of columns and are denormalised.

The use case, data access patterns, and the data you intend to store all significantly impact the optimal designs for a Cloud Bigtable table. A schema in Bigtable may cover the following table features: Row keys, Column and Column families. Note that Schemas should be generally optimised for queries intended by users.

Bigtable doesn't support joins or secondary indexes. Instead, row-key, the only indexed column in a Bigtable table, is used to store data lexicographically. To better read performance, store similar entities in adjacent rows.

At the row level, all operations are atomic (affect either the entire row or none of it). This might lead to inconsistency in the database if not taken into consideration. To overcome this, collate the data that changes per operation in a single row.

Bigtable doesn't have any secondary index. Hence queries are processed using the row key, the row key prefix and a range of rows indicated by the opening and closing row keys. Other forms of queries result in a full table scan, which is far less efficient. So choose the right row key from the start to avoid a time-consuming data migration.

The following factors make up a good row-key:

Define your row key in line with the queries you'll use to extract data.
Keep your row keys short, as long row keys take up additional memory and reduce performance.
Separated values by delimiters in each row key.
Compose a row key that lets you access a specific range of rows.
Integers should have leading zeros to proper lexicographic sorting.

Import and Export in BigTable

Cloud Dataflow is used for Bigtable import and export operations.

Data may be exported from Bigtable to Cloud Storage using dataflow templates and stored in Avro, SequenceFile or Parquet format. Users can also import from Cloud Storage to Bigtable in Avro, SequenceFile or Parquet format.

Users can also migrate data from another database to Bigtable, export data from bigtable’s table page and import a CSV file into Bigtable.

Note: If you are thinking only about data loss, you can generate and restore bigtable backups.

Data Engineering with Google Cloud Platform

Discussion about this post