Use BigQuery Storage API from R.
The main utility is to replace
bigrquery::bq_table_download
method.
It supports BigQueryRead interface. Support for BigQueryWrite interface may be added in a future release.
BigQuery Storage API is not rate limited and per project quota do not apply. It is an rpc protocol and provides faster downloads for big results sets.
This implementation use a C++ generated client combined with the
arrow
R package to transform the raw stream into an R
object.
bqs_table_download
is the main function of this package.
Other functions are helpers to facilitate authentication and
debugging.
install.packages("bigrquerystorage")
::install_github("meztez/bigrquerystorage") remotes
# install protoc and grpc
apt-get install -y libgrpc++-dev libprotobuf-dev protobuf-compiler-grpc \
pkg-config
# install grpc, protoc is automatically installed
dnf install -y grpc-devel pkgconf
Please let us know if these instructions do not work any more.
apk add grpc-dev protobuf-dev re2-dev c-ares-dev
Alpine Linux 3.19 and Edge do not work currently, because the installation of the arrow package fails.
Needs the buster-backports repository.
echo "deb https://deb.debian.org/debian buster-backports main" >> \
&& \
/etc/apt/sources.list.d/backports.list apt-get update && \
apt-get install -y 'libgrpc\+\+-dev'/buster-backports \
\
protobuf-compiler-grpc/buster-backports \
libprotobuf-dev/buster-backports protobuf-compiler/buster-backports pkg-config
In OpenSUSE 15.4 and 15.5 the version of the grpc package is too old, so installation fails. You can potentially compile a newer version of grpc from source.
In Ubuntu 20.04 the version of the grpc package is too old, so installation fails. You can potentially compile a newer version of grpc from source.
These distros do not have a grpc package. You can potentially compile grpc from source.
If you use Homebrew you may install the grpc
package,
plus pkg-config
. If you don’t have Homebrew installed, the
package will download static builds of the system dependencies during
installation. This works with macOS Big Sur, or later, on Intel and
Arm64 machines.
brew install grpc pkg-config
From Rtools43, grpc is included in the toolchain.
The package used to automatically download static builds of the system requirements during installation but this was removed per CRAN policy. Only, R 4.3.x (with Rtools43) or later is currently supported.
This is a basic example which shows you how to solve a common problem. BigQuery Storage API requires a billing project.
# Auth is done automagically using Application Default Credentials.
# or reusing bigrquery auth.
# Use the following command once to set it up :
# gcloud auth application-default login --billing-project={project}
library(bigrquery)
library(bigrquerystorage)
# TODO: (developer): Set the project_id variable to your billing project.
# The read session will bill this project. This project can be
# different from the one that contains the table.
<- 'your-project-id'
project_id
<- bqs_table_download(
rows x = "bigquery-public-data:usa_names.usa_1910_current",
parent = project_id
# , snapshot_time = Sys.time() # a POSIXct time
selected_fields = c("name", "number", "state"),
, row_restriction = 'state = "WA"'
# , sample_percentage = 50
)
sprintf(
"Got %d unique names in states: %s",
length(unique(rows$name)),
paste(unique(rows$state), collapse = " ")
)
Done using Google Application Default Credentials (ADC) or by
recycling bigrquery
authentication. Auth will be done
automatically the first time a request is made.
bqs_auth()
bqs_deauth()
Does not support AVRO output format. Report any issues to the project issue tracker.
Full gRPC debug trace with
bigrquerystorage:::bqs_set_log_verbosity(0)
.