Introduction to Datastream for BigQuery

  Рет қаралды 23,328

Google Cloud Tech

Google Cloud Tech

Күн бұрын

Пікірлер: 35
@googlecloudtech
@googlecloudtech 2 жыл бұрын
What do you think about Datastream for your data capture and replication needs? Let us know in the comments below and subscribe for more Google Cloud tips and tricks → goo.gle/GoogleCloudTech
@Farshad-Ghorbani
@Farshad-Ghorbani 2 жыл бұрын
Is there any solution for partitioning data with datastream?
@yunusarif6963
@yunusarif6963 2 жыл бұрын
I managed to play around with Datastream with BigQuery as destination. The problem with this approach is that the tables created are not partitioned. For those of as who do incremental load from our BigQuery replica to our reports, will always have to scan the whole table which comes with a cost, compared to scanning and querying only new data in the BigQuery replica
@terminalrecluse
@terminalrecluse 2 жыл бұрын
Perhaps cli or programmatic access (not shown in the UI) will allow for specifying a partition key
@Farshad-Ghorbani
@Farshad-Ghorbani 2 жыл бұрын
​@@terminalrecluse I checked the CLI as well, but I couldn’t find it there. Is there any solution for partitioning data? @googlecloudtech
@etaimargolin8449
@etaimargolin8449 2 жыл бұрын
Currently partitions aren't leveraged for optimizing the size of the table scan. We are looking into implementing this as a future improvement.
@dexterkarinyo9144
@dexterkarinyo9144 7 ай бұрын
Hey! on 2:31, do you have example on how did you create or set up that connection? Thanks.
@ShahnewazKhan
@ShahnewazKhan 2 жыл бұрын
When will postgres cloudsql datastream be available?
@david7482
@david7482 2 жыл бұрын
it’s also available now 🎉 cloud.google.com/datastream/docs/sources-postgresql
@felipe.veloso
@felipe.veloso 2 жыл бұрын
Wowowoqoqoqo great news!!!
@vitamin2732
@vitamin2732 2 жыл бұрын
if it works as described, it is really COOOOOOL. thanks a lot. @Gabe Weiss, some questions: 1. any limitations for Datastream for BigQuery? 2. I am using Cloud SQL so it would be great to have a tutorial for this combination. 3. Looks like AlloyDB competitor, isnt it?) what are the core differences? (I am thinking about AlloyDB in new verison of our project to avoid streaming analytic data to Bigquery)
@FuyangLiu
@FuyangLiu Жыл бұрын
Is there a way to let CloudSQL IAM user (or a ServiceAccount User) be accepted as a way to connection to the CloudSQL db?
@gabeweiss
@gabeweiss Жыл бұрын
Not currently sadly, no. BUT, it's something we're thinking about. No promises on timeline, but it's definitely something we're working on adding.
@felipe.veloso
@felipe.veloso 2 жыл бұрын
Its already en preview for Postgres??? 😮😮
@darshanparmar7961
@darshanparmar7961 2 жыл бұрын
what if there are records updated/deleted in source system(mysql) does it also perform update/delete in bigquery or it works in append only mode.
@etaimargolin8449
@etaimargolin8449 2 жыл бұрын
Datastream replicates all UPDATE / INSERT / DELETE operations to the destinaton. Support for append-only mode is planned for a future release.
@danielcarter9666
@danielcarter9666 2 жыл бұрын
Can I stream a subset of columns from my source? the cli help (gcloud datastream streams create --help)suggests yes, but when i specify mysql_columns in the suggested format gcloud errors out with ERROR: gcloud crashed (ValidationError): Expected type for field column, found {} (type )
@etaimargolin8449
@etaimargolin8449 2 жыл бұрын
Yes, a subset of columns is supported, in both the UI and API. There was a bug in gcloud around this capability, it should be fixed now.
@danielcarter9666
@danielcarter9666 2 жыл бұрын
Through the GUI, when selecting the source objects to replication i can use wildcards such as "*.mytable". How do i do this with the CLI? When i describe a stream created through the GUI (gcloud datastream streams describe) the database field is simply missing, but when i try to create a new stream using the same format gcloud bombs out with "ERROR: gcloud crashed (ParseError): Cannot parse YAML: missing key "database"."
@etaimargolin8449
@etaimargolin8449 2 жыл бұрын
Yes, this is supported - you need to specify an empty database key ( "database": "" )
@abhinavtripathi970
@abhinavtripathi970 Жыл бұрын
will it accept Schema changes ?
@etaimargolin8449
@etaimargolin8449 Жыл бұрын
Yes, many schema changes are automatically detected and supported, but some changes might not be detected automatically and may result in events from that table being dropped. In this case, Datastream will report the reason for the event(s) being dropped, and any missing data can be recovered using a backfill of the table.
@analyticshub499
@analyticshub499 2 жыл бұрын
can MariaDB be used instead of MySQL as source to stream data to bigquery?
@gabeweiss
@gabeweiss 2 жыл бұрын
Yes it can! See here for supported versions of MySQL supported: cloud.google.com/datastream/docs/faq#behavior-and-limitations
@HoaTran-rp4kf
@HoaTran-rp4kf Жыл бұрын
What happens if I accidentally delete destination table in BigQuery? How can I restore the table?
@etaimargolin8449
@etaimargolin8449 Жыл бұрын
Datastream will recreate the table automatically, and the data that was deleted can be recovered by trigerring a backfill from the source.
@HoaTran-rp4kf
@HoaTran-rp4kf Жыл бұрын
Hi@@etaimargolin8449 , I found out that some rows were duplicated in the destination table in BigQuery. I cannot delete any rows of the table. How can I solve it?
@apvyas80
@apvyas80 2 жыл бұрын
Does it support Customer managed encryption keys ?
@etaimargolin8449
@etaimargolin8449 2 жыл бұрын
Datastream supports CMEK for data stored at rest. Support for CMEK on data loaded to BigQuery will be added shortly.
@Ng6666yt
@Ng6666yt 2 жыл бұрын
Why
@rameshyadav1723
@rameshyadav1723 2 жыл бұрын
Can this feature be used to load from Bigquery to CloudSQL(Postgres) and have realtime streaming for operational purposes. @googlecloudtech
@vitamin2732
@vitamin2732 2 жыл бұрын
why do you need it?
@rameshyadav1723
@rameshyadav1723 2 жыл бұрын
@@vitamin2732 There is already established process where outputs are stored in BQ, now we needed to send outputs to CloudSQL for API consumption. We need both outputs one which is stored in BQ for analytical reporting and other one for realtime usage through API. Hope it make sense So wondering how to get realtime streaming from BQ to CloudSQL tables, having automatic CDC feature
@vitamin2732
@vitamin2732 Жыл бұрын
@@rameshyadav1723 it looks like wrong architecture.... normally you need to stream from cloud SQL to BQ
@ariefhalim5425
@ariefhalim5425 Жыл бұрын
@@rameshyadav1723 I think Datastream latency is too slow to be used for realtime transaction API
Strategies for optimizing your BigQuery queries
7:10
Google Cloud Tech
Рет қаралды 64 М.
Load data from CloudSQL to BigQuery
18:46
TechTrapture
Рет қаралды 4,2 М.
Don’t Choose The Wrong Box 😱
00:41
Topper Guild
Рет қаралды 62 МЛН
Сестра обхитрила!
00:17
Victoria Portfolio
Рет қаралды 958 М.
Sigma Kid Mistake #funny #sigma
00:17
CRAZY GREAPA
Рет қаралды 30 МЛН
How does BigQuery store data?
8:20
Google Cloud Tech
Рет қаралды 51 М.
How to Import Data From Google Sheets to BigQuery
13:35
datadice
Рет қаралды 123
Building Change Data Capture (CDC) in .NET with Debezium + RabbitMQ
21:39
What is Dataflow Prime?
9:31
Google Cloud Tech
Рет қаралды 9 М.
Near real-time CDC using DataStream
32:20
PracticalGCP
Рет қаралды 7 М.
Load Data from GCS to BigQuery using Dataflow
15:58
TechTrapture
Рет қаралды 30 М.
Learn 80% of NotebookLM in Under 13 Minutes!
12:36
Jeff Su
Рет қаралды 83 М.
I Scraped the Entire Steam Catalog, Here’s the Data
11:29
Newbie Indie Game Dev
Рет қаралды 518 М.
How to migrate a data warehouse to BigQuery
7:53
Google Cloud Tech
Рет қаралды 22 М.
Don’t Choose The Wrong Box 😱
00:41
Topper Guild
Рет қаралды 62 МЛН