No video

Introduction to Datastream for BigQuery

  Рет қаралды 21,578

Google Cloud Tech

Google Cloud Tech

Күн бұрын

Пікірлер: 35
@googlecloudtech
@googlecloudtech Жыл бұрын
What do you think about Datastream for your data capture and replication needs? Let us know in the comments below and subscribe for more Google Cloud tips and tricks → goo.gle/GoogleCloudTech
@Farshad-Ghorbani
@Farshad-Ghorbani Жыл бұрын
Is there any solution for partitioning data with datastream?
@yunusarif6963
@yunusarif6963 Жыл бұрын
I managed to play around with Datastream with BigQuery as destination. The problem with this approach is that the tables created are not partitioned. For those of as who do incremental load from our BigQuery replica to our reports, will always have to scan the whole table which comes with a cost, compared to scanning and querying only new data in the BigQuery replica
@terminalrecluse
@terminalrecluse Жыл бұрын
Perhaps cli or programmatic access (not shown in the UI) will allow for specifying a partition key
@Farshad-Ghorbani
@Farshad-Ghorbani Жыл бұрын
​@@terminalrecluse I checked the CLI as well, but I couldn’t find it there. Is there any solution for partitioning data? @googlecloudtech
@etaimargolin8449
@etaimargolin8449 Жыл бұрын
Currently partitions aren't leveraged for optimizing the size of the table scan. We are looking into implementing this as a future improvement.
@vitamin2732
@vitamin2732 Жыл бұрын
if it works as described, it is really COOOOOOL. thanks a lot. @Gabe Weiss, some questions: 1. any limitations for Datastream for BigQuery? 2. I am using Cloud SQL so it would be great to have a tutorial for this combination. 3. Looks like AlloyDB competitor, isnt it?) what are the core differences? (I am thinking about AlloyDB in new verison of our project to avoid streaming analytic data to Bigquery)
@dexterkarinyo9144
@dexterkarinyo9144 3 ай бұрын
Hey! on 2:31, do you have example on how did you create or set up that connection? Thanks.
@felipe.veloso
@felipe.veloso Жыл бұрын
Its already en preview for Postgres??? 😮😮
@ShahnewazKhan
@ShahnewazKhan Жыл бұрын
When will postgres cloudsql datastream be available?
@david7482
@david7482 Жыл бұрын
it’s also available now 🎉 cloud.google.com/datastream/docs/sources-postgresql
@felipe.veloso
@felipe.veloso Жыл бұрын
Wowowoqoqoqo great news!!!
@FuyangLiu
@FuyangLiu Жыл бұрын
Is there a way to let CloudSQL IAM user (or a ServiceAccount User) be accepted as a way to connection to the CloudSQL db?
@gabeweiss
@gabeweiss Жыл бұрын
Not currently sadly, no. BUT, it's something we're thinking about. No promises on timeline, but it's definitely something we're working on adding.
@HoaTran-rp4kf
@HoaTran-rp4kf Жыл бұрын
What happens if I accidentally delete destination table in BigQuery? How can I restore the table?
@etaimargolin8449
@etaimargolin8449 11 ай бұрын
Datastream will recreate the table automatically, and the data that was deleted can be recovered by trigerring a backfill from the source.
@HoaTran-rp4kf
@HoaTran-rp4kf 10 ай бұрын
Hi@@etaimargolin8449 , I found out that some rows were duplicated in the destination table in BigQuery. I cannot delete any rows of the table. How can I solve it?
@danielcarter9666
@danielcarter9666 Жыл бұрын
Can I stream a subset of columns from my source? the cli help (gcloud datastream streams create --help)suggests yes, but when i specify mysql_columns in the suggested format gcloud errors out with ERROR: gcloud crashed (ValidationError): Expected type for field column, found {} (type )
@etaimargolin8449
@etaimargolin8449 Жыл бұрын
Yes, a subset of columns is supported, in both the UI and API. There was a bug in gcloud around this capability, it should be fixed now.
@darshanparmar7961
@darshanparmar7961 Жыл бұрын
what if there are records updated/deleted in source system(mysql) does it also perform update/delete in bigquery or it works in append only mode.
@etaimargolin8449
@etaimargolin8449 Жыл бұрын
Datastream replicates all UPDATE / INSERT / DELETE operations to the destinaton. Support for append-only mode is planned for a future release.
@apvyas80
@apvyas80 Жыл бұрын
Does it support Customer managed encryption keys ?
@etaimargolin8449
@etaimargolin8449 Жыл бұрын
Datastream supports CMEK for data stored at rest. Support for CMEK on data loaded to BigQuery will be added shortly.
@analyticshub499
@analyticshub499 Жыл бұрын
can MariaDB be used instead of MySQL as source to stream data to bigquery?
@gabeweiss
@gabeweiss Жыл бұрын
Yes it can! See here for supported versions of MySQL supported: cloud.google.com/datastream/docs/faq#behavior-and-limitations
@abhinavtripathi970
@abhinavtripathi970 Жыл бұрын
will it accept Schema changes ?
@etaimargolin8449
@etaimargolin8449 11 ай бұрын
Yes, many schema changes are automatically detected and supported, but some changes might not be detected automatically and may result in events from that table being dropped. In this case, Datastream will report the reason for the event(s) being dropped, and any missing data can be recovered using a backfill of the table.
@Ng6666yt
@Ng6666yt Жыл бұрын
Why
@danielcarter9666
@danielcarter9666 Жыл бұрын
Through the GUI, when selecting the source objects to replication i can use wildcards such as "*.mytable". How do i do this with the CLI? When i describe a stream created through the GUI (gcloud datastream streams describe) the database field is simply missing, but when i try to create a new stream using the same format gcloud bombs out with "ERROR: gcloud crashed (ParseError): Cannot parse YAML: missing key "database"."
@etaimargolin8449
@etaimargolin8449 Жыл бұрын
Yes, this is supported - you need to specify an empty database key ( "database": "" )
@rameshyadav1723
@rameshyadav1723 Жыл бұрын
Can this feature be used to load from Bigquery to CloudSQL(Postgres) and have realtime streaming for operational purposes. @googlecloudtech
@vitamin2732
@vitamin2732 Жыл бұрын
why do you need it?
@rameshyadav1723
@rameshyadav1723 Жыл бұрын
@@vitamin2732 There is already established process where outputs are stored in BQ, now we needed to send outputs to CloudSQL for API consumption. We need both outputs one which is stored in BQ for analytical reporting and other one for realtime usage through API. Hope it make sense So wondering how to get realtime streaming from BQ to CloudSQL tables, having automatic CDC feature
@vitamin2732
@vitamin2732 Жыл бұрын
@@rameshyadav1723 it looks like wrong architecture.... normally you need to stream from cloud SQL to BQ
@ariefhalim5425
@ariefhalim5425 Жыл бұрын
@@rameshyadav1723 I think Datastream latency is too slow to be used for realtime transaction API
SQL Tutorial for Beginners [Full Course]
3:10:19
Programming with Mosh
Рет қаралды 11 МЛН
Data Governance Explained in 5 Minutes
5:22
IBM Technology
Рет қаралды 168 М.
UNO!
00:18
БРУНО
Рет қаралды 4,7 МЛН
Running With Bigger And Bigger Feastables
00:17
MrBeast
Рет қаралды 113 МЛН
КАКУЮ ДВЕРЬ ВЫБРАТЬ? 😂 #Shorts
00:45
НУБАСТЕР
Рет қаралды 3 МЛН
7 Days Stranded In A Cave
17:59
MrBeast
Рет қаралды 71 МЛН
Near real-time CDC using DataStream
32:20
PracticalGCP
Рет қаралды 6 М.
Introduction to Dataform in Google Cloud Platform
41:47
Cloud 4 Data Science
Рет қаралды 25 М.
Strategies for optimizing your BigQuery queries
7:10
Google Cloud Tech
Рет қаралды 59 М.
What is BigQuery?
4:39
Google Cloud Tech
Рет қаралды 341 М.
How does BigQuery store data?
8:20
Google Cloud Tech
Рет қаралды 48 М.
Data Streaming with Pub/Sub and Dataflow: Best Practices
22:30
Google Cloud Events
Рет қаралды 7 М.
UNO!
00:18
БРУНО
Рет қаралды 4,7 МЛН