What do you think about Datastream for your data capture and replication needs? Let us know in the comments below and subscribe for more Google Cloud tips and tricks → goo.gle/GoogleCloudTech
@Farshad-Ghorbani2 жыл бұрын
Is there any solution for partitioning data with datastream?
@yunusarif69632 жыл бұрын
I managed to play around with Datastream with BigQuery as destination. The problem with this approach is that the tables created are not partitioned. For those of as who do incremental load from our BigQuery replica to our reports, will always have to scan the whole table which comes with a cost, compared to scanning and querying only new data in the BigQuery replica
@terminalrecluse2 жыл бұрын
Perhaps cli or programmatic access (not shown in the UI) will allow for specifying a partition key
@Farshad-Ghorbani2 жыл бұрын
@@terminalrecluse I checked the CLI as well, but I couldn’t find it there. Is there any solution for partitioning data? @googlecloudtech
@etaimargolin84492 жыл бұрын
Currently partitions aren't leveraged for optimizing the size of the table scan. We are looking into implementing this as a future improvement.
@dexterkarinyo91447 ай бұрын
Hey! on 2:31, do you have example on how did you create or set up that connection? Thanks.
@ShahnewazKhan2 жыл бұрын
When will postgres cloudsql datastream be available?
@david74822 жыл бұрын
it’s also available now 🎉 cloud.google.com/datastream/docs/sources-postgresql
@felipe.veloso2 жыл бұрын
Wowowoqoqoqo great news!!!
@vitamin27322 жыл бұрын
if it works as described, it is really COOOOOOL. thanks a lot. @Gabe Weiss, some questions: 1. any limitations for Datastream for BigQuery? 2. I am using Cloud SQL so it would be great to have a tutorial for this combination. 3. Looks like AlloyDB competitor, isnt it?) what are the core differences? (I am thinking about AlloyDB in new verison of our project to avoid streaming analytic data to Bigquery)
@FuyangLiu Жыл бұрын
Is there a way to let CloudSQL IAM user (or a ServiceAccount User) be accepted as a way to connection to the CloudSQL db?
@gabeweiss Жыл бұрын
Not currently sadly, no. BUT, it's something we're thinking about. No promises on timeline, but it's definitely something we're working on adding.
@felipe.veloso2 жыл бұрын
Its already en preview for Postgres??? 😮😮
@darshanparmar79612 жыл бұрын
what if there are records updated/deleted in source system(mysql) does it also perform update/delete in bigquery or it works in append only mode.
@etaimargolin84492 жыл бұрын
Datastream replicates all UPDATE / INSERT / DELETE operations to the destinaton. Support for append-only mode is planned for a future release.
@danielcarter96662 жыл бұрын
Can I stream a subset of columns from my source? the cli help (gcloud datastream streams create --help)suggests yes, but when i specify mysql_columns in the suggested format gcloud errors out with ERROR: gcloud crashed (ValidationError): Expected type for field column, found {} (type )
@etaimargolin84492 жыл бұрын
Yes, a subset of columns is supported, in both the UI and API. There was a bug in gcloud around this capability, it should be fixed now.
@danielcarter96662 жыл бұрын
Through the GUI, when selecting the source objects to replication i can use wildcards such as "*.mytable". How do i do this with the CLI? When i describe a stream created through the GUI (gcloud datastream streams describe) the database field is simply missing, but when i try to create a new stream using the same format gcloud bombs out with "ERROR: gcloud crashed (ParseError): Cannot parse YAML: missing key "database"."
@etaimargolin84492 жыл бұрын
Yes, this is supported - you need to specify an empty database key ( "database": "" )
@abhinavtripathi970 Жыл бұрын
will it accept Schema changes ?
@etaimargolin8449 Жыл бұрын
Yes, many schema changes are automatically detected and supported, but some changes might not be detected automatically and may result in events from that table being dropped. In this case, Datastream will report the reason for the event(s) being dropped, and any missing data can be recovered using a backfill of the table.
@analyticshub4992 жыл бұрын
can MariaDB be used instead of MySQL as source to stream data to bigquery?
@gabeweiss2 жыл бұрын
Yes it can! See here for supported versions of MySQL supported: cloud.google.com/datastream/docs/faq#behavior-and-limitations
@HoaTran-rp4kf Жыл бұрын
What happens if I accidentally delete destination table in BigQuery? How can I restore the table?
@etaimargolin8449 Жыл бұрын
Datastream will recreate the table automatically, and the data that was deleted can be recovered by trigerring a backfill from the source.
@HoaTran-rp4kf Жыл бұрын
Hi@@etaimargolin8449 , I found out that some rows were duplicated in the destination table in BigQuery. I cannot delete any rows of the table. How can I solve it?
@apvyas802 жыл бұрын
Does it support Customer managed encryption keys ?
@etaimargolin84492 жыл бұрын
Datastream supports CMEK for data stored at rest. Support for CMEK on data loaded to BigQuery will be added shortly.
@Ng6666yt2 жыл бұрын
Why
@rameshyadav17232 жыл бұрын
Can this feature be used to load from Bigquery to CloudSQL(Postgres) and have realtime streaming for operational purposes. @googlecloudtech
@vitamin27322 жыл бұрын
why do you need it?
@rameshyadav17232 жыл бұрын
@@vitamin2732 There is already established process where outputs are stored in BQ, now we needed to send outputs to CloudSQL for API consumption. We need both outputs one which is stored in BQ for analytical reporting and other one for realtime usage through API. Hope it make sense So wondering how to get realtime streaming from BQ to CloudSQL tables, having automatic CDC feature
@vitamin2732 Жыл бұрын
@@rameshyadav1723 it looks like wrong architecture.... normally you need to stream from cloud SQL to BQ
@ariefhalim5425 Жыл бұрын
@@rameshyadav1723 I think Datastream latency is too slow to be used for realtime transaction API