How to Analyze Data from a REST API with Flink SQL

No video

How to Analyze Data from a REST API with Flink SQL

Рет қаралды 1,797

Confluent

Күн бұрын

Пікірлер: 5

@ConfluentDevXTeam 2 ай бұрын

Hey, it's Lucia! Hopping in here to say that if you're working on transforming, cleaning, and connecting data, we've got resources for you. Head over to Confluent Developer and check out more of our demos: cnfl.io/3X9niaV PS- like I said in the video, I'm happy to answer questions in the comments! Chime in.

@Fikusiklol 2 ай бұрын

Is this different from KSQL-db and stream processing happening there? Sorry for being ignorant, just genuinely confused :)

@ConfluentDevXTeam 2 ай бұрын

Lucia here. That's a great question! Both ksqlDB and FlinkSQL can analyze data that is in Kafka topics. However, ksqlDb is built on Kafka Streams, while FlinkSQL is an API offered by Apache Flink. You can read more about FlinkSQL here: cnfl.io/4bR5ndv

@shintaii84 2 ай бұрын

I’m sorry to say, but i did not learn anything, besides that we can create a cloud account… I think i can build a script that does this in a few hours with any db. So what is unique here? Why just not a cronjob running my script, saving to postgres or even without saving just put events on the topic, line by line.

@DaveTroiano 2 ай бұрын

Hello, demo developer Dave here :) Thanks for tuning in! I would call out these uniqueness points, particularly compared to the "script into any db" idea: 1. Ease of use, both in terms of development and deployment. Connector config plus SQL running in a cloud engine is easier than scripting and needing to run that script reliably. To your point, neither approach is very difficult if the goal is to get to a demo point, but the runtime aspect in particular has many big hard things lurking underneath if going beyond demo (next point...) 2. All of the other solution hardening that you would face doing this for real is a lot easier with this approach compared to rolling your own. Resilience with respect to REST API flakiness, fault tolerance with respect to connector infrastructure, logging, etc. Where do you want to build all of these features that you’ll probably need post-demo and how much time do you want to spend on developing these things on your own and maintaining them? In other words, many people would be able to get to the equivalent demo point via a script and postgres pretty quickly, but the marginal effort needed to harden would be significantly higher. 3. Ad hoc streaming / real-time analytics. This is mostly a response to the question “why not use any DB??” This is more a demo about getting started, but it then enables real-time answers to ad hoc questions, say a QoS type question like “how many aircraft at taxiing *currently* and what’s the avg / max taxiing time in the past minute? A Cronjob and postgres might work for batch and answering these kinds of questions after the fact, but the streaming aspect is unique and the reason to be looking at technologies like Kafka and Flink (many more details on the benefits of Flink in this Confluent Developer course: developer.confluent.io/courses/apache-flink/intro/ ). In the case of this example, it's seconds type latency to get from data available via API to "data reflected in a streaming aggregation" given the latency delay inherent in this particular API. Still pretty snappy and a difficult "time to live" bar to achieve with a cronjob and postgres though... 4. This is more of a “coming soon”, but I would expect data quality rules to become available for connectors (not supported as of today). This feature would unlock data quality at the source and help developers to proactively address REST APIs changing under the rug. (In my experience, REST APIs can be a bit of a wild west when it comes to format reliability.) More here: docs.confluent.io/cloud/current/sr/fundamentals/data-contracts.html. This would be a demo enhancement when that feature becomes available, but I’m thinking ahead to yet another problem that developers would face in building a pipeline like this in production and opting for a managed quality control feature rather than having to implement it yourself. Cheers 🙂 Dave