Superb questions from the audience. JSON schema evolution and marking the processed files. 2 very common problems and yet still hard to solve. Great answer from TD though!
@shijiema64666 жыл бұрын
it's interesting to see how easily it converts ETL from batch mode to real-time mode. But what I really get from here is a confirmation of bright future of relational model and SQL. You can invent new ways to arrange and move the data, but when it comes to analyzing the data, so far it still has to be flattened (and joined).
@Namelessdad837 жыл бұрын
What if we dont have a Distributed filesystem... What if we use a plain Spark Cluster along with Kafka? Can we use Zookeeper to work instead of HDFS/S3 for WAL?