Rather than considering writing your own task scheduler/runner, consider using the open-source HPC tools out there.. Slurm with auto-scaling is an absolute beast, as it was designed, and is used, to schedule millions of jobs daily for thousands of users against extremely busy/constrained super-computers around the world (over 60% of the supercomputers use it) - job runtimes ranging from sub-second to months. And you benefit from a massive set of other features such as user/team management, quotas, accounting/budgeting, flexible scheduler resources/constraints..
@stavetx Жыл бұрын
Hmmm. But may be such great difference between json+gzip vs iceberg+parquet is not point of the iceberg. Binary parquet (with metadata in it) vs text json...