Great presentation. It is interesting to see the practical side of running a bunch of LLMs. Ops makes it happen. Coming from the old, really old, school of computing with massive multi-user, time-share systems, it is interesting to see how no matter how much computing changes, aspects of it remain the same. Through-put, latency, caching and scheduling is still central. All that seems to have changed is the problem domain. We do, in deed, live in intereswting times.
@conan_der_barbar Жыл бұрын
great talk! still waiting for the open source release 👀
@suleimanshehu583911 ай бұрын
Please create a video on fine tuning MoE LLM using LoRa adapters such as Mixtural 8x7B MoE LLM within your framework
@Gerald-iz7mv8 ай бұрын
hi, do you have any links to benchmarks you can run to measure latency, throughput for different model and frameworks etc?
@fastcardlastname3353 Жыл бұрын
This shall change the landscape of multiple agents if it's promised.