The elephant in the room which was not addressed here is the model+jar solution is running on CPU not GPU so it is 2 to 3 orders of magnitude slower. This is a good start but this part should have been highlighted in the presentation. Python ecosystem is riddled with bad practices, third party git repos as dependencies which make reproducible builds hard to achieve. The language tooling for Python around observability and debugging is also way behind Java. I am eagerly waiting for Java to support GPU. Java will do wonders in the AI ecosystem once the GPU problem is solved.