Let's build a RAG App with Llama2 (Cloudflare Workers AI, Vectorize)

Рет қаралды 5,456

Күн бұрын

In this workshop, Kristian Freeman, Cloudflare Developer Advocate, shows how to build an Retrieval Augmented Generation app using Workers AI and Vectorize.
Learn how to build an alternate version of this app using the popular JS framework Langchain: • Build your own data-dr...
Find the source code for this project here: github.com/kri...
Follow CloudflareDev on Twitter: / cloudflaredev
Follow Kristian on Twitter: / kristianf_
Looking for more AI livecoding? Check out Kristian's KZbin channel: / @7dotdev
----------------------------------------
Deploy serverless code instantly across the globe to give it exceptional performance, reliability, and scale, with Cloudflare Workers.
#CloudflareWorkers #DeveloperQuickTakes #Streaming #CloudflareStream #CloudflareTutorials #Developer

Пікірлер: 11

@bishwasmishra6447 Ай бұрын

Why do we need vector? Is it like faster database retrieval?

@xiaoyulv8922 6 ай бұрын

In this case, D1 becomes part of LLM, right?

@danial_hamedi 3 ай бұрын

seemslike we can not create vectorize with free plans (=

@DiscipleDown 2 ай бұрын

The cloudflare documentation on this is confusing. It seems to indicate free tier is or will be available. Perhaps limited to paid plans since it's still in beta. I think Xata is probably the best free tier serverless database offering with vector support at the moment.

@JoaoPiovensan Ай бұрын

@@DiscipleDown You know if Xata has some docs about integration with cloudflare?

@DiscipleDown Ай бұрын

@@JoaoPiovensan Yes, they do have some docs regarding configuration for sites and workers. If you are using native workers (no framework), it should be very straightforward. However, one thing to keep in mind, is xata has no native geo replication support. So if you are building an app that needs to scale globally, you'll have to roll your own solution there. Xata has a decent writeup on how this can be done, and their solution could technically be achieved with workers. But my use case, I went with a different route. I'm in the process of writing a middleware in golang deployed on fly. This provides me a single endpoint routed to the region closest to the worker making the request. The middleware will then proxy that request to the various xata regions/databases. Eventual consistency is good enough for my use case. So the way I handle writes is by making asynchronous calls to every region using goroutines, when the first database responds with a successful write, we return the UUID (generated by the middleware) to the worker. Any pending writes in other regions will continue to be processed. It's a little more complex than I make it out to be, because you also need to log any failed writes and handle accordingly until all databases are in sync. On the read side, we make asynchronous requests to the three closest database regions. The first region to respond with data returns the request. If all three regions come back with no rows, then we return that result. If you're interested... I may release the source code or potentially launch a hosted service, when the middleware is production ready. I haven't decided yet. Still very much beta.

@DiscipleDown Ай бұрын

@@JoaoPiovensan Yes, CloudFlare has some documentation on this if you're using native workers (no framework) it should be straight forward. One thing to keep in mind is Xata has no native geo-replication support. You would have to roll your own solution in-app for partitioning and replication. Xata has a decent writeup on one way this can be done. Personally, I opted to develop a middleware running on fly, that handles replication and partitioning. So, on the worker side, I just make a simple request to the fly endpoint, which will resolve to the closest fly region and by extension the closest xata database. Low latency reads and writes from every region with eventual consistency.

@JoaoPiovensan Ай бұрын

@@DiscipleDown That helps a lot, thanks! A little more complex that i would expect (never did something similar to what you did) but okay. If you dont mind me asking, deciding to keep without geo-replications results in what problems? For learning proposal projects should not be a big deal, right?