Рет қаралды 26
Google DeepMind launched Gemini 2.0 models, including Flash, Flash-Lite, and Pro Experimental. Gemini 2.0 Flash outperforms Gemini 1.5 Pro while being 12x cheaper and offers features like multimodal input and a 1 million token context window. Gemini 2.0 Flash is now generally available, enabling developers to create production applications.
The "Flash Lite" model is noted for its cost-effectiveness. Gemini 2.0 models offer a 1-2 million long context. Gemini 2.0 Flash supports a context of 2 million tokens. The models offer multimodal input and output capabilities.
Cursor implemented MCP server integration in Cursor, enabling enhanced functionalities, including using Perplexity for assistance through commands.
Unsloth introduced Dynamic 4-bit Quantization to improve model accuracy while maintaining VRAM efficiency by selectively quantizing parameters. This method enhances the performance of models like DeepSeek and Llama compared to standard quantization techniques, offering a nuanced approach to model compression.
Andrej Karpathy released a 3h31m KZbin video providing a comprehensive overview of Large Language Models (LLMs), covering stages like pretraining, supervised fine-tuning, and reinforcement learning. He discusses topics such as data, tokenization, Transformer internals, and examples like GPT-2 training and Llama 3.1 base inference.
Deep Learning AI introduced "How Transformer LLMs Work", a free course offering a deep dive into Transformer architecture, including topics like tokenizers, embeddings, and mixture-of-expert models. The course aims to help learners understand the inner workings of modern LLMs.
Chain-of-Associated-Thoughts (CoAT) Framework enhances LLMs' reasoning abilities by combining Monte Carlo Tree Search with dynamic knowledge integration. This approach aims to improve comprehensive and accurate responses for complex reasoning tasks.
Ladder-residual modification accelerates the 70B Llama model by ~30% on multi-GPU setups with tensor parallelism when used within Torchtune. This enhancement, developed at TogetherCompute, marks a significant stride in distributed model training efficiency.
A new paper introduces harmonic loss as an alternative to standard cross-entropy loss, claiming improved interpretability and faster convergence in neural networks and LLMs. While some express skepticism about its novelty, others see potential in its ability to shift optimization targets and improve model training dynamics.
There are expectations for a Gemma 3 model with larger context sizes.
The TinyRAG project streamlines RAG implementations using llama-cpp-python and sqlite-vec for ranking, querying, and generating LLM answers.
Gemini 2.0 post: blog.google/te...
Unsloth - Dynamic 4-bit Quantization: unsloth.ai/blo...
Deep Dive into LLMs like ChatGPT: • Deep Dive into LLMs li...
How Transformer LLMs Work: www.deeplearni...
CoAT: Chain-of-Associated-Thoughts Framework for Enhancing Large Language Models Reasoning: arxiv.org/abs/...
Ladder-residual: parallelism-aware architecture for accelerating large model inference with communication overlapping: arxiv.org/abs/...
Harmonic Loss Trains Interpretable AI Models: arxiv.org/abs/...
TinyRAG in GH: github.com/wan...