Рет қаралды 4,703
Many of us know scikit-learn for it's ability to construct pipelines that can do .fit().predict(). It's an amazing feature for sure. But once you dive into the codebase ... you realise that there is just so much more.
This talk will be an attempt at demonstrating some extra features in scikit-learn, and it's ecosystem, that are less common but deserve to be in the spotlight.
In particular I hope to discuss these things that scikit-learn can do:
- sparse datasets and models
- larger than memory datasets
- sample weight techniques
- image classification via embeddings
- tabular embeddings/vectorisation
- data deduplication
- pipeline caching
If time allows I may also touch on extra topics.