If you enjoy learning about LLMs, make sure to also watch my tutorial on prompt engineering: kzbin.info/www/bejne/Y3Olpp99gpurfJI
@boredcrow72853 ай бұрын
straight to the point pretty great! I have doubt in sentencepeice does the model split the corpus into character level and do the same as BPE or word peice instead of splitting it on the basis of spaces in case of english??
@datamlistic3 ай бұрын
Thanks! Yes, sentence piece considers the space as a stand-alone character. No pre-tokenization based on space is done there.