As many of you know, I have been building a personal AI assistant using Large Language Model transformers. The goal is to have it access documents and internet feeds to take care of different tasks.
This week I have been experimenting with Meta’s newest iteration of their open source transformer, Llama3. I downloaded their 8B model and started some experiments. Initial results show their changes to the EOT (end of token) or stop token, which results in the model continuously answering itself. While this leads to some ammusing replies, it’s a little less than ideal. I need to go back to their document page and update the ‘assistant’ end of token standard (commonly <|eot|>). The Quant GGUF file for Llama (from Quant) works properly, but I’m not as happy with the decoded results that it seems to be giving. I’ll have to continue exploring solutions in Langchain.
I also have been comparing both Mixtral (8x7b and 8x22b) and Qwen (32b) and both have been giving amazing and snappy results on my Macbook M3 Max with 36 gigs of RAM. My results are consistent with other benchmarks and prototyping results that others are getting.
I’ll end by saying, it’s an exciting time to be experimenting with LLMs. There are so many open source models that are hitting very high marks, and fine tuning and building and chaining RAGs onto the base models is becoming significantly more powerful.
#ll #qwen #llama3 #langchain #mixtral #ai
Hi. I'm Scott Sullivan, a slave of Christ, author, AI programmer, and animator. I spend my time split between the countryside of Lancaster, Pa, and Northern Italy, near Cinque Terre and La Spezia.
In addition to improving lives through data analytics with my BS in Computer Science,
I also published, Searching For Me,
my first memoir, about my adoption, search for my biological family, and how it affected my faith.