this post was submitted on 29 Jun 2023
1 points (100.0% liked)

Animal House

2 readers
1 users here now

Discussion area for the main blog: animal-machine.com. Feel free to comment here to discuss any of my blog posts.

Rules:

  1. Excessive hate speech such racism will not be tolerated.
  2. Excessive self-promotion or advertisement will probably get modded.
  3. Try to be kind where possible. At the very least, be respectful when disagreeing.

founded 1 year ago
MODERATORS
 

This is my step-by-step guide on how to replicate fine tuning of the example datasets using axolotl.

Last I checked, the bitsandbytes library copy was still needed and open-llama-3b was still problematic for quantizing, but hopefully those issues are solved at some point.

What I didn't know when I first wrote the post was that it was possible to load the finetuned LoRA file in a frontend like text-generation-webui. I have since updated the text to account for that. There are performance side-effects of just loading the qlora adapter in the webui besides just the penalty to load time. This should show how fast text inference was with little context in tokens/p while using the transformers library and source model in f16 or quantized 8-bit & 4-bit and how fast I can run a merged q4_0 quantization.

top 1 comments
sorted by: hot top controversial new old
[–] [email protected] 2 points 1 year ago

@InattentiveRaccoon
This is a great guide on fine tuning with Axolotl! I have been trying to find github projects for fine-tuning llama2 models, and there aren't many complete examples. I was finally able to do it thanks to you!