Hi Owen, thanks for you work. It's really helpful 🙌 🙌
I'm trying to run DOREMI on the entire 21 dataset to get the weight to create global_combine_v1.
But I have some cuda version problems. as I understand doremi is just for detecting weights of datasto combine. Can you share the doremi weights what it was for global_combine_v1? In order to prevent Doremi from rerunning
alternatively, if you have global_combine_v1 already, it would be great to share, Thanks
I am also following the same thoughts and was looking at a way to use Karpathy's llama2c instead of Pythia and the synthetic datasets (of high quality) to have 100MB-300 MB task specific models.
Hi Owen, thanks for you work. It's really helpful 🙌 🙌
I'm trying to run DOREMI on the entire 21 dataset to get the weight to create global_combine_v1.
But I have some cuda version problems. as I understand doremi is just for detecting weights of datasto combine. Can you share the doremi weights what it was for global_combine_v1? In order to prevent Doremi from rerunning
alternatively, if you have global_combine_v1 already, it would be great to share, Thanks
I am also following the same thoughts and was looking at a way to use Karpathy's llama2c instead of Pythia and the synthetic datasets (of high quality) to have 100MB-300 MB task specific models.
Did you publish your pipeline?
Yes, please take a look here - https://github.com/emrgnt-cmplxty/sciphi and here https://github.com/emrgnt-cmplxty/SmolTrainer, for data and training pipelines, respectively.