Llama2 Installation Guide for Mac (M1 Chip)
Guide for setting up and running Llama2 on Mac systems with Apple silicon. This repo provides instructions
for installing prerequisites like Python and Git, cloning the necessary repositories, downloading and converting
the Llama models, and finally running the model with example prompts.
Prerequisites
Before starting, ensure your system meets the following requirements:
Python 3.8+ (Python 3.11 recommended):
Check your Python version:
python3 --version
Install Python 3.11 (if needed):
brew install python@3.11
Install Mini Conda.
Cloning the Llama2 Repository
git clone https://github.com/facebookresearch/llama.git
Clone the llama C++ port repository
git clone https://github.com/ggerganov/llama.cpp.git
Now, both repositories should be in your llama2
directory.
Inside the llama.cpp
directory, build it:
cd llama.cpp
make
Requesting Access to Llama Models
- Visit Meta AI Resources.
- Fill in your details in the request form.
- You’ll receive an email with a unique URL to download the models.
Downloading the Models
In your terminal, navigate to the llama
directory:
cd llama
Run the download script:
/bin/bash ./download.sh
When prompted, enter the custom URL from the email.
Converting the Downloaded Models
Navigate back to the llama.cpp
repository:
cd llama.cpp
Create a conda environment named llama2
:
conda create --name llama2
Activate the environment:
conda activate llama2
Install Python dependencies:
python3 -m pip install -r requirements.txt
Convert the model to f16 format:
python3 convert.py --outfile models/7B/ggml-model-f16.bin --outtype f16 ../llama2/llama-2-7b-chat --vocab-dir ../llama2
Note: If you encounter an error about a vocab size mismatch (model has -1, but tokenizer.model has 32000), update params.json
in ../llama2/llama-2-7b-chat
from -1 to 32000.
Quantize the model to reduce its size:
./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0
Running the Model
Execute the following command:
./main -m ./models/7B/ggml-model-q4_0.bin -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f ./prompts/chat-with-bob.txt
-m
: Model file
-n
: Number of tokens
--color
: Colored text input
-i
: Interactive mode
-r "User:"
: User input marker
-f
: Path to prompt file
Now you’re ready to use Llama2!
Llama2 Installation Guide for Mac (M1 Chip)
Guide for setting up and running Llama2 on Mac systems with Apple silicon. This repo provides instructions for installing prerequisites like Python and Git, cloning the necessary repositories, downloading and converting the Llama models, and finally running the model with example prompts.
Prerequisites
Before starting, ensure your system meets the following requirements:
Python 3.8+ (Python 3.11 recommended): Check your Python version:
Install Python 3.11 (if needed):
Install Mini Conda.
Cloning the Llama2 Repository
Clone the llama C++ port repository
Now, both repositories should be in your
llama2
directory. Inside thellama.cpp
directory, build it:Requesting Access to Llama Models
Downloading the Models
In your terminal, navigate to the
llama
directory:Run the download script:
When prompted, enter the custom URL from the email.
Converting the Downloaded Models
Navigate back to the
llama.cpp
repository:Create a conda environment named
llama2
:Activate the environment:
Install Python dependencies:
Convert the model to f16 format:
Quantize the model to reduce its size:
Running the Model
Execute the following command:
-m
: Model file-n
: Number of tokens--color
: Colored text input-i
: Interactive mode-r "User:"
: User input marker-f
: Path to prompt fileNow you’re ready to use Llama2!