Llama2 Installation Guide for Mac (M1 Chip)

Guide for setting up and running Llama2 on Mac systems with Apple silicon. This repo provides instructions for installing prerequisites like Python and Git, cloning the necessary repositories, downloading and converting the Llama models, and finally running the model with example prompts.

Prerequisites

Before starting, ensure your system meets the following requirements:

Python 3.8+ (Python 3.11 recommended): Check your Python version:
```
python3 --version
```
Install Python 3.11 (if needed):
```
brew install python@3.11
```
Install Mini Conda.

Cloning the Llama2 Repository

git clone https://github.com/facebookresearch/llama.git

Clone the llama C++ port repository

git clone https://github.com/ggerganov/llama.cpp.git

Now, both repositories should be in your llama2 directory. Inside the llama.cpp directory, build it:

cd llama.cpp
make

Requesting Access to Llama Models

Visit Meta AI Resources.
Fill in your details in the request form.
You’ll receive an email with a unique URL to download the models.

Downloading the Models

In your terminal, navigate to the llama directory:
```
cd llama
```
Run the download script:
```
/bin/bash ./download.sh
```
When prompted, enter the custom URL from the email.

Converting the Downloaded Models

Navigate back to the llama.cpp repository:
```
cd llama.cpp
```
Create a conda environment named llama2:
```
conda create --name llama2
```
Activate the environment:
```
conda activate llama2
```

Install Python dependencies:

python3 -m pip install -r requirements.txt

Convert the model to f16 format:
```
python3 convert.py --outfile models/7B/ggml-model-f16.bin --outtype f16 ../llama2/llama-2-7b-chat --vocab-dir ../llama2
```
Note: If you encounter an error about a vocab size mismatch (model has -1, but tokenizer.model has 32000), update params.json in ../llama2/llama-2-7b-chat from -1 to 32000.

Quantize the model to reduce its size:

./quantize ./models/7B/ggml-model-f16.bin ./models/7B/ggml-model-q4_0.bin q4_0

Running the Model

Execute the following command:
```
./main -m ./models/7B/ggml-model-q4_0.bin -n 1024 --repeat_penalty 1.0 --color -i -r "User:" -f ./prompts/chat-with-bob.txt
```
- -m: Model file
- -n: Number of tokens
- --color: Colored text input
- -i: Interactive mode
- -r "User:": User input marker
- -f: Path to prompt file

Now you’re ready to use Llama2!