LoRA Adapter Merger - Cog Implementation
This is a Cog implementation for merging LoRA adapters back into their base models. Cog is a tool that packages machine learning models in standard, production-ready containers.
Files
cog.yaml: Configuration file that defines the Docker environment, Python version, and dependenciespredict.py: Defines the prediction interface with aPredictorclass that hassetup()andpredict()methods
Installation
First, install Cog:
# macOS
brew install cog
# Linux/Windows (via curl)
sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
sudo chmod +x /usr/local/bin/cog
Usage
1. Build the Docker Image
cog build
This builds a Docker image with all dependencies specified in cog.yaml.
2. Run a Prediction
First, prepare your LoRA adapter files as a .tar.gz or .zip archive:
# Create archive of your adapter files
tar -czf my_adapter.tar.gz adapter_config.json adapter_model.safetensors
Then run the prediction:
# With Unsloth (recommended for Unsloth-trained models)
cog predict \
-i adapter_path=@my_adapter.tar.gz \
-i base_model_id="meta-llama/Llama-2-7b-hf" \
-i merge_method="merged_16bit" \
-i use_unsloth=true
# Standard PEFT merging
cog predict \
-i adapter_path=@my_adapter.tar.gz \
-i base_model_id="meta-llama/Llama-2-7b-hf" \
-i merge_method="standard" \
-i use_unsloth=false
The output will be a .tar.gz file containing the merged model.
3. Extract the Merged Model
tar -xzf output.tar.gz
Input Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
adapter_path |
Path | Required | Path to LoRA adapter (as .tar.gz or .zip) |
base_model_id |
String | ”“ | HuggingFace model ID (auto-detected if empty) |
merge_method |
Choice | “merged_16bit” | Method: merged_16bit, merged_4bit, merged_4bit_forced, lora, standard |
use_unsloth |
Boolean | true | Use Unsloth for merging (recommended for Unsloth-trained models) |
Merge Methods
Unsloth Methods (Recommended for Unsloth-trained models)
merged_16bit: ✅ Best for production use with VLLM and inference enginesmerged_4bit: For special cases (DPO training, HuggingFace inference)merged_4bit_forced: Force 4-bit merge (use with caution)lora: Save only LoRA adapters without full merge
Standard Method
standard: Standard PEFTmerge_and_unload()method
Deploy to Replicate
To deploy this model to Replicate:
# Login to Replicate
cog login
# Push to Replicate
cog push r8.im/your-username/lora-merger
Once pushed, you can run predictions via Replicate’s API or web interface.
API Usage (After Deploying to Replicate)
import replicate
output = replicate.run(
"your-username/lora-merger:version-id",
input={
"adapter_path": "https://url-to-your-adapter.tar.gz",
"base_model_id": "meta-llama/Llama-2-7b-hf",
"merge_method": "merged_16bit",
"use_unsloth": True
}
)
# Download the merged model
print(output) # URL to the merged model .tar.gz
Testing Locally
You can test the Docker container locally:
# Start a prediction server
cog run -p 5000 python -m cog.server.http
# In another terminal, make a request
curl http://localhost:5000/predictions \
-X POST \
-H "Content-Type: application/json" \
-d '{
"input": {
"adapter_path": "https://url-to-adapter.tar.gz",
"base_model_id": "meta-llama/Llama-2-7b-hf",
"merge_method": "merged_16bit",
"use_unsloth": true
}
}'
System Requirements
- Docker
- NVIDIA GPU with CUDA support (for GPU acceleration)
- Sufficient disk space for model downloads and merging
Notes
- The merged model is returned as a
.tar.gzarchive - Files are automatically cleaned up after prediction completes
- GPU is enabled by default in
cog.yamlfor faster processing - Unsloth provides optimized merging that’s faster and more memory-efficient
Troubleshooting
Out of memory errors: Try using a machine with more RAM or reduce the model size.
Unsloth not available: If Unsloth installation fails, set use_unsloth=false to use standard PEFT merging.
Base model not found: Ensure the base_model_id is correct and accessible on HuggingFace Hub.
License
This implementation is provided as-is. Please ensure you comply with the licenses of the base models and adapters you use.