LoRA Adapter Merger - Cog Implementation

This is a Cog implementation for merging LoRA adapters back into their base models. Cog is a tool that packages machine learning models in standard, production-ready containers.

Files

cog.yaml: Configuration file that defines the Docker environment, Python version, and dependencies
predict.py: Defines the prediction interface with a Predictor class that has setup() and predict() methods

Installation

First, install Cog:

# macOS
brew install cog

# Linux/Windows (via curl)
sudo curl -o /usr/local/bin/cog -L "https://github.com/replicate/cog/releases/latest/download/cog_$(uname -s)_$(uname -m)"
sudo chmod +x /usr/local/bin/cog

Usage

1. Build the Docker Image

cog build

This builds a Docker image with all dependencies specified in cog.yaml.

2. Run a Prediction

First, prepare your LoRA adapter files as a .tar.gz or .zip archive:

# Create archive of your adapter files
tar -czf my_adapter.tar.gz adapter_config.json adapter_model.safetensors

Then run the prediction:

# With Unsloth (recommended for Unsloth-trained models)
cog predict \
  -i adapter_path=@my_adapter.tar.gz \
  -i base_model_id="meta-llama/Llama-2-7b-hf" \
  -i merge_method="merged_16bit" \
  -i use_unsloth=true

# Standard PEFT merging
cog predict \
  -i adapter_path=@my_adapter.tar.gz \
  -i base_model_id="meta-llama/Llama-2-7b-hf" \
  -i merge_method="standard" \
  -i use_unsloth=false

The output will be a .tar.gz file containing the merged model.

3. Extract the Merged Model

tar -xzf output.tar.gz

Input Parameters

Parameter	Type	Default	Description
`adapter_path`	Path	Required	Path to LoRA adapter (as .tar.gz or .zip)
`base_model_id`	String	”“	HuggingFace model ID (auto-detected if empty)
`merge_method`	Choice	“merged_16bit”	Method: merged_16bit, merged_4bit, merged_4bit_forced, lora, standard
`use_unsloth`	Boolean	true	Use Unsloth for merging (recommended for Unsloth-trained models)

Merge Methods

Unsloth Methods (Recommended for Unsloth-trained models)

merged_16bit: ✅ Best for production use with VLLM and inference engines
merged_4bit: For special cases (DPO training, HuggingFace inference)
merged_4bit_forced: Force 4-bit merge (use with caution)
lora: Save only LoRA adapters without full merge

Standard Method

standard: Standard PEFT merge_and_unload() method

Deploy to Replicate

To deploy this model to Replicate:

# Login to Replicate
cog login

# Push to Replicate
cog push r8.im/your-username/lora-merger

Once pushed, you can run predictions via Replicate’s API or web interface.

API Usage (After Deploying to Replicate)

import replicate

output = replicate.run(
    "your-username/lora-merger:version-id",
    input={
        "adapter_path": "https://url-to-your-adapter.tar.gz",
        "base_model_id": "meta-llama/Llama-2-7b-hf",
        "merge_method": "merged_16bit",
        "use_unsloth": True
    }
)

# Download the merged model
print(output)  # URL to the merged model .tar.gz

Testing Locally

You can test the Docker container locally:

# Start a prediction server
cog run -p 5000 python -m cog.server.http

# In another terminal, make a request
curl http://localhost:5000/predictions \
  -X POST \
  -H "Content-Type: application/json" \
  -d '{
    "input": {
      "adapter_path": "https://url-to-adapter.tar.gz",
      "base_model_id": "meta-llama/Llama-2-7b-hf",
      "merge_method": "merged_16bit",
      "use_unsloth": true
    }
  }'

System Requirements

Docker
NVIDIA GPU with CUDA support (for GPU acceleration)
Sufficient disk space for model downloads and merging

Notes

The merged model is returned as a .tar.gz archive
Files are automatically cleaned up after prediction completes
GPU is enabled by default in cog.yaml for faster processing
Unsloth provides optimized merging that’s faster and more memory-efficient

Troubleshooting

Out of memory errors: Try using a machine with more RAM or reduce the model size.

Unsloth not available: If Unsloth installation fails, set use_unsloth=false to use standard PEFT merging.

Base model not found: Ensure the base_model_id is correct and accessible on HuggingFace Hub.

License

This implementation is provided as-is. Please ensure you comply with the licenses of the base models and adapters you use.

Model created 4 months, 3 weeks ago