deepseek-ai / deepseek-67b-base

DeepSeek LLM, an advanced language model comprising 67 billion parameters. Trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese

  • Public
  • 65 runs
  • GitHub
  • License

Run time and cost

This model runs on 2x Nvidia A100 (80GB) GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

1. Deepseek 67B LLM

Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.

2. Model Summary

deepseek-llm-67b-base is a 67B parameter model with Grouped-Query Attention trained on 2 trillion tokens from scratch. - Home Page: DeepSeek - Repository: deepseek-ai/deepseek-LLM - Chat With DeepSeek LLM: DeepSeek-LLM

3. How to Use

Here give some examples of how to use our model.

Text Completion

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/deepseek-llm-67b-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

4. License

This code repository is licensed under the MIT License. The use of DeepSeek LLM models is subject to the Model License. DeepSeek LLM supports commercial use.

See the LICENSE-MODEL for more details.

5. Contact

If you have any questions, please raise an issue or contact us at service@deepseek.com.