deepseek-ai / deepseek-67b-base

DeepSeek LLM, an advanced language model comprising 67 billion parameters. Trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese

  • Public
  • 431 runs
  • 2x A100 (80GB)
  • GitHub
  • License

Input

string
Shift + Return to add a new line

Input prompt

Default: "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"

integer
(minimum: 1, maximum: 4096)

The maximum number of tokens the model should generate as output.

Default: 256

number

The value used to modulate the next token probabilities.

Default: 0.7

number

A probability threshold for generating the output. If < 1.0, only keep the top tokens with cumulative probability >= top_p (nucleus filtering). Nucleus filtering is described in Holtzman et al. (http://arxiv.org/abs/1904.09751).

Default: 0.8

integer

The number of highest probability tokens to consider for generating the output. If > 0, only keep the top k tokens with highest probability (top-k filtering).

Default: 50

number

Repetition penalty

Default: 1.05

Output

computed as a weighted sum of the values, where the weight assigned to each value is computed by a compatibility function of the query with the corresponding key. The simplest form of attention mechanism is the additive attention mechanism (Bahdanau et al., 2015), which uses the following equation: Where the function softmax is applied to each row of the matrix QK to ensure that the weights add up to one. In this case, the attention vector a is computed as a weighted sum of the values V using the weights WQK. The multiplicative attention mechanism (Luong et al., 2015) is another common form of attention mechanism. It uses the following equation: In this case, the attention vector a is computed as a weighted sum of the values V using the weights WQKV. Attention mechanisms are used in various natural language processing tasks such as machine translation, question answering, and text summarization. They allow the model to focus on different parts of the input sequence depending on the task at hand. For example, in machine translation, the attention mechanism allows the decoder to attend to different parts of the source sentence when generating the target sentence. Attention Mechanism Attention mechanism is a technique used in deep learning models to
Generated in

Run time and cost

This model costs approximately $0.41 to run on Replicate, or 2 runs per $1, but this varies depending on your inputs. It is also open source and you can run it on your own computer with Docker.

This model runs on 2x Nvidia A100 (80GB) GPU hardware. Predictions typically complete within 147 seconds. The predict time for this model varies significantly based on the inputs.

Readme

1. Deepseek 67B LLM

Introducing DeepSeek LLM, an advanced language model comprising 67 billion parameters. It has been trained from scratch on a vast dataset of 2 trillion tokens in both English and Chinese. In order to foster research, we have made DeepSeek LLM 7B/67B Base and DeepSeek LLM 7B/67B Chat open source for the research community.

2. Model Summary

deepseek-llm-67b-base is a 67B parameter model with Grouped-Query Attention trained on 2 trillion tokens from scratch. - Home Page: DeepSeek - Repository: deepseek-ai/deepseek-LLM - Chat With DeepSeek LLM: DeepSeek-LLM

3. How to Use

Here give some examples of how to use our model.

Text Completion

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM, GenerationConfig

model_name = "deepseek-ai/deepseek-llm-67b-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.bfloat16, device_map="auto")
model.generation_config = GenerationConfig.from_pretrained(model_name)
model.generation_config.pad_token_id = model.generation_config.eos_token_id

text = "An attention function can be described as mapping a query and a set of key-value pairs to an output, where the query, keys, values, and output are all vectors. The output is"
inputs = tokenizer(text, return_tensors="pt")
outputs = model.generate(**inputs.to(model.device), max_new_tokens=100)

result = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(result)

4. License

This code repository is licensed under the MIT License. The use of DeepSeek LLM models is subject to the Model License. DeepSeek LLM supports commercial use.

See the LICENSE-MODEL for more details.

5. Contact

If you have any questions, please raise an issue or contact us at service@deepseek.com.