tomasmcm / pandalyst-13b-v1.0

Source: pipizhao/Pandalyst_13B_V1.0 ✦ Quant: TheBloke/Pandalyst_13B_V1.0-AWQ ✦ Pandalyst: A large language model for mastering data analysis using pandas

  • Public
  • 21 runs
  • L40S
  • Paper
  • License
Iterate in playground

Input

*string
Shift + Return to add a new line

Text prompt to send to the model.

integer

Maximum number of tokens to generate per output sequence.

Default: 128

number
(minimum: -5, maximum: 5)

Float that penalizes new tokens based on whether they appear in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.

Default: 0

number
(minimum: -5, maximum: 5)

Float that penalizes new tokens based on their frequency in the generated text so far. Values > 0 encourage the model to use new tokens, while values < 0 encourage the model to repeat tokens.

Default: 0

number
(minimum: 0.01, maximum: 5)

Float that controls the randomness of the sampling. Lower values make the model more deterministic, while higher values make the model more random. Zero means greedy sampling.

Default: 0.8

number
(minimum: 0.01, maximum: 1)

Float that controls the cumulative probability of the top tokens to consider. Must be in (0, 1]. Set to 1 to consider all tokens.

Default: 0.95

integer

Integer that controls the number of top tokens to consider. Set to -1 to consider all tokens.

Default: -1

string
Shift + Return to add a new line

List of strings that stop the generation when they are generated. The returned output will not contain the stop strings.

Output

```python # Prepare the data: convert "Races" and "Wins" to numerical values, and replace '?' and 'N/A' with NaN career_summary['Races'] = pd.to_numeric(career_summary['Races'], errors='coerce') career_summary['Wins'] = pd.to_numeric(career_summary['Wins'], errors='coerce') career_summary = career_summary.dropna(subset=['Races', 'Wins']) # Calculate the correlation correlation = career_summary['Races'].corr(career_summary['Wins']) print(f"The correlation between the number of races and the number of wins is {correlation}") ```
Generated in

Run time and cost

This model runs on Nvidia L40S GPU hardware. We don't yet have enough runs of this model to provide performance information.

Readme

Pandalyst: A large language model for mastering data analysis using pandas

Github Repo: https://github.com/pipizhaoa/Pandalyst

What is Pandalyst - Pandalyst is a general large language model specifically trained to process and analyze data using the pandas library.

How is Pandalyst - Pandalyst has strong generalization capabilities for data tables in different fields and different data analysis needs.

Why is Pandalyst - Pandalyst is open source and free to use, and its small parameter size (7B/13B) allows us to easily deploy it on local PC. - Pandalyst can handle complex data tables (multiple columns and multiple rows), allowing us to enter enough context to describe our table in detail. - Pandalyst has very competitive performance, significantly outperforming models of the same size and even outperforming some of the strongest closed-source models.

News

  • 🔥[2023/10/15] Now we can plot 📈! and much more powerful! We released Pandalyst-7B-V1.2, which was trained on CodeLlama-7b-Python and it surpasses ChatGPT-3.5 (2023/06/13), Pandalyst-7B-V1.1 and WizardCoder-Python-13B-V1.0 in our PandaTest_V1.0.
  • 🤖️[2023/09/30] We released Pandalyst-7B-V1.1 , which was trained on CodeLlama-7b-Python and achieves the 76.1 exec@1 in our PandaTest_V1.0 and surpasses WizardCoder-Python-13B-V1.0 and ChatGPT-3.5 (2023/06/13).
Model Checkpoint Support plot License
🔥Pandalyst-7B-V1.2 🤗 HF Link Llama2
Pandalyst-7B-V1.1 🤗 HF Link Llama2

Usage and Human evaluation

Please refer to Github.