I have a parquet file named ah.parquet.
It contains Apple Health data and has the following columns:
- type: Nullable(String)
- value: Nullable(String)
- start: Nullable(DateTime64(6))
- end: Nullable(DateTime64(6))
- created: Nullable(DateTime64(6))
I have a parquet file named ah.parquet.
It contains Apple Health data and has the following columns:
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| pandafish-3-7B-32k | 40.85 | 73.57 | 56.3 | 42.17 | 53.22 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 20.47 | ± | 2.54 |
| acc_norm | 20.87 | ± | 2.55 | ||
| agieval_logiqa_en | 0 | acc | 34.10 | ± | 1.86 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| pandafish-2-7b-32k | 40.8 | 73.35 | 57.46 | 42.69 | 53.57 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 22.05 | ± | 2.61 |
| acc_norm | 19.69 | ± | 2.50 | ||
| agieval_logiqa_en | 0 | acc | 35.94 | ± | 1.88 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| dolphin-2.8-mistral-7b-v02 | 38.99 | 72.22 | 51.96 | 40.41 | 50.9 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 21.65 | ± | 2.59 |
| acc_norm | 20.47 | ± | 2.54 | ||
| agieval_logiqa_en | 0 | acc | 35.79 | ± | 1.88 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| Mistral-7B-Instruct-v0.2 | 38.5 | 71.64 | 66.82 | 42.29 | 54.81 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 23.62 | ± | 2.67 |
| acc_norm | 22.05 | ± | 2.61 | ||
| agieval_logiqa_en | 0 | acc | 36.10 | ± | 1.88 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| HeatherSpellGen3 | 44.88 | 76.87 | 78.3 | 49.89 | 62.48 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 27.56 | ± | 2.81 |
| acc_norm | 25.20 | ± | 2.73 | ||
| agieval_logiqa_en | 0 | acc | 39.02 | ± | 1.91 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| pandafish-dt-7b | 45.24 | 77.19 | 78.41 | 49.76 | 62.65 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 27.95 | ± | 2.82 |
| acc_norm | 26.38 | ± | 2.77 | ||
| agieval_logiqa_en | 0 | acc | 39.32 | ± | 1.92 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| HeatherSpellGen2 | 40.73 | 75.43 | 72.75 | 47.12 | 59.01 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 21.65 | ± | 2.59 |
| acc_norm | 20.47 | ± | 2.54 | ||
| agieval_logiqa_en | 0 | acc | 36.41 | ± | 1.89 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| HeatherSpell-7b | 45.65 | 77.24 | 75.75 | 50 | 62.16 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 28.74 | ± | 2.85 |
| acc_norm | 25.98 | ± | 2.76 | ||
| agieval_logiqa_en | 0 | acc | 39.63 | ± | 1.92 |
| Model | AGIEval | GPT4All | TruthfulQA | Bigbench | Average |
|---|---|---|---|---|---|
| pandafish-7b | 40 | 74.23 | 53.22 | 40.51 | 51.99 |
| Task | Version | Metric | Value | Stderr | |
|---|---|---|---|---|---|
| agieval_aqua_rat | 0 | acc | 21.65 | ± | 2.59 |
| acc_norm | 21.65 | ± | 2.59 | ||
| agieval_logiqa_en | 0 | acc | 34.10 | ± | 1.86 |