diff --git a/README.md b/README.md index e18cfc7..85610ca 100644 --- a/README.md +++ b/README.md @@ -52,7 +52,8 @@ CMMLU是一个综合性的中文评估基准,专门用于评估语言模型在 | [Bactrian-LLaMA-13B](https://github.com/mbzuai-nlp/bactrian-x) | 27.52 | 32.47 | 32.27 | 35.77 | 31.56 | 31.88 | | [MOSS-SFT-16B](https://github.com/OpenLMLab/MOSS) | 27.23 | 30.41 | 28.84 | 32.56 | 28.68 | 29.57 | | 尚未开放测试的模型 | -| [Mengzi-7B](https://www.langboat.com/) | **49.59** | **75.27** | **71.36** | **70.52** | **69.23** | **66.41** | +| [Galaxy](https://www.zuoyebang.com/) | **69.61** | 74.95 | **78.54** | **77.93** | **73.99** | **74.03** | +| [Mengzi-7B](https://www.langboat.com/) | 49.59 | **75.27** | 71.36 | 70.52 | 69.23 | 66.41 | | [KwaiYii-13B](https://github.com/kwai) | 46.54 | 69.22 | 64.49 | 65.09 | 63.10 | 61.73 | | [MiLM-6B](https://github.com/XiaoMi/MiLM-6B/) | 46.85 | 61.12 | 61.68 | 58.84 | 59.39 | 57.17 | | [MiLM-1.3B](https://github.com/XiaoMi/MiLM-6B/) | 35.59 | 49.58 | 49.03 | 47.56 | 48.17 | 45.39 | @@ -77,7 +78,8 @@ CMMLU是一个综合性的中文评估基准,专门用于评估语言模型在 | [Chinese-LLaMA-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca)| 26.76 | 26.57 | 27.42 | 28.33 | 26.73 | 27.34 | | [MOSS-SFT-16B](https://github.com/OpenLMLab/MOSS) | 25.68 | 26.35 | 27.21 | 27.92 | 26.70 | 26.88 | | 尚未开放测试的模型 | -| [Mengzi-7B](https://www.langboat.com/) | **49.49** | **75.84** | **72.32** | **70.87** | **70.00** | **66.88** | +| [Galaxy](https://www.zuoyebang.com/) | **69.38** | 75.33 | **78.27** | **78.19** | **73.25** | **73.85** | +| [Mengzi-7B](https://www.langboat.com/) | 49.49 | **75.84** | 72.32 | 70.87 | 70.00 | 66.88 | | [KwaiYii-13B](https://github.com/kwai) | 46.82 | 69.35 | 63.42 | 64.02 | 63.26 | 61.22 | | [MiLM-6B](https://github.com/XiaoMi/MiLM-6B/) | 48.88 | 63.49 | 66.20 | 62.14 | 62.07 | 60.37 | | [MiLM-1.3B](https://github.com/XiaoMi/MiLM-6B/) | 40.51 | 54.82 | 54.15 | 53.99 | 52.26 | 50.79 | diff --git a/README_EN.md b/README_EN.md index 6d1f7ef..cca8e79 100644 --- a/README_EN.md +++ b/README_EN.md @@ -55,7 +55,8 @@ The following table displays the performance of models in the five-shot and zero | [Bactrian-LLaMA-13B](https://github.com/mbzuai-nlp/bactrian-x) | 27.52 | 32.47 | 32.27 | 35.77 | 31.56 | 31.88 | | [MOSS-SFT-16B](https://github.com/OpenLMLab/MOSS) | 27.23 | 30.41 | 28.84 | 32.56 | 28.68 | 29.57 | | Not open-source/API models | -| [Mengzi-7B](https://www.langboat.com/) | **49.59** | **75.27** | **71.36** | **70.52** | **69.23** | **66.41** | +| [Galaxy](https://www.zuoyebang.com/) | **69.61** | 74.95 | **78.54** | **77.93** | **73.99** | **74.03** | +| [Mengzi-7B](https://www.langboat.com/) | 49.59 | **75.27** | 71.36 | 70.52 | 69.23 | 66.41 | | [KwaiYii-13B](https://github.com/kwai) | 46.54 | 69.22 | 64.49 | 65.09 | 63.10 | 61.73 | | [MiLM-6B](https://github.com/XiaoMi/MiLM-6B/) | 46.85 | 61.12 | 61.68 | 58.84 | 59.39 | 57.17 | | [MiLM-1.3B](https://github.com/XiaoMi/MiLM-6B/) | 35.59 | 49.58 | 49.03 | 47.56 | 48.17 | 45.39 | @@ -81,7 +82,8 @@ The following table displays the performance of models in the five-shot and zero | [Chinese-LLaMA-13B](https://github.com/ymcui/Chinese-LLaMA-Alpaca)| 26.76 | 26.57 | 27.42 | 28.33 | 26.73 | 27.34 | | [MOSS-SFT-16B](https://github.com/OpenLMLab/MOSS) | 25.68 | 26.35 | 27.21 | 27.92 | 26.70 | 26.88 | | Not open-source/API models | -| [Mengzi-7B](https://www.langboat.com/) | **49.49** | **75.84** | **72.32** | **70.87** | **70.00** | **66.88** | +| [Galaxy](https://www.zuoyebang.com/) | **69.38** | 75.33 | **78.27** | **78.19** | **73.25** | **73.85** | +| [Mengzi-7B](https://www.langboat.com/) | 49.49 | **75.84** | 72.32 | 70.87 | 70.00 | 66.88 | | [KwaiYii-13B](https://github.com/kwai) | 46.82 | 69.35 | 63.42 | 64.02 | 63.26 | 61.22 | | [MiLM-6B](https://github.com/XiaoMi/MiLM-6B/) | 48.88 | 63.49 | 66.20 | 62.14 | 62.07 | 60.37 | | [MiLM-1.3B](https://github.com/XiaoMi/MiLM-6B/) | 40.51 | 54.82 | 54.15 | 53.99 | 52.26 | 50.79 |