Skip to main content

· 3 min read

If you have doubts about the findings of this report or any other evaluations, how should you reproduce and compare the evaluation results?

Workflow: Login → Create a project → Run the model → Create a report

STEP1: Login

First, you have to log in to the Starwhale platform by clicking on the login. If you haven't registered yet, you can click on the sign-up to create an account.

STEP2: Create a project

After successful login, you will be directed to the project list page. Click the Create button in the top right corner to create a new project. Enter the project name and click the Submit button to create the project.

STEP3: Run the models

Go to the Evaluations list pag, click the Create button, and then choose the parameters.

For example, to reproduce the evaluation result of baichuan2-13b with the cmmlu dataset, refer to the following:

  1. Choose the running resource, recommend to select A10*24G*2;
  2. Select the model: Choose the models you want to reproduce, e.g.: starwhale/llm-leaderboard/baichuan2-13b/atgoiscm(v1、latest);
  3. Choose the handler: Select the option "src.evaluation:evaluation_results";
  4. Choose the dataset: Select the option "starwhale/llm-leaderboard/cmmlu/kiwtxza7(v1、latest)";
  5. Choose the runtime: Select the option "starwhale/llm-leaderboard/llm-leaderboard/ickinf6q(v1、latest)".
  6. Advanced configuration: Turn off the auto-release switch.

Click Submit to run the model. During the evaluation process, you can click View Log on the task tab of the evaluation details page to understand the running status of the evaluation. When the evaluation status is "Successed," you can view the results on the list and details pages.

STEP4: Compare the evaluation results

To create a report, go to the Report list page and click the Create button in the upper right corner.

Reports provide rich text editing capabilities, and here we mainly introduce how to compare your evaluation results with Starwhale or other evaluation results.

  1. Input the report title and description;
  2. Input /, select and click the Panel option;
  3. Click the Add Evaluation button, select the project, such as "llm-leaderboard", and then to check the evaluations you want to add. Click Add to add evaluations to the evaluation list. You can add multiple evaluations that you want to compare across different projects;
  4. After adding the evaluations, click the Column Management settings icon to set the columns in the evaluation list and their display order. When you hover over a column in the evaluation list, you can fix that column or sort it in ascending or descending order;
  5. You can click the Add Chart button and select the chart type, such as Bar Chart, then add Metrics related to accuracy (support for metric fuzzy search). Input a chart title (optional) and click Submit to display the data in bar chart format for intuitive analysis.
  6. Click Publish to Project button to publish the report;
  7. If you want to share the report with others, go to the Report list page, turn on the "Share" switch, and people who obtain the report link can view it.

reproduce-and-compare-evals.gif

These are the instructions on how to reproduce and compare evaluations using Starwhale. Please leave a private message if you encounter any issues during the using process. You can also visit the Starwhale official website for more information. Thank you for your attention and support.

· 4 min read

Meta Llama 2, once released captured the attention of the entire world. Starwhale has specially developed the Llama 2-Chat and Llama 2-7b model packages. In just 5 minutes, you can engage in a conversation with Llama 2-Chat from scratch on https://cloud.starwhale.cn.

In the future, we will also provide model packages for Llama 2-13b and Llama 2-70b. Interested friends, please stay tuned!

The following will provide a detailed introduction to what is Llama 2, what is Starwhale, and how to use Starwhale to run Llama 2-Chat.

What is Llama 2

The Llama 2 series models are a set of large language models that utilize optimized autoregressive Transformer architecture. They have undergone pre-training and fine-tuning and come in three parameter versions: 7 billion, 13 billion, and 70 billion. Additionally, Meta has trained a 34 billion parameter version, but it has not been released, and relevant data is mentioned in the research paper.

Pre-training: Compared to Llama 1, Llama 2's training data has increased by 40%, using 2 trillion tokens for training, and the context length is twice that of Llama 1, reaching 4096. Llama 2 is well-suited for various natural language generation tasks.

image

Meta compared the results of Llama 2-70b with closed-source models and found that its performance is close to GPT-3.5 on MMLU (Multilingual Multimodal Language Understanding) and GSM8K (German Speech Recognition) tasks. However, there are significant differences in performance on encoding benchmarks.

Moreover, on almost all benchmarks, Llama 2-70b performs on par with or even better than Google's PaLM-540b model. But there still remains a considerable gap in performance when compared to models like GPT-4 and PaLM-2-L.

image

Fine-tuning: Llama 2-Chat is a version of Llama 2 that has been fine-tuned specifically for chat dialogue scenarios. The fine-tuning process involves using SFT (Supervised Fine-Tuning) and RLHF (Reinforcement Learning from Human Feedback) in an iterative optimization to align better with human preferences and improve safety. The fine-tuning data includes publicly available instruction datasets and over one million newly annotated samples. Llama 2-Chat can be used for chat applications similar to virtual assistants. The image below shows the percentage of violations in single-turn and multi-turn conversations. Compared to the baseline, Llama 2-Chat performs particularly well in multi-turn conversations.

image

What is Starwhale

Starwhale is an MLOps platform that offers a comprehensive solution for the entire machine learning operations process. It enables developers and businesses to efficiently and conveniently manage model hosting, execution, evaluation, deployment, and dataset management. Users can choose from three different versions: Standalone, Server, or Cloud, based on their specific requirements. For more detailed information and instructions on using Starwhale, users can refer to the platform's documentation.

how to use Starwhale to run Llama 2-Chat

Workflow:Login → Create a project → Run the model → Chat with Llama2

1. Login

First, you need to log in to the Starwhale platform by clicking on the login. If you haven't registered yet, you can click on the sign-up to create an account.

2. Create a project

After successful login, you will be directed to the project list page. Click on the Create button on the top right corner to create a new project. Enter the project name and click on the Submit button to create the project.

image

image

3. Run the model

Go to the job list page and click on the Create task button.

1) Choose the running resource, you can select A100 80G1 (recommended) or A10 24G1. 2) Select the model: starwhale/public/llama2-7b-chat/ki72ulaf(latest). 3) Choose the handler: Run the chatbot model, select the default option: evaluation:chatbot. 4) Choose the runtime: Select the default option, built-in. 5) Advanced configuration: Turn on the auto-release switch, where you can set the duration after which the task will be automatically canceled. If you don't set auto-release, you can manually cancel the task after the experiment is completed.

Click on Submit to run the model.

image

4. View the Results and Logs

The job list page allows you to view all the tasks in the project.

image

Click on the Job ID to enter the task details page, and then click on View Logs to see the logs.

The total time taken from task submission to model execution is 5 minutes.

image

Once the execution is successful, return to the task list and click on the Terminal button to open the chatbox page. You can now start a conversation with Llama 2-Chat on the chatbox page.

image

image

These are the instructions on how to use Starwhale Cloud to run Llama 2-Chat. If you encounter any issues during the process, please feel free to leave a private message. You can also visit the Starwhale official website for more information. Thank you for your attention and support.

· One min read
tianwei

Starwhale is an MLOps platform that make your model creation, evaluation and publication much eaiser. It aims to create a handy tool for data scientists and machine learning engineers.