Add gemma demo notebook

mlc-ai · Feb 23, 2024 · c178da6 · c178da6
1 parent 7953b79
commit c178da6
Showing 1 changed file with 356 additions and 0 deletions.
diff --git a/mlc-llm/models/demo_gemma.ipynb b/mlc-llm/models/demo_gemma.ipynb
@@ -0,0 +1,356 @@
+{
+  "cells": [
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "uLLHBhZ_KVqE"
+      },
+      "source": [
+        "# Demo: Gemma with MLC LLM\n",
+        "\n",
+        "Google recently release Gemma: https://blog.google/technology/developers/gemma-open-models/.\n",
+        "\n",
+        "This notebook demonstrates how to use the model with MLC LLM: https://llm.mlc.ai/.\n",
+        "\n",
+        "For the easiest setup, we recommend trying this out in a Google Colab notebook. Click the button below to get started!\n",
+        "\n",
+        "<a target=\"_blank\" href=\"https://colab.research.google.com/github/mlc-ai/notebooks/blob/main/mlc-llm/models/demo_gemma.ipynb\">\n",
+        "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
+        "</a>"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "Vu8opC0QMOZf"
+      },
+      "source": [
+        "## Environment Setup\n",
+        "\n",
+        "Let's set up your environment, so you can successfully run the `ChatModule`. First, let's set up the Conda environment which we will be running this notebook in (not required if running in Google Colab).\n",
+        "\n",
+        "```bash\n",
+        "conda create --name mlc-llm python=3.11\n",
+        "conda activate mlc-llm\n",
+        "```\n",
+        "\n",
+        "**Google Colab:** If you are running this in a Google Colab notebook, be sure to change your runtime to GPU by going to Runtime > Change runtime type and setting the Hardware accelerator to be \"GPU\". Select \"Connect\" on the top right to instantiate your GPU session.\n",
+        "\n",
+        "If you are using CUDA, you can run the following command to confirm that CUDA is set up correctly, and check the version number."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "o7vvnPntdgun",
+        "outputId": "fb05a739-0a5a-4447-b21a-bf21d5cfb537"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Fri Feb 23 18:19:58 2024       \n",
+            "+---------------------------------------------------------------------------------------+\n",
+            "| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |\n",
+            "|-----------------------------------------+----------------------+----------------------+\n",
+            "| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |\n",
+            "| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |\n",
+            "|                                         |                      |               MIG M. |\n",
+            "|=========================================+======================+======================|\n",
+            "|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |\n",
+            "| N/A   46C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |\n",
+            "|                                         |                      |                  N/A |\n",
+            "+-----------------------------------------+----------------------+----------------------+\n",
+            "                                                                                         \n",
+            "+---------------------------------------------------------------------------------------+\n",
+            "| Processes:                                                                            |\n",
+            "|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |\n",
+            "|        ID   ID                                                             Usage      |\n",
+            "|=======================================================================================|\n",
+            "|  No running processes found                                                           |\n",
+            "+---------------------------------------------------------------------------------------+\n"
+          ]
+        }
+      ],
+      "source": [
+        "!nvidia-smi"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "qZfQBQExMV-f"
+      },
+      "source": [
+        "Next, let's download the MLC-AI and MLC-Chat nightly build packages. Go to https://mlc.ai/package/ and replace the command below with the one that is appropriate for your hardware and OS."
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "yiPuqenodnB8"
+      },
+      "outputs": [],
+      "source": [
+        "!pip install --pre mlc-ai-nightly-cu122 mlc-chat-nightly-cu122 -f https://mlc.ai/wheels"
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "qtRRsOPHM3SE"
+      },
+      "source": [
+        "**Google Colab**: If you are using Colab, you may see the red warnings such as \"You must restart the runtime in order to use newly installed versions.\" For our purpose, simply restart session, and run the next cell after restart.\n",
+        "\n",
+        "Let's confirm we have installed the packages successfully!"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 1,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "ktNZi8B6M4Md",
+        "outputId": "908711b7-eaa8-4a1a-88a8-147cb32c58c9"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "tvm installed properly!\n",
+            "mlc_chat installed properly!\n"
+          ]
+        }
+      ],
+      "source": [
+        "!python -c \"import tvm; print('tvm installed properly!')\"\n",
+        "!python -c \"import mlc_chat; print('mlc_chat installed properly!')\""
+      ]
+    },
+    {
+      "cell_type": "markdown",
+      "metadata": {
+        "id": "EIEtjOVvM9LJ"
+      },
+      "source": [
+        "## Running Gemma with MLC-LLM\n",
+        "\n",
+        "Then we can clone gemma weights converted to MLC format from huggingface.\n",
+        "\n",
+        "This is the only thing you need. Afterwards, our JIT (just-in-time) compilation will take care of everything for you!\n",
+        "\n",
+        "First time running may require more time as we need to compile the model. But afterwards we cache it to `/pathto/.cache/mlc_chat/`, so future runs are faster.\n",
+        "\n",
+        "Alternatively, you could also use the following\n",
+        "\n",
+        "```python\n",
+        "!python -m mlc_chat compile gemma-7b-it-q4f16_2-MLC -o gemma-7b-it-q4f16_2-q4f16_2-cuda.so\n",
+        "\n",
+        "cm = ChatModule(\"./gemma-7b-it-q4f16_2-MLC\", model_lib_path=\"gemma-7b-it-q4f16_2-q4f16_2-cuda.so\")\n",
+        "```"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 4,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "MHsFX5cwNZQN",
+        "outputId": "e5ab61f0-37b6-46a8-9ff5-d5782be6495f"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Git LFS initialized.\n"
+          ]
+        }
+      ],
+      "source": [
+        "!git lfs install"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 5,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "isA1NfGFNadt",
+        "outputId": "26b1a8a6-bcf4-4b8b-a0eb-42e6816d1773"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Cloning into 'gemma-7b-it-q4f16_2-MLC'...\n",
+            "remote: Enumerating objects: 113, done.\u001b[K\n",
+            "remote: Counting objects: 100% (110/110), done.\u001b[K\n",
+            "remote: Compressing objects: 100% (110/110), done.\u001b[K\n",
+            "remote: Total 113 (delta 0), reused 0 (delta 0), pack-reused 3\u001b[K\n",
+            "Receiving objects: 100% (113/113), 33.40 KiB | 6.68 MiB/s, done.\n",
+            "Filtering content: 100% (103/103), 5.54 GiB | 62.53 MiB/s, done.\n"
+          ]
+        }
+      ],
+      "source": [
+        "# This is gemma 7b with 4-bit quantization\n",
+        "# Any other quantizations/models have the same steps: https://huggingface.co/mlc-ai\n",
+        "!git clone https://huggingface.co/mlc-ai/gemma-7b-it-q4f16_2-MLC"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 6,
+      "metadata": {
+        "id": "MbxdMhcgfGvk"
+      },
+      "outputs": [],
+      "source": [
+        "from mlc_chat import ChatModule\n",
+        "from mlc_chat.callback import StreamToStdout"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 7,
+      "metadata": {
+        "id": "QAjm3lTJmsiy"
+      },
+      "outputs": [],
+      "source": [
+        "cm = ChatModule(\"./gemma-7b-it-q4f16_2-MLC\")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 8,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "BEHtDLG9nTx1",
+        "outputId": "1c7a3037-afd5-406e-ca98-6282a14b2719"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "Sure, here's a quick overview of five states in the US:\n",
+            "\n",
+            "**1. California:**\n",
+            "- Capital: Sacramento\n",
+            "- Largest city: Los Angeles\n",
+            "- Known for: Golden Gate Bridge, Hollywood, Silicon Valley, and its diverse population.\n",
+            "\n",
+            "**2. New York:**\n",
+            "- Capital: Albany\n",
+            "- Largest city: New York City\n",
+            "- Known for: Empire State Building, Times Square, Niagara Falls, and its rich history.\n",
+            "\n",
+            "**3. Texas:**\n",
+            "- Capital: Austin\n",
+            "- Largest city: Dallas\n",
+            "- Known for: Its large size, diverse culture, and its strong economy.\n",
+            "\n",
+            "**4. Florida:**\n",
+            "- Capital: Tallahassee\n",
+            "- Largest city: Jacksonville\n",
+            "- Known for: Its beautiful beaches, warm climate, and its history as a major naval power.\n",
+            "\n",
+            "**5. Alaska:**\n",
+            "- Capital: Juneau\n",
+            "- Largest city: Anchorage\n",
+            "- Known for: Its breathtaking natural beauty, including towering mountains, glaciers, and fjords.\n"
+          ]
+        }
+      ],
+      "source": [
+        "output = cm.generate(\n",
+        "    prompt=\"Tell me about 5 states in the US\",\n",
+        "    progress_callback=StreamToStdout(callback_interval=2),\n",
+        ")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": 9,
+      "metadata": {
+        "colab": {
+          "base_uri": "https://localhost:8080/"
+        },
+        "id": "ElwvKxHSQZe-",
+        "outputId": "0709daf6-623d-42dd-c790-9b330afc035e"
+      },
+      "outputs": [
+        {
+          "name": "stdout",
+          "output_type": "stream",
+          "text": [
+            "**Sure, here are two more states:**\n",
+            "\n",
+            "**6. Nevada:**\n",
+            "- Capital: Carson City\n",
+            "- Largest city: Las Vegas\n",
+            "- Known for: Its casinos, its desert landscapes, and its history as a frontier town.\n",
+            "\n",
+            "**7. Idaho:**\n",
+            "- Capital: Boise\n",
+            "- Largest city: Boise\n",
+            "- Known for: Its scenic mountains, its salmon fishing, and its rich Native American heritage.\n"
+          ]
+        }
+      ],
+      "source": [
+        "output = cm.generate(\n",
+        "    prompt=\"Two more please\",\n",
+        "    progress_callback=StreamToStdout(callback_interval=2),\n",
+        ")"
+      ]
+    },
+    {
+      "cell_type": "code",
+      "execution_count": null,
+      "metadata": {
+        "id": "hvbzb39ZrAVO"
+      },
+      "outputs": [],
+      "source": [
+        "cm.reset_chat()"
+      ]
+    }
+  ],
+  "metadata": {
+    "accelerator": "GPU",
+    "colab": {
+      "gpuType": "T4",
+      "provenance": []
+    },
+    "kernelspec": {
+      "display_name": "Python 3",
+      "name": "python3"
+    },
+    "language_info": {
+      "name": "python",
+      "version": "3.11.6"
+    }
+  },
+  "nbformat": 4,
+  "nbformat_minor": 0
+}