Introduction

Quick Start

Start the vLLM server.

Create a file called docker-compose.yml.

# Usage with CUDA, requires Nvidia GPU.
services:
  sv:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: 1
              capabilities: [gpu]
    image: vllm/vllm-openai:v0.4.0
    command: "--model TheBloke/Mistral-7B-Instruct-v0.2-AWQ --gpu-memory-utilization 0.8 --max-model-len 4096 --quantization awq"
    ports:
      - 8000:8000
    network_mode: host
    ipc: "host"
    volumes:
      - ~/.cache/huggingface:/root/.cache/huggingface

Start it with docker-compose up

note

Check vLLM docs for more information or help on how to start the server.

vLLM 0.4.0 or newer is required

Install `@lmscript/client`

npm install @lmscript/client

Create a script

Create a file called main.ts

import { LmScript } from "@lmscript/client";
import { VllmBackend } from "@lmscript/client/backends/vllm";
import { BULLET_LIST_REGEX } from "@lmscript/client/regex";

const backend = new VllmBackend({
  url: `http://localhost:8000`,
  template: "mistral",
  model: "TheBloke/Mistral-7B-Instruct-v0.2-AWQ",
});

const model = new LmScript(backend, { temperature: 0.0 });
const {
  captured: { mdList },
} = await model
  .user(
    "Tell me a list of 5 jokes. Answer with a markdown list, where each item of the list has a joke, in a single line.",
  )
  .assistant((m) =>
    m.gen("mdList", {
      maxTokens: 128,
      stop: "\n\n",
      regex: BULLET_LIST_REGEX,
    }),
  )
  .run();

console.log(mdList);

Execute the file

Use your preferred runtime. @lmscript/client has no dependencies and runs everywhere.

For instance, to run using tsx and node: npx tsx main.ts

UI Prompt Editor

TODO

Introduction

Quick Start​

Start the vLLM server.​

Install @lmscript/client​

Create a script​

Execute the file​

UI Prompt Editor​