Overview

Provider Type

Local

API Endpoint


                                                        http://localhost:8080/v1

Free Tier Highlights

Hardware dependent

Why Choose llama.cpp?

llama.cpp stands out for its unique features and capabilities. With a developer-friendly API and comprehensive documentation, you can integrate AI capabilities into your applications within minutes.

Quick Start Guide

Download release or compile from source

Obtain GGUF model

Run ./server -m model.gguf

Access via API or Web UI

Available Models

Model Name	ID	Context	Capabilities
Any GGUF Model Free	`gguf-model`	RAM limited	-

Integration Examples

Ready-to-use code snippets for your applications.

Select a Model

Select Model

main.py

from openai import OpenAI

# llama.cpp server: ./server -m model.gguf
client = OpenAI(
    api_key="llama-cpp",
    base_url="http://localhost:8080/v1"
)

response = client.chat.completions.create(
    model="local",
    messages=[
        {"role": "user", "content": "Explain quantum computing"}
    ]
)

print(response.choices[0].message.content)

index.js

// llama.cpp server provides OpenAI-compatible API
const response = await fetch('http://localhost:8080/v1/chat/completions', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'local',
    messages: [
      { role: 'user', content: 'Explain quantum computing' }
    ]
  })
});

const data = await response.json();
console.log(data.choices[0].message.content);

terminal

# Start server: ./server -m model.gguf
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "local",
    "messages": [
      {
        "role": "user",
        "content": "Explain quantum computing"
      }
    ]
  }'

example.php

<?php
$apiKey = 'YOUR_API_KEY';
$url = "http://localhost:8080/v1/models/gguf-model:generateContent?key=$apiKey";

$data = [
    "contents" => [
        ["parts" => [["text" => "Explain quantum mechanics"]]]
    ]
];

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, json_encode($data));
curl_setopt($ch, CURLOPT_HTTPHEADER, ['Content-Type: application/json']);

$response = curl_exec($ch);
curl_close($ch);

echo $response;
?>

main.go

package main

import (
    "context"
    "fmt"
    "log"
    "github.com/google/generative-ai-go/genai"
    "google.golang.org/api/option"
)

func main() {
    ctx := context.Background()
    client, err := genai.NewClient(ctx, option.WithAPIKey("YOUR_API_KEY"))
    if err != nil { log.Fatal(err) }
    defer client.Close()

    model := client.GenerativeModel("gguf-model")
    resp, err := model.GenerateContent(ctx, genai.Text("Explain quantum mechanics"))
    if err != nil { log.Fatal(err) }
    
    // ...
}

Free Tier Pricing & Limits

Rate Limit

Requests per minute

Hardware dependent

Daily Quota

Requests per day

Unlimited

Token Limit

Tokens per minute

Unlimited

Monthly Quota

Per month limit

Free Open Source

Use Cases

Embedded AI applications

High performance local inference

Backend for other tools (Ollama, LM Studio)

Mobile deployment

Limitations & Considerations

Command line interface

Manual model management

Requires technical knowledge

Barebones UI

Community Hub

Live

Join the discussion, share tips, and rate llama.cpp.

Quick Reactions

Add Discussion

Display Name

Comment

Comments are moderated. Be helpful and respectful.

Recent Activity

0 comments

Cookie Consent

llama.cpp

Overview

Provider Type

API Endpoint

Free Tier Highlights

Why Choose llama.cpp?

Quick Start Guide

Download release or compile from source

Obtain GGUF model

Run ./server -m model.gguf

Access via API or Web UI

Available Models

Integration Examples

Select Model

Free Tier Pricing & Limits

Rate Limit

Daily Quota

Token Limit

Monthly Quota

Use Cases

Limitations & Considerations

Community Hub

Quick Reactions

Add Discussion

Recent Activity

Ready to Get Started?

Cookie Consent

llama.cpp

Overview

Provider Type

API Endpoint

Free Tier Highlights

Why Choose llama.cpp?

Quick Start Guide

Download release or compile from source

Obtain GGUF model

Run ./server -m model.gguf

Access via API or Web UI

Available Models

Integration Examples

Select Model

Free Tier Pricing & Limits

Rate Limit

Daily Quota

Token Limit

Monthly Quota

Use Cases

Limitations & Considerations

Community Hub

Quick Reactions

Add Discussion

Recent Activity

Suggest an Edit

Ready to Get Started?