Master the art of building AI-powered applications with free resources. Learn optimization, security, prompt engineering, and more.
Essential setup and configuration
Free APIs can fail (rate limits, downtime, network issues). Always build resilient code:
import requests
import time
from requests.adapters import HTTPAdapter
from requests.packages.urllib3.util.retry import Retry
def call_llm_with_retry(url, headers, data, max_retries=3):
"""Robust API call with exponential backoff"""
# Configure retry strategy
session = requests.Session()
retry = Retry(
total=max_retries,
backoff_factor=1, # Wait 1s, 2s, 4s, etc.
status_forcelist=[429, 500, 502, 503, 504]
)
adapter = HTTPAdapter(max_retries=retry)
session.mount('https://', adapter)
try:
response = session.post(url, headers=headers, json=data, timeout=30)
response.raise_for_status()
return response.json()
except requests.exceptions.HTTPError as e:
if e.response.status_code == 429:
print("Rate limit hit. Consider switching providers.")
raise
except requests.exceptions.Timeout:
print("Request timed out. Provider may be slow.")
raise
except requests.exceptions.ConnectionError:
print("Network issue. Check your internet.")
raise
finally:
session.close()
# Usage
result = call_llm_with_retry(url, headers, data)
Don't rely on a single provider. If Groq hits rate limits, automatically switch to Google AI Studio or OpenRouter. This ensures 99.9% uptime.
Example .env file:
GROQ_API_KEY=gsk_xxxxxxxxxxxxxxxxxxxx GOOGLE_API_KEY=AIzaSyXXXXXXXXXXXXXXXXXXXXXX OPENROUTER_API_KEY=sk-or-v1-xxxxxxxxxxxxxxxxxxxx
Maximize your free tier usage
Identical prompts = identical responses. Cache aggressively to reduce API calls by 40-70%:
import hashlib
import json
from functools import lru_cache
# Option 1: In-Memory Cache (simple, fast)
@lru_cache(maxsize=1000)
def cached_llm_call(prompt, model="llama-3.3-70b", temp=0.7):
# Cache based on prompt + parameters
response = call_api(prompt, model, temp)
return response
# Option 2: Redis Cache (persistent, scalable)
import redis
r = redis.Redis(host='localhost', port=6379, decode_responses=True)
def cached_llm_call_redis(prompt, model, temp, ttl=3600):
# Create cache key
key = hashlib.md5(f"{prompt}{model}{temp}".encode()).hexdigest()
# Check cache
cached = r.get(key)
if cached:
print("Cache hit!")
return json.loads(cached)
# Cache miss - call API
response = call_api(prompt, model, temp)
r.setex(key, ttl, json.dumps(response)) # Cache for 1 hour
return response
max_tokens to prevent overly long responses.
"I need you to analyze this product review and tell me if the sentiment is positive, negative, or neutral. Please provide a detailed explanation..."
"Sentiment (positive/negative/neutral): [review text]"
Instead of making 100 API calls for 100 items, process multiple items in a single request:
for review in reviews:
sentiment = llm(f"Sentiment: {review}")
# 100 API calls = slow + expensive
batch = "\n".join([f"{i}. {r}" for i,r in enumerate(reviews)])
result = llm(f"Sentiment for each:\n{batch}")
# 1 API call = fast!
Tip: Use JSON output format for easy parsing: "Return as JSON: [{{"id": 1, "sentiment": "positive"}}, ...]"
import time
from collections import deque
class RateLimiter:
def __init__(self, max_calls_per_minute=30):
self.max_calls = max_calls_per_minute
self.calls = deque()
def wait_if_needed(self):
now = time.time()
# Remove calls older than 1 minute
while self.calls and self.calls[0] < now - 60:
self.calls.popleft()
# If we've hit the limit, wait
if len(self.calls) >= self.max_calls:
sleep_time = 60 - (now - self.calls[0])
print(f"Rate limit reached. Waiting {sleep_time:.1f}s...")
time.sleep(sleep_time)
self.calls.popleft()
self.calls.append(time.time())
# Usage
limiter = RateLimiter(max_calls_per_minute=30)
for prompt in prompts:
limiter.wait_if_needed()
result = call_api(prompt)
Get better outputs with smarter prompts
"Write about AI"
"Write a 500-word article explaining how transformers work for a beginner audience"
Add "Let's think step by step" or "Explain your reasoning" for complex tasks:
"Calculate 47 × 23. Think step by step."
✓ Output quality improves by 20-30% on reasoning tasks
Extract structured data from text. Example 1: Input: "John Doe, [email protected], age 30" Output: {{"name": "John Doe", "email": "[email protected]", "age": 30}} Example 2: Input: "Jane Smith, [email protected], age 25" Output: {{"name": "Jane Smith", "email": "[email protected]", "age": 25}} Now extract from: "Bob Wilson, [email protected], age 45"
Protect your applications and data