Stage 03 — Using Claude, GPT & Gemini APIs with Python

From API Keys to Streaming Tool Use  ·  Technical + Hands-On  ·  ⏱ 8–10 hours

Learning Objectives

By the end of this stage you will be able to:

  • Set up Python environments and API keys for Claude, OpenAI, and Gemini
  • Send basic completions, handle responses, and stream output
  • Build multi-turn conversation history correctly
  • Implement proper error handling, retries, and rate limiting
  • Use token counting and cost estimation
  • Call functions/tools from LLM outputs
  • Build a reusable LLM client wrapper class

Setup

Prerequisites

python --version   # Python 3.9+
pip install anthropic openai google-generativeai python-dotenv

API Keys

Create a .env file in your project root (never commit this):

ANTHROPIC_API_KEY=sk-ant-...
OPENAI_API_KEY=sk-proj-...
GOOGLE_API_KEY=AI...

Load them in Python:

from dotenv import load_dotenv
import os

load_dotenv()

anthropic_key = os.getenv("ANTHROPIC_API_KEY")
openai_key = os.getenv("OPENAI_API_KEY")
google_key = os.getenv("GOOGLE_API_KEY")

Section 1: Anthropic (Claude) API

Basic Completion

import anthropic

client = anthropic.Anthropic()  # Reads ANTHROPIC_API_KEY from env

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    messages=[
        {"role": "user", "content": "Explain gradient descent in 3 sentences."}
    ]
)

print(message.content[0].text)
print(f"\nTokens used: {message.usage.input_tokens} in, {message.usage.output_tokens} out")

With System Prompt

message = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1024,
    system="You are a senior Python developer. Be concise. Use type hints. Show examples.",
    messages=[
        {"role": "user", "content": "How do I read a CSV file?"}
    ]
)

print(message.content[0].text)

Multi-Turn Conversation

The API is stateless — you must send the entire conversation history every time.

def chat(client, conversation_history: list[dict], user_message: str, system: str = "") -> str:
    """Send a message and return the assistant's response."""
    conversation_history.append({
        "role": "user",
        "content": user_message
    })
    
    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=2048,
        system=system,
        messages=conversation_history
    )
    
    assistant_message = response.content[0].text
    conversation_history.append({
        "role": "assistant",
        "content": assistant_message
    })
    
    return assistant_message


# Usage
client = anthropic.Anthropic()
history = []
system = "You are a helpful Python tutor."

while True:
    user_input = input("You: ")
    if user_input.lower() in ["quit", "exit"]:
        break
    
    response = chat(client, history, user_input, system)
    print(f"Claude: {response}\n")

Streaming Responses

For long outputs, streaming provides real-time token-by-token output:

def stream_chat(client, messages: list[dict], system: str = "") -> str:
    """Stream a response and collect it for history."""
    full_response = ""
    
    with client.messages.stream(
        model="claude-sonnet-4-5",
        max_tokens=2048,
        system=system,
        messages=messages
    ) as stream:
        for text in stream.text_stream:
            print(text, end="", flush=True)
            full_response += text
    
    print()  # newline after streaming
    return full_response

Tool Use (Function Calling)

Tools allow the model to call functions you define:

import json

# Define available tools
tools = [
    {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "input_schema": {
            "type": "object",
            "properties": {
                "city": {
                    "type": "string",
                    "description": "The city name"
                },
                "unit": {
                    "type": "string",
                    "enum": ["celsius", "fahrenheit"],
                    "description": "Temperature unit"
                }
            },
            "required": ["city"]
        }
    }
]

# Simulated weather function
def get_weather(city: str, unit: str = "celsius") -> dict:
    # In real code, call a weather API here
    return {"city": city, "temperature": 22, "unit": unit, "condition": "Sunny"}

# Agentic loop: model decides when to call tools
def run_with_tools(client, user_message: str) -> str:
    messages = [{"role": "user", "content": user_message}]
    
    while True:
        response = client.messages.create(
            model="claude-sonnet-4-5",
            max_tokens=1024,
            tools=tools,
            messages=messages
        )
        
        # Check if model wants to use a tool
        if response.stop_reason == "tool_use":
            # Process all tool calls
            tool_results = []
            for content_block in response.content:
                if content_block.type == "tool_use":
                    tool_name = content_block.name
                    tool_input = content_block.input
                    tool_use_id = content_block.id
                    
                    # Execute the tool
                    if tool_name == "get_weather":
                        result = get_weather(**tool_input)
                    else:
                        result = {"error": f"Unknown tool: {tool_name}"}
                    
                    tool_results.append({
                        "type": "tool_result",
                        "tool_use_id": tool_use_id,
                        "content": json.dumps(result)
                    })
            
            # Add assistant response and tool results to history
            messages.append({"role": "assistant", "content": response.content})
            messages.append({"role": "user", "content": tool_results})
        
        else:
            # Model is done — return final text response
            return response.content[0].text


result = run_with_tools(client, "What's the weather like in Tokyo and London?")
print(result)

Section 2: OpenAI (GPT) API

Basic Completion

from openai import OpenAI

client = OpenAI()  # Reads OPENAI_API_KEY from env

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Explain gradient descent in 3 sentences."}
    ],
    max_tokens=512,
    temperature=0.7
)

print(response.choices[0].message.content)
print(f"Tokens: {response.usage.prompt_tokens} in, {response.usage.completion_tokens} out")

Structured Output (JSON Mode)

import json

response = client.chat.completions.create(
    model="gpt-4o",
    response_format={"type": "json_object"},
    messages=[
        {"role": "system", "content": "Extract entities from the text. Return JSON with keys: people (list), organizations (list), locations (list), dates (list)."},
        {"role": "user", "content": "Elon Musk founded SpaceX in 2002 in Hawthorne, California. He later acquired Twitter in October 2022."}
    ]
)

data = json.loads(response.choices[0].message.content)
print(data)
# {'people': ['Elon Musk'], 'organizations': ['SpaceX', 'Twitter'], 
#  'locations': ['Hawthorne, California'], 'dates': ['2002', 'October 2022']}

Function Calling

tools = [
    {
        "type": "function",
        "function": {
            "name": "search_database",
            "description": "Search the product database by query",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Search query"},
                    "category": {"type": "string", "enum": ["laptops", "phones", "tablets"]},
                    "max_price": {"type": "number", "description": "Maximum price in USD"}
                },
                "required": ["query"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Find me a laptop under $1000"}],
    tools=tools,
    tool_choice="auto"
)

# Check if a tool was called
if response.choices[0].finish_reason == "tool_calls":
    tool_call = response.choices[0].message.tool_calls[0]
    function_name = tool_call.function.name
    function_args = json.loads(tool_call.function.arguments)
    print(f"Model wants to call: {function_name}({function_args})")

Streaming

stream = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Write a haiku about recursion."}],
    stream=True
)

for chunk in stream:
    if chunk.choices[0].delta.content is not None:
        print(chunk.choices[0].delta.content, end="", flush=True)
print()

Section 3: Google Gemini API

Basic Completion

import google.generativeai as genai
import os

genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

model = genai.GenerativeModel("gemini-1.5-flash")

response = model.generate_content("Explain the CAP theorem in simple terms.")
print(response.text)

With System Instruction

model = genai.GenerativeModel(
    model_name="gemini-1.5-flash",
    system_instruction="You are a database expert. Use examples from PostgreSQL and MongoDB."
)

response = model.generate_content("What's the difference between ACID and BASE consistency models?")
print(response.text)

Multi-Turn Chat

model = genai.GenerativeModel("gemini-1.5-flash")
chat = model.start_chat(history=[])

# Send messages
response = chat.send_message("What is Kubernetes?")
print("Gemini:", response.text)

response = chat.send_message("How does it compare to Docker Swarm?")
print("Gemini:", response.text)

# History is maintained automatically
print("\nConversation length:", len(chat.history), "turns")

Processing Long Documents

Gemini's 1M token context is especially useful for large documents:

with open("long_document.txt", "r") as f:
    document = f.read()

model = genai.GenerativeModel("gemini-1.5-pro")

response = model.generate_content([
    f"Document:\n{document}\n\nQuestions: 1) What is the main argument? 2) List the 5 key claims. 3) What evidence is weakest?"
])
print(response.text)

Section 4: Error Handling and Resilience

Production code must handle API failures gracefully.

Common Errors

ErrorCauseSolution
AuthenticationErrorInvalid API keyCheck key, reload env
RateLimitErrorToo many requestsExponential backoff
APIConnectionErrorNetwork issueRetry with backoff
BadRequestErrorInvalid parametersLog and fix the request
APIStatusError5xx server errorRetry with backoff

Exponential Backoff Pattern

import time
import random
import anthropic

def call_with_retry(
    client: anthropic.Anthropic,
    messages: list[dict],
    system: str = "",
    max_retries: int = 5,
    base_delay: float = 1.0
) -> str:
    """Call Claude API with exponential backoff retry."""
    
    for attempt in range(max_retries):
        try:
            response = client.messages.create(
                model="claude-sonnet-4-5",
                max_tokens=2048,
                system=system,
                messages=messages
            )
            return response.content[0].text
            
        except anthropic.RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            # Get retry-after header if available
            delay = float(e.response.headers.get("retry-after", base_delay * (2 ** attempt)))
            jitter = random.uniform(0, delay * 0.1)
            print(f"Rate limited. Waiting {delay + jitter:.1f}s (attempt {attempt + 1}/{max_retries})")
            time.sleep(delay + jitter)
            
        except anthropic.APIConnectionError:
            if attempt == max_retries - 1:
                raise
            delay = base_delay * (2 ** attempt) + random.uniform(0, 1)
            print(f"Connection error. Retrying in {delay:.1f}s...")
            time.sleep(delay)
            
        except anthropic.APIStatusError as e:
            if e.status_code >= 500:
                if attempt == max_retries - 1:
                    raise
                delay = base_delay * (2 ** attempt)
                time.sleep(delay)
            else:
                raise  # 4xx errors should not be retried
    
    raise RuntimeError("Max retries exceeded")

Section 5: Token Counting and Cost Estimation

Understanding token usage is essential for cost control in production.

import anthropic

client = anthropic.Anthropic()

def estimate_cost(input_tokens: int, output_tokens: int, model: str = "claude-sonnet-4-5") -> float:
    """Estimate Claude API cost in USD."""
    # Prices per million tokens (as of 2025 — verify current prices)
    pricing = {
        "claude-opus-4-6": {"input": 15.00, "output": 75.00},
        "claude-sonnet-4-5": {"input": 3.00, "output": 15.00},
        "claude-haiku-4-5": {"input": 0.80, "output": 4.00},
    }
    
    if model not in pricing:
        return 0.0
    
    p = pricing[model]
    cost = (input_tokens / 1_000_000 * p["input"]) + (output_tokens / 1_000_000 * p["output"])
    return cost

def count_tokens(client: anthropic.Anthropic, messages: list[dict], system: str = "") -> int:
    """Count tokens before making a request."""
    response = client.messages.count_tokens(
        model="claude-sonnet-4-5",
        system=system,
        messages=messages
    )
    return response.input_tokens

# Example: estimate before sending
messages = [{"role": "user", "content": "Explain the Raft consensus algorithm in detail with examples."}]
token_count = count_tokens(client, messages)
estimated_cost = estimate_cost(token_count, 1000)  # Assume 1000 output tokens
print(f"Estimated input tokens: {token_count}")
print(f"Estimated cost: ${estimated_cost:.4f}")

Section 6: Building a Reusable LLM Client

Here's a production-ready wrapper that works across providers:

from abc import ABC, abstractmethod
from dataclasses import dataclass, field
from typing import Optional
import anthropic
from openai import OpenAI
import time
import random


@dataclass
class Message:
    role: str  # "user" or "assistant"
    content: str


@dataclass  
class LLMResponse:
    content: str
    input_tokens: int
    output_tokens: int
    model: str
    cost_usd: float


class BaseLLMClient(ABC):
    """Abstract base class for LLM clients."""
    
    @abstractmethod
    def complete(
        self,
        messages: list[Message],
        system: str = "",
        max_tokens: int = 2048,
        temperature: float = 0.7
    ) -> LLMResponse:
        pass
    
    def chat(self, conversation: list[Message], user_message: str, **kwargs) -> LLMResponse:
        """Add user message and get response, updating conversation in place."""
        conversation.append(Message(role="user", content=user_message))
        response = self.complete(conversation, **kwargs)
        conversation.append(Message(role="assistant", content=response.content))
        return response


class ClaudeClient(BaseLLMClient):
    """Anthropic Claude client with retry logic."""
    
    PRICING = {
        "claude-sonnet-4-5": {"input": 3.00, "output": 15.00},
        "claude-haiku-4-5": {"input": 0.80, "output": 4.00},
    }
    
    def __init__(self, model: str = "claude-sonnet-4-5", max_retries: int = 3):
        self.client = anthropic.Anthropic()
        self.model = model
        self.max_retries = max_retries
    
    def complete(
        self,
        messages: list[Message],
        system: str = "",
        max_tokens: int = 2048,
        temperature: float = 0.7
    ) -> LLMResponse:
        api_messages = [{"role": m.role, "content": m.content} for m in messages]
        
        for attempt in range(self.max_retries):
            try:
                response = self.client.messages.create(
                    model=self.model,
                    max_tokens=max_tokens,
                    system=system,
                    messages=api_messages,
                    temperature=temperature
                )
                
                input_tok = response.usage.input_tokens
                output_tok = response.usage.output_tokens
                p = self.PRICING.get(self.model, {"input": 3.0, "output": 15.0})
                cost = (input_tok / 1e6 * p["input"]) + (output_tok / 1e6 * p["output"])
                
                return LLMResponse(
                    content=response.content[0].text,
                    input_tokens=input_tok,
                    output_tokens=output_tok,
                    model=self.model,
                    cost_usd=cost
                )
                
            except anthropic.RateLimitError:
                if attempt == self.max_retries - 1:
                    raise
                time.sleep(2 ** attempt + random.random())
            except anthropic.APIConnectionError:
                if attempt == self.max_retries - 1:
                    raise
                time.sleep(2 ** attempt)


class OpenAIClient(BaseLLMClient):
    """OpenAI GPT client."""
    
    PRICING = {
        "gpt-4o": {"input": 2.50, "output": 10.00},
        "gpt-4o-mini": {"input": 0.15, "output": 0.60},
    }
    
    def __init__(self, model: str = "gpt-4o"):
        self.client = OpenAI()
        self.model = model
    
    def complete(
        self,
        messages: list[Message],
        system: str = "",
        max_tokens: int = 2048,
        temperature: float = 0.7
    ) -> LLMResponse:
        api_messages = []
        if system:
            api_messages.append({"role": "system", "content": system})
        api_messages.extend([{"role": m.role, "content": m.content} for m in messages])
        
        response = self.client.chat.completions.create(
            model=self.model,
            messages=api_messages,
            max_tokens=max_tokens,
            temperature=temperature
        )
        
        input_tok = response.usage.prompt_tokens
        output_tok = response.usage.completion_tokens
        p = self.PRICING.get(self.model, {"input": 2.5, "output": 10.0})
        cost = (input_tok / 1e6 * p["input"]) + (output_tok / 1e6 * p["output"])
        
        return LLMResponse(
            content=response.choices[0].message.content,
            input_tokens=input_tok,
            output_tokens=output_tok,
            model=self.model,
            cost_usd=cost
        )


# Usage example
def run_benchmark(prompt: str) -> None:
    """Compare responses and costs across providers."""
    
    clients = {
        "Claude Sonnet": ClaudeClient("claude-sonnet-4-5"),
        "GPT-4o": OpenAIClient("gpt-4o"),
    }
    
    for name, client in clients.items():
        try:
            messages = [Message(role="user", content=prompt)]
            response = client.complete(messages)
            print(f"\n{'='*50}")
            print(f"Model: {name}")
            print(f"Tokens: {response.input_tokens} in / {response.output_tokens} out")
            print(f"Cost: ${response.cost_usd:.4f}")
            print(f"Response: {response.content[:200]}...")
        except Exception as e:
            print(f"{name} error: {e}")


run_benchmark("What are the three most important concepts in distributed systems? Be concise.")

Section 7: Image and Vision

Both Claude and GPT-4o support image inputs.

import base64
from pathlib import Path

def encode_image(image_path: str) -> tuple[str, str]:
    """Encode image to base64 and detect media type."""
    path = Path(image_path)
    suffix = path.suffix.lower()
    
    media_types = {".jpg": "image/jpeg", ".jpeg": "image/jpeg", 
                   ".png": "image/png", ".gif": "image/gif", ".webp": "image/webp"}
    media_type = media_types.get(suffix, "image/jpeg")
    
    with open(image_path, "rb") as f:
        data = base64.standard_b64encode(f.read()).decode("utf-8")
    
    return data, media_type

# Claude vision
def analyze_image_claude(client: anthropic.Anthropic, image_path: str, question: str) -> str:
    image_data, media_type = encode_image(image_path)
    
    response = client.messages.create(
        model="claude-sonnet-4-5",
        max_tokens=1024,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image",
                        "source": {
                            "type": "base64",
                            "media_type": media_type,
                            "data": image_data,
                        }
                    },
                    {"type": "text", "text": question}
                ]
            }
        ]
    )
    return response.content[0].text

Checkpoint Assessment

  1. Why must you send the entire conversation history with each API call?
  2. What is the difference between temperature=0 and temperature=1?
  3. When should you use streaming vs. standard completions?
  4. A production app is hitting rate limits under load. Name two architectural solutions.
  5. You need to call a weather API from within a Claude conversation. What API feature enables this, and sketch the call sequence in pseudocode.
  6. Estimate the cost of processing 10,000 documents, each with 500 tokens of content and generating 200 tokens of output, using Claude Sonnet.

Project: Multi-Provider Summarizer

Build a Python script that:

  1. Accepts a URL or file path as a command-line argument
  2. Extracts the text content (use requests + beautifulsoup4 for URLs)
  3. Sends it to Claude for a structured summary: title, key points (3–5), sentiment, reading time estimate
  4. Formats the output as clean markdown
  5. Logs token usage and cost to a CSV file for tracking

Bonus: Add a --compare flag that runs all three providers and outputs a side-by-side comparison.


Key Vocabulary

TermDefinition
Stateless APIEach call is independent; no memory of previous calls
Function calling / Tool useLLM generating structured function call requests
StreamingReceiving response tokens in real-time as they're generated
Rate limitingAPI throttling to prevent abuse; requires backoff
Exponential backoffRetry strategy doubling wait time between attempts
JSON modeAPI feature forcing JSON output format
Vision / multimodalPassing images alongside text in API calls

What's Next

Stage 4 covers Retrieval-Augmented Generation (RAG) — how to give LLMs access to your own documents, databases, and real-time information using vector embeddings and semantic search.

Lock In Founding Member Access

Get full access to every course on TechNodeX — AI, cybersecurity, Python, and everything we build next. $9/month, price locked forever.

Become a Founding Member →