LLMfy - Technical Features Overview

Thumbnail Credit
Introduction
LLMfy is a Python framework designed to streamline the development of applications powered by large language models. It provides unified abstractions across multiple LLM providers, workflow orchestration, vector storage, tool calling, and utility functions — all in a single modular package.
Installation:
pip install llmfy
Install with optional dependencies:
pip install llmfy[openai] # OpenAI support
pip install llmfy[bedrock] # AWS Bedrock support
pip install llmfy[faiss] # Vector store support
pip install llmfy[redis] # Redis checkpointer
pip install llmfy[sql] # SQL checkpointer
Requirements: Python >= 3.11
Multi-Provider LLM Support
LLMfy provides a unified interface for working with different LLM providers. Currently supported providers are OpenAI and AWS Bedrock.
OpenAI
from llmfy import OpenAIModel, OpenAIConfig, LLMfy, Message, Role
config = OpenAIConfig(temperature=0.7)
llm = OpenAIModel(model="gpt-4o-mini", config=config)
ai = LLMfy(llm, system_message="You are a helpful assistant.")
messages = [Message(role=Role.USER, content="What is Python?")]
response = ai.invoke(messages)
print(response.result.content)
AWS Bedrock
from llmfy import BedrockModel, BedrockConfig, LLMfy, Message, Role
config = BedrockConfig(temperature=0.7)
llm = BedrockModel(model="amazon.nova-pro-v1:0", config=config)
ai = LLMfy(llm, system_message="You are a helpful assistant.")
messages = [Message(role=Role.USER, content="What is Python?")]
response = ai.invoke(messages)
print(response.result.content)
Switching providers requires only changing the model and config — the rest of the code stays the same.
System Message Templating
LLMfy supports dynamic system prompts using {{variable}} template placeholders. Variables are injected at invocation time via input_variables.
ai = LLMfy(
llm,
system_message="You are a {{role}} expert. Answer about {{topic}}.",
input_variables=["role", "topic"],
)
messages = [Message(role=Role.USER, content="Explain the basics.")]
response = ai.invoke(messages, role="Python", topic="decorators")
Multi-Turn Conversation
Use chat() for multi-turn conversations with automatic message history management.
ai = LLMfy(llm, system_message="You are a helpful assistant.")
# First turn
messages = [Message(role=Role.USER, content="What is machine learning?")]
response = ai.chat(messages)
print(response.result.content)
# Second turn (history is preserved)
follow_up = [Message(role=Role.USER, content="Give me an example.")]
response = ai.chat(follow_up)
print(response.result.content)
Streaming
All invocation methods have streaming variants for real-time token delivery.
# Single turn streaming
for chunk in ai.invoke_stream(messages):
print(chunk, end="", flush=True)
# Multi-turn streaming
for chunk in ai.chat_stream(messages):
print(chunk, end="", flush=True)
Tool Calling (Function Calling)
LLMfy provides a @Tool() decorator that automatically registers Python functions as callable tools for the LLM.
Define Tools
from llmfy import Tool
@Tool()
def get_weather(city: str):
"""Get the current weather for a given city."""
# Your logic here
return f"The weather in {city} is sunny, 25°C."
@Tool()
def search_database(query: str, limit: int = 5):
"""Search the database for relevant records."""
# Your logic here
return [{"id": 1, "name": "Result"}]
Register and Use Tools
ai = LLMfy(llm, system_message="You are a helpful assistant with tool access.")
# Register tools
ai.register_tool([get_weather, search_database])
messages = [Message(role=Role.USER, content="What's the weather in Tokyo?")]
response = ai.invoke_with_tools(messages)
print(response.result.content)
The @Tool() decorator automatically extracts the function schema (name, description, parameters, types) and formats it for the selected provider (OpenAI or Bedrock).
ToolRegistry
For advanced tool management, use ToolRegistry to register tools separately and access them in workflows.
from llmfy import ToolRegistry
registry = ToolRegistry([get_weather, search_database], llm)
# Get tool definitions for API calls
tool_definitions = registry.get_tools()
# Execute a tool by name
result = registry.execute("get_weather", {"city": "Tokyo"})
Multi-Modal Content
LLMfy supports multi-modal inputs including text, images, documents, and videos.
from llmfy import Message, Role, Content, ContentType
# Text + Image message
message = Message(
role=Role.USER,
content=[
Content(type=ContentType.TEXT, data="What's in this image?"),
Content(type=ContentType.IMAGE, data="base64_encoded_image_data"),
],
)
response = ai.invoke([message])
FlowEngine - Workflow Orchestration
FlowEngine is a state machine for building complex, multi-step LLM workflows with nodes, edges, and conditional routing.
Basic Workflow
from llmfy import FlowEngine, START, END, WorkflowState
# Define workflow with initial state schema
workflow = FlowEngine({"messages": [], "result": ""})
# Define node functions
async def process_input(state: WorkflowState) -> dict:
messages = state.get("messages", [])
response = ai.invoke(messages)
return {"result": response.result.content}
async def validate_output(state: WorkflowState) -> dict:
result = state.get("result", "")
# validation logic
return {"result": result}
# Add nodes
workflow.add_node("process", process_input)
workflow.add_node("validate", validate_output)
# Define edges
workflow.add_edge(START, "process")
workflow.add_edge("process", "validate")
workflow.add_edge("validate", END)
# Execute
result = await workflow.invoke({"messages": [Message(role=Role.USER, content="Hello")]})
Conditional Routing
def route_decision(state: WorkflowState) -> str:
result = state.get("result", "")
if "error" in result:
return "retry"
return END
workflow.add_conditional_edge("validate", ["retry", END], route_decision)
workflow.add_edge("retry", "process")
Workflow Visualization
Generate a visual diagram of your workflow graph.
from IPython.display import Image, display
graph_url = workflow.get_diagram_url()
display(Image(url=graph_url))
Streaming in FlowEngine
Stream results from workflow nodes in real-time.
async for stream_response in workflow.invoke_stream(initial_state):
if stream_response.type == FlowEngineStreamType.NODE:
print(f"Node: {stream_response.node_name}")
# Handle streaming chunks
Checkpointing - State Persistence
FlowEngine supports checkpointing for persisting workflow state across sessions.
In-Memory Checkpointer
from llmfy import InMemoryCheckpointer
checkpointer = InMemoryCheckpointer()
workflow = FlowEngine({"messages": []}, checkpointer=checkpointer)
# State is automatically saved after each node execution
result = await workflow.invoke(initial_state, session_id="user-123")
Redis Checkpointer
from llmfy import RedisCheckpointer
checkpointer = RedisCheckpointer(host="localhost", port=6379)
workflow = FlowEngine({"messages": []}, checkpointer=checkpointer)
SQL Checkpointer
Supports PostgreSQL, MySQL, and SQLite with both async and sync drivers.
from llmfy import SQLCheckpointer
# PostgreSQL (async)
checkpointer = SQLCheckpointer(connection_string="postgresql+asyncpg://user:pass@host/db")
# SQLite (sync)
checkpointer = SQLCheckpointer(connection_string="sqlite:///checkpoints.db")
workflow = FlowEngine({"messages": []}, checkpointer=checkpointer)
Vector Store (FAISS)
LLMfy includes a built-in FAISS-based vector store for semantic search and retrieval.
Setup
from llmfy import FAISSVectorStore, OpenAIEmbedding, Document
# Initialize embedding model
embedding = OpenAIEmbedding(model="text-embedding-3-small")
# Create vector store
vector_store = FAISSVectorStore(embedding_client=embedding, dimension=1536)
Add Documents
documents = [
Document(content="Python is a programming language.", metadata={"source": "wiki"}),
Document(content="Machine learning uses statistical models.", metadata={"source": "textbook"}),
]
vector_store.add_documents(documents)
Search
results = vector_store.search(query="What is Python?", top_k=3)
for doc, score in results:
print(f"Score: {score:.4f} | {doc.content}")
Persistence
# Save index
vector_store.save_index("./my_index")
# Load index
vector_store.load_index("./my_index")
Index Types
FAISSVectorStore supports multiple index types that are automatically selected based on dataset size:
- Flat - Exact search, best for small datasets
- HNSW - Approximate search with high recall
- IVFFlat - Inverted file index for large datasets
- IVFPQ - Product quantization for very large datasets
Embeddings
LLMfy provides embedding model abstractions for both OpenAI and AWS Bedrock.
from llmfy import OpenAIEmbedding, BedrockEmbedding
# OpenAI
embedding = OpenAIEmbedding(model="text-embedding-3-small")
vectors = embedding.embed(["Hello world", "Another text"])
# Bedrock
embedding = BedrockEmbedding(model="amazon.titan-embed-text-v2:0")
vectors = embedding.embed(["Hello world", "Another text"])
Text Utilities
Text Chunking
Split large texts into overlapping chunks for embedding and retrieval.
from llmfy import chunk_text
chunks = chunk_text(
text="Your long document text here...",
chunk_size=500,
chunk_overlap=50,
)
# Returns list of chunk strings
Markdown Chunking
Split markdown documents by header levels while preserving structure.
from llmfy import chunk_markdown_by_header
chunks = chunk_markdown_by_header(
markdown_text="# Title\n## Section 1\nContent...\n## Section 2\nMore content...",
max_level=2,
)
# Returns list of MdChunkResult with header hierarchy and content
Text Preprocessing
Clean text for optimal embedding quality.
from llmfy import clean_text_for_embedding
cleaned = clean_text_for_embedding(" Some messy text with extra spaces ")
Message Trimming
Manage conversation history to fit within token limits.
from llmfy import trim_messages, safe_trim_messages, count_tokens_approximately
# Count approximate tokens
token_count = count_tokens_approximately(messages)
# Trim messages to fit token limit
trimmed = trim_messages(messages, max_tokens=4000)
# Safe trim (preserves system message and latest user message)
trimmed = safe_trim_messages(messages, max_tokens=4000)
Usage Tracking
Track token usage and estimate costs across requests.
from llmfy import llmfy_usage_tracker
# After invocation, access usage data
response = ai.invoke(messages)
usage = llmfy_usage_tracker.get()
print(f"Input tokens: {usage.input_tokens}")
print(f"Output tokens: {usage.output_tokens}")
print(f"Estimated cost: ${usage.total_cost:.6f}")
The tracker supports provider-specific pricing for both OpenAI and Bedrock models.
Exception Handling
LLMfy provides a structured exception hierarchy for granular error handling.
from llmfy import (
LLMfyException,
RateLimitException,
AuthenticationException,
TimeoutException,
ContentFilterException,
)
try:
response = ai.invoke(messages)
except RateLimitException:
# Handle rate limiting (retry with backoff)
pass
except AuthenticationException:
# Handle auth errors
pass
except ContentFilterException:
# Handle safety filter triggers
pass
except TimeoutException:
# Handle request timeouts
pass
except LLMfyException as e:
# Catch-all for any LLMfy error
print(f"Error: {e}")
Available exceptions:
| Exception | Description |
|---|---|
| LLMfyException | Base exception for all LLMfy errors |
| RateLimitException | API rate limit exceeded |
| QuotaExceededException | Usage quota exceeded |
| TimeoutException | Request timeout |
| InvalidRequestException | Invalid request parameters |
| AuthenticationException | Authentication failure |
| PermissionDeniedException | Insufficient permissions |
| ModelNotFoundException | Model not available |
| ServiceUnavailableException | Service temporarily down |
| ContentFilterException | Safety filter triggered |
| ModelErrorException | Model processing error |
Helper Nodes for FlowEngine
LLMfy provides pre-built helper nodes for common workflow patterns.
tools_node
Automatically executes tool calls from LLM responses within a workflow.
from llmfy import tools_node, ToolRegistry
async def handle_tools(state: WorkflowState) -> dict:
messages = tools_node(
messages=state.get("messages", []),
registry=tool_registry,
)
return {"messages": messages}
tools_stream_node
Streaming variant of tools_node for real-time tool execution feedback.
from llmfy import tools_stream_node
async def handle_tools_stream(state: WorkflowState):
async for chunk in tools_stream_node(
messages=state.get("messages", []),
registry=tool_registry,
):
yield chunk
Summary
LLMfy brings together the core building blocks for LLM-powered applications into a single, cohesive framework:
| Feature | Description |
|---|---|
| Multi-Provider | Unified API for OpenAI and AWS Bedrock |
| Tool Calling | @Tool() decorator with automatic schema extraction |
| FlowEngine | State machine workflow orchestration |
| Checkpointing | In-memory, Redis, and SQL state persistence |
| Vector Store | FAISS-based semantic search |
| Embeddings | OpenAI and Bedrock embedding models |
| Streaming | Real-time token delivery across all methods |
| Multi-Modal | Text, image, document, and video inputs |
| Text Utilities | Chunking, preprocessing, and message trimming |
| Usage Tracking | Token counting and cost estimation |
| Exception Handling | Structured error hierarchy |
For more details, visit the documentation or the GitHub repository.