LLMfy - Technical Features Overview

Generative AI

2026-03-18

9 min read

irufano

Contents
LLMfy - Technical Features Overview image

Thumbnail Credit

Introduction

LLMfy is a Python framework designed to streamline the development of applications powered by large language models. It provides unified abstractions across multiple LLM providers, workflow orchestration, vector storage, tool calling, and utility functions — all in a single modular package.

Installation:

bash
pip install llmfy

Install with optional dependencies:

bash
pip install llmfy[openai]       # OpenAI support
pip install llmfy[bedrock]      # AWS Bedrock support
pip install llmfy[faiss]        # Vector store support
pip install llmfy[redis]        # Redis checkpointer
pip install llmfy[sql]          # SQL checkpointer

Requirements: Python >= 3.11

Multi-Provider LLM Support

LLMfy provides a unified interface for working with different LLM providers. Currently supported providers are OpenAI and AWS Bedrock.

OpenAI

python
from llmfy import OpenAIModel, OpenAIConfig, LLMfy, Message, Role

config = OpenAIConfig(temperature=0.7)
llm = OpenAIModel(model="gpt-4o-mini", config=config)

ai = LLMfy(llm, system_message="You are a helpful assistant.")

messages = [Message(role=Role.USER, content="What is Python?")]
response = ai.invoke(messages)
print(response.result.content)

AWS Bedrock

python
from llmfy import BedrockModel, BedrockConfig, LLMfy, Message, Role

config = BedrockConfig(temperature=0.7)
llm = BedrockModel(model="amazon.nova-pro-v1:0", config=config)

ai = LLMfy(llm, system_message="You are a helpful assistant.")

messages = [Message(role=Role.USER, content="What is Python?")]
response = ai.invoke(messages)
print(response.result.content)

Switching providers requires only changing the model and config — the rest of the code stays the same.

System Message Templating

LLMfy supports dynamic system prompts using {{variable}} template placeholders. Variables are injected at invocation time via input_variables.

python
ai = LLMfy(
    llm,
    system_message="You are a {{role}} expert. Answer about {{topic}}.",
    input_variables=["role", "topic"],
)

messages = [Message(role=Role.USER, content="Explain the basics.")]
response = ai.invoke(messages, role="Python", topic="decorators")

Multi-Turn Conversation

Use chat() for multi-turn conversations with automatic message history management.

python
ai = LLMfy(llm, system_message="You are a helpful assistant.")

# First turn
messages = [Message(role=Role.USER, content="What is machine learning?")]
response = ai.chat(messages)
print(response.result.content)

# Second turn (history is preserved)
follow_up = [Message(role=Role.USER, content="Give me an example.")]
response = ai.chat(follow_up)
print(response.result.content)

Streaming

All invocation methods have streaming variants for real-time token delivery.

python
# Single turn streaming
for chunk in ai.invoke_stream(messages):
    print(chunk, end="", flush=True)

# Multi-turn streaming
for chunk in ai.chat_stream(messages):
    print(chunk, end="", flush=True)

Tool Calling (Function Calling)

LLMfy provides a @Tool() decorator that automatically registers Python functions as callable tools for the LLM.

Define Tools

python
from llmfy import Tool

@Tool()
def get_weather(city: str):
    """Get the current weather for a given city."""
    # Your logic here
    return f"The weather in {city} is sunny, 25°C."

@Tool()
def search_database(query: str, limit: int = 5):
    """Search the database for relevant records."""
    # Your logic here
    return [{"id": 1, "name": "Result"}]

Register and Use Tools

python
ai = LLMfy(llm, system_message="You are a helpful assistant with tool access.")

# Register tools
ai.register_tool([get_weather, search_database])

messages = [Message(role=Role.USER, content="What's the weather in Tokyo?")]
response = ai.invoke_with_tools(messages)
print(response.result.content)

The @Tool() decorator automatically extracts the function schema (name, description, parameters, types) and formats it for the selected provider (OpenAI or Bedrock).

ToolRegistry

For advanced tool management, use ToolRegistry to register tools separately and access them in workflows.

python
from llmfy import ToolRegistry

registry = ToolRegistry([get_weather, search_database], llm)

# Get tool definitions for API calls
tool_definitions = registry.get_tools()

# Execute a tool by name
result = registry.execute("get_weather", {"city": "Tokyo"})

Multi-Modal Content

LLMfy supports multi-modal inputs including text, images, documents, and videos.

python
from llmfy import Message, Role, Content, ContentType

# Text + Image message
message = Message(
    role=Role.USER,
    content=[
        Content(type=ContentType.TEXT, data="What's in this image?"),
        Content(type=ContentType.IMAGE, data="base64_encoded_image_data"),
    ],
)

response = ai.invoke([message])

FlowEngine - Workflow Orchestration

FlowEngine is a state machine for building complex, multi-step LLM workflows with nodes, edges, and conditional routing.

Basic Workflow

python
from llmfy import FlowEngine, START, END, WorkflowState

# Define workflow with initial state schema
workflow = FlowEngine({"messages": [], "result": ""})

# Define node functions
async def process_input(state: WorkflowState) -> dict:
    messages = state.get("messages", [])
    response = ai.invoke(messages)
    return {"result": response.result.content}

async def validate_output(state: WorkflowState) -> dict:
    result = state.get("result", "")
    # validation logic
    return {"result": result}

# Add nodes
workflow.add_node("process", process_input)
workflow.add_node("validate", validate_output)

# Define edges
workflow.add_edge(START, "process")
workflow.add_edge("process", "validate")
workflow.add_edge("validate", END)

# Execute
result = await workflow.invoke({"messages": [Message(role=Role.USER, content="Hello")]})

Conditional Routing

python
def route_decision(state: WorkflowState) -> str:
    result = state.get("result", "")
    if "error" in result:
        return "retry"
    return END

workflow.add_conditional_edge("validate", ["retry", END], route_decision)
workflow.add_edge("retry", "process")

Workflow Visualization

Generate a visual diagram of your workflow graph.

python
from IPython.display import Image, display

graph_url = workflow.get_diagram_url()
display(Image(url=graph_url))

Streaming in FlowEngine

Stream results from workflow nodes in real-time.

python
async for stream_response in workflow.invoke_stream(initial_state):
    if stream_response.type == FlowEngineStreamType.NODE:
        print(f"Node: {stream_response.node_name}")
    # Handle streaming chunks

Checkpointing - State Persistence

FlowEngine supports checkpointing for persisting workflow state across sessions.

In-Memory Checkpointer

python
from llmfy import InMemoryCheckpointer

checkpointer = InMemoryCheckpointer()
workflow = FlowEngine({"messages": []}, checkpointer=checkpointer)

# State is automatically saved after each node execution
result = await workflow.invoke(initial_state, session_id="user-123")

Redis Checkpointer

python
from llmfy import RedisCheckpointer

checkpointer = RedisCheckpointer(host="localhost", port=6379)
workflow = FlowEngine({"messages": []}, checkpointer=checkpointer)

SQL Checkpointer

Supports PostgreSQL, MySQL, and SQLite with both async and sync drivers.

python
from llmfy import SQLCheckpointer

# PostgreSQL (async)
checkpointer = SQLCheckpointer(connection_string="postgresql+asyncpg://user:pass@host/db")

# SQLite (sync)
checkpointer = SQLCheckpointer(connection_string="sqlite:///checkpoints.db")

workflow = FlowEngine({"messages": []}, checkpointer=checkpointer)

Vector Store (FAISS)

LLMfy includes a built-in FAISS-based vector store for semantic search and retrieval.

Setup

python
from llmfy import FAISSVectorStore, OpenAIEmbedding, Document

# Initialize embedding model
embedding = OpenAIEmbedding(model="text-embedding-3-small")

# Create vector store
vector_store = FAISSVectorStore(embedding_client=embedding, dimension=1536)

Add Documents

python
documents = [
    Document(content="Python is a programming language.", metadata={"source": "wiki"}),
    Document(content="Machine learning uses statistical models.", metadata={"source": "textbook"}),
]

vector_store.add_documents(documents)
python
results = vector_store.search(query="What is Python?", top_k=3)
for doc, score in results:
    print(f"Score: {score:.4f} | {doc.content}")

Persistence

python
# Save index
vector_store.save_index("./my_index")

# Load index
vector_store.load_index("./my_index")

Index Types

FAISSVectorStore supports multiple index types that are automatically selected based on dataset size:

  • Flat - Exact search, best for small datasets
  • HNSW - Approximate search with high recall
  • IVFFlat - Inverted file index for large datasets
  • IVFPQ - Product quantization for very large datasets

Embeddings

LLMfy provides embedding model abstractions for both OpenAI and AWS Bedrock.

python
from llmfy import OpenAIEmbedding, BedrockEmbedding

# OpenAI
embedding = OpenAIEmbedding(model="text-embedding-3-small")
vectors = embedding.embed(["Hello world", "Another text"])

# Bedrock
embedding = BedrockEmbedding(model="amazon.titan-embed-text-v2:0")
vectors = embedding.embed(["Hello world", "Another text"])

Text Utilities

Text Chunking

Split large texts into overlapping chunks for embedding and retrieval.

python
from llmfy import chunk_text

chunks = chunk_text(
    text="Your long document text here...",
    chunk_size=500,
    chunk_overlap=50,
)
# Returns list of chunk strings

Markdown Chunking

Split markdown documents by header levels while preserving structure.

python
from llmfy import chunk_markdown_by_header

chunks = chunk_markdown_by_header(
    markdown_text="# Title\n## Section 1\nContent...\n## Section 2\nMore content...",
    max_level=2,
)
# Returns list of MdChunkResult with header hierarchy and content

Text Preprocessing

Clean text for optimal embedding quality.

python
from llmfy import clean_text_for_embedding

cleaned = clean_text_for_embedding("  Some   messy    text  with extra   spaces  ")

Message Trimming

Manage conversation history to fit within token limits.

python
from llmfy import trim_messages, safe_trim_messages, count_tokens_approximately

# Count approximate tokens
token_count = count_tokens_approximately(messages)

# Trim messages to fit token limit
trimmed = trim_messages(messages, max_tokens=4000)

# Safe trim (preserves system message and latest user message)
trimmed = safe_trim_messages(messages, max_tokens=4000)

Usage Tracking

Track token usage and estimate costs across requests.

python
from llmfy import llmfy_usage_tracker

# After invocation, access usage data
response = ai.invoke(messages)

usage = llmfy_usage_tracker.get()
print(f"Input tokens: {usage.input_tokens}")
print(f"Output tokens: {usage.output_tokens}")
print(f"Estimated cost: ${usage.total_cost:.6f}")

The tracker supports provider-specific pricing for both OpenAI and Bedrock models.

Exception Handling

LLMfy provides a structured exception hierarchy for granular error handling.

python
from llmfy import (
    LLMfyException,
    RateLimitException,
    AuthenticationException,
    TimeoutException,
    ContentFilterException,
)

try:
    response = ai.invoke(messages)
except RateLimitException:
    # Handle rate limiting (retry with backoff)
    pass
except AuthenticationException:
    # Handle auth errors
    pass
except ContentFilterException:
    # Handle safety filter triggers
    pass
except TimeoutException:
    # Handle request timeouts
    pass
except LLMfyException as e:
    # Catch-all for any LLMfy error
    print(f"Error: {e}")

Available exceptions:

ExceptionDescription
LLMfyExceptionBase exception for all LLMfy errors
RateLimitExceptionAPI rate limit exceeded
QuotaExceededExceptionUsage quota exceeded
TimeoutExceptionRequest timeout
InvalidRequestExceptionInvalid request parameters
AuthenticationExceptionAuthentication failure
PermissionDeniedExceptionInsufficient permissions
ModelNotFoundExceptionModel not available
ServiceUnavailableExceptionService temporarily down
ContentFilterExceptionSafety filter triggered
ModelErrorExceptionModel processing error

Helper Nodes for FlowEngine

LLMfy provides pre-built helper nodes for common workflow patterns.

tools_node

Automatically executes tool calls from LLM responses within a workflow.

python
from llmfy import tools_node, ToolRegistry

async def handle_tools(state: WorkflowState) -> dict:
    messages = tools_node(
        messages=state.get("messages", []),
        registry=tool_registry,
    )
    return {"messages": messages}

tools_stream_node

Streaming variant of tools_node for real-time tool execution feedback.

python
from llmfy import tools_stream_node

async def handle_tools_stream(state: WorkflowState):
    async for chunk in tools_stream_node(
        messages=state.get("messages", []),
        registry=tool_registry,
    ):
        yield chunk

Summary

LLMfy brings together the core building blocks for LLM-powered applications into a single, cohesive framework:

FeatureDescription
Multi-ProviderUnified API for OpenAI and AWS Bedrock
Tool Calling@Tool() decorator with automatic schema extraction
FlowEngineState machine workflow orchestration
CheckpointingIn-memory, Redis, and SQL state persistence
Vector StoreFAISS-based semantic search
EmbeddingsOpenAI and Bedrock embedding models
StreamingReal-time token delivery across all methods
Multi-ModalText, image, document, and video inputs
Text UtilitiesChunking, preprocessing, and message trimming
Usage TrackingToken counting and cost estimation
Exception HandlingStructured error hierarchy

For more details, visit the documentation or the GitHub repository.

Sorry, comment is under maintenance