LLMfy - Technical Features Overview

Generative AI

2026-03-18

9 min read

irufano

Ctrl+F

Contents

Thumbnail Credit

Introduction

LLMfy is a Python framework designed to streamline the development of applications powered by large language models. It provides unified abstractions across multiple LLM providers, workflow orchestration, vector storage, tool calling, and utility functions — all in a single modular package.

Installation:

pip install llmfy

Install with optional dependencies:

              
          pip install llmfy[openai]       # OpenAI support
pip install llmfy[bedrock]      # AWS Bedrock support
pip install llmfy[faiss]        # Vector store support
pip install llmfy[redis]        # Redis checkpointer
pip install llmfy[sql]          # SQL checkpointer

Requirements: Python >= 3.11

Multi-Provider LLM Support

LLMfy provides a unified interface for working with different LLM providers. Currently supported providers are OpenAI and AWS Bedrock.

OpenAI

              
          from llmfy import OpenAIModel, OpenAIConfig, LLMfy, Message, Role

config = OpenAIConfig(temperature=0.7)
llm = OpenAIModel(model="gpt-4o-mini", config=config)

ai = LLMfy(llm, system_message="You are a helpful assistant.")

messages = [Message(role=Role.USER, content="What is Python?")]
response = ai.invoke(messages)
print(response.result.content)

AWS Bedrock

              
          from llmfy import BedrockModel, BedrockConfig, LLMfy, Message, Role

config = BedrockConfig(temperature=0.7)
llm = BedrockModel(model="amazon.nova-pro-v1:0", config=config)

ai = LLMfy(llm, system_message="You are a helpful assistant.")

messages = [Message(role=Role.USER, content="What is Python?")]
response = ai.invoke(messages)
print(response.result.content)

Switching providers requires only changing the model and config — the rest of the code stays the same.

System Message Templating

LLMfy supports dynamic system prompts using {{variable}} template placeholders. Variables are injected at invocation time via input_variables.

              
          ai = LLMfy(
    llm,
    system_message="You are a {{role}} expert. Answer about {{topic}}.",
    input_variables=["role", "topic"],
)

messages = [Message(role=Role.USER, content="Explain the basics.")]
response = ai.invoke(messages, role="Python", topic="decorators")

Multi-Turn Conversation

Use chat() for multi-turn conversations with automatic message history management.

              
          ai = LLMfy(llm, system_message="You are a helpful assistant.")

# First turn
messages = [Message(role=Role.USER, content="What is machine learning?")]
response = ai.chat(messages)
print(response.result.content)

# Second turn (history is preserved)
follow_up = [Message(role=Role.USER, content="Give me an example.")]
response = ai.chat(follow_up)
print(response.result.content)

Streaming

All invocation methods have streaming variants for real-time token delivery.

              
          # Single turn streaming
for chunk in ai.invoke_stream(messages):
    print(chunk, end="", flush=True)

# Multi-turn streaming
for chunk in ai.chat_stream(messages):
    print(chunk, end="", flush=True)

Tool Calling (Function Calling)

LLMfy provides a @Tool() decorator that automatically registers Python functions as callable tools for the LLM.

Define Tools

              
              
            

          from llmfy import Tool

@Tool()
def get_weather(city: str):
    """Get the current weather for a given city."""
    # Your logic here
    return f"The weather in {city} is sunny, 25°C."

@Tool()
def search_database(query: str, limit: int = 5):
    """Search the database for relevant records."""
    # Your logic here
    return [{"id": 1, "name": "Result"}]
          

Register and Use Tools

              
          ai = LLMfy(llm, system_message="You are a helpful assistant with tool access.")

# Register tools
ai.register_tool([get_weather, search_database])

messages = [Message(role=Role.USER, content="What's the weather in Tokyo?")]
response = ai.invoke_with_tools(messages)
print(response.result.content)

The @Tool() decorator automatically extracts the function schema (name, description, parameters, types) and formats it for the selected provider (OpenAI or Bedrock).

ToolRegistry

For advanced tool management, use ToolRegistry to register tools separately and access them in workflows.

              
          from llmfy import ToolRegistry

registry = ToolRegistry([get_weather, search_database], llm)

# Get tool definitions for API calls
tool_definitions = registry.get_tools()

# Execute a tool by name
result = registry.execute("get_weather", {"city": "Tokyo"})

LLMfy supports multi-modal inputs including text, images, documents, and videos.

              
          from llmfy import Message, Role, Content, ContentType

# Text + Image message
message = Message(
    role=Role.USER,
    content=[
        Content(type=ContentType.TEXT, data="What's in this image?"),
        Content(type=ContentType.IMAGE, data="base64_encoded_image_data"),
    ],
)

response = ai.invoke([message])

FlowEngine - Workflow Orchestration

FlowEngine is a state machine for building complex, multi-step LLM workflows with nodes, edges, and conditional routing.

Basic Workflow

              
              
            

          from llmfy import FlowEngine, START, END, WorkflowState

# Define workflow with initial state schema
workflow = FlowEngine({"messages": [], "result": ""})

# Define node functions
async def process_input(state: WorkflowState) -> dict:
    messages = state.get("messages", [])
    response = ai.invoke(messages)
    return {"result": response.result.content}

async def validate_output(state: WorkflowState) -> dict:
    result = state.get("result", "")
    # validation logic
    return {"result": result}

# Add nodes
workflow.add_node("process", process_input)
workflow.add_node("validate", validate_output)

# Define edges
workflow.add_edge(START, "process")
workflow.add_edge("process", "validate")
workflow.add_edge("validate", END)

# Execute
result = await workflow.invoke({"messages": [Message(role=Role.USER, content="Hello")]})
          

Conditional Routing

              
          def route_decision(state: WorkflowState) -> str:
    result = state.get("result", "")
    if "error" in result:
        return "retry"
    return END

workflow.add_conditional_edge("validate", ["retry", END], route_decision)
workflow.add_edge("retry", "process")

Workflow Visualization

Generate a visual diagram of your workflow graph.

from IPython.display import Image, display

graph_url = workflow.get_diagram_url()
display(Image(url=graph_url))

Streaming in FlowEngine

Stream results from workflow nodes in real-time.

              
          async for stream_response in workflow.invoke_stream(initial_state):
    if stream_response.type == FlowEngineStreamType.NODE:
        print(f"Node: {stream_response.node_name}")
    # Handle streaming chunks

Checkpointing - State Persistence

FlowEngine supports checkpointing for persisting workflow state across sessions.

In-Memory Checkpointer

              
          from llmfy import InMemoryCheckpointer

checkpointer = InMemoryCheckpointer()
workflow = FlowEngine({"messages": []}, checkpointer=checkpointer)

# State is automatically saved after each node execution
result = await workflow.invoke(initial_state, session_id="user-123")

Redis Checkpointer

              
          from llmfy import RedisCheckpointer

checkpointer = RedisCheckpointer(host="localhost", port=6379)
workflow = FlowEngine({"messages": []}, checkpointer=checkpointer)

SQL Checkpointer

Supports PostgreSQL, MySQL, and SQLite with both async and sync drivers.

              
          from llmfy import SQLCheckpointer

# PostgreSQL (async)
checkpointer = SQLCheckpointer(connection_string="postgresql+asyncpg://user:pass@host/db")

# SQLite (sync)
checkpointer = SQLCheckpointer(connection_string="sqlite:///checkpoints.db")

workflow = FlowEngine({"messages": []}, checkpointer=checkpointer)

Vector Store (FAISS)

LLMfy includes a built-in FAISS-based vector store for semantic search and retrieval.

Setup

              
          from llmfy import FAISSVectorStore, OpenAIEmbedding, Document

# Initialize embedding model
embedding = OpenAIEmbedding(model="text-embedding-3-small")

# Create vector store
vector_store = FAISSVectorStore(embedding_client=embedding, dimension=1536)

Add Documents

              
          documents = [
    Document(content="Python is a programming language.", metadata={"source": "wiki"}),
    Document(content="Machine learning uses statistical models.", metadata={"source": "textbook"}),
]

vector_store.add_documents(documents)

Search

              
          results = vector_store.search(query="What is Python?", top_k=3)
for doc, score in results:
    print(f"Score: {score:.4f} | {doc.content}")

Persistence

              
          # Save index
vector_store.save_index("./my_index")

# Load index
vector_store.load_index("./my_index")

Index Types

FAISSVectorStore supports multiple index types that are automatically selected based on dataset size:

Flat - Exact search, best for small datasets
HNSW - Approximate search with high recall
IVFFlat - Inverted file index for large datasets
IVFPQ - Product quantization for very large datasets

Embeddings

LLMfy provides embedding model abstractions for both OpenAI and AWS Bedrock.

              
          from llmfy import OpenAIEmbedding, BedrockEmbedding

# OpenAI
embedding = OpenAIEmbedding(model="text-embedding-3-small")
vectors = embedding.embed(["Hello world", "Another text"])

# Bedrock
embedding = BedrockEmbedding(model="amazon.titan-embed-text-v2:0")
vectors = embedding.embed(["Hello world", "Another text"])

Text Utilities

Text Chunking

Split large texts into overlapping chunks for embedding and retrieval.

              
          from llmfy import chunk_text

chunks = chunk_text(
    text="Your long document text here...",
    chunk_size=500,
    chunk_overlap=50,
)
# Returns list of chunk strings

Markdown Chunking

Split markdown documents by header levels while preserving structure.

              
          from llmfy import chunk_markdown_by_header

chunks = chunk_markdown_by_header(
    markdown_text="# Title\n## Section 1\nContent...\n## Section 2\nMore content...",
    max_level=2,
)
# Returns list of MdChunkResult with header hierarchy and content

Text Preprocessing

Clean text for optimal embedding quality.

              
          from llmfy import clean_text_for_embedding

cleaned = clean_text_for_embedding("  Some   messy    text  with extra   spaces  ")

Message Trimming

Manage conversation history to fit within token limits.

              
          from llmfy import trim_messages, safe_trim_messages, count_tokens_approximately

# Count approximate tokens
token_count = count_tokens_approximately(messages)

# Trim messages to fit token limit
trimmed = trim_messages(messages, max_tokens=4000)

# Safe trim (preserves system message and latest user message)
trimmed = safe_trim_messages(messages, max_tokens=4000)

Usage Tracking

Track token usage and estimate costs across requests.

              
          from llmfy import llmfy_usage_tracker

# After invocation, access usage data
response = ai.invoke(messages)

usage = llmfy_usage_tracker.get()
print(f"Input tokens: {usage.input_tokens}")
print(f"Output tokens: {usage.output_tokens}")
print(f"Estimated cost: ${usage.total_cost:.6f}")

The tracker supports provider-specific pricing for both OpenAI and Bedrock models.

Exception Handling

LLMfy provides a structured exception hierarchy for granular error handling.

              
              
            

          from llmfy import (
    LLMfyException,
    RateLimitException,
    AuthenticationException,
    TimeoutException,
    ContentFilterException,
)

try:
    response = ai.invoke(messages)
except RateLimitException:
    # Handle rate limiting (retry with backoff)
    pass
except AuthenticationException:
    # Handle auth errors
    pass
except ContentFilterException:
    # Handle safety filter triggers
    pass
except TimeoutException:
    # Handle request timeouts
    pass
except LLMfyException as e:
    # Catch-all for any LLMfy error
    print(f"Error: {e}")
          

Available exceptions:

Exception	Description
LLMfyException	Base exception for all LLMfy errors
RateLimitException	API rate limit exceeded
QuotaExceededException	Usage quota exceeded
TimeoutException	Request timeout
InvalidRequestException	Invalid request parameters
AuthenticationException	Authentication failure
PermissionDeniedException	Insufficient permissions
ModelNotFoundException	Model not available
ServiceUnavailableException	Service temporarily down
ContentFilterException	Safety filter triggered
ModelErrorException	Model processing error

Helper Nodes for FlowEngine

LLMfy provides pre-built helper nodes for common workflow patterns.

tools_node

Automatically executes tool calls from LLM responses within a workflow.

              
          from llmfy import tools_node, ToolRegistry

async def handle_tools(state: WorkflowState) -> dict:
    messages = tools_node(
        messages=state.get("messages", []),
        registry=tool_registry,
    )
    return {"messages": messages}

tools_stream_node

Streaming variant of tools_node for real-time tool execution feedback.

              
          from llmfy import tools_stream_node

async def handle_tools_stream(state: WorkflowState):
    async for chunk in tools_stream_node(
        messages=state.get("messages", []),
        registry=tool_registry,
    ):
        yield chunk

Summary

LLMfy brings together the core building blocks for LLM-powered applications into a single, cohesive framework:

Feature	Description
Multi-Provider	Unified API for OpenAI and AWS Bedrock
Tool Calling	@Tool() decorator with automatic schema extraction
FlowEngine	State machine workflow orchestration
Checkpointing	In-memory, Redis, and SQL state persistence
Vector Store	FAISS-based semantic search
Embeddings	OpenAI and Bedrock embedding models
Streaming	Real-time token delivery across all methods
Multi-Modal	Text, image, document, and video inputs
Text Utilities	Chunking, preprocessing, and message trimming
Usage Tracking	Token counting and cost estimation
Exception Handling	Structured error hierarchy

For more details, visit the documentation or the GitHub repository.

Tags:

LLMfy - Technical Features Overview

Introduction

Multi-Provider LLM Support

OpenAI

AWS Bedrock

System Message Templating

Multi-Turn Conversation

Streaming

Tool Calling (Function Calling)

Define Tools

Register and Use Tools

ToolRegistry

Multi-Modal Content

FlowEngine - Workflow Orchestration

Basic Workflow

Conditional Routing

Workflow Visualization

Streaming in FlowEngine

Checkpointing - State Persistence

In-Memory Checkpointer

Redis Checkpointer

SQL Checkpointer

Vector Store (FAISS)

Setup

Add Documents

Search

Persistence

Index Types

Embeddings

Text Utilities

Text Chunking

Markdown Chunking

Text Preprocessing

Message Trimming

Usage Tracking

Exception Handling

Helper Nodes for FlowEngine

tools_node

tools_stream_node

Summary

Tags: