Advanced Features: Natural Language Topic Search
The Natural Language Categorization & Search (Experimental) feature in Infinite Image Browsing (IIB) provides a powerful way to organize and retrieve images based on their semantic similarity using advanced AI models. It groups images by the meaning of their prompts and allows users to search using natural language queries, similar to a Retrieval-Augmented Generation (RAG) system.
This feature is currently experimental, and its results can vary depending on the quality of the embedding and chat models used, as well as the completeness of the image's prompt metadata.
Purpose
The primary purpose of this feature is to:
- Automatically group images into thematic categories based on the semantic similarity of their generation prompts.
- Enable natural language search capabilities, allowing users to find images by describing what they are looking for in a sentence, rather than relying on exact keyword matches.
- Reduce the manual effort required for organizing large collections of AI-generated images.
How End-Users Can Utilize It
- Open the Feature: Navigate to "Natural Language Categorization & Search (Experimental)" from the startup page of IIB.
- Select Scope: Click on "Scope" and choose one or more folders from your QuickMovePaths. These selected folders define the dataset for analysis.
- Categorize: Click "Refresh" to initiate the process. IIB will generate topic cards representing the semantic clusters within your selected scope. This step involves vectorizing image prompts and clustering them.
- Search: Once categorization is complete, type a natural-language query into the search bar. Click "Search" to retrieve and display images semantically similar to your query.
How It Works (Simple Explanation)
The process involves several stages, leveraging AI models for semantic understanding and organization:
-
Prompt Extraction & Normalization:
- The system reads the
image.exif data (or .txt files for videos) to extract the generation prompt. It specifically focuses on content appearing before "Negative prompt:".
- Optionally, "boilerplate" terms (e.g., quality parameters, photography settings) are removed from the prompt to emphasize the core subject and theme. This helps the system focus on semantic topics rather than generic descriptors. This normalization is controlled by
IIB_PROMPT_NORMALIZE and IIB_PROMPT_NORMALIZE_MODE environment variables.
-
Embeddings:
- The cleaned prompt text is sent to an OpenAI-compatible
/embeddings API endpoint.
- This API generates a numerical vector (embedding) that represents the semantic meaning of the prompt.
- These vectors are stored in the SQLite database within the
image_embedding table, linked to the respective image IDs.
-
Clustering:
- An online centroid-sum clustering algorithm is applied to the image embeddings. This algorithm groups similar vectors (and thus similar image prompts) into clusters.
- A post-merge step helps consolidate highly similar clusters that might have been initially separated.
- Optionally, members of very small ("noise") clusters can be reassigned to larger, more relevant clusters if their similarity exceeds a specified threshold.
-
Title Generation (LLM):
- For each identified cluster, a representative set of prompt snippets is sent to an OpenAI-compatible
/chat/completions API endpoint.
- A Large Language Model (LLM) is instructed to generate a concise, human-readable title and a few descriptive keywords for the cluster. The LLM is prompted to provide structured JSON output.
- These generated titles and keywords are stored in the
topic_title_cache table for quick retrieval.
-
Retrieval:
- When a user submits a natural language search query, the query itself is first converted into an embedding using the same embedding model.
- This query embedding is then compared against all image embeddings within the selected scope using cosine similarity.
- Images are ranked by their similarity scores, and the Top K most relevant images are returned as search results.
Caching & Incremental Updates
To optimize performance and reduce repeated API calls (and associated costs), IIB employs a robust caching mechanism:
1) Embedding Cache (image_embedding)
- Location: Stored in the
image_embedding table within the SQLite database. Each entry is keyed by image_id.
- Skip Rule (Incremental Update): An image's embedding is skipped (not recomputed) if:
- The
model used for embedding is the same.
- The
text_hash of the (normalized) prompt text is the same.
- An existing vector (
vec) is already present.
- Re-vectorization Cache Key: The
text_hash is computed as sha256(f"{normalize_version}:{prompt_text}"). The normalize_version is an internal fingerprint derived from the prompt normalization rules, ensuring that if these rules change, relevant embeddings are regenerated.
- Force Rebuild: Users can force a rebuild of embeddings by passing
force=true to build_iib_output_embeddings or force_embed=true to cluster_iib_output_job_start.
2) Title Cache (topic_title_cache)
- Location: Stored in the
topic_title_cache table, keyed by cluster_hash.
- Hit Rule: When
use_title_cache=true (default) and force_title=false, previously generated titles and keywords for a cluster are reused.
- Cache Key (
cluster_hash): This hash is generated based on:
- The
member image IDs (sorted) belonging to the cluster.
- The
embedding model, threshold, and min_cluster_size used for clustering.
- The
title_model and output language for the title.
- The
normalization fingerprint (normalize_version) and mode.
- Force Title Regeneration: Set
force_title=true to bypass the title cache and regenerate titles using the LLM.
Configuration (Environment Variables)
The following environment variables are crucial for configuring the Natural Language Categorization & Search feature. They typically need to be set in your .env file in the extension's root directory.
-
OPENAI_BASE_URL:
- Description: The base URL for your OpenAI-compatible API endpoint (e.g.,
https://your-host/v1).
- Default:
https://api.openai.com/v1
- Impact: Required for any AI functionality, including embeddings and chat completions.
-
OPENAI_API_KEY:
- Description: Your API key for accessing OpenAI-compatible services.
- Default: Not configured
- Impact: Essential for authenticating requests to the AI API.
-
AI_MODEL:
- Description: The default chat model to use for general AI tasks. It also serves as a fallback for topic title generation if
TOPIC_TITLE_MODEL is not explicitly set.
- Default:
gpt-4o-mini
- Impact: Influences the quality and cost of LLM-based operations.
-
EMBEDDING_MODEL:
- Description: The embedding model used for converting image prompts into numerical vectors. This is critical for semantic search and clustering.
- Default:
text-embedding-3-small
- Impact: Determines the quality of semantic similarity and clustering results.
-
TOPIC_TITLE_MODEL:
- Description: The specific chat model designated for generating human-readable titles and keywords for image clusters. If not set, it defaults to the value of
AI_MODEL.
- Default: Value of
AI_MODEL
- Impact: Directly affects the quality of auto-generated topic titles and keywords.
-
IIB_PROMPT_NORMALIZE:
- Description: Enables or disables the process of normalizing image prompts before they are converted into embeddings. Normalization removes common boilerplate (e.g., quality descriptors, camera settings) to help the embedding focus on the core semantic theme of the prompt.
- Default:
true
- Values:
1, true, yes, on (enable) / 0, false, no, off (disable).
-
IIB_PROMPT_NORMALIZE_MODE:
- Description: Defines the strictness level for prompt normalization when
IIB_PROMPT_NORMALIZE is enabled.
balanced (recommended): Removes generic boilerplate terms but attempts to retain some discriminative style words (e.g., "scientific illustration," "documentary," "film grain").
theme_only: Applies a more aggressive removal of descriptors, aiming to leave primarily subject/theme nouns.
- Default:
balanced
- Impact: Affects how prompts are distilled for semantic analysis, influencing clustering results.
-
IIB_TAG_GRAPH_MAX_TAGS_FOR_LLM:
- Description: Sets an upper limit on the number of individual tags that are sent to the LLM when generating the higher-level abstract layers of the tag graph. This helps control LLM prompt size and processing latency for very large tag sets.
- Default:
500
-
IIB_TAG_GRAPH_TOPK_TAGS_FOR_LLM:
- Description: Specifies the number of top (most frequent) tags that will be considered as input for the LLM during the hierarchical tag graph generation. If the total number of unique tags exceeds this, only the top
K will be used.
- Default:
500
-
IIB_TAG_GRAPH_LLM_TIMEOUT_SEC:
- Description: Configures the timeout in seconds for requests made to the LLM during the tag graph generation process.
- Default:
180
-
IIB_TAG_GRAPH_LLM_MAX_ATTEMPTS:
- Description: Determines the maximum number of times the system will retry an LLM request if it fails during tag graph generation (e.g., due to network issues or API errors).
- Default:
5
Note: There is no mock fallback for AI calls. If the configured provider or model fails to respond, or returns invalid output, the API will return an error directly.
For more general configuration options, refer to the [Configuration] page.