> ## Documentation Index > Fetch the complete documentation index at: https://pulze.ai/docs/llms.txt > Use this file to discover all available pages before exploring further. # Changelog > Latest updates and improvements to Pulze # Changelog ## October 2025 Integration logs now display pagination controls when there are multiple pages of results. Users can navigate between pages using Previous/Next buttons and see their current position (e.g., "Page 1 of 5"). A results counter shows the range of logs being displayed (e.g., "Showing 1-20 of 87 results"). Pagination controls only appear when there is more than one page of logs available. Added comprehensive filtering and export capabilities for integration logs. Users can now search logs by prompt/response content with debounced input, filter by custom date ranges (24 hours, 7 days, 30 days, or 90 days), adjust page size (10, 25, 50, or 100 items), and export filtered logs to CSV format. The CSV export includes timestamp, model, prompt, response, token counts, costs, latency, and status code for comprehensive log analysis. Added ability to export comprehensive insights reports to CSV format, including summary statistics, daily/weekly/monthly active users, request timelines, top users and spaces, model distribution, and tool usage data. Enhanced chart visualization with improved tooltips that follow mouse position and added informational tooltips explaining DAU (Daily Active Users), WAU (Weekly Active Users), and MAU (Monthly Active Users) metrics. Charts now include proper padding to prevent tooltip cutoff at edges. Fixed an issue where attached images were not properly sent to Grok-4, Grok-4-fast, Grok-4-fast-reasoning, and Grok-4-fast-reasoning models. The fix enables vision support for these Grok models and ensures image content is correctly serialized by excluding empty fields when sending requests to vision-capable models. Images can now be successfully processed by these models according to xAI's vision capabilities documentation. Resolved issues that prevented processing of large files by automatically applying context chunking as a fallback when no models fit the context window. The engine now disables smart\_learn during chunking operations, extracts and combines large content from all messages (over 1000 characters), and creates semantic chunks to reduce token count. This prevents "No models fit" errors and enables successful processing of files that previously exceeded model context limits. Fixed automatic document chunking failing when content exceeded 100,000 characters by disabling SmartLearn for large documents and preventing recursive chunking loops. The system now properly handles documents that exceed 30% of a model's context window (threshold for models with >2000 token limits) by automatically splitting content into manageable chunks without triggering embedding model token limit errors or infinite chunking recursion. Improved the performance and reliability of the Insights page by optimizing database queries to filter data earlier in the process. The queries now filter requests by date range at the database level and only fetch records with relevant fields (tool\_calls and model data), significantly reducing memory usage and preventing out-of-memory errors. Additionally, increased API server memory limit from 3GB to 6GB to handle larger datasets more reliably. Enhanced the organization Insights page to prevent unnecessary API calls by only triggering data refetch when custom date filters are actually applied, not on every keystroke. Added interactive chart tooltips that appear on hover showing precise data points with formatted dates and values. Improved chart visual design with gradient fills, smoother lines, and better hover states for data points. Introduced a comprehensive Insights dashboard for organization admins to analyze usage patterns with default 30-day views and period comparison support. The insights include user activity metrics (DAU/WAU/MAU), monthly breakdowns, model distribution, tool usage statistics, router usage data, top apps, and agent vs user request tracking. Users can now specify custom date ranges and compare different time periods to track organizational trends. Suite X subscription tier now includes 400 daily API requests, a significant increase from the previous limit of 20 requests per day. This 20x improvement provides Suite X subscribers with substantially more capacity for generation API endpoints, enabling more extensive usage of AI features throughout the day. Fixed streaming responses to properly support function/tool calls across all providers including OpenAI, Anthropic, xAI, and Google. The system now automatically enables tool support when tools are provided in requests, correctly converts tool formats between provider specifications (OpenAI to Anthropic/Google formats), and properly handles tool\_choice parameters for each provider. This ensures external agents can receive and execute tool calls in streaming mode with full OpenAI compatibility. Introduced a new context chunking tool that automatically splits large content into manageable chunks when it exceeds 95% of a model's context window. When triggered, content is divided into chunks of up to 100,000 tokens (with 1,000 token overlap, max 100 chunks), and the LLM is automatically instructed to process each chunk sequentially using the context\_chunking tool, then synthesize the results. This enables analysis of documents that would otherwise exceed context limits without manual intervention. Fixed an issue where baseline evaluation runs in the custom router creation dialog were not displaying the correct Pulze logo. Baseline evaluations now properly show the Pulze favicon icon instead of no icon, making it easier to visually identify baseline runs in the evaluation list. Replaced the separate "Dynamic Router" option with a unified model/router selector in the assistant creation dialog and tool configuration. The selector now combines both models and routers in a single dropdown menu. Additionally, improved UI rendering by adding proper overflow handling and flex layout to prevent text truncation issues in long model/router names. Resolved two critical issues in chat processing: (1) Fixed 422 errors caused by context window limits by adjusting the automatic chunking threshold from 95% to 30% of context window to account for tokenizer differences across providers (Anthropic counts \~63% more tokens) and content expansion after plugins run (up to 3x expansion observed). (2) Fixed incomplete chunk processing where AI would stop after the first chunk by adding explicit instructions to continue processing all remaining chunks, ensuring complete analysis of large documents split across multiple chunks. Enhanced the baseline benchmark display to show accurate creation dates for each evaluated model instead of placeholder dates. The system now reads metadata from benchmark.json files to properly display when each baseline model was evaluated, providing better temporal context when comparing model performance. Updated the baseline benchmark ID to use the latest benchmark dataset. Enhanced the Space overview interface to display default model settings including max\_tokens and temperature parameters. Users can now see the default generation settings configured for their Spaces at a glance. Fixed a critical issue where paused evaluation runs could become stuck due to stale resume requests. When pausing an evaluation run, the system now properly clears the resume\_requested\_at field to prevent the run from automatically resuming unintentionally. Additionally, increased the evaluation worker capacity from 1 process/1 thread to 4 processes/8 threads for better performance, and added automated cleanup tasks to detect and remove duplicate evaluation results while maintaining data consistency. Fixed an issue where pausing multi-model evaluation runs only paused child runs in RUNNING state, leaving PENDING or FAILED runs unaffected. Now all non-completed child runs (RUNNING, PENDING, FAILED, etc.) are properly paused when pausing a parent evaluation, ensuring consistent state when resuming. Completed child runs are correctly preserved and skipped during pause operations. Fixed a layout issue where child evaluation runs could overflow their container by replacing 'flex-shrink-0' with 'min-w-0 max-w-full'. This ensures that child run cards (displaying router or model information) properly respect container boundaries and wrap text instead of causing horizontal overflow, improving readability in the evaluation runs interface. ## September 2025 Added two new Anthropic models: claude-sonnet-4.5 (latest alias) and claude-sonnet-4.5-20250929 (fixed version). Both models feature 200K context window, support for vision/image input, function calling, JSON output, and streaming. Models have exceptional agent and coding capabilities with prompt costs of $0.003/1K tokens and completion costs of $0.015/1K tokens. Fixed model failover chain behavior to properly clean up model settings when failover is disabled. Enhanced error display with better handling of network interruptions and JSON parsing errors, now showing user-friendly refresh options. Fixed model router selection logic to properly handle fallback scenarios and maintain selection state when switching between routers and models. Also corrected evaluation page navigation and improved stream connection reliability. Extensive model deprecation schedule affecting multiple providers: Claude 3.5 Sonnet (Oct 22, 2025), Claude 3 Opus (Jan 5, 2026), Gemini 1.5 series (Sept 22, 2025), Google Gemini 2.0 Flash models (Feb 5, 2026), and various models from Together, Groq, Cohere, and Fireworks. Additionally, several models from Anthropic, Fireworks, Mistral, and OpenAI will no longer be pre-selected by default in new spaces. Enhanced the benchmark browser with new bulk selection capabilities, including the ability to select all subjects at once, sample specific subjects, and perform bulk subject sampling with customizable sample sizes and random seeds. Users can now efficiently select multiple benchmark items by subject, with detailed feedback on sample distribution and warnings for subjects with insufficient data. Added support for xAI provider with new logo integration. Introduced a comprehensive evaluation and dataset management system allowing users to create, manage, and run evaluations on their AI models. Users can now create datasets with custom prompts, expected answers, and system instructions, then use evaluation templates with configurable metrics and rater models to assess model performance. The system supports both manual and benchmark datasets, with detailed progress tracking and scoring capabilities. Improved the Avatar component with intelligent fallback handling: displays user initials when profile pictures fail to load, and shows the Pulze favicon for system users. Added better error handling for image loading, smart initials generation from names (using first and last initials), and special styling for system user avatars with dedicated padding and rounded borders. ## August 2025 Claude Sonnet 4.0's context window has been expanded to 1M tokens, with updated pricing for long contexts (>200K tokens): $6/MTok for prompts and $22.50/MTok for completions. Added new Claude Opus 4.1 model with 200K context window ($15/MTok prompt, $75/MTok completion) featuring vision support, function calling, and streaming capabilities. GPT-5 models were also added (details truncated in diff) with support for functions, JSON, vision, and streaming. Improved the reliability of custom data processing by adding a new reprocess endpoint for both app and organization-level data files. The '/refresh' endpoint has been renamed to '/reprocess' for better clarity, and direct database updates have been implemented for more reliable state management. The update includes enhanced handling of synced files and better error messaging for unprocessable files. Added a new 'Process Again' option to reprocess PDFs and web pages that may have failed initial processing. This feature is available both at the organization level and within individual apps through a new action menu. The UI has been updated to show clearer feedback messages during reprocessing attempts, and the 'Refresh' button has been renamed to 'Retry' for clarity. ## June 2025 Changed the model used for generating conversation names from OpenAI's GPT-4-Nano to Groq's LLaMA-3-70B-Instruct. This change aims to improve the quality and speed of auto-generated conversation titles. Enhanced the message sources display with improved text wrapping and overflow handling, making long source names more readable. The main menu is now hidden for FREE tier users, and paid users are redirected to /s after onboarding. Added support for PULZEONE and PULZEXSUITE subscription tiers with streamlined navigation. Added OpenAI's O3-Pro model and its snapshot version (o3-pro-2025-06-10) with a 200K token context window. The model supports streaming, vision, JSON output, function calling, and multiple completions. It features higher compute capabilities for complex problems, with token costs of 0.015¢ for prompt and 0.06¢ for completion tokens. Improved source document handling in chat interface with new document type detection and direct document preview functionality. Users can now click on document sources to open them in a new tab, while non-document sources show detailed information in a popup. Added visual indicators to distinguish between document and non-document sources, with a new document icon for PDF/document sources. Added support for several new AI tools including Gmail read, Slack read, LinkedIn profile access, transcribe audio, and multi-turn image editing capabilities. DALL-E image generation is now disabled by default. The free trial period for new subscriptions has been extended from 3 to 7 days. Default configurations now include more granular human-in-the-loop settings for various tools. Extended the free trial period from 3 to 7 days for Pro plan upgrades. Added McpTool capability to multiple Pro Assistants including Content Writer, Business Analyst, and Research Assistant. Enhanced configurable options for Pro Assistants by adding a new configuration button and MultiTurnImageEditing tool to select assistants like Project Manager and Wellness Coach. ## May 2025 Fixed crashes in the Add Member dialog when handling partially initialized user data. The dialog now properly validates user objects before filtering, displaying, and processing member additions, ensuring stable operation when managing space members with incomplete profile information. Added support for maintaining context across chat conversations by tracking generated artifacts (images, transcripts, documents) and attached files throughout the conversation history. The system now automatically scans previous messages for pulze:// URLs and file references, preserves them in metadata, and makes them available to plugins in subsequent turns. This enables more coherent multi-turn interactions where AI can reference and work with previously generated content or uploaded files. Added a new Multi-turn Image Editing tool that supports interactive image generation and editing. Users can configure the tool with multiple GPT models (including gpt-4o, gpt-4.1, o3) and customize image parameters like size (up to 1536x1024), quality levels (high/medium/low), and background types (transparent/opaque). The tool includes automatic parameter selection and supports various image dimensions with flexible quality settings. Added new endpoints for configuring and managing MCP tool integrations. Users with editor permissions can now configure MCP server credentials and disconnect MCP tool connections through the API. The integration is managed through the linked accounts system, allowing organizations to maintain separate MCP tool configurations per user. Fixed layout issues in the MCP server configuration dialog by adding proper width constraints and flex behavior to prevent text overflow. Updated the promotional banner to reference Claude 4 instead of Claude 3.7, alongside GPT-4.1 and DeepSeek-R1 models. Enhanced error handling for the model scoring system to gracefully handle API failures and connection issues. Now falls back to default model scores (featuring Claude Sonnet 4.0) when the scoring service is unavailable. Also fixed initialization of query plugins including file/URL handling and RAG query rewrite functionality, with improved logging for better troubleshooting. Added four new Claude models: claude-sonnet-4-0, claude-sonnet-4-20250514, claude-opus-4-0, and claude-opus-4-20250514. All models feature 200K token context windows, support for functions, streaming, and vision capabilities (except dated versions). Pricing is set at $0.003/1K tokens for input and $0.015/1K tokens for output. Claude Opus 4 is specifically optimized for coding tasks and complex, long-running workflows. Fixed an issue where Advanced Workflow tool configurations were being automatically overwritten when loaded from the database. The system now properly preserves existing tool configurations, maintains the correct recipe order from the latest version, and automatically handles legacy configurations by converting them to the advanced workflow format when multiple tools are detected. Updated the file search input schema to clarify file reference formatting requirements, making it easier for users to understand the correct syntax. File references now use a simpler format '``' or '``' without requiring quotes or JSON formatting. Also streamlined the plugin credential handling system by removing redundant placeholder code. Added automatic redirection of free-tier users to the /onboarding page whenever they attempt to access other sections of the application. This new feature ensures free users complete the onboarding process before accessing the main application features. The redirect persists until users upgrade from the FREE subscription tier. Added support for integrating with external MCP servers to discover and use their tools within the agent system. This allows connecting to multiple MCP servers simultaneously, automatically discovering their available tools, and using them through a standardized interface. Each MCP server can provide multiple specialized tools with defined input schemas and capabilities that can now be used alongside existing tools. Fixed an issue where tool call input details could overflow beyond the container width. Added horizontal scrolling to tool call input displays, ensuring long JSON content remains accessible without breaking the layout. Fixed an issue where user credentials were not being properly injected into Slack, Gmail, and LinkedIn tools when using default Pulze assistants. Previously, these integrations would fail to authenticate when not using a custom assistant. The fix ensures that user credentials are now correctly loaded and injected for all assistant types, improving reliability of third-party tool integrations. Removed visible JSON debug output that was showing tool configuration details in the Edit Tools dialog. This improves the UI cleanliness by removing technical information that was not meant for end users. Removed email verification requirement for accessing dashboard data, application lists, and logs. Users can now access these features immediately after account creation, without waiting for email verification. Additionally, added new Stripe price mappings for PulzeOne and PulzeXSuite subscription tiers with monthly, quarterly, and yearly billing cycles. Implemented specific subscription tier limits for PulzeOne and PulzeXSuite users. PulzeOne tier is limited to 20 datasources per organization with no special API requests allowed. PulzeXSuite tier allows up to 50 datasources and 20 special API requests per day. Both tiers include seat limit enforcement for organization members. Added new professional assistant avatar 'Buzz' with detailed SVG graphics including animated facial expressions, eye movements, and emotional responses. The avatar features a distinctive black and white color scheme with interactive elements like blinking eyes, dynamic mouth movements, and responsive emotional states. Added support for linking external service accounts to organization members, starting with LinkedIn integration through the partner API. Users can now connect and disconnect their LinkedIn accounts through a new linked accounts management system, with each account type uniquely constrained per organization member. The implementation includes a new database schema for linked accounts and OAuth-based authentication flow. ## April 2025 Fixed the rendering of custom assistant avatar images by properly applying size classes and image formatting. Custom avatars now correctly display in 5 different sizes (sm: 24px, md: 32px, lg: 40px, xl: 56px, 2xl: 72px) with consistent rounded corners and proper scaling. Introduced a new onboarding flow with pro-level assistants and subscription-based access control. Users can now filter assistants by pro-only status, and assistant availability is automatically managed based on subscription tier (Free, PulzeOne, or higher). Organizations with admin privileges can enable/disable assistants globally, with automatic management for Free/PulzeOne subscriptions through a new checkout process. Fixed an issue where tables in model responses could overflow beyond the visible area on smaller screens. Tables now automatically scroll horizontally when their content exceeds the container width, ensuring all data remains accessible while maintaining the visual layout. Added six new OpenAI GPT-4.1 models with 1 million token context windows: gpt-4.1, gpt-4.1-mini, and gpt-4.1-nano (plus their dated variants). All models support functions, streaming, vision, JSON output, and penalties. The models offer different price points, with gpt-4.1-nano being the most cost-effective at 0.0001¢/prompt token and 0.0004¢/completion token, while gpt-4.1 costs 0.0002¢/prompt token and 0.0008¢/completion token. Added support for uploading and handling Microsoft Office spreadsheet (XLSX, XLS) and presentation (PPTX, PPT) file formats. Users can now work with these additional file types alongside existing document formats like PDF, DOC, DOCX, and TXT. Added support for selectively loading specific plugins through request payload, allowing more precise control over which tools are available during interactions. The system now better handles models with function-calling capabilities by automatically configuring appropriate tools. Also fixed validation issues with document IDs in custom data handling by making them optional. Introduced a new data synchronization system that replaces the previous RAG integration with a more flexible connection framework. Added support for tracking connection states through new fields (connection\_id, document\_id, source\_type, last\_synced\_on) and reorganized API endpoints to use '/connect' and '/webhook' instead of '/carbon'. This change provides a more robust foundation for managing external data connections and synchronization. Added Crisp live chat support widget to help users get real-time assistance. Also integrated TikTok analytics pixel tracking for Spaces pages to better understand user engagement and behavior patterns. Users can now set and manage a default assistant for their workspace directly from the space home page. The interface displays the currently selected default assistant with options to remove it, or create a new assistant if none is set. Default assistants will be automatically selected when starting new conversations in the space. Fixed an issue where the status toggle switch in tool details would render incorrectly when accessing tools that don't exist. The fix adds proper key handling to the switch component, ensuring consistent rendering and state management for tool status toggles. ## March 2025 Added support for the LinkedIn Profile plugin and improved how plugin configurations are handled. Plugin configurations can now be managed globally through organization settings instead of individual assistant configurations. This change specifically affects the LinkedIn Profile plugin, which now reads connected account information from global organization configurations. Fixed an issue with partner webhook URL formatting and enhanced LinkedIn Profile tool management by centralizing its configuration at the organization level. Organization administrators can now manage LinkedIn Profile tool settings globally, which will automatically apply to all assistants using this tool. When assistants use the LinkedIn Profile tool, they will inherit organization-level configurations while maintaining the ability to have assistant-specific overrides. Added support for connecting multiple external services (LinkedIn, WhatsApp, Instagram, Messenger, Telegram, Google, Microsoft, IMAP, X) through partner authentication flow. Users can now generate hosted authentication links to connect their accounts with automatic service type mapping and one-hour expiration. The integration includes success/failure redirects and webhook handling for connection status. Improved web search functionality with a configurable 120-second timeout to prevent hanging searches, plus better error handling and connection management. Added support for Exa API integration for advanced search capabilities, with configurable maximum results (default: 20) and custom timeout messages. Search service now includes improved connection health monitoring and handles large response messages up to 100MB. Improved Gmail label processing to support comma-separated label values and enhanced email parsing with more comprehensive recipient information. The plugin now returns structured email data including sender, recipient, thread ID, and truncated message body (limited to 500 characters), making it more robust for handling multiple labels and email metadata. Fixed incorrect OAuth redirect URI handling for the Gmail draft poster tool. The system now correctly constructs the OAuth redirect URL using the appropriate environment-specific API URL (localhost, development, or production) combined with the OAuth callback path. This resolves authentication issues when using the Gmail draft posting feature. Added new Gmail integration allowing users to create email drafts directly through the API. Includes secure OAuth2.0 authentication flow for connecting Gmail accounts with automatic token refresh handling. Users can now authorize access to their Gmail account through a popup window, with the integration storing credentials securely for future use. Introduced a new task scheduling system that allows users to create and manage automated tasks with custom schedules. Tasks can be associated with specific apps or organizations, support plugin integrations, file attachments, and different LLM models. Users can configure task names, prompts, schedule types, and view task status, last run time, and next scheduled run through both app-specific and organization-wide interfaces. Enhanced conversation naming by using the dedicated GPT-4o-mini model, which is now hardcoded for generating conversation titles. This change ensures more consistent and higher-quality conversation names when creating new chats. The model is configured with a 60-token limit to generate concise, relevant titles. Added mobile-friendly layout for Space home page with a collapsible settings panel that automatically hides on mobile devices (screen width \< 768px). Users can now toggle space settings via a floating button, and the assistant search preview has improved text truncation for better mobile display. The settings panel includes default model selection and member management in a more accessible format for small screens. Fixed an issue where the default model selection wasn't properly handling failover models. The system now correctly initializes the default model by first checking available failover models, then falling back to regular models, and finally defaulting to SMART\_MODEL if no others are available. Enhanced the handling of disabled assistants by adding direct navigation to permissions page when clicking disabled assistants. Disabled assistants now show a clearer visual state with 50% opacity and a light red background, plus an improved message '🔒 Assistant is disabled. Admins click here to enable it.' This provides administrators a more intuitive way to enable assistants directly from the grid or search views. Enhanced the assistant creation process to gracefully handle permissions when users aren't organization administrators. The system now properly validates user access permissions before attempting to add assistants to organization configurations, preventing potential errors for non-admin users. Organization administrators can now create assistants that are automatically enabled for their entire organization. When an org admin creates a new assistant, it is immediately added to the organization's configuration with enabled status, eliminating the need for manual activation. This streamlines the assistant deployment workflow for organization administrators. Fixed an issue where assistants were incorrectly enabled by default in the global configuration. Now, assistants are disabled by default unless explicitly enabled in the organization's configuration, with exceptions for draft assistants and those owned by the current application. This ensures the 'Try Now' button works correctly based on proper permission settings. Fixed model selection logic when using assistants to properly respect model configuration. When 'pulze' is specified as the model, the system now correctly falls back to using the assistant's configured model, max tokens, and temperature settings instead of overriding them. This ensures consistent behavior when using dynamic model routing with assistants. ## February 2025 Added a new model selector component that allows users to set a default model for their space, including a smart routing option that automatically selects the best model based on queries. The selector includes model descriptions, tooltips for each option, and provides visual feedback when selections are made. Changes are saved automatically and persist across sessions. Simplified the space home page layout by removing redundant icons and reducing header sizes. Documents and Data sections now have cleaner 'text-lg' headers, and the Recent Documents section was moved to improve spacing with an 'mt-8' margin. The space title and member management sections were reorganized for better visual hierarchy. Added Claude 3.7 to the platform's promotional banner alongside existing o3-mini and DeepSeek-R1 models. The banner on the landing page now displays all three available models to users. Added two new Anthropic models: claude-3-7-sonnet (latest version alias) and claude-3-7-sonnet-20250219, both featuring a 200,000 token context window. These models support function calling, streaming, and chat functionality, with token costs of $0.003/1K for input and $0.015/1K for output tokens. Vision capabilities are not supported. Fixed an issue where Gemini model interactions would fail when processing plugin results in an unexpected format. The system now properly handles both dictionary and non-dictionary plugin results by automatically converting non-dictionary responses into a structured format with an 'original\_prompt' key. This improves reliability when using plugins with Gemini models. Improved error handling in chat completion API calls by adding defensive checks for missing 'tool\_calls' and 'usage\_metadata' attributes. This prevents API failures when these optional attributes are not present in model responses, making the API more robust and reliable. Added streaming support for O1, O1-preview, and O1-mini OpenAI models. Fixed human-in-loop functionality to properly handle plugin names with hyphens. Enhanced security by masking API keys in assistant tool configurations alongside other sensitive credentials like access tokens and consumer secrets. Added a new API Request plugin that enables making HTTP requests to external API endpoints. The plugin supports configurable URLs, HTTP methods (GET, POST, PUT, DELETE), custom headers including API key authentication, and flexible request body handling with JSON validation. Plugin can also combine results from other plugins as input data. Modified how tool availability is determined across the platform. Previously undefined tools were treated as enabled by default, now they are considered disabled until explicitly enabled. This affects tool visibility in assistant creation dialogs and the permissions management interface, providing more consistent tool availability management. The web search plugin now supports chaining results from other plugins by incorporating their output as context. Added ability to configure whether to use previous plugin results through the 'data' configuration array, allowing more sophisticated search queries that build on earlier plugin responses. This enables multi-step reasoning chains where web searches can be informed by context from other plugin executions. Organization API keys are now sorted by creation date (newest first) instead of alphabetically. Added the ability to filter API keys by name using a search parameter. This makes it easier to find specific API keys in organizations with many tokens. All email addresses for organization invitations are now automatically converted to lowercase to prevent duplicate invites and ensure consistent matching. This means invites sent to '[User@Example.com](mailto:User@Example.com)' and '[user@example.com](mailto:user@example.com)' will be treated as the same email, improving the reliability of the invitation system and preventing potential confusion with case-sensitive email addresses. When adding a new tool in the advanced tools configuration, it now automatically selects the newly created tool for editing. Also improved handling of empty instructions and prompts in the model selector configuration to prevent undefined values. Added comprehensive API key management functionality including the ability to create, update, delete, and regenerate API keys with granular permissions. New features include tracking key creation/modification dates, associating keys with specific users via auth0\_id, and improved token access validation. API keys can now be managed individually with custom names and permission sets. Modified how prompts are logged in chat completions to only store the most recent user message instead of the entire conversation history. This improves log readability and fixes issues where system prompts and previous messages were unnecessarily included in logs. The change also includes a slight adjustment to Anthropic model scoring penalties from -0.14 to -0.13. Fixed handling of chat messages containing multiple content parts or lists, ensuring proper concatenation of text content in message processing and scoring. The update improves support for complex message structures in chat conversations by correctly handling both string and list-based content types, with proper text extraction and formatting. Added support for Google's Gemini 2.0 Flash model (gemini-2.0-flash-001) with a 1M token context window. This next-gen model features superior speed, native tool use, multimodal capabilities including vision support, and function calling. The model supports streaming and has token costs of 0.1¢ per 1M prompt tokens and 0.7¢ per 1M completion tokens. Added two new Groq models: LLaMA 3.3 70B Versatile (128K context) for multilingual tasks and LLaMA 3.2 90B Vision Preview (128K context) for image analysis and reasoning. LLaMA 3.2 90B Text Preview has been deprecated as of November 25, 2024. Both new models support streaming, function calling, and chat functionality, with the Vision model adding specific image processing capabilities. Added ability to globally enable or disable specific assistants through organization configuration settings. Each assistant now includes a 'globally\_disabled' flag that can be controlled via the global configuration, allowing organization administrators to centrally manage assistant availability across their organization. This change synchronizes assistant status with global configuration settings when listing or retrieving assistants. Improved the tool configuration interface by replacing simple labels with comprehensive tool information including detailed descriptions and documentation links. Each tool (Model Selector, Add Data, Web Search, etc.) now displays a more informative label and includes a detailed description explaining its functionality. This update makes it easier for users to understand and correctly configure tools when creating assistants. ## January 2025 Added OpenAI's o3-mini model, their latest small reasoning model optimized for science, math, and coding tasks. The model features a 2M token context window, supports streaming, batch API, structured outputs, and function calling. It maintains the same cost efficiency as o1-mini (0.0000044 USD per completion token, 0.0000011 USD per prompt token) while offering improved intelligence. Added support for assigning and managing categories for assistants through the API. Users can now select multiple categories when creating or updating assistants, and categories are organized into groups. The update includes proper database relationships between assistants and categories, with the ability to view, assign, and modify category assignments while maintaining visibility settings. Users can now share direct links to specific assistants via a new sharing interface in the assistant dialog. When clicking a shared link, the application automatically opens the assistant dialog for that specific assistant. The feature includes a copy-to-clipboard button and displays the full URL in a dedicated sharing section at the top of the dialog. Organizations can now globally manage and monitor model usage across all spaces. Added new endpoints to view model status and configuration across spaces, including the ability to see which spaces are using specific models and whether models are globally enabled or disabled. This gives organization admins better visibility and control over model usage at the organization level. Improved the loading experience when viewing member details by replacing the basic 'Loading' text with an animated skeleton placeholder. The skeleton shows the expected layout with pulsing elements representing the member's information fields, providing a smoother and more polished user experience. Added support for DeepSeek-R1-Distill-Llama-70B model on the Groq platform, featuring a 128K token context window. This fine-tuned version of Llama 3.3 70B excels at mathematical reasoning and coding tasks, and supports streaming, multiple completions (n), and penalties. The model is optimized for instant reasoning on GroqCloud™ with competitive pricing at $0.00079 per completion token and $0.00059 per prompt token. Enhanced the read-only state handling in the assistant editor by disabling all interactive elements when in read-only mode. This includes disabling the sharing controls, avatar input field, and edit tools button. The share visibility dropdown now shows a distinct disabled state with a sand-colored background, and the 'Allow duplication' checkbox respects the read-only state. Redesigned the assistant sharing interface with a new consolidated dropdown that clearly shows visibility options (My Space, My Organization, Everyone). Added explicit descriptions for each sharing level and introduced a new 'Allow duplication' toggle that controls whether others can view and duplicate the assistant's configuration. Dialog width is now consistent across all creation steps. Fixed a UI issue where model names and provider logos could overflow their container in the assistant tools editor. Added minimum width constraints to prevent content from breaking layout when model names are long. Enhanced model selection behavior to automatically fall back to the smart model when a previously configured model is not found. This improves reliability when editing assistants by preventing configuration errors due to unavailable models. Also reorganized the assistant editing interface to place Tools section after Persona & Writing Style for better UX flow. Fixed an issue where image previews could break when image URLs contained special characters. The fix properly wraps image URLs in double quotes within the CSS background-image property, ensuring consistent display across all image sources. Integrations section (including Zapier and other AI widgets) is now hidden for free tier users, showing only a preview list of available integrations. Users with billing editor permissions will see an 'Upgrade Now' button to access these features. This change improves the clarity of premium features and provides a direct upgrade path for free tier users. Fixed an issue where tools were being added multiple times to assistants, causing duplicates in the available tools list. Also resolved a bug in human-in-loop functionality that was caused by inconsistent tool name formatting (with dashes vs underscores). The X (Twitter) post tool description has been updated to be more accurate. Added DeepSeek-R1 model with 160K context window, available through Together.ai ($0.007/1K tokens) and Fireworks.ai ($0.008/1K tokens) providers. The model supports streaming, penalties, and multi-completion (n>1) capabilities, but does not support JSON mode or function calling. Also introduced a new categorization system for assistants with predefined groups like Marketing & Sales, Finance & Legal, Operations & HR, Engineering & Support, and Fun & Lifestyle. Added support for DeepSeek-R1 model with a prominent promotional banner at the top of the landing page. The landing page has been redesigned with clearer sections for 'Chat with AI' and 'Automate Tasks', featuring more detailed descriptions and organized feature lists. The UI now emphasizes collaborative workspaces and task automation capabilities. Added new configuration options to the chatbot widget embed code including plugin support and automatic tool selection. Users can now control plugin availability through the 'plugins' array and enable automatic tool selection with the 'auto\_tools' feature flag. The widget also supports customizable send button (➤), placeholder text, and footer branding options. Added support for OpenAI's o1 model, designed for complex reasoning with a 200K token context window. The model features built-in chain-of-thought processing and supports advanced capabilities including function calling, JSON output, vision tasks, and custom penalties. Token costs are set at $0.015/1K for prompt tokens and $0.060/1K for completion tokens. Enhanced the space selection popover's appearance and positioning by adjusting its width to 96 units, increasing the anchor gap to 18px, and adding a 10px offset. The visual design was refined with a larger shadow (shadow-lg) while maintaining the rounded corners and white background. These changes make the dropdown more visually prominent and better positioned relative to its trigger button. ## December 2024 Added support for filtering organization custom data and documents by specific app IDs. Users can now pass an optional list of app\_ids to narrow down results to only show custom data and documents associated with particular applications. This filtering works in conjunction with existing filters like show\_public\_only, show\_org\_only, and search functionality. Users can now upload data files directly from the space home page with support for custom file formats and webpage URLs. New features include a progress indicator during uploads, error handling for invalid file types, and ability to upload multiple files simultaneously. Supports file state tracking (UPLOADING, CREATED, PENDING, DELETING, QUEUED) with visual indicators for processing and error states. Fixed an issue where documents without any versions were not appearing in document lists. The improved query now correctly displays all documents, including those without versions, and properly handles documents with deleted versions by showing their most recent non-deleted version. The modified\_on date now falls back to the document's creation date when no versions exist. Documents created from requests now automatically receive intelligent titles based on their content. The system uses an AI model to analyze the request's response text and generate a relevant, concise title between 5-12 words that captures the main theme of the document. This improves document organization and searchability without requiring manual title input. Added new filtering options to the organization's data table that allow users to filter content by document type. Users can now specifically show only documents or only custom data entries using the new 'show\_documents\_only' and 'show\_custom\_data\_only' filter parameters. This enhancement improves content organization and navigation in the data table view. Documents are now displayed alongside custom data in the global data view, showing metadata like title, modification date, and version information. Users can view document details including associated apps, visibility status, and state through a new API endpoint (/documents/{document_id}). This provides a unified view of both custom data files and documents within the organization. Success toast notifications now automatically dismiss after 8 seconds instead of 20 seconds, providing a more streamlined user experience while still ensuring messages are visible long enough to be read. This change affects all success notifications across the application while maintaining the 3-second duration for other toast types. Added support for Google's experimental Gemini 2.0 Flash model, featuring a massive 1M token context window and multimodal capabilities. This next-generation model supports streaming, function calling, and vision tasks, with improved speed and native tool use. Pricing is set at $0.00000015 per prompt token and $0.0000006 per completion token. Enhanced error reporting when AI model streaming fails by showing the actual error message to users instead of a generic 'an error occurred' message. This provides more specific and helpful feedback about what went wrong during model interactions. Fixed an issue where selectable chat messages could overflow their container width. Messages now properly constrain to their container size with improved layout behavior using the 'min-w-0' and 'grow' CSS properties, ensuring a better visual experience when hovering and selecting messages. Enhanced navigation by making space names clickable in the custom data details and member management interfaces. Users can now directly navigate to spaces by clicking on space names instead of having to manually navigate there. Also improved security on external links by adding noopener/noreferrer attributes. Users can now add, edit, and delete comments on conversations, as well as reply to existing comments. Comments support full CRUD operations with user-specific permissions - only comment authors or admins can modify/delete comments. Each comment tracks metadata including creation time, author, and deletion status. ## November 2024 Added automatic navigation to the conversation thread immediately after sending a new message from the space home page. When users submit a message through the ChatBox component on the space home screen, they will now be redirected to the full conversation view instead of staying on the home page. Added support for uploading MP4 video files by including video/mp4 and video/x-mp4 MIME types to the list of valid file formats. This expands the platform's media handling capabilities beyond audio formats like WAV and WebM. Fixed text overflow issues in two areas: saved prompts now properly truncate after two lines using line-clamp, and long file MIME types are now truncated with an ellipsis. This improves readability and prevents UI layout breaks when displaying long prompt text or file type information. Enhanced the 'Upgrade Now' button functionality by adding click navigation to the billing page and restricting visibility to users with appropriate billing permissions (Admin ALL, Admin Billing, Editor ALL, or Editor Billing). The button appears for accounts using their upload quota (10 uploads for free tier, 100 for paid tier) and directs users to the organization billing page. Introduced Smart Learning feature that learns from highly-rated responses to personalize model selection. When users rate responses positively, the system now stores the prompt, model, and context in a vector database for future optimization. This personalization system automatically removes data from poorly rated responses to continuously improve recommendation accuracy. Improved the Space interface by removing the ChatWrapper component for better performance and adding a maximum width limit (96) to space names in the navigation menu. Chat UI state now resets when switching between spaces, and console logging for image preview errors has been removed for cleaner debugging. Additionally, the chat interface is now wrapped in a ChatProvider component with proper app settings and ID context. Added Pixtral Large (124B) multimodal model from MistralAI with 128K context window support. The model excels at mathematical reasoning, document analysis, and visual data interpretation, with pricing at $0.006/1K tokens for completion and $0.002/1K tokens for prompts. All OctoAI models have been deprecated as of October 31, 2024. The new Pixtral model supports streaming, JSON output, function calling, and vision capabilities. Modified the quality score adjustment for Anthropic models, reducing the penalty from 0.16 to 0.14. This change results in slightly higher quality scores for Anthropic models like Claude while maintaining the relative ranking with other providers. Users can now share conversation messages with team members, which triggers an automatic email notification to the recipient. The email includes the shared message's prompt, response text, space logo, and a direct link to the conversation, along with the sender's name and avatar. This feature integrates with existing team member permissions and space access controls. Refactored the navigation system to use a simplified two-level breadcrumb structure (first and second level) instead of a dynamic stack. This improves menu consistency and prevents navigation items from accumulating. The change affects space navigation, thread navigation, and widget information displays throughout the application. Added ability to edit individual organization member permissions via a new API endpoint. Organization admins and editors can now view individual member details and update permissions for specific members, while maintaining security by only allowing modification of permissions that the editor themselves has access to. Removed unnecessary 'test' label that appeared below space names in the member details permissions interface. This improves the UI clarity by showing only the actual space name without redundant test text. Enhanced UI with improved button styling for selected states, streamlined navigation menu with clearer labels, and redesigned thread notifications with a new unread counter badge. Updated the Members page layout with a cleaner interface, relocated the 'Invite Member' button to the header, and simplified the permissions menu description to 'Manage user access'. Enhanced thread menu visibility by showing full thread names and improving layout up to 500px width. Added loading indicator while comparing AI models, and fixed chat renaming to show current name by default. Thread names are now always visible instead of being hidden on mobile, and member management UI received improved z-index handling to prevent overlay issues. Added scoring support for OpenAI's O1-preview (score: 1.0) and O1-mini (score: 0.9) models. Implemented scoring adjustments that reduce OpenAI model scores by 0.3 and Anthropic model scores by 0.16, affecting model recommendations in the platform. Fixed search functionality to properly include custom data when associated with labels by adding distinct query results and improving the join relationships between custom data tables and label tables. This ensures that searches now correctly return all relevant custom data entries that are connected to specific labels without duplicates. Improved the automatic conversation naming system to generate more consistent and descriptive titles. Each conversation title now starts with a relevant emoji followed by a concise, grammatically correct heading under 50 characters. The system now follows a standardized format (e.g., '🚀 Space Exploration Technologies') to better capture conversation topics. Added a comprehensive file icon system supporting 20+ data sources including Google Drive, Slack, Notion, and more. Introduced a new PlayRing icon and improved file visualization component that automatically renders the appropriate icon based on the content source. This update provides better visual context for different file types and sources in the interface. Fixed layout issues in the conversations view by implementing fixed-width columns and proper text wrapping. Added proper handling for conversation participant avatars with a maximum width of 32 pixels and shrink prevention for action buttons. The conversation list now displays up to 10 conversations per page with improved spacing and border handling. Added two new features: (1) Conversation read status tracking that allows marking conversations as read with timestamps, and (2) A saved-for-later feature that lets users bookmark app requests for future reference. Also improved organization invites by preventing duplicate invitations and memberships through new database constraints. Removed automatic scrolling behavior that was causing the conversation list to jump to the selected chat. Users can now naturally scroll through their conversation history without the view automatically repositioning to the active conversation. ## October 2024 Fixed an issue where O1-class OpenAI models (like GPT-3.5-Turbo and GPT-4) could return empty responses due to restrictive token limits. The maximum completion token limit has been increased to 32,000 tokens, allowing for much longer model responses. Fixed an issue where messages couldn't be properly compared in the chat comparison view due to message ID mismatches. The system now preserves the frontend-generated message ID using a new '\_old\_id' field and uses it as a fallback lookup mechanism, ensuring messages can be correctly referenced and compared even after server synchronization. Improved the billing interface with a larger, more prominent yearly/monthly toggle switch and clearer pricing display. Prices now show per-seat monthly costs (e.g., $28/seat/month for yearly Pro plan, $35/seat/month for monthly) with a visible 20% yearly discount. Added clearer billing cycle indicators and made the pricing toggle more user-friendly with clickable labels. Introduced a new Collections feature that allows users to organize and group conversations and requests within spaces. Users can create named collections with descriptions, add requests to collections, and manage collection items through new API endpoints. Collections are unique within an app and support search functionality. Changed how microphone permissions are checked to be less intrusive by using the Permissions API instead of automatically starting a recording. Users will now only be prompted for microphone access when they actively try to record audio, rather than on page load. Also adds better error handling when microphone access is denied. ## September 2024 Added support for audio input via microphone by enabling WebM audio format (audio/webm) and handling WebM video format conversion. Users can now record audio directly through their microphone in addition to uploading audio files. The system automatically handles format detection and conversion of WebM video formats to their audio equivalents for transcription. Added four new models: Phi-3.5 Vision Instruct (32K context), Llama 3.2 11B Vision Instruct (131K context), Llama 3.2 90B Text Preview (8K context), and Llama 3.2 90B Vision Instruct Turbo (131K context). These models support various capabilities including vision processing, long-form text generation, and advanced reasoning. The vision models enable image captioning, visual question answering, and document analysis, while the text model excels at general knowledge, coding, and multilingual translation. Added support for uploading and attaching various file types to apps, including images (PNG, JPEG, WebP, SVG) and audio files (AAC, FLAC, MP3, M4A, WAV, etc). Files are securely stored with signed URLs and can be referenced in conversations. This feature requires a paid subscription and includes mime-type validation for security. Messages can now only be sent when text input is present, even if files are attached. Previously, messages could be sent with just attached files and no text. This change ensures more intentional message submissions by requiring users to provide text content along with any attachments. Improved the user experience when switching between conversations by adding visual loading indicators. Users will now see animated pulse placeholders for the conversation title and message history while content is loading. The message input is also automatically disabled during conversation switches to prevent premature submissions. Added support for audio file transcription via Gemini and Groq integration. Users can now upload audio files in various formats (mp3, wav, flac, ogg, opus, etc.) and have them automatically transcribed. The system intelligently determines if transcription is needed based on conversation context and handles the transcription process in the background, supporting both streaming and non-streaming responses. Streamlined the in-app tour explanation of the assistant feature, making it more concise and easier to understand. The updated tooltip focuses on key capabilities like browsing, favoriting, and creating personalized assistants, along with controlling their visibility across spaces and organizations. Removed redundant content while maintaining all essential information about assistant customization and switching. Fixed an issue where failed API requests would incorrectly finalize message state, preventing proper retry attempts. The system now preserves the original message state when errors occur, allowing for proper retry functionality and more reliable error recovery. Added support for plugins in chat completion requests through a new 'plugins' field. Users can now specify a list of plugins to enable when making requests to the /chat/completions endpoint. This enhancement allows for dynamic plugin activation on a per-request basis. Updated the product tour to include comprehensive guidance for two major features: AI image analysis and model comparison. Users can now learn how to upload or paste images directly for AI analysis, with detailed instructions for combining text and image queries. The tour also demonstrates the new side-by-side model comparison feature, allowing users to compare responses from different AI models for the same prompt. Both features include step-by-step instructions and specific use cases. Users can now paste images directly from clipboard into chat conversations using Ctrl+V/Cmd+V, supporting PNG, SVG, and JPEG formats. The image upload interface has been enhanced with better error handling, visual feedback for invalid uploads, and improved UI elements including a border around image previews and better z-indexing for delete buttons. Images can be added both through file upload and clipboard paste, with toast notifications for error states. Added support for image inputs in prompts and introduced the Pixtral-12B vision model from MistralAI with 128K context window. Updated vision capabilities for existing models including Claude-3 series, GPT-4 Turbo, and Gemini 1.5 models. The Pixtral-12B model supports streaming, JSON output, and function calling alongside its vision capabilities, with pricing at \$0.0000015 per token. Fixed an issue where web searches could fail silently when search results were missing required text content. The system now properly logs and skips invalid results while continuing the search, preventing searches from breaking when incomplete data is returned. Free plan users can now add one datasource to their organization. Previously, Free plan users could not add any datasources. The playground prompt request endpoint has also been removed from datasource limit enforcement, making it more accessible. Error messages have been updated to be more informative, now specifically suggesting an upgrade to Pro plan when limits are exceeded. Fixed a layout issue where notification banners (Pro Plan, Subscription Expired, Version Check) caused incorrect content height calculations. The fix ensures proper content scaling and prevents overflow issues by adding min-height constraints to the main layout containers. Fixed inconsistent token limit handling for OpenAI models by properly setting max\_tokens parameter for non-O1 models and max\_completion\_tokens for O1 models (o1-mini and o1-preview). Also resolved a label reference formatting issue that affected how application labels were processed and returned in the API response. Fixed validation and handling of file metadata in custom data sources, now properly supporting both URLs and file paths. Updated parameter validation to make file size optional and improved handling of external URLs. Also fixed model validation of custom data references to properly parse database rows. Added function-calling support for 18 major models including Claude 3 series (Opus, Sonnet, Haiku), GPT-4 Turbo variants, Mistral Large, Llama 3.1-70B, and Cohere Command-R models. Models can now automatically detect and execute tool-calling operations based on conversation context. Migration includes updates to conversation handling and API response formats to support the new capabilities. Added support for viewing and previewing AI-generated artifacts (images) in chat conversations. Users can now click on generated image thumbnails to open them in a fullscreen modal view with a dark overlay background. The preview includes a close button and supports clicking outside to dismiss. Added support for OpenAI's O1 series models: o1-preview for complex reasoning tasks and o1-mini optimized for coding and math. Both models feature 128K context windows, support for function calling and JSON output, with October 2023 knowledge cutoff. The o1-preview model costs $0.060/1K completion tokens and $0.015/1K prompt tokens, while o1-mini offers more cost-efficient rates at $0.003/1K completion tokens and $0.012/1K prompt tokens. The AI Chatbot widget now supports extensive visual customization including custom titles, colors, fonts, and branding elements. Users can configure the primary color, secondary color, font family, bot avatar, placeholder text, send button appearance, and footer text. Added support for Google Fonts integration and the ability to customize or remove the 'Powered by' attribution. Added distinct orange highlighting for assistant mentions in the chat editor interface. Assistant references now appear with an orange background (bg-orange-100) and orange border (border-orange-400), providing better visual differentiation from other mention types like data (purple), models (green), and other references (blue). Improved user experience on the assistants page by adding loading states with animated placeholders while content loads, and clear error messages when data fails to load. Popular and new assistants sections now show loading animations and handle error states gracefully with expandable error details. Assistants are no longer displayed in the mention panel when typing '@' in Trypulze. While the assistant mention candidate type remains in the codebase, they are now filtered out from appearing as suggestions to users. This temporarily simplifies the mention experience by focusing on other reference types like labels, messages, and search options. Improved the appearance of the version update notification banner by removing an unnecessary space character and adjusting text spacing. The banner now shows a cleaner 'A new version is available' message with better-aligned underlined text for the update action. Improved the display of collapsed text previews by removing the automatically appended ellipsis (...) from truncated content. This change provides a cleaner interface since truncated text often already includes natural breaks or ellipsis when needed. Streamlined the organization settings update process by focusing on essential display information (organization display name and logo) rather than billing and address details. Removes validation of billing email domains and billing customer updates, making the settings page more focused and easier to use. Added a new notification banner that automatically checks for frontend updates every 10 seconds. When a new version is available, users will see a blue banner with a clickable 'refresh' link to update their application to the latest version. This helps ensure users are always running the most recent version without manually checking for updates. Improved the visual appearance of tables in markdown responses by adding consistent borders, alternating row colors, and proper padding. Tables now feature light sand-colored backgrounds for even rows and headers, with uniform cell padding and border styling for better readability. Improved how long assistant descriptions are displayed by adding a collapsible text component that shows a 'More/Less' toggle when text exceeds 350 characters. Added a new assistant details dialog that automatically displays when descriptions are longer than 4 lines. Users can now easily read long assistant descriptions without cluttering the interface, with proper line break handling and formatted text display. Added ServiceNow as a new integration option, including the ServiceNow logo asset and integration capabilities. This expands the available integrations alongside existing options like Salesforce, OneDrive, and Zendesk. Improved handling of generated artifacts (like DALL-E images) by implementing a new dynamic URL signing system. Images are now stored in a dedicated artifacts bucket with organization and app-specific paths, and URLs are generated on-demand using a new 'pulze://' protocol. This change provides more secure and organized access to generated content while maintaining long-term accessibility. Enhanced the thread assistant preview layout with a centered, width-constrained design (max-width: 3xl). Fixed button interaction behavior to properly handle blur effects when clicked. Also improved mention candidate type displays for image generation, file search, and web search references. Added ability for users to rate responses using thumbs up and thumbs down buttons. This feature introduces new SVG icons (thumbs-up.svg and thumbs-down.svg) and enhances the Button component with automatic blur functionality after clicking. The rating system allows users to provide direct feedback on response quality. Enhanced user tracking by maintaining UTM parameters (like utm\_source, utm\_medium, utm\_campaign) across the authentication flow. When users log in, their marketing attribution data is now preserved and automatically restored after successful authentication, ensuring accurate campaign tracking and analytics. Fixed an issue where the chat interface wouldn't properly reset and re-render when switching between different applications (appIds). The fix ensures that the chat context and conversation state are properly reset by forcing a full re-render of the ChatProvider component when the appId changes. Added a minimum height of 20 units to the editable chat input box while maintaining the maximum height of 40% viewport height. This improves usability by preventing the input area from collapsing too small when empty. Fixed several issues with file handling in custom data management: improved URL-safe filename encoding for uploads, fixed file refresh logic for external URLs, and corrected RAG webhook processing for malformed objects. Now properly handles filenames with special characters and more reliably processes file refreshes for both local and external data sources. Added support for GitHub Flavored Markdown through the remark-gfm v4.0.0 package. Users can now use advanced markdown features like tables, task lists, strikethrough text, autolinks, and footnotes when writing or viewing markdown content in the application. Improved handling of files uploaded through RAG by adding better validation and filtering of malformed objects. Added support for handling missing file statistics gracefully and introduced new database indices for better performance. System now automatically removes invalid or malformed files and provides clearer error messages when files cannot be refreshed. Introduced RAG integration for enhanced document and data management capabilities. The update adds support for external file storage, URL management, and automatic sync tracking through new database fields like object\_id and data\_source\_type. Users can now store external URLs and track document modifications with automatic timestamp updates. Enhanced the image generation experience by adding a dedicated loading state with spinner animation when DALL·E 3 is generating images. Users can now see a visual loading indicator in a square container, and after generation, they have the option to regenerate images using a new 'Generate new image' button. Analytics tracking for image generation attempts has also been added. Added a new status indicator that shows 'Syncing Failed' in red text when a file encounters synchronization issues. This provides clearer feedback when file synchronization encounters problems, distinguishing sync failures from general processing failures. Enhanced the user menu with direct access to important resources, including Privacy Policy, Terms of Service, and Documentation links. Users can now quickly access these pages directly from the dropdown menu, with each link opening in a new tab. Added support for generating images using DALL-E 3 directly within chat completions. Users can trigger image generation using the 'image-generation-dalle' command, which creates 1024x1024 images with standard quality. Generated images are automatically stored in S3 with 7-day signed URLs for access. Simplified the display text for image generation mentions in the editor, changing from 'image-generation-dalle' to just 'image'. This makes the interface cleaner and more user-friendly when referencing DALL-E generated images in the editor. Fixed an issue where conversations would completely stop when encountering an error in a parent message. Now, the conversation can continue by automatically falling back to the previous valid message in the chat history, allowing users to keep chatting even if one message fails. Added a new Integrations panel in the chat interface, accessible via a dedicated button with DataFlow icon. This feature is available exclusively for Pro plan subscribers and can be toggled from the sidebar. The panel maintains consistent dimensions (400px width) with other sidebar components like Members and Models panels. Fixed an issue where selecting the dynamic/smart model option wasn't properly handled in the assistant creation and editing interfaces. Now correctly sets the model\_id to null (instead of undefined) when users choose the dynamic model option, ensuring proper backend compatibility and consistent model selection behavior. Fixed tooltip behavior in the chat interface's Members panel. Free tier users now see a 'Unlock in Pro plan' tooltip when hovering over the disabled Members button, and users can't remove themselves with a clearer tooltip explanation. Added a new notification banner that encourages free tier users to upgrade to the Pro plan. The banner appears at the top of both mobile and desktop interfaces, displaying a clickable message that directs users to the billing page. Users can dismiss the banner by clicking the close icon, and it remains hidden until the next session. Added a new Members panel to the chat interface, accessible via a dedicated button in the side navigation. Users can now toggle the Members panel visibility alongside existing Data and Models panels. The panel displays team member information and is designed with a consistent 400px width layout matching other side panels. The Create Assistant dialog now supports cloning existing assistants by pre-populating all fields including name, avatar, description, visibility settings, persona, instructions, greeting, writing style, and model settings. This allows users to easily create new assistants based on existing ones while retaining all configuration options with the ability to modify them before creation. Added a new SpaceSelection component that allows users to search and select spaces with an improved dropdown interface. The component includes space names, creation dates, creator avatars, infinite scrolling for large lists, and supports filtering for accessible spaces. Also improved the RangeSlider component with better disabled state styling and cursor handling. Introduced a new 'make purge' command that completely removes all Docker containers, images, volumes, and orphans for a full environment reset. The existing 'make clean' command has been optimized to only remove local images (instead of all images), making it significantly faster for routine cleanup while preserving cached images from external registries. Enhanced the assistant management system to automatically delete assistants when their last version is removed. When deleting assistant versions, the system now checks if any versions remain, and if not, automatically removes the parent assistant and associated favorites. This prevents orphaned assistant records and provides cleaner workspace management. When viewing an assistant from a different space, users can now click on the organization/slug identifier to navigate directly to the assistant's owning space. This navigation link appears with a purple underline and external link icon when the assistant belongs to the current organization but is being viewed from a different space. Added a new guided tour system that introduces users to the @ reference functionality and improved the mention panel UI. The mention panel now appears as a portal with better positioning and increased z-index (110) for improved visibility. The maximum width has been set to 650px for better readability, and the panel includes enhanced tooltips explaining space organization and data security features. Added official integrations with Make.com and Zapier, enabling users to create automated workflows with Pulze.ai. Users can now generate API keys specifically for these platforms and connect to hundreds of other apps and services. The integration includes direct invitation links to both Make.com and Zapier platforms for seamless setup. Increased the default maximum token limit from 2048 to 4096 tokens across the platform, including assistant versions and model settings. This change allows for longer conversations and completions by default without requiring manual configuration. The update affects both new assistants and the default behavior when max\_tokens is not explicitly specified. ## August 2024 Updated Cohere Command-R Plus to the newer 08-2024 version with adjusted pricing (2.5e-6 per prompt token, 10e-6 per completion token). Command-R model was also updated to version 08-2024 with new pricing (0.15e-6 per prompt token, 0.6e-6 per completion token) and had its deprecation status removed. Standardized the width of input fields across all dialog boxes, including organization creation, space creation/renaming, chat renaming, label editing, and member invitation forms. Replaced fixed minimum widths (300px) with responsive full-width inputs that automatically adjust to their container size, providing a more consistent and adaptable user interface. Improved readability of assistant descriptions in the mention suggestions dropdown by limiting them to a maximum of 3 lines of text plus the name line. This prevents long descriptions from taking up too much space while still providing relevant information. Improved the assistant list UI to better highlight default and favorite assistants. Added 'current\_app\_default' flag to clearly indicate which assistant is set as default for the current app, and enhanced sorting to prioritize default assistants followed by favorites. Also improved shared assistant visibility with better annotation handling. Improved the Assistants interface by adding organization ownership context and refined sorting capabilities. Assistants now display their owner organization's name and ID, with clearer indicators for whether an assistant belongs to the current organization or app. The sorting algorithm has been enhanced to better handle popular assistants based on favorites and modified dates. Added the experimental Gemini 1.5 Flash model with 1M token context window, supporting streaming but not JSON, functions, or penalties. Updated Gemini 1.5 Pro Experimental (formerly Pro-exp-0801) and adjusted token pricing for Gemini 1.5 Flash models to 0.000075¢ per prompt token and 0.0003¢ per completion token. Assistants now display additional metadata including favorite status, app ownership, and organization ownership. List views have been updated to better organize assistants by showing favorites first, followed by app-owned assistants, then organization-owned assistants, and finally public assistants. Also improved assistant reference handling for access tokens with proper auth0\_id filtering. Improved the descriptive text in the Assistants page with more engaging and clear section subtitles. The 'Popular' section now reads 'Our community's favorite assistants. ❤️' and the 'New and updated' section displays 'Freshly baked assistants, hot off the press.' Relocated the model whitelist management to a dedicated side panel, making it easier to select and customize which AI models are available for your conversations. Added a new cube icon for model management and improved the model display component to show clearer provider logos and names. This change includes a new onboarding tour step to help users discover and understand the model whitelist feature. Added new view parameter to sort shared assistants by either 'newest' (default) or 'popular'. Popular view sorts assistants by the number of favorites, while newest view sorts by the most recently modified published version. Both sorting options maintain alphabetical ordering by name as a secondary sort. Enhanced the ordering of assistants in list views to show a more logical and user-friendly sequence. Assistants are now ordered by priority: favorites appear first, followed by assistants owned by the current app, then assistants shared within the organization, and finally publicly shared assistants. Within each priority level, assistants are sorted alphabetically by name. Introduced a new Assistants feature that allows creating and managing AI assistants with versioning support. Users can now configure assistants with custom personas, instructions, greetings, writing styles, and sample interactions. Each assistant can be published with specific model settings (temperature, max tokens) and visibility controls. Added support for favoriting assistants and storing conversation-specific settings. Added AI21 Labs' Jamba 1.5 Large model, featuring a 256K context window and support for function calling, JSON output, and streaming. This state-of-the-art hybrid SSM-Transformer model offers up to 2.5X faster inference than comparable models and is optimized for business use cases. The older Jamba Instruct model has been deprecated. Added error handling to display a toast notification when creating a new space fails. Users will now see a visible error message 'Creating space failed!' instead of a silent failure, making it clearer when there's an issue during space creation. Search functionality has been expanded to look for matches in both file names and label names when filtering custom data. Users can now find their data by searching for either the file name or any associated label names, making data discovery more flexible and comprehensive. Simplified the organization subscription validation process by removing the legacy billing system and trial balance enforcement. Users will now receive clearer subscription status messages, and the billing information endpoint has been streamlined to support the new subscription model. This change affects how subscription status is checked and displayed but does not impact actual subscription features or limits. Introduced a new access token management system for API authentication, replacing the legacy API key pattern. Users can now create and manage multiple access tokens per app with granular permissions through new endpoints '/access-tokens'. The system includes automatic token generation during app creation and the ability to list all tokens for an app, with improved security through permission-based access controls. Enhanced conversation organization by sorting threads by most recently modified within their respective time groups (Today, Yesterday, Last Week, This Year, and Older). This improvement ensures the most recently active conversations appear first in each section, making it easier to find recent discussions. Added ability to dismiss the 'Start Tour' prompt box that appears for new users in their first workspace. Users can now either start the guided tour or close the prompt using a new dismiss button in the top-right corner. The tour state is properly reset when dismissed, preventing the prompt from reappearing. Added an interactive product tour feature using react-joyride v2.8.2 to help new users learn the platform's interface and features. This introduces step-by-step guidance and tooltips that can walk users through key functionality of the application. Chat conversations in the sidebar are now sorted by when they were last modified instead of creation date. This change ensures your most recently active conversations appear at the top, making it easier to find and resume recent chats. Added comprehensive event tracking across the application using Google Analytics and Twitter Pixel. Users' interactions are now tracked for actions including copying code blocks, switching organizations, creating new organizations, sending chat messages, using reference systems (@), and upgrading plans. Also added Twitter conversion tracking to better measure payment confirmations and sign-ups. Implemented subscription-based access control across key API endpoints including app creation, chat completions, playground access, and organization management. Users now have daily request limits based on their subscription tier (Free, Pro, etc.) with accurate usage tracking. Organizations are limited to one free tier per billing email, and request counts are tracked per organization on a daily basis. Fixed a bug where empty system prompts were being passed to language models unnecessarily. The system now only injects the system prompt from space settings when instructions are actually present and not empty. This change also removes the automatic inheritance of max\_tokens and temperature from app settings in playground requests. Introduced a new billing system for Spaces that supports tiered subscriptions with configurable seats (1-100 per organization). Organizations can now manage their Spaces subscription through a new billing portal, including features like automatic tax handling, subscription upgrades/downgrades, and customizable seat quantities. The system integrates with Stripe for payment processing and includes automatic subscription status tracking. Added support for CSV file ingestion in the RAG system. CSV files are now automatically converted to HTML table format during processing, with each row rendered as a complete table containing both headers and values. This enables better structured data representation and improves retrieval accuracy for tabular information. Improved the web search and URL ingestion functionality to automatically follow HTTP redirects when fetching content. This ensures that URLs which have moved or redirect to other locations will be properly scraped instead of failing, making the RAG system more robust when processing web content. Completions now inherit default values from space settings when not explicitly provided in the request. This includes automatically applying the space's max tokens, temperature, and system instructions. Previously, these settings had to be manually specified in each request. Added GPT-4o-2024-08-06, a new OpenAI model variant with 128,000 token context window that supports structured outputs, streaming, function calling, and JSON mode. The model features improved token pricing at $0.015/1K tokens for completion and $0.005/1K tokens for prompts. This model version specializes in larger output token counts compared to previous versions. Organization administrators can now manage members in spaces regardless of their space-specific permissions. This includes editing permissions and removing members, even if they are listed as the current user in that space. A new blue info banner indicates when an org admin is managing a space with elevated permissions. Added Google's experimental Gemini 1.5 Pro (gemini-1.5-pro-exp-0801) model with a 2M token context window. The model supports streaming and chat capabilities, with token costs of $0.00350/1K prompt tokens and $0.0105/1K completion tokens. Also updated the context window size to 2M tokens for existing gemini-1.5-pro and gemini-1.5-pro-001 models. Added new API endpoints and permissions to support the TryPulze.com Space Widget integration. This includes new endpoints for listing app models, retrieving conversations, and managing labels with access token support. Enhanced security with granular permissions for model listing, conversation access, and label management through the Space Widget. ## July 2024 Introduced a new access token system allowing programmatic access to apps with fine-grained permissions. Applications can now create and manage access tokens with specific permissions like data retrieval, conversation creation, and custom data management. This includes a new endpoint for retrieving token permissions and enhanced security through hashed token storage. Chat completion requests now automatically inject system instructions from app settings when no system message is provided. This allows organizations to set default system prompts at the app level that will be prepended to all chat conversations that don't explicitly include a system message, ensuring consistent behavior across interactions. Changed permission level required for deleting conversations from Viewer to Editor access. Users now need Editor-level permissions to delete conversations within an app, providing better access control and data security. Enhanced the app settings API to include current user permissions when retrieving app details. Users can now see their specific access levels (admin or custom permissions) within each app space. This change improves permission transparency and helps users understand their access rights within the platform. Fixed a permissions issue where organization admins were being required to have full admin permissions instead of just app admin permissions to access applications. Organization users with admin:app permissions can now properly access all apps within their organization without requiring broader admin privileges. Added a user-friendly error state with visual feedback when users attempt to access a Space they don't have permission for or that doesn't exist. The new UI shows a frown face icon and clear error message explaining the access issue, replacing the previous basic loading state. Also added loading skeletons for conversations and better error handling throughout the Space access flow. Added a helpful explanation and step-by-step instructions when attempting to add members to a Space with no available users. The new interface explains that users must first be invited to the organization before they can be added to a Space, and provides clear instructions for both inviting organization members and adding them to Spaces. Also improved the spacing of the permissions selection badge. The Fireworks AI Qwen2-72B Instruct model (fireworks/qwen2-72b-instruct) will be deprecated on August 12, 2024. Users should plan to transition to alternative models before this date. Added clearer error messages when a requested model's context window is too small for the input. Instead of a generic 'not allowed' message, users now receive a specific message indicating that their request is too large for the model's context window, helping them better understand and resolve the issue. Organizations now get better default names when created: empty organization names are automatically set to 'org-{id}', and empty display names default to 'My Organization'. This improves the initial setup experience and ensures organizations always have meaningful identifiers. The changes apply to both new organizations and retroactively updates existing organizations with empty names. Streamlined the organization creation workflow by removing automatic Hubspot synchronization that previously ran in the background. Organizations are now created faster with the same core functionality including Stripe customer creation and trial balance setup. The change maintains existing features like currency matching and free trial balance allocation. When users create a new personal organization, the system now automatically creates their first workspace named 'My First Space'. This improvement streamlines the onboarding experience by eliminating the need for users to manually create their first workspace after organization setup. Enhanced the chat interface by adding a new default user avatar icon that appears when messages don't have an associated Auth0 ID. The placeholder now shows a person icon in a white circular border instead of the previous empty sand-colored background, improving visual consistency and user recognition in chat conversations. Improved chat navigation by automatically synchronizing the URL with the conversation ID when sending the first message in a chat. When users start a new conversation, the browser URL will now update to reflect the correct conversation ID, making it easier to share or bookmark specific chat sessions. Increased the token estimation safety margin from 1.5% to 15% when checking if prompts fit within a model's context window. This helps prevent failed requests by being more conservative in estimating token counts, especially for longer prompts and complex content. Changed the default workspace spaces view to show all available spaces instead of only showing accessible spaces. This fixes an issue where the first space wouldn't be created properly due to filtering restrictions. Users will now see all workspace spaces by default when accessing the spaces page. Fixed an issue with app visibility filters where users couldn't properly see accessible apps in shared organizations. Organization admins now correctly see all apps, while other users only see apps they have explicit permissions for. Also improved permission inheritance logic where org-level admin permissions now properly cascade to all apps within the organization. Added the ability to link directly to specific conversation threads through parent\_request\_id. The update includes database changes to track conversation relationships and auth0\_id in requests, enabling better thread navigation and history tracking. Conversations now also automatically include system instructions from app settings when starting a new thread. Added visual indicators and filtering options to show which spaces are accessible to users. Spaces without access permissions are now greyed out and non-clickable, while accessible spaces remain interactive. Users can filter spaces using a new 'Accessible' toggle button alongside the existing 'Mine' filter. This improves visibility of space permissions and helps users quickly find spaces they can access. Added 5 new Fireworks AI models: Llama 3 70B (8K context), Mixtral 8x22B (65K context), Qwen2 72B (32K context), and Llama 3.1 70B/405B (128K context). All new models support streaming, JSON mode, and function calling. Several legacy models have been deprecated: Mixtral 8x7B, Command-R, Claude 3 Sonnet, Gemini Pro, GPT-3.5 Turbo, and Mistral Medium/Small. Added four new Mistral AI models: Mistral Large 2.0 (128K context), Mistral Large Latest (128K context), Mixtral 8x7B Instruct (32K context), and Mixtral 8x22B Instruct (65K context). All models support streaming and JSON output, with Mistral Large and 8x22B also supporting function calling. Pricing varies from 0.7µ¢ to 9µ¢ per token, with separate prompt and completion costs. Fixed an issue where filenames were being truncated in the message sources dialog header. Long filenames will now display in full, improving readability when viewing source documents and files. Changed the default model for local development from llama-3-70b-instruct to gpt-4o-mini, with Claude-3-haiku as a secondary option. Added latency metrics for both models with gpt-4o-mini averaging 1.1ms per token (p50) and Claude-3-haiku at 0.1ms per token (p50). This change optimizes local development performance and model selection. Added Meta's latest Llama 3.1 models via Together and OctoAI providers: llama-3.1-70b-instruct and llama-3.1-405b-instruct. Both models feature a 128K token context window, support for 8 languages (English, French, German, Hindi, Italian, Portuguese, Spanish, Thai), and function calling capabilities. The models include improvements in understanding and instruction following compared to Llama 3.0, with the 405B variant being positioned as the most powerful open-source LLM for complex use cases. GPT-4o-mini is now promoted as the default active model with a score of 1254, replacing GPT-3.5 Turbo and Cohere Command-R as default options. All existing model settings using GPT-3.5 Turbo have been automatically migrated to GPT-4o-mini, and Cohere Command-R settings have been removed. Added support for GPT-4o mini (OpenAI's most advanced small model) with 128K context window. This new multimodal model accepts both text and image inputs, offering higher intelligence than GPT-3.5-turbo at similar speed. Features include function calling, streaming, JSON mode, and vision capabilities, with extremely cost-effective pricing at $0.00015/1K prompt tokens and $0.0006/1K completion tokens. Updated the router's model scoring system to use the latest pulze-wildbench benchmark version dated July 18, 2024 (previously July 10, 2024). This update applies to both development and production environments and may affect how the router selects and ranks AI models for requests based on updated performance benchmarks. Introduced a new file labeling system allowing users to create, manage, and assign colored labels to custom data files. Labels can be created with names (up to 32 characters), descriptions, and custom colors (hex format). Users can bulk update file labels and filter/search files by labels through the new API endpoints. Added Google's Gemma-2-27B-IT model with 8,192 token context window, available through Together AI. This lightweight model supports streaming, penalties, and multiple outputs (n>1). Also updated Qwen2-72B-Chat to be correctly named as Qwen2-72B-Instruct. Token costs are set at \$0.0000008 per token for both prompt and completion. Fine-grained Role-Based Access Control (RBAC) for applications is now enabled by default for all environments. This enhances security by requiring explicit viewer, editor, or admin permissions at both the organization and app levels. Users must have at least viewer:app permissions at the organization level to perform any app-related actions. Improved the model failover chain system with smarter handling of model activation/deactivation. When disabling models, the failover chain is now automatically reset only when necessary, maintaining chain configurations when possible. The system also better handles model candidate selection for apps, with more efficient model whitelisting and validation checks. Improved the @-references panel with clearer loading states, error handling, and helpful placeholder text. Users now see a 'Start typing to show more relevant suggestions' prompt in the search box, and get clear feedback when no results are found. The panel also maintains previous search results while loading new ones for a smoother experience. Fixed an issue where empty messages could be sent by pressing Enter when the editor contained only whitespace. The editor now properly checks for empty content by examining all nodes and their text content, preventing submission of blank messages. Added Cohere's Command-R model to the available model roster and set it as default active. Based on the model scores fixture, Command-R is positioned with a 0.8 routing score alongside Claude 3 Haiku (0.9) and Llama 3 70B Instruct (1.0), making it available for automatic model routing. Improved the handling of RAG (Retrieval Augmented Generation) queries by collecting all search results before processing and introducing a new sorting mechanism. The system now sorts search results by relevance score and includes a new RAG query rewrite plugin that formats retrieved content with proper citations and timestamps. This change enables more comprehensive document retrieval and better organized responses when querying multiple documents. Improved the model comparison interface to only show responses from the same prompt when switching between models. The ModelSwitcher now filters responses based on matching prompts, and comparison views are restricted to responses generated from identical prompts. Added a readOnly state for pinned messages to prevent model switching. Added loading indicators and improved search functionality in the Spaces menu. Users now see clear loading states while content is being fetched, a dedicated 'No Results' message when searches return empty, and smoother menu transitions with fade effects. The search input now updates results in real-time and maintains proper pagination state when clearing searches. Fixed an issue where model comparisons weren't correctly selecting the next response for comparison. The logic now compares responses based on their unique IDs instead of model names, ensuring more accurate response comparisons across different chat messages. Added 2 new models: AI21's Jamba Instruct (256K context) and OctoAI's WizardLM-2-8x22B (65K context). Deprecated numerous legacy models across providers including AI21 Labs (J2 series), Anthropic (Claude 2.x, Claude Instant), OpenAI (older GPT-4 and GPT-3.5 versions), Google (Gemini 1.0, Bison), and Together AI's previous model versions. New default active models include Claude 3 Sonnet/Haiku, Gemini 1.5 Pro/Flash, and various LLaMA-3 70B implementations. Fixed retrieval-augmented generation (RAG) by sanitizing search queries to remove special tokens and improving response accuracy. Also streamlined system prompts by removing redundant language detection instruction and simplifying the template structure. These changes improve the reliability and accuracy of responses when using both file and web search capabilities. Users can now delete conversation threads from their applications. This implements a soft-delete functionality where threads are marked as deleted but preserved in the database. Deleted threads are automatically filtered out from conversation queries and can't be accessed after deletion. Fixed an issue where file search and web search plugin settings weren't being preserved when re-running previous prompts. The system now correctly maintains the original plugin configuration (file search and web search) when users click the re-run button on existing messages. Improved error message display with a new collapsible error component that shows detailed error information. Users can now click to expand/collapse error details, and the system provides more specific error messages instead of generic failures. Also improved handling of stream response errors and authentication token expiration. Improved chat reliability by adding automatic retry functionality when fetching log entries. The system now makes up to 5 retry attempts with 500ms delays between attempts if the initial fetch fails, reducing interruptions in chat conversations when experiencing temporary network issues. Search inputs now support an optional autoFocus property to control automatic focusing behavior. By default, search inputs will automatically focus and scroll into view, but this can now be disabled by setting autoFocus=false. This improves user experience in scenarios like the Spaces Menu where automatic focus is undesirable. Additionally, the mention panel now supports Tab key for selection and displays loading/empty states more clearly. Modified chat interface to consistently display prompt controls even when there are response errors. Previously, prompt controls were hidden when an error occurred, limiting user interaction options. This change improves the user experience by maintaining access to prompt actions regardless of the response status. Fixed an issue where file search and web search plugin settings weren't preserved when comparing responses between different models. The system now correctly maintains the original message's plugin configuration (file search and web search settings) when generating comparison responses. Users can now enable web search and file search plugins individually for each message via the @ menu. Each plugin appears as a removable chip above the message editor, allowing users to toggle them on/off per conversation turn. The plugins' state is preserved and initialized based on the message context. Improved the mention panel interface by integrating file search and web search capabilities directly into the filtering system. Previously, these options were handled separately through showAllFiles and showWebSearch flags. The update streamlines the search experience by consolidating all search types (data, file, and web) into a unified filtering interface with better keyboard navigation and selection handling. Improved prompt template instructions for language matching by making the language detection requirement more explicit. The system now specifically instructs to identify the primary language of the query before responding, ensuring more consistent same-language responses across all interactions. Improved RAG (Retrieval Augmented Generation) functionality to ensure responses are always provided in the same language as the user's query. This enhancement enables more natural multilingual interactions by enforcing language consistency between questions and answers in RAG-based conversations. Updated citation format to appear after sentences rather than immediately after words, making responses more readable. Added virtual whiteboard instruction to help models structure their thoughts, and simplified source template formatting. Citation examples now consistently show citations after complete sentences with proper spacing. Improved the citation system with better formatting and clearer instructions for AI responses. Citations are now displayed with proper markdown formatting, multi-line text is properly quoted, and citation numbers are ordered more logically. The prompt instructions have been refined to ensure more consistent and accurate source citations with explicit examples and stricter formatting rules. Updated the visual feedback when hovering over and selecting mention candidates in the editor interface. Changed the background color from the previous row-hover style to a sand-200 color variant, providing more consistent visual feedback across the application. Enhanced the mention panel with keyboard arrow key navigation and visual selection feedback. Users can now use up/down arrow keys to navigate through mention suggestions, with auto-scrolling to keep the selected item in view. Selected items are highlighted with a hover state background, and Enter key confirms the selection. Augmented RAG (Retrieval-Augmented Generation) responses with UTC timestamps to provide temporal context for each query. This helps users understand when information was retrieved and processed, especially important for time-sensitive queries or when referencing dynamic content. Simplified the syntax for file and URL references in prompts by removing the '@' prefix from pattern matching. File and URL references now use the format '``' or '``' instead of '@``' or '@``'. Improved the model reference list to show only actively configured models for each application, removing inactive or unconfigured models from the results. This change simplifies the model selection experience by displaying only relevant, non-deprecated models that are specifically configured for your application. Added support for @-reference plugins and improved RAG (Retrieval Augmented Generation) functionality through a new plugin system. Users can now list references for apps via a new /references endpoint that returns both custom data and model references. The RAG system has been restructured to use separate file search and web search services, providing more flexible document retrieval capabilities. ## June 2024 Enhanced the playground API to automatically handle streaming responses based on model capabilities. The system now always attempts to use streaming mode first, and gracefully falls back to non-streaming when a model doesn't support it, providing a more consistent experience across different providers. This eliminates unnecessary streaming-related errors and removes provider-specific streaming restrictions. Enhanced chat message readability by making date separator headers sticky at the top while scrolling through messages. Date markers showing 'MM/DD/YYYY' now remain visible at the top of the viewport, making it easier to maintain context when viewing long chat histories. Modified trial settings to provide $50 USD in free credits (up from $20) for new organizations. Production trial period extended to 90 days (up from 21), while test environment trials reduced to 3 days. Trial credits are now only automatically added in local development environments. Enhanced the document retrieval system by increasing the initial search results from 5x the requested amount to 200 documents before reranking, improving the quality of final results. Updated HTML and PDF parsing to preserve structured content (tables, lists) by extracting both plain text and HTML metadata, ensuring richer context for search queries. Also added a proper user agent identifier (PulzeBot/1.0) for web crawling operations. Users can now see email addresses for all app members and member candidates in the app management interface. The members and candidates lists are now automatically sorted alphabetically by name for better organization. This update improves user identification and list navigation when managing app permissions. Improved the app member management interface by adding detailed user information (name, profile picture) to member listings and a new endpoint to view potential members who can be added to an app. Also added member preview functionality to show up to 5 member profile pictures in app listings and added permissions visibility for the current user in app views. Enhanced markdown code block rendering to better handle syntax highlighting for different programming languages. Now explicitly supports Python, JavaScript, Bash, JSON, and plain text, with improved fallback to plain text for unsupported languages. Added overflow scrolling for long code blocks and removed keyboard-style formatting for non-language code segments. Added support for Anthropic's Claude 3.5 Sonnet model (claude-3-5-sonnet-20240620) with a 200,000 token context window. The model supports streaming responses and has token costs of $0.000003 per prompt token and $0.000015 per completion token. Two variants are available: the base model name 'claude-3-5-sonnet' and the specific version 'claude-3-5-sonnet-20240620'. Added support for synchronizing assistant settings with the backend, including customizable instructions, maximum token limits (up to 2000 tokens), and temperature controls (0-1 range, default 0.7). These settings can now be updated and persisted alongside existing weights and policies configurations. Updated the RAG (Retrieval-Augmented Generation) system prompting to enforce a more journalistic tone and stricter source adherence. The system now explicitly requires responses to be derived solely from provided sources, with clearer instructions on citation formatting and a stronger emphasis on avoiding speculation or external knowledge. Changed the 'Start a new thread' button in the empty conversations view from a standard button to an underlined, purple text link. This UI update provides a more subtle and modern look while maintaining the same functionality to create new chat threads. Added responsive mobile design improvements including a new SidePanel component with collapsible header, reorganized navigation with SpacesMenuItem, and a dedicated UserMenu component. Introduced a new Topbar component for mobile view with compact organization switcher and user controls. The layout now adapts between desktop sidebar and mobile topbar views for better usability on smaller screens. Corrected the spacing in the notification toast that appears when selecting a space, removing an extra space before the exclamation mark for better text aesthetics. Improved RAG document retrieval performance by increasing the maximum concurrent connections for fetching chunks from S3 storage from the default (10) to 60. This enhancement allows the system to download multiple document chunks in parallel more efficiently, reducing latency when retrieving search results. The concurrency level is now configurable via the PULZE\_RAG\_DATA\_DOWNLOAD\_MAX\_CONCURRENCY environment variable. Added search functionality and cursor-based pagination for both apps and conversations lists. Users can now search apps by name and filter to show only owned apps. Conversations are now automatically sorted by last modified time, with a new modified\_on column tracking the most recent message in each conversation. Added visual feedback for failed data file processing, showing a red 'Processing Failed' message. Files that are not in 'INDEXED' state can no longer be toggled active/inactive, preventing interaction with files that are still processing or have failed. The file activation switch is now only enabled for successfully indexed files. Users can now customize the name of their spaces when creating them through a new dialog interface. Previously, spaces were created with default names; now users can enter a custom name before creation. The system automatically creates a space called 'My First Space' for new users' first workspace. Added a new UI element that provides users with a logout option when experiencing persistent authentication issues. A clickable 'here' link now appears at the bottom-right of the loading screen, allowing users to manually logout if authentication gets stuck. Also improved error handling for invite code parsing during the organization joining process. Enhanced the organization invitation flow by adding success messages for both accepting and declining invites. Now displays 'Organization joined!' when accepting an invite and 'Organization invite declined!' when declining, followed by an automatic redirect to the dashboard page. Fixed two issues: 1) Model switching now correctly displays all available models and prevents duplicate entries in the model selection dropdown. 2) Added automatic page reload when authentication tokens expire, preventing session-related errors. The model switcher's ranking system now only excludes the top 3 ranked models from the full list, making more models accessible. Improved the organization switcher UI by adding placeholder images with organization initials when no logo is available. The ImagePreview component now supports dynamic placeholders using organization names, displays logos in a consistent size (16x16), and maintains aspect ratio with proper background containment. Organization logos in the switcher now have a unified styling with slate-colored rings and proper scaling. Added standardized response behavior when source documents are not relevant to the user's query. The system will now respond with 'Sorry, I do not have enough information to respond to this query' instead of attempting to generate a potentially inaccurate response from irrelevant sources. Fixed a bug where custom data would be lost when merging sandbox app changes back to the parent app. The code previously attempted to replace custom data during sandbox merging, which could lead to data loss. Now maintains existing custom data integrity during sandbox operations. Enhanced the chat interface with a centered layout (max-width 3xl) and improved message styling. Messages now have a cleaner look with a white background, rounded borders, and better spacing. User prompts and AI responses are more visually distinct with consistent padding and improved avatar alignment. Redesigned the chat interface with a full-width layout and improved message presentation. Messages now span the full width with cleaner spacing, refined message bubbles, and improved visual hierarchy. Additional UI improvements include updated placeholder text from 'conversation' to 'thread', truncated conversation titles in the header, and a more compact header design. Enhanced the Spaces sidebar navigation with a new hover-activated popover menu that displays all available spaces. Users can now quickly view and switch between spaces without leaving their current view, with each space entry showing its name and last modified time. The menu includes hover states and smooth animations for better user experience. Added a new organization switcher component with enhanced visual design, including organization logos, active state indicators, and smoother transitions. Users can now easily switch between organizations with clear visual feedback, see their current active organization highlighted in purple, and access organization management options directly from the switcher menu. URL ingestion now includes a browser user agent (Chrome 58 on Windows) when fetching web content. This improves compatibility with websites that block or restrict requests without proper user agent headers, reducing failed ingestion attempts for RAG document processing. Added dedicated Space Settings page with new navigation menu accessible via a cog icon in space actions. Updated organization management route from '/organization' to '/org' for consistency. Improved Badge component to be more flexible with children props and enhanced the space actions menu with settings and rename options. Added Qwen2-72B-Chat model (131K context window) with support for streaming, penalties, and multiple outputs. Model costs 0.0009¢ per token for both prompt and completion. Additionally, renamed together/mistral-7b-instruct-v0.3 to together/mistral-7b-instruct, and together/mistral-7b-instruct to together/mistral-7b-instruct-v0.2 for better version clarity. Fixed incorrect redirect paths when switching organizations or authenticating by updating URLs from '/spaces' to '/s'. This ensures users are properly redirected to the correct spaces dashboard URL after organization switches and authentication events. Added new organization management functionality with user interface components including member removal and refresh capabilities. Introduced new icons (person-remove.svg and refresh.svg) and placeholders for organization imagery, along with badge components for improved user management interface. The changes include Zod validation library integration (v3.23.8) for enhanced data validation. Introduced a new app-level permissions system with specific roles (viewer, editor, admin) and membership management. Users who create apps are now automatically assigned as app admins, and permissions are enforced at both organization and app levels. This enables more fine-grained access control for app operations like key regeneration, custom data management, and model configuration. Removed the waitlist requirement for signing up to Spaces in production, allowing immediate access for all new users. Also updated authentication infrastructure to use auth.pulze.ai domain for enhanced security and branding consistency. Added support for system instructions in Google Chat models, allowing users to set context and behavior instructions at the conversation level. System instructions are now properly handled for all Gemini models except gemini-1.0-pro-001, where the system message is automatically prepended to the first user message as a workaround. This brings Google Chat models in line with other providers' system instruction capabilities. AI responses now display source documents that were used to generate the answer. Sources appear below each response with document paths and file previews in a grid layout. Users can see which files were referenced, with each source showing a file icon and truncated path name. Removed the flowz feature, including its API endpoints, database tables, and app integration. This change removes the ability to create and manage flow diagrams, validate flows, and associate flows with applications. The feature previously allowed users to define and validate workflow diagrams with app integrations. Expanded payment method listing functionality to retrieve all available payment types for an organization, removing the previous restriction that limited results to card payments only. This change allows organizations to view and manage a wider range of payment methods in their billing settings. Added ability to compare different model responses side-by-side in a split-screen view. Users can now select between multiple model responses for the same prompt, visually compare them in a two-column layout, and easily switch between different model versions using a new model switcher component. The feature includes visual indicators for selected messages and the ability to trigger new comparisons directly from the chat interface. Rebranded application to 'Pulze Spaces' with a new orange circular favicon replacing the default Vite logo. Also improved UI alignment in the Spaces view by adding consistent vertical centering for avatars, names, and action buttons. Improved sidebar navigation with a new collapsible menu design and consistent button styling. Chat interface now shows a cleaner history view with improved message actions, including toast notifications for copy actions and better hover states. Added a new Avatar component with optional badge support and default profile picture fallback. The 'Retrieved Files' button and panel have been renamed to 'Sources' throughout the sandbox interface for better clarity. Additionally, fixed a bug where the Sources button would appear even when no documents were retrieved (RAG not triggered), now correctly hiding when the documents array is empty. Enhanced the model scoring system to use more comprehensive sorting criteria. When multiple models have the same overall score, they are now consistently ordered by additional factors: Quality, Latency, and Cost (in that priority). This provides more stable and predictable model rankings in the API response. Refactored the notebook renaming functionality into a more general 'Space' concept, with improved dialog UI components. The rename dialog now uses a standardized Dialog component and includes better focus management for the input field. This change represents a terminology shift from 'notebooks' to 'spaces' throughout the application. Rebranded 'Notebooks' feature to 'Spaces' throughout the application, including navigation paths and UI elements. Added tooltips for sidebar icons when collapsed and improved error handling in chat functionality. All existing /notebooks URLs now redirect to /spaces, with updated navigation menu items and related UI components. Fixed an overly restrictive default token limit that was set to 200 tokens. The default maximum token limit has been increased to 2000 tokens, allowing for longer message responses in chat conversations. This change also includes refactoring of chat history handling to improve message organization. Added error toast notifications when chat messages fail to send, providing users with immediate feedback about communication issues. Also improved the file upload area by making it clickable, allowing users to trigger the file selector by clicking anywhere in the drop zone in addition to drag-and-drop functionality. Enhanced document retrieval accuracy by implementing a two-phase chunking strategy. Documents are now first split into larger context windows (768 tokens with 128 token overlap), then subdivided into smaller embedding chunks (128 tokens with 32 token overlap). This replaces the previous single-phase approach that used arbitrary merging, resulting in better semantic understanding while maintaining context during retrieval operations. Added automatic retry mechanisms for Qdrant vector store operations (add, delete, retrieve) and Tika PDF text extraction service. Operations now retry with exponential backoff (up to 15 seconds between attempts) for up to 60 seconds when encountering connection or response handling failures. Tika service calls now properly validate response status codes and raise errors on failures (status >= 400), triggering the existing retry logic. Fixed an issue where PDF content extracted from Tika was not being properly parsed into structured nodes before indexing. The fix introduces HTML-based chunking using partition\_html with 'by\_title' strategy, replaces default sub-sentence splitting functions to avoid regex-based splitting that was causing issues, and adds validation to ensure start/end character indices are present for accurate byte range calculations. This resolves problems with PDF text extraction and improves the reliability of document retrieval from S3. Improved the chat interface with more polished UI elements. Message actions (copy and regenerate) are now displayed as icon buttons with tooltips, the model switcher has an improved visual design with a white background, and search inputs across the application now support custom placeholders. Tooltips throughout the application now feature rounded corners for a more modern look. Updated conversation history grouping to use a rolling 7-day window instead of the calendar week. Conversations are now grouped into 'Today', 'Yesterday', and 'Last 7 Days' categories, providing a more intuitive timeline view of chat history. Added new dependencies for URL validation (url-regex-safe) and added new file and link-related icons (FileBlankIcon, LinkIcon) to the icon library. This change appears to be foundational work for upcoming features related to file handling and URL processing in notebooks. Chat conversations are now automatically named using the first few words of the initial prompt (up to 50 characters), instead of being titled 'Untitled Chat'. This makes it easier to identify and locate specific conversations in your chat history. The title is intelligently truncated to avoid breaking words in the middle. Added ability to enable source citations in RAG (Retrieval-Augmented Generation) responses through a new 'citations' feature flag. When enabled, model responses will automatically include citation references (e.g., \[\[citation:1]]) at the end of each sentence to indicate which source documents were used. Citations are formatted consistently and include both single and multiple source references. ## May 2024 Improved RAG (Retrieval Augmented Generation) system with a new document citation format that provides clearer source attribution. Documents are now tagged with sequential citation numbers (e.g., \[\[citation:1]]), and responses include specific citations for each statement. The system also includes improved prompt instructions for more natural, concise responses while maintaining accuracy and comprehensive source references. Introduced a new conversations feature that allows users to maintain persistent chat histories within apps. Users can now create named conversations, view conversation history, and link related requests together using parent-child relationships. The update includes new API endpoints for creating, updating, deleting, and retrieving conversations, plus UI support for managing conversation threads. Updated terminology in the UI from 'conversation' to 'chat' for better consistency and clarity. Changes include renaming the 'Recent Conversations' header to 'Recent Chats' and updating the dialog title from 'Rename conversation' to 'Rename chat'. Updated navigation labels and section titles to use 'Chat History' instead of 'Conversations' for better clarity. Also standardized capitalization of 'Customize Assistant' in the interface. These changes make the UI terminology more intuitive and consistent. Increased the default maximum token limit for chat responses from 200 to 2000 tokens, allowing for significantly longer model responses without manual adjustment. This change provides more comprehensive responses while maintaining the same temperature setting of 0.7. Fixed a z-index issue with the model switcher dropdown menu that could be hidden behind other UI elements. The model selection popup now consistently appears above other page content for better usability. Added new Google models including Gemini-1.0-Pro-001 (32K context), Gemini-1.5-Pro-001 (1M context), and Gemini-1.5-Flash-001 (1M context). Deprecating Gemini-1.5-Flash-Preview and Gemini-1.5-Pro-Preview (June 24, 2024) and setting end dates for Gemini-1.5-Flash-001 and Gemini-1.5-Pro-001 (May 24, 2025). Updated pricing for Gemini Pro models and added aliases 'gemini-1.5-pro' and 'gemini-1.5-flash' for latest stable versions. Added Mistral-7B-Instruct-v0.3 model through Together AI provider with a 32K context window. The model supports chat, streaming, penalties, and multiple outputs (n), with competitive pricing at \$0.0002 per thousand tokens. This instruct-tuned version of Mistral-7B-v0.3 is initially set as inactive by default. Improved RAG (Retrieval Augmented Generation) response quality by refining the prompt structure and instruction clarity. The system now provides more natural, concise answers without referencing the context explicitly, and includes clearer guidelines for the AI to process information step-by-step while maintaining response accuracy. Enhanced the chat interface with smarter auto-scrolling that responds to scroll direction and preserves user scroll position when reading history. Improved user name display in the sidebar to handle long names with truncation. Added edit icon assets and renamed notebook dialog functionality for better UX. Added support for tracking when users accept terms and privacy policies by storing acceptance timestamps from Auth0. Also improved test model visibility logic to show test models only when debug mode is enabled, rather than based on environment. These changes ensure better user consent tracking and clearer test model handling. Added ability to rename notebooks through a new dialog interface, along with react-hot-toast notifications for user feedback. The dialog includes a text input field with auto-focus and integrated API updates to persist name changes. Removed whitelist requirement for user signups in the development environment, while maintaining the waitlist restriction in production. New users can now freely create accounts and organizations when using development instances without needing to be on an approved whitelist. Users can now rename notebooks directly from the chat interface. A new rename option appears in the notebook header menu, opening a dialog where you can enter a new name. The change is immediately reflected across the interface and in the notebooks list. The notebooks page now supports vertical scrolling, allowing users to view all notebook content even when it extends beyond the visible area. Previously, content that extended below the viewport was inaccessible, but now users can scroll through their entire notebooks list and content. Added a dropdown menu for user logout when the sidebar is collapsed, replacing the direct click-to-logout behavior with a proper menu interface. Implemented a page loader component that displays during chat loading instead of plain text. When no notebooks exist, the app now automatically creates a new notebook and redirects to it, eliminating the empty state screen. Streamlined the user interface by temporarily hiding several navigation items including Recipes, Calendar, Collections, Data, and Saved from the sidebar menu. Also removed the Chat tab button and Data icon button from the chat header for a cleaner interface. Additionally, the chat route was restructured to display in a full-height layout, and the Model Settings panel title was updated from 'Edit SMART mode' for better clarity. Implemented toast notifications using react-hot-toast library to provide user feedback when creating or deleting notebooks. When a notebook is created, a success message displays "Using \[notebook name]!" and when deleted, shows "\[notebook name] removed!". Notifications appear in the bottom-right corner with customized durations: 2 seconds for success messages and 5 seconds for errors. The browser tab title has been updated to display 'Frontend v2' instead of the default 'Vite + React + TS' template text. This provides better branding and makes it easier to identify the application when multiple tabs are open in your browser. Fixed an issue where outlined buttons would not consistently display their borders due to CSS specificity conflicts. The border styling now uses the !important flag to ensure outlined buttons always show their border when not selected, improving visual consistency across the interface. Replaced the direct logout button with a dropdown menu in the sidebar user section. The menu now appears when clicking the three-dot icon next to your profile, providing a cleaner interface with a 'Log out' option in a properly styled dropdown menu. This improvement also removes debug information that was previously displayed on the chat page. Added the ability to delete notebooks through a new dropdown menu in the notebook header. The dropdown menu, accessible via a button with chevron icon, includes a delete option with a trash icon that allows users to permanently remove notebooks. This feature utilizes the HeadlessUI Menu component for a polished dropdown experience. The 'New notebook' button on the Notebooks page is now fully functional. Clicking the button creates a new notebook via the /apps/create API endpoint and automatically navigates you to the chat interface for that notebook. The button also includes enhanced visual states with hover, focus, and active styling for better user feedback. The chat input box can no longer be manually resized by dragging the corner. The textarea now maintains its automatic height adjustment based on content while preventing user-initiated resizing, creating a more consistent and streamlined chat interface. The chat input box now automatically receives focus when you open the chat page, allowing you to start typing immediately without clicking into the text field first. Additionally, removed the placeholder message 'How can I help you today?' that previously displayed when no messages were present. Adjusted the spacing and padding in the chat message interface for better visual consistency. User avatars and message content now align more precisely, with profile pictures and AI icons positioned using 3px padding (reduced from 4px), and message text using 3px left padding instead of the previous 16px margin-left layout. The vertical spacing between user name and message content has also been increased from 2px to 4px for improved readability. The chat input box now automatically grows in height as you type multi-line messages, up to a maximum of 50% of the viewport height. This makes it easier to compose and review longer prompts without manually resizing the text area or scrolling within a fixed-size input box. Adjusted the padding of the Send button in the chat box to improve visual alignment and spacing. The button now has consistent horizontal padding (px-3) for better appearance. Also disabled mock store functionality to ensure real data is used in the chat interface. Improved the visual design of the model switcher in chat by replacing the checkmark icon with a more prominent check-circle icon and changing the highlight color from blue to purple for better visibility of the currently selected model. Replaced the placeholder "Page Loader" text with a proper loading screen component featuring an animated Pulze logo. The loader now displays a centered, pulsing Pulze logo (100px) while the application authenticates and initializes, providing better visual feedback during page load. Fixed an issue where disabled button text color could be overridden by other styles, ensuring disabled buttons always display the correct muted text appearance. Also improved the authentication experience by showing a 'Redirecting to login' message when users are not authenticated, and updated the model switcher label from 'recommended for your prompt' to 'recommended for your message' for clarity. Page headers can now display action buttons on the right side. The Notebooks page now features a prominent purple "New notebook" button in the header, making it easier to create notebooks. The Button component also supports a new primary type with purple styling (#986BFF) and enhanced outlined button states with selected styling. The sidebar collapse and expand buttons now have a faded appearance with 60% opacity, making them visually less prominent. This enhancement to the Button component includes a new 'faded' prop that reduces the visual weight of buttons when needed. Additionally, navigation menu item spacing has been increased with more vertical padding for improved readability. Redesigned the chat input box with a dedicated 'Send' button that becomes active when text is entered, replacing the keyboard-only submit method. Updated button component styling with new 'fill' appearance (black background with white text), improved disabled state styling (gray background and text), and refined ghost button styles. The chat textarea now includes an enhanced placeholder text prompting users to 'Ask a question, start a conversation, or type / for commands'. The sidebar navigation can now be collapsed to provide more screen space for the main content area. This change includes updates to the sidebar icons and layout structure to support the collapse/expand functionality, improving the overall user interface flexibility. Added two new Google models: Gemini 1.5 Pro Preview (1M context) for complex reasoning tasks and Gemini 1.5 Flash Preview (1M context) for high-speed operations. Also scheduled deprecations for several Google models: Chat-bison and Text-bison\@002 models (Oct 9, 2024), Text-unicorn\@001 (Nov 30, 2024), and Gemini 1.5 Pro Preview 0409 (June 14, 2024). Both new models support streaming and chat functionality. Removed redundant onClick navigation handler from the Chat button in the header that was causing it to navigate to the same page. The button now correctly displays as selected without triggering unnecessary page reloads when already on the chat page. The sidebar collapse icon now uses a muted ghost color (text-ghost-muted) instead of the default color. This improves visual hierarchy and consistency with the overall sidebar design, making the interface more polished and easier to scan. Improved navigation menu items with rounded corners (rounded-lg) and active state highlighting using custom ghost theme colors. Added click navigation from the chat header 'Notebooks' text, making it easier to return to the notebooks list directly from the chat interface. Overhauled the application's routing system with new layout structure, moving from a single Layout component to a Root component with nested routing. Added comprehensive authentication context using Auth0 that automatically handles login redirects, token management, and API instance creation with authorization headers. Updated the application logo with a larger, redesigned version featuring a new color scheme (dark green background #003C00 with cream text #FFF5DC). Model names now display with their snapshot dates using the @ symbol format (e.g., gpt-4\@2024-05-13). The ModelDisplay component has been enhanced to include an optional 'at' field that appends the snapshot date to the model name when available, providing clearer model version information throughout the interface. The model switcher now includes a search box that allows you to filter models by typing. The search matches against model names and providers (e.g., 'OpenAI gpt-4' becomes searchable as 'openaigpt4'). The dropdown also now displays all available models from both base and custom model settings, sorted alphabetically, making it easier to find and switch between models during conversations. Chat messages now automatically scroll to show the latest content as new messages arrive. When you manually scroll up to view older messages, auto-scrolling pauses and a scroll-to-bottom button appears, allowing you to quickly jump back to the most recent messages. The system intelligently detects when you're near the bottom (within 40 pixels) to resume auto-scrolling. Corrected a typographical error in the Assistant side panel where the label above the model selection dropdown was incorrectly displayed as 'Mode' instead of 'Model'. This fix ensures the UI accurately reflects that users are selecting an AI model, improving clarity and preventing confusion. Implemented dynamic model selection in the Assistant panel that automatically configures failover models. When switching from SMART mode to a specific model, the system now updates the app's failover chain configuration via API. The selected model persists and enables single-model mode, while selecting SMART mode disables the failover chain. Added two new configuration sliders in the chat assistant panel: a Creativity slider (controlling temperature from 0 to 1 in 0.1 increments) and a Max Tokens slider (allowing values up to 32,750 tokens in 50-token steps). Both sliders include visual input fields with improved styling and support for decimal values, giving users fine-grained control over AI response generation parameters. Enhanced the chat interface header with a breadcrumb navigation showing 'Notebooks / \[App Name]'. The header now displays the current notebook name dynamically loaded from the API, with a loading state while fetching. Also removed debug information (prompt and response IDs) from the chat input area for a cleaner interface. Added a new button with a broom icon in the top-right corner of the chat area that allows users to clear the entire conversation with one click. The button appears muted when there are no messages and becomes active when messages are present. Also improved the model selection display by consistently showing model names with their providers throughout the assistant settings panel. Extended the maximum processing time for document ingestion tasks from 30 minutes (1,800,000ms) to 60 minutes (3,600,000ms). This allows larger documents and data sources to be fully processed without timing out, improving reliability for ingesting substantial content into the RAG system. Increased the maximum time limit for document ingestion operations from the default to 30 minutes (1,800,000 milliseconds), allowing larger documents and datasets to be processed without timing out. Also increased production RAG worker replicas from 3 to 5 to improve ingestion throughput and handle more concurrent document processing tasks. Added a new 'SMART' model option alongside existing provider models in the assistant configuration panel. Users can now select between SMART mode (intelligent automatic model routing) and manual model selection from available providers. The model selector displays provider logos and model names in a dropdown menu with improved visual presentation. Fixed an issue where the mode edit button in the Assistant panel was always enabled regardless of the selected mode. The edit button is now properly disabled when manual mode is selected, preventing users from attempting to edit settings that are only applicable to smart mode. Also improved button styling to show a visual disabled state with reduced opacity. Introduced interactive range sliders in the Smart Mode settings panel, allowing users to fine-tune Quality and Speed parameters with precise control. The sliders support custom min/max values, step increments (0.05 intervals), optional value display, and inverted ranges. Users can now adjust model routing weights through an intuitive visual interface with real-time feedback and step markers. Added the ability to set custom instructions for the AI assistant (e.g., 'You're a helpful assistant') through a new text area in the assistant panel. Implemented optimistic UI updates for app configuration changes, which means changes to assistant settings now appear instantly before server confirmation, providing a more responsive user experience. Also added React Query DevTools for better debugging capabilities. Added three new icons to the interface: a dollar-circle icon for cost indicators, a speedometer icon for performance/speed metrics, and a stars icon likely for quality or smart features. These icons are now available for use throughout the application, particularly for smart mode editing features and cost/performance visualization. Implemented automatic redirection for users who land on the /auth route after completing authentication. Users are now seamlessly redirected to the /chat page instead of staying on the auth endpoint, providing a smoother post-login experience. Introduced a new Assistant panel that appears as a 400px sidebar on the right side of the chat interface. Users can now toggle the Assistant panel open and closed using a close button in the panel header. The panel includes a dedicated header with the title 'Assistant' and improved layout structure for future assistant configuration options. Added markdown rendering capabilities to the application using react-markdown (v9.0.1) and highlight.js (v11.9.0). Users can now view formatted markdown content with syntax-highlighted code blocks, enabling better display of documentation, comments, and technical content throughout the interface. Added new copy and reload action buttons to chat messages with two new icons (CopyIcon and RepeatIcon). Users can now easily copy message content to clipboard or reload/regenerate responses directly from the message interface. The buttons use a new outlined appearance style with customizable sizes (xs, sm, md, lg). Introduced a new model switcher interface that allows users to change AI models during conversations. The switcher displays recommended models based on the prompt and shows provider logos (via providerLogo function) alongside model names. Users can now easily switch between different AI providers and models mid-conversation, with the currently selected model indicated by a checkmark. The feature includes search functionality and visual indicators for model selection. Fixed the cursor pointer display in the model switcher dropdown to show on the entire clickable row instead of just the inner content area. The cursor now correctly indicates the full clickable area when hovering over model options in the dropdown menu. Implemented a new model selection interface using Headless UI components (@headlessui/react v2.0.3) with floating UI elements for better positioning and accessibility. The model switcher includes support for virtualized lists (@tanstack/react-virtual v3.5.0) for efficient rendering of large model lists, along with improved focus management and keyboard interactions through React Aria utilities. Implemented core conversation feature with support for real-time message streaming using Server-Sent Events. Added state management with Zustand and Immer for conversation handling, TanStack Query for data fetching, and Axios for HTTP requests. This enables users to have interactive conversations with streaming responses from the backend API (configured to connect to [http://localhost:8080/v1](http://localhost:8080/v1)). Corrected the scoring scale for all model benchmark scores to use decimal values between 0-1.0 instead of 0-10. This affects 29 models including GPT-4, Claude 3, and various Mistral/Llama models. For example, GPT-4's score was adjusted from 8.8 to 0.88, and Claude-3-opus from 8.1 to 0.81. Added three new language models: GPT-4o (with function calling support), Qwen 1.5 110B Chat (32K context window) via Together AI, and Hermes-2-Pro-Llama-3-8B (32K context window) via OctoAI. The Hermes model is an open-source Llama 3 fine-tune optimized for conversational and reasoning tasks, while Qwen 1.5 110B is a large-scale decoder-only transformer model. Introduced a new chat page with a sticky textarea input at the bottom and scrollable message area. Added Auth0 authentication integration with login/logout buttons and user avatar display in the header. Enhanced header navigation with a new menu icon, clickable Chat button for routing, and improved visual styling with proper spacing and color tokens (ghost-muted, ghost-selected-text, ghost-selected-bg). Implemented a new application header with a navigation bar that includes a sidebar menu toggle, a Chat button with message icon, and quick access icons for AtomAlt and Data features. The header has a fixed height of 56px with border styling and uses a flexbox layout for consistent spacing and alignment across the interface. Code blocks rendered in markdown content now display a copy button, making it easier to copy code snippets to your clipboard. Previously, the copy button was only available on standalone code blocks, but now appears on all code blocks within markdown content as well. Updated token pricing for OctoAI models across different sizes: 7B/8B models ($0.15), 13B models ($0.20), Mixtral-8x7B ($0.45), 32B/34B models ($0.75), 70B models ($0.90), and Mixtral-8x22B ($1.20). Several models will be deprecated on May 13, 2024: OctoAI's CodeLlama (7B, 13B, 34B) and Llama 2 (13B, 70B). Additionally, removed the Groq/Llama-2-70B-chat model. Deprecated Groq's LLaMA-2-70B-Chat model. Updated token pricing for OctoAI models, with all 7B and 8B models (including Mistral, Code Llama, Llama 2, Llama Guard, and Llama 3) now priced at \$0.15 per million tokens for both prompt and completion. Removed Groq's LLaMA-2-70B Chat model (groq/llama-2-70b-chat) and its associated settings from the available model options. Users who were using this model will need to switch to an alternative model. ## April 2024 Added two new Llama 3 models from OctoAI: llama-3-8b-instruct (8K context) and llama-3-70b-instruct (8K context). Both models support streaming, JSON output, and custom sampling parameters. The 70B model is set as default active and offers higher performance at a higher cost (0.6/1.9µ¢ per token vs 0.1/0.25µ¢ for 8B). These models are instruction-tuned for dialogue and reportedly outperform many existing open-source chat models. The default number of top similarity search results returned when querying model scores has been increased from 5 to 10, providing more comprehensive scoring information. This change is now configurable via a new --top-k command-line flag, allowing users to customize the number of results based on their needs. The Learning Hub page now uses a two-column layout that displays tutorial videos alongside their corresponding step-by-step instructions. This layout improvement makes it easier to follow along with training courses on topics like Prompt Engineering, Creative Writing, and Model Comparison by allowing users to view both the video and instructions simultaneously without scrolling. Fixed an issue where the file drop area's background color would not render correctly when the application was built for production. The background color now properly displays as a semi-transparent pulze-300 color when not dragging, and switches to a blue background when dragging files over the drop zone. Changed the file drop area visual styling from purple to blue colors for better consistency. When dragging files over the drop zone, the border and background now display in blue (blue-v2-500) instead of purple, and the border width has been standardized from 2px to 1px throughout the component for a cleaner appearance. Introduced a new Learning Hub accessible from the Tools section in the sidebar, featuring educational video content on prompt engineering. The hub includes beginner-level tutorials covering prompt engineering introduction, creative writing for short stories, poetry generation, and additional training materials with embedded Loom video lessons in an expandable accordion interface. Added OctoAI's Mixtral-8x22B Instruct model with a 64K context window, supporting chat, streaming, JSON output, and penalties. Updated specifications for existing OctoAI models: Mistral-7B (32K context), Mixtral-8x7B, CodeLlama-7B/13B (16K context) with revised token pricing. The new Mixtral-8x22B model excels in multiple languages including English, French, Italian, German, and Spanish, with strong mathematics and coding capabilities. Modified sandbox application creation to always inherit custom data from parent apps by setting use\_parent\_custom\_data to true. This ensures consistent data handling between original apps and their sandbox versions, improving testing and development workflows. Fixed the custom data file upload interface to properly display a drag-and-drop area when dragging files over the page, with visual feedback showing a purple dashed border during drag operations. Also fixed the InModalOpener component to properly trigger onClose callbacks when modals are closed, ensuring cleanup actions execute correctly. The drag-and-drop area now correctly handles file drops and dispatches events properly even when rendered within dialogs. Removed the legacy GPU node pool configuration that used NVIDIA Tesla T4 accelerators in favor of the newer L4 GPU instances. This cleanup affects both development and production environments, removing T4 nodes that were previously deployed in us-west1-a and us-west1-b with local SSD storage. Users should now utilize the gpu-l4 node pools with g2-standard-4 machines for GPU-accelerated workloads. The sandbox datasources section now supports drag-and-drop file uploads with real-time upload percentage display. Files can be dragged directly into the sandbox area, with invalid file types automatically rejected and filtered out. A visual drop zone indicator appears during drag operations to guide users. You can now upload files to custom data by dragging and dropping them directly onto the files list. A visual drop zone with a purple dashed border appears when dragging files over the area, making it clear where to drop your files. This complements the existing file upload functionality with a more intuitive interface. Added Llama-3-70B-Instruct model through Together AI and Groq providers to the default active model list. This expands the availability of one of Meta's largest and most capable instruction-tuned models across multiple providers. Added support for two new Llama 3 models from Groq: llama-3-8b-instruct and llama-3-70b-instruct. Both models feature an 8,192 token context window, support for streaming and chat completions. The 8B model costs $0.05/million prompt tokens and $0.1/million completion tokens, while the 70B model costs $0.59/million prompt tokens and $0.79/million completion tokens. Updated terminology in the application selector component from 'Apps' to 'Projects' for better clarity. The label in the log filters and component tests now displays 'Projects' instead of 'Apps', making the interface more intuitive and aligned with standard project management terminology. Enhanced the custom data upload interface with individual progress tracking for each file being uploaded. Files and URLs are now uploaded in parallel instead of as a single batch, with separate progress indicators and success/error handling for each item. Added file preview functionality and improved error messages to show which specific uploads succeeded or failed. Introduced new features for managing custom data in apps, including the ability for sandbox apps to use parent app's custom data through the 'use\_parent\_custom\_data' setting. Added new endpoints for listing, updating, and batch deleting custom data files, with improved file metadata management and search capabilities. Custom data files now have an 'active' status for better state management. Added support for two new Llama 3 instruction-tuned models through Together AI: llama-3-8b-instruct (8K context) and llama-3-70b-instruct (8K context). Both models support streaming, multiple outputs (n), and custom penalties. Pricing is set at $0.0002 per 1K tokens for 8B and $0.0009 per 1K tokens for 70B. The Mixtral-8x22B Instruct model from Together AI provider has been removed from the default active models list. This change affects model availability in the default configuration, though the model may still be accessible if explicitly enabled. Added two new large language models: Mixtral-8x22B Instruct (65K context) and WizardLM-2 8x22B (65K context) through Together.ai. Mixtral-8x22B Instruct is enabled by default while WizardLM-2 is available as an optional model. Both models support streaming and penalties, with token pricing at \$0.0012 per 1K tokens. Additionally, increased the OctoAI Mixtral-8x22B fine-tuned model's context window to 65K tokens. Added Gemini 1.5 Pro Preview (1M token context window) and Gemini 1.0 Pro-002 models. Updated pricing for all Gemini models with text-only capabilities costing $0.000000125/token for input and $0.000000375/token for output. Gemini 1.5 Pro Preview features higher pricing ($0.0000025/token input, $0.0000075/token output) and supports multimodal inputs including image, audio, video, and PDF files. Significantly enhanced the speed and accuracy of document retrieval in the RAG engine by adding a cross-encoder reranking service (BAAI/bge-reranker-base model) running on GPU infrastructure, and implementing Redis caching (1GB Standard HA instance with LRU eviction policy) to store frequently accessed results. This optimization reduces latency when retrieving relevant documents for AI-powered responses. Fixed an issue in compare mode where prompts were being sent to all available models simultaneously. Now limits concurrent model requests to the top 3 ranked models to prevent performance issues and improve response handling. This change applies to both standard and alternative response loading paths in the sandbox. Updated the default active model list to include Claude 3 series (Haiku, Opus, Sonnet), Mistral's new models (Small, Medium, Large), and latest versions of GPT-4 Turbo (2024-04-09) and GPT-3.5 Turbo (0125). Also includes Cohere Command-R/R+, Mixtral-8x7b-instruct variants from Groq, OctoAI, and Together.ai, plus Together's DBRX-instruct model. Added three new OctoAI models: Hermes 2 Pro Mistral 7B (32K context), Qwen 1.5 32B Chat (32K context), and Mixtral 8x22B Finetuned (32K context). Updated specifications for existing OctoAI models: Mistral-7B-Instruct and Mixtral-8x7B-Instruct received token cost updates, while CodeLlama models' context windows were set to 16K. All new models support streaming, JSON mode, and custom sampling parameters. Added support for OpenAI's latest GPT-4 Turbo models: 'gpt-4-turbo' and 'gpt-4-turbo-2024-04-09'. Both models feature a 128,000 token context window and support for vision capabilities, function calling, streaming, JSON mode, and custom sampling parameters. The models are priced at $0.01/1K tokens for prompts and $0.03/1K tokens for completions. Enhanced playground model ranking to return scores for all available whitelisted models instead of being limited to just 3 models. The scoring system now uses percentile normalization for cost, quality, and latency metrics, providing more comprehensive model comparison data. This allows users to see scoring and ranking information for all applicable models when making model selection decisions. Resolved an issue where the RAG worker would crash from out-of-memory (OOMKill) errors during vector store operations. Reduced the batch size from 32 to 16 when processing documents to lower memory consumption and prevent worker crashes. Fixed sandbox chat not initializing with the correct failover and weight settings from the app configuration. Also corrected the model score display logic so that quality ratings now show 'N/A' with grayed-out styling when models haven't been scored yet, and score badges only appear when the score is non-zero, preventing misleading score information from being displayed. Fixed an issue where the sandbox environment would not properly refresh when switching between different sandbox applications. The sandbox provider now correctly remounts when changing to a different sandbox app ID, ensuring a clean state for each sandbox instance. Fixed the broom icon in the sandbox chat header by replacing it with a properly rendered SVG component. The clear conversation button now correctly disables when there are no messages instead of checking for root conversation state. Also removed debug output that was accidentally displaying auto-scroll state in the chat header. Corrected the visual appearance of the creativity/temperature slider in the sandbox interface by updating its color from the incorrect red-500 to the proper red-v2-500 theme color. This ensures consistent styling with the application's design system. Fixed multiple issues in the sandbox interface: added proper loading skeleton states while messages are being processed, improved error handling to display user-friendly error messages with a retry button instead of raw error text, and implemented retry functionality for failed responses. Users can now click a retry link when errors occur to resubmit their request without starting over. Added support for three new Together.ai models: DBRX-Instruct (32K context), DeepSeek Coder 67B (4K context), and Qwen 1.5-32B (32K context). DBRX-Instruct is a mixture-of-experts model optimized for few-turn interactions, while DeepSeek Coder 67B specializes in code-related tasks. All models support streaming and penalty parameters, with competitive pricing ranging from 0.8e-6 to 1.2e-6 per token. Updated the model router to version pulze-v0.1-20240409-alpha1, which adds support for the dbrx-instruct model. This new version replaces pulze-v0.1-20240405-alpha1 and expands the available model options for API users across both development and production environments. Upgraded the model router to a newer version (pulze-v0.1-20240405-alpha1, released April 5, 2024) from the previous version (pulze-v0.1-20240330-alpha1, released March 30, 2024). This update includes improvements to the model scoring and routing logic for better performance and accuracy in selecting the optimal model for each request. Enhanced the RAG system prompt to prevent explicit context citations and redundant source mentions. The prompt now instructs the model to naturally incorporate context information without directly referencing it, while still maintaining the requirement to stay within provided context boundaries. Enhanced model routing accuracy by using the last message in conversation history for route selection instead of the full conversation context. This change optimizes model selection by focusing on the most recent user query, particularly important for multi-turn conversations where the latest message is most relevant for determining the appropriate model. Enhanced the custom data file download endpoint to properly handle URL-type data sources by redirecting users directly to the original URL. Previously, all data sources were treated as file references, but now the system intelligently routes URL data sources to their original locations while maintaining signed URL generation for file-based sources. Users can now provide URLs alongside file uploads when creating custom knowledge bases for their apps. The system accepts both file uploads and URLs simultaneously, with URLs being automatically scraped and processed as HTML content. Each URL is stored separately and can be refreshed by re-uploading, providing more flexibility in maintaining up-to-date knowledge bases. Added support for GPT-3.5 Turbo 0125 model with 16,385 token context window, featuring improved format accuracy and fixes for non-English function calls. The model supports streaming, JSON mode, and function calling. Additionally updated the base GPT-3.5 Turbo model's context window to 16,385 tokens. Pricing is set at $0.0005/1K prompt tokens and $0.0015/1K completion tokens. Added support for Cohere's Command-R+ model, which features a 128,000 token context window and is optimized for complex RAG workflows and multi-step tool use. The model supports streaming, penalties, and multiple completions (n), with token costs of $0.003/1K for prompt tokens and $0.015/1K for completion tokens. Added Nous Hermes 2 Mistral 7B DPO model (32K context window) with improved benchmark performance across AGIEval, BigBench Reasoning, GPT4All, and TruthfulQA. Deprecated over 50 Together AI models including WizardCoder 15B, Falcon series, Llama 2 series, Qwen series, and various other models. Also updated Qwen 1.5 72B Chat's context window to 32K tokens. Added support for securely downloading custom data files uploaded to apps through signed URLs. Files are now accessible through a new endpoint '/custom-data/{app_id}/files/{file_id}' which generates temporary signed URLs valid for 10 minutes, ensuring secure access to uploaded content while preventing unauthorized downloads. ## March 2024 Added support for Claude-3 Haiku (200K context window) and Cohere Command-R (128K context window). Claude-3 Haiku is Anthropic's fastest model optimized for near-instant responses, while Command-R is Cohere's new instruction-following model with improved capabilities for code generation, RAG, and tool use. Additionally, Cohere's Command-Light model now supports chat completion API. Fixed the document splitter to use explicit default values instead of relying on unspecified defaults. Documents are now split into chunks of 500 tokens with a 200-token overlap between consecutive chunks, ensuring consistent and predictable behavior when processing documents for retrieval-augmented generation. Removed the text categorization system that automatically classified prompts into 20 predefined categories (like Arts & Science, History). This change simplifies the model selection architecture by removing the category-based routing and knowledge graph dependencies. API functionality remains the same, but model selection no longer uses categorical classification. Improved the accuracy of model selection by fixing how latency and cost scores are calculated during model ranking. Previously, the scoring system was using inverted values incorrectly, which could lead to suboptimal model selections. The fix ensures that models with lower latency and cost are properly prioritized during the selection process. Fixed an issue where RAG context size was being incorrectly calculated due to only using the last message instead of the full conversation history. Now properly processes the entire message history and converts it to the correct prompt format before model ranking, ensuring more accurate context size calculations and better model recommendations. Fixed an issue where the signup whitelist validation was case-sensitive when matching email addresses. The system now uses case-insensitive matching (ILIKE) for email verification, ensuring users can sign up regardless of email case formatting. Fixed a bug in the signup whitelist validation that was preventing authorized users from registering in production. The fix corrects the parameter passing in the whitelist verification function, ensuring that whitelisted email addresses are properly recognized during the signup process. Fixed an issue with the signup whitelist validation where users couldn't register even when whitelisted. The fix corrects the function call to properly check if an email is on the whitelist before allowing registration in the production environment. Non-whitelisted users will still be directed to join the waitlist. Fixed an issue with the signup whitelist validation by correcting the reference to the signup whitelist checker (crud\_signup\_whitelist). This ensures proper validation of new user registrations against the whitelist in the production environment, maintaining controlled access during the waitlist period. Introduced a new signup whitelist system that allows specific email addresses to register accounts in production, even when general signups are closed. This feature adds granular control over user registration through a new signup\_whitelist database table that stores approved email addresses. Non-whitelisted users will be directed to join the waitlist. Improved error handling when models return unexpected responses, specifically for cases where Mistral returns a 200 status code but no completion tokens, or when responses contain an error finish reason. This ensures more accurate health monitoring and error reporting for model API calls. Enhanced the model monitoring system to track more detailed latency metrics, including p50, p90, and p99 percentiles for per-token latency. Removed the legacy Redis-based EWMA latency tracking in favor of more accurate percentile-based measurements. This change provides more reliable performance monitoring and optimization capabilities. Improved model ranking in the playground by incorporating RAG (Retrieval-Augmented Generation) context when custom data is present. The system now hydrates prompts with relevant context from your custom data before ranking models, resulting in more accurate model recommendations based on the full context of your queries. Fixed streaming responses across all providers (OpenAI, Anthropic, Google, Cohere) to properly end with a 'data: \[DONE]' message. This change ensures clients can reliably detect when a streaming response has completed. Additionally updated the playground endpoint to use the correct 'text/event-stream' content type for Server-Sent Events. Fixed an issue where the rerun button in the sandbox chat could be enabled even when there were no chat items in the conversation history. The button is now properly disabled when the chat is empty (items.length === 0), preventing users from attempting to rerun prompts when there's no conversation context available. Added five new rating icons to enhance the visual appearance of the explanation box: rocket, stars, speedometer-fast, face-neutral, and thumbs-down. These icons complement the existing thumbs-up icon to provide more expressive rating options in the user interface. Model scores now display with color-coded backgrounds that reflect performance levels: gray for N/A (0), gradual color progression for scores 0.5-0.97, and a vibrant gradient (orange/pink/purple) for top scores above 0.97. The 'highlighted' prop was removed in favor of this standardized, score-based visual system. Score displays are now consistent across all views including the sandbox response comparison and explanation popovers. Fixed multiple issues with the explanation popover that appears when hovering over model scores. The popover now correctly positions itself and stays open when moving between the score badge and the "Why this model?" link. Upgraded the floating-ui library from react-dom to the full react package (v0.26.9) to enable better popover interaction handling and prevent premature closing. Fixed how the system handles API key issues and model retries. When a provider's API keys are unavailable or invalid, the system now smoothly skips to the next available model instead of failing. Also improved the retry logic to better handle temporary failures and continue with alternative models when appropriate. Added a new feature that provides transparency into model selection decisions by exposing detailed scoring metrics (Quality, Latency, Cost) for each model candidate. The system now tracks and returns whether each model was actively scored, and includes context about similar historical prompts to explain model recommendations. This helps users understand why specific models are suggested for their use case. Improved model filtering and selection logic to better handle both explicit model requests and failover chains. When users specify a model, the system now properly includes all failover models in the candidate list while maintaining the original selection preferences. The scoring and ranking system has been streamlined to provide more accurate model recommendations while respecting project whitelists. Fixed two issues with RAG (Retrieval Augmented Generation) functionality: 1) RAG now properly works with sandbox applications by checking custom data against the parent app, and 2) RAG engine now gracefully handles cases where no relevant document chunks are found, returning the original query prompt instead of failing. These changes improve reliability when using custom data with sandbox environments. Enhanced model scoring system to properly evaluate RAG (Retrieval-Augmented Generation) hydrated prompts. The system now correctly processes and scores both standard prompts and chat messages when using RAG, ensuring more accurate evaluation of model performance with retrieved context. This includes preserving the hydrated prompt in both regular prompt and chat message formats for consistent scoring. Upgraded the internal model scoring system from pulze-v0.1-20240312 to pulze-v0.1-20240313-alpha1. This alpha version includes improvements to model selection and routing algorithms that may affect which models are recommended for your requests. Test log entries in the logs view no longer display with a yellow warning background color. Previously, rows marked as test entries were highlighted with a yellow background (bg-alert-warning-light), but this visual distinction has been temporarily removed for a cleaner, more uniform appearance in the logs table. Improved RAG (Retrieval Augmented Generation) functionality with better handling of both chat and non-chat completion requests. The system now intelligently retrieves relevant documents based on the most recent query in chat history or the full prompt for non-chat requests. Additionally, updated the Q\&A system prompt template to generate more natural responses that seamlessly incorporate retrieved information without explicitly referencing sources. Improved RAG (Retrieval Augmented Generation) engine to trigger whenever an organization has custom data in any app, not just the current app. Previously, RAG was only triggered when the specific app being queried had custom data. This change makes custom data accessible across all apps within an organization, providing more consistent and comprehensive responses. Improved the playground sandbox functionality with new app model configuration capabilities. Test models are now non-streamable by default, and the app model configuration system has been updated to better handle sandbox environments. The update includes simplified app creation with default model weights and policies, and improved handling of sandbox modes for testing and development. Fixed how system messages are handled in Anthropic chat conversations by merging system instructions into the first user message rather than sending them separately. This improves compatibility with Anthropic's chat models and ensures system instructions are properly incorporated into the conversation context. Fixed an issue where emoji characters and other special Unicode characters would cause errors or display incorrectly in Google AI model responses. The fix improves UTF-8 character handling in the streaming response parser for both Google AI and Google Chat models, ensuring proper decoding of all Unicode characters including emojis. Modified default model selection weights to prioritize quality (1.0) over cost (0.0) and latency (0.0). This change makes quality the sole default consideration when automatically selecting language models, which should result in higher quality responses out of the box. Updated token pricing for AI21 Labs J2 series models. J2-Ultra now costs $0.002 per prompt token and $0.01 per completion token, J2-Mid costs $0.00025 per prompt token and $0.00125 per completion token, and J2-Light costs $0.0001 per prompt token and $0.0005 per completion token. Fixed an issue where the API would crash when language models returned unexpected finish reasons. Instead of raising an exception, the API now logs these cases and continues processing, improving reliability when working with models that may return non-standard completion statuses. Fixed an issue where the sandbox sidebar would automatically collapse on the first visit to the page. The sidebar now remains expanded by default when no previous user preference is stored, instead of defaulting to a collapsed state. Additionally, improved spacing in the compared response view by adding left padding. Fixed an issue where incomplete JSON chunks from Google model responses could cause the application to crash. The system now gracefully handles partial response chunks from both Google's text completion and chat completion APIs, ensuring more reliable and stable interactions with Google's language models. Enhanced the custom data upload functionality to store files in Google Cloud Storage (GCS) instead of directly in the database. Files are now organized in a structured format (uploads/org\_id/app\_id/filename) within a dedicated RAG component bucket, with duplicate file uploads prevented based on size comparison. The system now marks uploaded files with a 'pending' status for subsequent indexing. Updated token pricing for the Cohere Command model. Input (prompt) tokens now cost $0.0005 per 1K tokens (down from previous rate), and output (completion) tokens cost $0.0015 per 1K tokens. Fixed textarea auto-grow functionality to properly adjust height when content changes. The auto-grow now triggers whenever the textarea value changes using React's useEffect hook instead of on specific keyboard events. Also fixed the positioning of clear and end icons in textareas to appear in the top-right corner instead of vertically centered, and added max-height constraints (50vh) to prevent textareas from growing too large. Improved visual consistency of copy and delete icons in the Playground interface. Updated icon dimensions from 24x24 to 20x20 pixels with refined SVG paths for better rendering. Added new dark-themed variants (copy-dark.svg and delete-dark.svg) to support different UI themes, ensuring better visibility across light and dark modes. Updated model performance scores across the LLM lineup, including significant adjustments for newer models like Claude 3 Opus (8.1), Claude 3 Sonnet (8.2), and GPT-4-0125-preview (9.0). Added new models Gemma-7b-it and adjusted scoring methodology to prepare for upcoming 'Why this model?' feature integration. Added new sandbox environment capabilities with dedicated test request tracking. Database changes now include an 'is\_test' flag to mark requests performed from sandbox apps, enabling better separation between testing and production usage. This update improves the app settings structure with clearer model organization, including separate tracking of active base models, custom models, and failover chains. Removed the `ignore_unsupported_features` policy option that previously allowed requests to proceed even when using unsupported model features. The API will now always validate that requested features are supported by the chosen model, providing clearer error messages when incompatible features are requested. Temporarily disabled new user registrations in the production environment. New users attempting to sign up will now receive a message directing them to join the waitlist. Existing users are unaffected and can continue to use the platform normally. Added support for Gemma 7B Instruct model on Groq, featuring an 8,192 token context window and streaming capability. The model is open-source and based on the same technology used to create Gemini models. Additionally, updated the context window for Together AI's Mistral-7B-Instruct-v0.1 to 8,192 tokens. Enabled streaming capabilities for all MistralAI models by switching to OpenAI's implementation. This update removes support for 'n' concurrent completions but adds streaming functionality, allowing for real-time response generation with MistralAI models. Database schema has been updated to reflect these capability changes. Updated token pricing and features for Cohere Command and Command-Light models. Command model now costs $0.001/1K prompt tokens and $0.002/1K completion tokens, while Command-Light costs $0.0003/1K prompt tokens and $0.0006/1K completion tokens. Both models now support the 'n' parameter for generating multiple completions. Added support for streaming responses when using Cohere language models, allowing for real-time text generation. The implementation handles streamed chunks of text with proper error handling and includes support for temperature, max tokens, frequency penalty, and presence penalty parameters. Note that logit bias, top\_p, and best\_of parameters are not supported by Cohere's API. Added support for Google's Gemma-7B model through Pulze's self-hosted infrastructure. The model features a 4,096 token context window and is available at no cost (0 tokens/request). This open-source model is configured for completion-only tasks (no chat, streaming, or function calling support) and is integrated through Pulze's infrastructure. Added support for streaming responses from Google AI models using server-side events (SSE). This implementation allows real-time, token-by-token streaming responses with all supported parameters including temperature, top\_p, max tokens, and presence/frequency penalties. The feature maintains compatibility with OpenAI's streaming format while adding proper safety attribute handling and usage calculation for Google's models. Fixed an issue where unchecked model providers were not automatically collapsing in the app settings interface. Also corrected the provider counter to accurately display the number of active base models and custom models separately, rather than showing a combined total. Updated Google's chat-bison-32k model from version 001 to 002. The model no longer supports generating multiple completions (n>1) in a single request. This change affects both new and existing model versions to maintain consistency with Google's API capabilities. Fixed an issue where custom app names weren't being properly set during app creation. Previously, the app description field was incorrectly used instead of the name field. Now, users can properly set custom names for their apps during creation, with random names still being generated as a fallback when no name is provided. Fixed the Edit Profile page to display user profile images of any size instead of blocking images larger than 1 megapixel. Also improved the authentication provider information banner to support additional provider types (Google2 and test tokens) and enhanced the copy-to-clipboard functionality with better tooltips that show 'Copied to clipboard' after successful copying. Fixed a bug where playground messages were not properly cleared when changing apps or contexts. The issue was caused by missing the chatItems dependency in the component's dependency array, which prevented the playground from resetting when the conversation state changed. Modified model scoring algorithm to handle unscored models more effectively. Instead of zeroing out scores for models without quality metrics, the system now incorporates their latency and cost metrics into the final ranking. This change ensures more balanced model selection based on all available performance metrics, even when quality scores are unavailable. Users can now create sandbox versions of their apps for testing and configuration purposes. Each user can have their own sandbox instance of an app, allowing safe experimentation with settings and configurations without affecting the production app. Added new fields including app name, description, and sandbox relationships in the database schema. Removed support for NVIDIA Tesla A100 GPU node pools in both production and development environments. The following configurations are no longer available: single A100 (a2-highgpu-1g), dual A100 (a2-highgpu-2g), quad A100 (a2-highgpu-4g), and octa A100 (a2-highgpu-8g) machine types in the us-west1-b location. Users should utilize alternative GPU options such as NVIDIA L4 for their workloads. The 'Sandbox' tab in the app management details section has been renamed to 'Playground' for better clarity. This changes the navigation label and URL routing, but the functionality remains the same. Users will now see 'Playground' instead of 'Sandbox' when viewing application details. Moved the Sandbox tab to the first position in the app management navigation and removed the Playground tab. Temporarily hidden the Sandbox Mode toggle switch while preserving the underlying functionality for future use. Added copy-to-clipboard functionality for model namespaces to make it easier to reference model names. Also updated test configurations to use qwen1.5-0.5b-chat model instead of mistral-tiny and increased test timeout from 10 to 20 seconds. Reordered the App management detail tabs to prioritize Flowz, Playground, and Sandbox before Logs. Added a new Smart Router icon to the icon library and enabled icons to be displayed in card selector components. Updated the Installation panel to include a link to the Sandbox tab for easier navigation between related features. Enhanced the model scoring algorithm by switching from min-max to quantile normalization for cost and quality metrics. This change provides more balanced and robust scoring across models by reducing the impact of outliers, resulting in more reliable model recommendations based on cost, quality, and latency preferences. Fixed an issue where requests to Anthropic's API included additional message properties that could cause compatibility issues. Messages are now properly formatted to only include the required 'role' and 'content' fields when making requests to Anthropic's models, ensuring better API compatibility. Modified the model optimization presets to produce more deterministic results. The cost preset now prioritizes cost at 1.0 with minimal quality consideration (0.1), the latency preset focuses solely on speed at 1.0 with minimal quality (0.1), and the quality preset maximizes quality at 1.0 with slight latency consideration (0.1). These adjusted weights reduce the influence of competing factors, making model selection more predictable and aligned with the chosen optimization goal. Updated the Pulze model scoring system to version pulze-v0.1-20240305 (released March 5, 2024). This version update applies to both development and production API environments and may include improvements to model performance evaluation and routing decisions. The /pricing route has been updated to redirect users directly to the home page (/). Users visiting the pricing page will now automatically be taken to the main landing page instead. Added support for OpenAI's GPT-4 Turbo Preview (128K context window) and Anthropic's Claude-3 models: Opus and Sonnet (both with 200K context windows). Claude-3 Opus offers highest performance for complex tasks with 15/75μ¢ token pricing, while Sonnet provides balanced performance at 3/15μ¢ per token. Additionally, Claude-2 models were updated to use the messages API format, with claude-2 now pointing to claude-2.1 as the default target. Fixed an issue where response scores in the Playground were not being calculated correctly due to incorrect request type handling. The fix updates the abstract provider engine to use the correct request table schema, ensuring accurate response scoring for all playground interactions. All models from the Replicate provider have been deprecated as of March 4, 2024. These models will no longer be available for use through the API. This change affects all previously available Replicate-hosted models. Improved streaming support verification to provide clearer error messages for unsupported providers. Streaming is now explicitly supported for OpenAI, Groq, Together, and OctoAI providers, with helpful error messages for other providers indicating future support is planned. The system now checks both model-level streaming capability and provider-level support. Fixed streaming responses to consistently show the full model namespace (e.g., 'test/openai' instead of just 'test') in streaming chunks. This improvement ensures users see the complete and accurate model identifier throughout the entire streaming response, making it easier to track which model is generating the output. Model labels in the UI now automatically include version information (e.g., "model\@version") when the model has an associated version tag. This makes it easier to distinguish between different versions of the same model at a glance. The version information is appended with an @ symbol after the model name. Enhanced model scoring system to use provider-specific latency benchmarks instead of a default high value. This improvement makes model selection more accurate by using real-world latency data from each provider, or falling back to provider-specific maximums when exact data isn't available. Also added quantile normalization for latency scores to better handle variations between different providers. Added support for streaming responses from language models across OpenAI, TogetherAI, and Groq providers. The feature works with both chat completions and text completions endpoints, allowing real-time token-by-token responses. Streaming responses are automatically handled with proper background task cleanup and token consumption tracking. Enhanced the model scoring mechanism to better handle different Pulze synthetic models (PULZE, PULZE\_V0, PULZE\_V01) with specialized scoring strategies. The update includes a more robust scoring system that considers quality, latency, and cost weights when selecting models, and properly handles both synthetic and fully qualified models. This results in more accurate model selection based on user-specified preferences. Introduced a new tool that helps convert and optimize token costs between different units (per 1K or 1M tokens) into precise decimal values without approximation errors. The tool accepts multiple price inputs (including scientific notation like 1.5e-6) and automatically calculates the optimal decimal places needed for accurate per-token pricing. This enables more precise cost tracking and billing calculations for token usage. Fixed a bug in the app update hook where optimistic UI updates were using the old value instead of the new value. This ensures that when updating app settings, the interface immediately reflects the correct updated state rather than temporarily showing stale data before the server response arrives. ## February 2024 Added 9 Replicate models including Llama 2 (7B/13B/70B base and chat variants), Mistral-7B, Mistral-7B-Instruct v0.2 (32K context), and Mixtral-8x7B-Instruct (32K context). Also added Mistral Large model and deprecated older Replicate models (mistral-7b-instruct-v0.1, mistral-7b-openorca, codellama-13b). New models support streaming and most feature extended context windows up to 32K tokens. Enhanced the model scoring system to support both namespace and direct model name matching when determining model quality scores. Updated to use a new modelscores server endpoint (port 8888) with a simplified scoring response format that provides direct model-to-score mappings rather than the previous complex temperature-based scoring system. Added support for Groq as a new model provider with two powerful models: Llama-2-70B-Chat (4,096 token context) and Mixtral-8x7B-Instruct (32,768 token context). Both models support streaming and multiple completions (n>1), with Mixtral offering significantly larger context window and lower token costs. These models are integrated through Groq's OpenAI-compatible API. Fixed an issue where the API would crash when Gemini Pro blocked a response due to safety filters. The system now properly handles cases where the model returns no text content and missing token counts, returning an empty response instead of throwing an error. Deprecated gooseai/gpt-neo-20b, gooseai/gpt-j-6b, and huggingface/falcon-40b-instruct models. Standardized model names for CodeLLama series (removing 'hf' suffix) and LLaMA variants. Added GPT-4 Turbo with 128K context window, supporting functions, JSON output, and streaming. Model includes competitive pricing at $0.01/1K prompt tokens and $0.03/1K completion tokens. Fixed an issue where Failover App Mode was too restrictive with model selection. Users can now specify any model in failover chains, including custom configurations. The error message has been improved to clearly indicate when no failover models are selected versus when no models fit the context window requirements. Additionally, the test provider now properly handles model namespace resolution in failover scenarios. Added support for dynamic time-based variables in prompts using date, time, and datetime placeholders. When used in prompts, these variables are automatically replaced with the current date (YYYY-MM-DD), time (HH:MM:SS), and datetime (YYYY-MM-DDTHH:MM:SS) respectively. This allows for creating prompts that include current temporal information without manual updates. Enhanced token estimation accuracy by adding a 1.5% safety margin to prevent context window errors. Optimized performance by caching the token encoder, reducing request latency by 250ms. This improvement helps ensure more reliable model selection based on context size requirements while maintaining faster response times. Enhanced model information display to show which base model and prompt custom models are derived from. This adds transparency by exposing the parent\_id, parent model reference, and associated prompt\_id in the model data schema, helping users better understand their custom model configurations. Removed access restrictions that previously limited certain models (like PULZE\_V01) to internal users only. All users can now access these previously restricted models, including the new Knowledge Graph (KG) model, regardless of their organization status. Enhanced error reporting by standardizing error codes across the platform and adding specific error codes like 'E\_REQUEST\_COST\_EXCEEDED' for cost control limits and 'E\_RL\_PULZE' for capacity limits. Increased Flowz execution limit from 10 to 20 iterations and added automated monitoring for potential infinite loops. Also improved internal error tracking for loop detection in Flowz. The copy-to-clipboard functionality now displays a green checkmark icon for 1 second after successfully copying content, providing immediate visual confirmation to users. This replaces the previous behavior where only a toast notification was shown (and only in some cases). The improvement applies to all copy actions throughout the application including API keys and prompt IDs. Cost values in the logs table now display with 8 decimal places instead of 6, providing more precise cost tracking for API calls. This improved precision is especially useful for tracking costs of very inexpensive API calls where minute differences matter. Updated Python integration examples to use the newer OpenAI SDK pattern with the `OpenAI` client class instead of deprecated module-level methods. The code now initializes a client with `OpenAI(api_key=..., base_url=...)` and uses `client.completions.create()` and `client.chat.completions.create()` instead of `openai.completions.create()`. Also updated the default model identifier from 'pulze-v0' to 'pulze' and corrected the base URL to include a trailing slash. Fixed an issue where Google Chat model responses were incorrectly labeled with 'user' role instead of 'assistant' role in the API response. This ensures that model-generated content is properly attributed in chat conversations. The fix also includes database schema updates to handle Together AI model namespaces more efficiently. Fixed an issue where Together AI completion responses weren't properly handling choice indexes. Also improved OpenAI error handling by changing from ServiceUnavailableError to UnprocessableEntityError, and ensured proper JSON response parsing for OpenAI completions. The fix ensures more reliable response handling and better error reporting for both providers. Added support for 9 new open-source models via OctoAI, including Llama-2 (13B, 70B), CodeLlama (7B, 13B, 34B, 70B), Mistral-7B, Mixtral-8x7B, and Nous-Hermes-2-Mixtral. All models support streaming, JSON output, and penalty configurations. Notable context windows include 32K tokens for Mixtral models and 16K for CodeLlama-34B, with competitive pricing starting from \$0.000025 per token. Updated OpenAI SDK implementation to version 1.12.0, adopting new API patterns like openai.chat.completions.create() instead of the deprecated openai.ChatCompletion.create(). Changes include updated error handling patterns (e.g., BadRequestError instead of InvalidRequestError) and base URL configuration using base\_url instead of api\_base. This update maintains compatibility with both OpenAI and GooseAI providers. Fixed email notification behavior when updating organization spending limits. When soft or hard spending limits are modified, the system now automatically resets the alert status, allowing notifications to be sent again when crossing the new thresholds. This ensures organizations receive proper notifications when reaching their updated spending limits. Expanded A100 GPU infrastructure by adding new node pool configurations with 4-GPU (a2-highgpu-4g) and 8-GPU (a2-highgpu-8g) machines in both development and production environments. Production now includes the complete range of A100 configurations: single GPU (a2-highgpu-1g), 2-GPU (a2-highgpu-2g), 4-GPU, and 8-GPU variants, all deployed in us-west1-b with corresponding local SSD counts matching GPU counts. Development environment also received the 4-GPU and 8-GPU configurations, with minimum node count requirements removed from existing 1-GPU and 2-GPU pools. Added two new GPU-enabled node pool configurations to the API infrastructure: a single A100 GPU node pool (a2-highgpu-1g machine type with 1 local SSD, minimum 1 node) and a dual A100 GPU node pool (a2-highgpu-2g machine type with 2 nvidia-tesla-a100 accelerators and 2 local SSDs, minimum 2 nodes). Both node pools are deployed in the us-west1-b location to support GPU-accelerated workloads. Fixed the Select component dropdown to properly constrain its maximum width while maintaining a minimum width of 340px. This prevents the dropdown from extending beyond viewport boundaries when displaying long option text, improving the component's visual layout and usability. Added new test models including 'test-model-default', 'test-model-1' through 'test-model-5', 'test-model-repeat', and specialized test models for error cases and context window testing. These models simulate LLM behavior without making actual API calls, with configurable token usage (up to 1M tokens), customizable error responses, and a small-context (20 tokens) model for testing limits. Includes a model scheduled for deprecation (test-model-will-deprecate) on Dec 31, 2099, and an already deprecated model (test-model-deprecated) from Feb 2, 2022. The initial splash loader now stores its progress state at the window level, ensuring that if the app reloads or navigates during startup, the progress bar resumes from where it left off instead of restarting from 0%. This enhancement provides better visual feedback during app initialization and reduces confusion when page transitions occur during loading. Updated pricing units and token costs for multiple models. GPT-4 preview models now show correct token costs (adjusted by 10x), Together AI models have updated pricing (adjusted by 1000x), and GPT-3.5 Turbo models now show accurate costs (0.0005/0.0015 per 1K tokens). Added specific price units (tokens/characters) for each model provider, with Google models now explicitly billed per character. Fixed API request errors with Google's Gemini Pro model by adjusting safety thresholds from 'BLOCK\_NONE' to 'BLOCK\_ONLY\_HIGH' and removing unsupported frequency/presence penalties. The model endpoint was also updated from streaming to standard content generation, improving reliability and request success rates. Fixed issue with log level configuration not working properly and added support for the 'critical' log level. The logging system now correctly responds to level changes through the internal API endpoint, supporting debug, info, warn, error, and critical levels. Also improved logging implementation across the codebase for better consistency. Added support for configuring model failover chains in applications, allowing users to specify a prioritized sequence of backup models. Users can now define multiple models in a failover chain with specific priority orders, and the system will automatically try alternative models if the primary model fails. This feature can be enabled/disabled per app and includes validation to prevent duplicate models in the chain. Fixed an issue where the autocomplete dropdown would not open when clicking directly on the field. The component now uses a Combobox.Button wrapper to properly handle click events and trigger the dropdown menu. Additionally, added an OutsideClickDetector to properly manage focus state when clicking outside the component. Toast notifications now feature a darker, semi-transparent background (dark gray with 80% opacity) with white text and rounded corners for better visual hierarchy and readability. The App Management Settings page now includes enhanced documentation showing that weights and policies can be configured per-request via headers (X-Pulze-Weights and X-Pulze-Policies), with direct links to documentation. External links now support an optional icon prop to display an arrow (↗) indicator. Success messages for settings updates are now more specific, distinguishing between weight, policy, and benchmark model changes. Fixed an issue where the app details page wouldn't automatically refresh after updating failover model settings. Now when you save changes to your failover model priority order, the app data immediately updates to reflect the new configuration. Additionally, toast notification positioning has been standardized to bottom-center across all pages for better consistency. Added support for Together AI as a new provider and improved compatibility with providers that don't specify choice indices in their responses. The system now automatically assigns index values to response choices, ensuring consistent behavior across all providers including Together AI. Fixed an issue where only active base models were being shown in the documentation and model monitoring tools, instead of the complete list of available base models. Now correctly displays all base models alongside Pulze synthetic models in the documentation table and includes them in monitoring scripts. Fixed two issues with Gemini Pro responses: messages are now correctly attributed to 'assistant' role instead of 'user', and the API properly handles cases where finishReason is not present in responses. Added safety checks that raise clear error messages when responses are blocked due to safety filters. All models from the MosaicML provider will be deprecated on February 29, 2024. Users should migrate their applications to alternative models before this date. This deprecation affects all MosaicML models available through the API. Deprecating the replicate/dolly-v2-12b model effective February 1, 2024. Additionally, introduced a new failover chain system that allows models to be called in a priority order when primary models fail, bypassing the SMART router. Apps can now configure multiple backup models with specific priority levels that will be automatically attempted in sequence. Enhanced Replicate model integration by adding support for frequency\_penalty, presence\_penalty, and top\_p parameters. Also improved max\_tokens handling by ensuring consistent parameter names (max\_length, max\_new\_tokens, max\_tokens) and fixed temperature parameter to default to 0.75 when set to 0. These changes enable finer control over model outputs when using Replicate provider. Fixed an issue where the prompt token calculator was constantly fetching data from the server whenever any prompt data changed. The calculator now only triggers when the actual prompt text changes, significantly reducing unnecessary server requests and improving performance during prompt editing. Improved the model creation interface by extracting the card selection UI into a reusable CardSelector component. The new component supports configurable columns, nullable selections, disabled states with custom labels, and automatic first-card selection. Also added a confirmation dialog when closing the custom model creation modal to prevent accidental loss of changes. Introduced a new Autocomplete component for the frontend that provides real-time search filtering, keyboard navigation, and customizable dropdown positioning. The component includes features like clearable input, error state handling, helper text support, and automatic dropdown placement. Built on @headlessui/react Combobox with virtualization support via @tanstack/react-virtual for handling large datasets efficiently. Removed the July 5th, 2023 knowledge cutoff date restriction from Pulze and Pulze-v0 synthetic LLM expert models. Both models maintain their 8,191 token context windows but can now access more recent knowledge. Pulze remains the latest synthetic model while Pulze-v0 continues as the first synthetic LLM expert model. Added support for Google's language models including Gemini Pro (32K context) and PaLM 2 models: text-bison (8K context), text-bison-32k (32K context), chat-bison (8K context), and chat-bison-32k (32K context). Several models are marked for deprecation on July 6, 2024, including text-bison\@001, text-bison-32k\@002, and chat-bison\@001. All models support streaming and penalties, with token costs of $0.0000005 per completion token and $0.0000025 per prompt token. Added support for Together.ai's foundation model lineup including Yi-34B-Chat (4K context), DeepSeek Coder 33B (16K context), DiscoLM Mixtral 8x7B (32K context), Platypus2-70B (4K context), MythoMax-L2-13B (4K context), and other variants. Models feature varying capabilities with all supporting streaming and temperature controls, while specialized models like DeepSeek Coder focus on programming tasks. Enhanced dashboard readability by adding units to metric values. Requests now show '5 requests', latency shows values in seconds (e.g., '0.5s'), costs display with dollar signs (e.g., '\$10.50'), and errors show counts like '2 errors'. This improvement makes dashboard statistics more intuitive and easier to understand at a glance. API responses now include the original finish reason from the model (like 'length', 'stop', 'content\_filter') instead of always returning 'stop'. This provides more accurate information about why a model stopped generating its response. The splash screen now displays a visual progress bar that automatically animates while the application loads. The progress bar uses an exponential decay algorithm (0.96 multiplier with 40ms intervals) to create a smooth loading animation that fills to 100% once initialization completes, providing better visual feedback during app startup. The Apps page is now the default landing page when accessing the platform (previously Dashboard). The navigation menu has been reordered with Apps moved to the top position, followed by Dashboard, Prompts, Models, and Logs. The Dashboard icon was also updated from a home icon to a statistics/chart icon to better reflect its purpose. Added a search bar to filter models in the Model Pricing page, allowing users to quickly find specific models by name or provider. The search functionality uses the same filtering logic as other model tables and displays a 'no results' message when no models match the search query. This feature was previously only available on other model tables and is now available on the pricing page as well. Tables can now display custom React components when they have no data to show, in addition to the existing string-based labels. The `noRows` prop (renamed from `noRowsLabel`) now accepts either a string for simple messages or a JSX.Element for custom components like interactive empty states. This enables more sophisticated no-data experiences, such as the NoResults component that shows filtered vs. unfiltered empty states with search reset functionality. Corrected a naming inconsistency throughout the frontend where 'Models Pricing' was used instead of the singular 'Model Pricing'. This fix updates the sidebar menu item label, icon references, URL route from '/models-pricing' to '/model-pricing', and internal type definitions to use consistent singular form naming. The dialog for changing app names in the app management detail view can now be dismissed by clicking outside the modal or pressing the Escape key. Previously, users were required to either save changes or use the close button to exit the dialog. Improved the frontend model search functionality to search across multiple fields including namespace, description, and URL. Users can now find models by searching for terms in any of these fields, making it easier to discover models based on their descriptions or repository URLs, not just their namespace. Flowz can now be previewed directly from the Apps Table List without requiring the full App Management context. The FlowzComponent and FlowzModal have been refactored to accept the app object as a direct prop instead of relying on AppManagementContext, enabling readonly Flowz previews from any table view where apps are listed. Enhanced the Retrieval-Augmented Generation (RAG) system by upgrading the keyword extraction model from Mistral Tiny to GPT-3.5 Turbo. This change in the optimized path should provide more accurate keyword extraction while maintaining efficient performance. Added user notification when attempting to use streaming responses. The system now explicitly informs users that streaming is not yet supported by Pulze (even for models that support it natively) and suggests checking back later. Also added validation to check if requested models support streaming and function/tool calls before processing requests. Improved error handling for API key validation with more descriptive error messages. Invalid API keys now return a consistent OpenAI-style error format with clearer guidance, including a link to get valid API keys. API keys must start with 'sk-' prefix and will return standardized error responses if invalid. Redesigned the Base Models table layout with improved visual organization. Model deprecation messages now appear below descriptions instead of in a separate column, providing a cleaner and more intuitive interface. The table structure has been streamlined by removing empty columns and reorganizing content for better readability. Set explicit maximum token limit of 4000 tokens for Slack bot responses, ensuring more comprehensive answers while maintaining reasonable response lengths. This improvement helps prevent truncated messages while optimizing the balance between detailed responses and Slack message constraints. Fixed a typo in the MosaicML provider implementation where 'max\_new\_tokens' parameter was incorrectly spelled as 'max\_new\_tokes'. This bug was preventing proper token length control for MosaicML model completions. Added support for integrating Slack workspaces with Pulze through a dedicated Slack App. The integration allows teams to install the Pulze app, storing team-specific access tokens, bot IDs, and enterprise settings. This feature includes proper authentication handling and prevents the Slack app from being accidentally deleted like other playground apps. Fixed an issue where single message chat completions were being unnecessarily converted into tagged prompts with role labels (\[USER]:, \[ASSISTANT]:, etc.). Now, single messages are kept in their original format, while multi-turn conversations maintain proper role tagging structure. This improves prompt clarity and maintains more natural conversation flow for single-message interactions. The Playground link has been removed from the landing page header navigation and footer sections. Users will no longer see the Playground option in these navigation areas, though the Playground feature itself may still be accessible through other routes in the application. Fixed an issue with organization settings page navigation where the default route redirect was incorrectly configured. The navigation logic has been moved to the proper level in the routing hierarchy to ensure users are correctly redirected to the default organization settings tab when accessing the organization settings section. ## January 2024 Users can now create and manage custom models based on existing base models through new API endpoints. The feature includes the ability to specify custom model names and descriptions, with automatic namespace generation to ensure uniqueness. Custom models can be created, listed, and deleted at the organization level, with full integration into existing app configurations through custom model settings. Added support for GPT-4-0125-preview model with 128K context window, offering improved task completion and reduced 'laziness'. Updated token pricing for GPT-4-1106-preview (0.001¢ prompt, 0.003¢ completion) and GPT-3.5-turbo-instruct (0.0015¢ prompt). The new GPT-4 model supports functions, streaming, JSON mode, and penalty parameters. Added the ability to select a benchmark model for your application in the settings page. The benchmark model serves as a baseline for comparing all requests in terms of score and cost savings. Users can now choose from any available base model in their organization, with the selection displayed showing both the provider and model name for easy identification. Tooltips now display with a smaller font size (0.75rem) for improved visual consistency and readability. The 'success' class styling has been removed in favor of direct font size styling, making tooltips more compact and easier to scan at a glance. Fixed model routing logic to prevent automatic model hopping when using non-synthetic models (specific model names), ensuring retries stay with the requested model. Enhanced usage reporting with more precise decimal formatting for billing cycle usage (now shows 2 decimal places). Also improved Anthropic Claude prompt formatting by removing extra space after HUMAN\_PROMPT. Enhanced the tooltip system across the application to use a more robust implementation with improved type safety and consistent behavior. Tooltips now use the ITooltipOptions type instead of plain strings, providing better control over tooltip positioning, timing, and appearance. This change affects multiple UI components including chips, icons, help elements, range sliders, switchers, and tabs, ensuring more reliable and consistent tooltip display throughout the interface. Enhanced model scoring system for Pulze's synthetic models (pulze and pulze-v0.1) by integrating with a new scoring service. The change improves model routing decisions and adds additional access controls to restrict internal models to Pulze employees only. Model scoring is now handled through a dedicated scoring API endpoint. Modified the request handling logic to maintain conversation history for completion models when converting between completion and chat completion formats. This improvement enables traditional completion models to have better context awareness across multiple interactions, similar to chat models. Enhanced handling of failed payments and subscription management with clearer error messages. Updated subscription error message to better explain upgrade/downgrade options, and added automated notifications for failed payments. Now sends internal alerts for payment issues in non-production environments. Fixed a bug where usage tracking emails and alerts weren't consistently sending when organizations crossed token consumption thresholds. The system now properly tracks and notifies users when they reach soft limits (configurable), hard limits, or percentage-based thresholds (e.g., 80% of quota) of their token allocation. Additionally improved logging and tracking of trial accounts' usage. Added six GPT-3.5 models to the platform: gpt-3.5-turbo-1106 (16K context), gpt-3.5-turbo-16k, gpt-3.5-turbo-instruct (4K context), gpt-3.5-turbo-0613, gpt-3.5-turbo-16k-0613, and gpt-3.5-turbo-0301. All models support streaming, JSON mode, function calling, and response penalties. The 0613 and 0301 variants are marked for deprecation on June 13, 2024. Improved Flowz validation by adding client-side validation that runs before making API requests to the server. This provides faster feedback when Flowz configurations contain errors, and error messages are now displayed directly in the modal interface using a dedicated error component instead of only showing toast notifications. Updated the team page to include Jeev Balakrishnan as CTO & Co-Founder, positioned prominently after the CEO. The update also standardized the founder title formatting from 'CEO and Co-Founder' to 'CEO & Co-Founder' for consistency across leadership profiles. Updated the team page to correctly display Fabian Baier's title as 'CEO and Co-Founder' instead of 'CEO and founder'. This change ensures accurate representation of the leadership structure on the company's about page. Corrected the 'Last Invoice Date' timestamp display in the billing usage section. The date was previously showing an incorrect value due to missing timestamp conversion from seconds to milliseconds. Users will now see the accurate date of their last invoice in the usage summary. Enhanced the pricing table interface with informative tooltips that explain each feature in detail. Users can now see additional information about features like app limits, LLM routing, Flowz configuration, custom prompts, fine-tuning capabilities, and model selection routing by hovering over feature names. This makes it easier to understand the specific capabilities included in each subscription tier. Simplified the user onboarding process by removing Auth0 profile editing restrictions and improving terms acceptance flow. Users can now update their profiles regardless of authentication method, and organizations can accept Terms of Service and Privacy Policy in a single step during organization updates. Enhanced the visual spacing of all major headings on the homepage by applying increased line-height (leading-snug) to improve readability. This affects 11 headline elements across the landing page, including the main hero section, feature callouts, and solution descriptions, making text easier to scan and read. Added a new Grafana dashboard called "Pulze Insights" that provides monitoring and analytics for the platform. The dashboard displays key metrics including the number of organizations, monthly active applications (apps with at least one request in the last 30 days), and request volumes over the last 30 days. The dashboard connects to the API database and excludes Model Monitor applications from metrics calculations. The primary call-to-action button on the homepage now displays 'Try for free' instead of 'Try the Playground', and directs users to the signup page rather than a separate playground URL. This change provides clearer messaging about accessing the platform and streamlines the onboarding experience by taking users directly to account creation. Fixed error response format to better match OpenAI's API specification by re-enabling detailed error fields (code, type, message, param). Also improved error handling by using more specific InvalidRequestError instead of generic APIError for cases like deprecated models, unavailable models, and invalid prompt IDs. Fixed the Community Slack invitation link on the homepage that had expired. Users can now successfully join the Pulze AI Community Slack workspace using the new invite link to connect with other community members, share insights, and collaborate. Enhanced Flowz validation by checking diagrams for recursion before creation, preventing potential infinite loops. Also standardized app creation response format by changing 'app\_id' field to 'id' in API responses. The validation now happens earlier in the process, providing better error messages when diagram configurations would cause recursive loops. Enhanced Replicate API integration with better token control by adding both max\_length and max\_new\_tokens parameters. This improves token limit handling and compatibility across different Replicate models. Also added debug mode support controlled by application settings. Updated Flowz validation endpoint to validate specific app configurations via a new '/validate/for-app/{app_id}' endpoint, replacing the previous flowz\_id validation method. Improved error handling and validation for prompts across the application, with more consistent verification of prompt ownership and access permissions. Corrected a spelling error in the subscription management interface where "accpt" was misspelled. The text now correctly reads "I understand and accept that all my remaining free credits will be voided" when users are reviewing subscription terms. Updated billing UI to explicitly state that all remaining free trial credits will be voided when subscribing to a paid plan. The subscription confirmation dialog now displays the current trial credit balance and requires users to acknowledge that these credits will not transfer to the paid subscription. Added helper text explaining that trial credits expire and cannot be carried over. Slack responses now display the original prompt text above the response, providing better context and conversation clarity for users. The prompt appears as a context block with plain text formatting in the Slack message thread. Modified app usage tracking to make the 'since' parameter optional when querying request history, allowing for more flexible usage reporting. Enhanced app schema documentation by adding detailed field descriptions for API keys, organization relationships, and app settings. These changes improve the developer experience and data tracking capabilities. Introduced a new Slack integration that allows users to query Pulze AI directly from Slack using the /askpulze command. Responses are displayed in-channel with rich formatting, including the model used and response latency. The integration features secure request validation, asynchronous processing, and includes Pulze branding in responses with contextual metadata. Updated the Trial subscription tier to include unlimited applications (previously displayed as 'Unlimited', now '∞ Unlimited') and upgraded support level from 'Community' to 'Personalized Support' with customer success access. Also standardized support level naming across tiers, with the Startup tier now showing 'Community Support' instead of 'Community support'. Fixed the visual appearance of disabled and loading buttons by removing the border. Previously, disabled buttons displayed with an unintended border that made them appear inconsistent with the intended design. Disabled buttons now correctly show with no border, gray background (pulze-200), and gray text (pulze-500). Fixed an issue where promocodes with leading or trailing whitespace would fail to apply correctly. The system now automatically trims whitespace from promocode inputs before processing them with the billing adapter, ensuring more reliable coupon redemption. Improved the Flowz validation system to provide more detailed feedback when validation fails, particularly for recursive flows. Users now receive specific information about which app caused validation failures in recursive scenarios, and a new /validate endpoint allows checking Flowz validity without making changes. The update also adds safeguards to prevent invalid Flowz updates by restoring previous valid states automatically. Modified the model filtering system to automatically hide deprecated models from the pricing table and app settings once they reach their deprecation date. Previously deprecated models remained visible in these interfaces even after their end-of-life date. This change helps prevent users from selecting models that are no longer available. Enhanced promotional code system to support percentage-based discounts in addition to fixed amount discounts. When applying promotional codes, the system now correctly handles both types of discounts, with percentage discounts being automatically calculated based on subscription price. The pricing table UI now displays discounted prices when a percentage-based promotion is active. Standardized call-to-action button labels throughout the homepage to use 'Try for free' consistently. Also corrected the features table label from 'Increase time to market' to 'Decrease time to market' to accurately reflect the benefit of faster development with model testing capabilities. Fixed an issue where the billing system would error when trying to calculate discounted subscription prices for users without an active discount. The system now properly checks if a discount exists before attempting to apply percentage-based price reductions. Trial subscriptions now come with a significantly increased token limit of 1 billion tokens, up from the previous 50 million tokens. This 20x increase gives trial users substantially more capacity to test and evaluate the platform while maintaining unlimited app creation capabilities. Fixed an issue where chat completion messages weren't being properly handled in the Flowz engine - now correctly processes the message object from response choices. Also improved iteration feedback by adding proper logging and clearer error messages when flow execution hits the maximum iteration limit. The engine now properly maintains conversation context between iterations. Fixed the subscription pricing display to only show the strikethrough original price when a discount is actually applied (discounted price is less than original price). Previously, the strikethrough indicator could appear even when the discounted price equaled the original price, incorrectly suggesting a discount was available. Redesigned the subscription plans modal to use a responsive grid layout that adapts from 1 column on mobile to 3 columns on desktop (or 4-5 columns for wider screens depending on plan count). Reduced modal padding from 14 to 7 units for better space utilization, removed fixed width constraints on individual plan cards, and adjusted modal minimum width to 1024px on extra-large screens for optimal viewing. Several legacy OpenAI models have been marked as deprecated as of January 4, 2024: text-ada-001, text-babbage-001, text-curie-001, text-davinci-002, and text-davinci-003. These models will no longer appear in the documentation or be available by default for new applications. Existing applications using these models should migrate to newer alternatives. Introduced Flowz, a new visual workflow system that allows users to create, manage, and execute flow-based logic diagrams for their applications. Users can now create workflow diagrams with nodes and connections, attach them to apps, and the Flowz engine will validate and execute the defined logic. This includes new API endpoints for creating and retrieving Flowz (/flowz), database tables for storing flow diagrams as JSON, and an execution engine that processes the visual workflows. Changed the display label for password-based authentication from 'Password' to 'Email and Password' to provide clearer information about the authentication method. This affects how the login type is shown in welcome emails and user profile information. The change also includes internal improvements to ensure user and organization data is properly loaded when sending verification emails to new users. Added validation to prevent users from being added to an organization they are already a member of through the internal tool. The system now checks membership status before processing the add request and returns a clear error message "You are already part of this organization" if the user is already a member, avoiding duplicate memberships and potential data inconsistencies. Welcome emails can now be selectively controlled in development environments. Emails will only be sent if the feature is explicitly enabled or if the user's email contains 'welcome' as a keyword, making it easier to test the email flow without spamming real addresses. The configuration has been renamed from EMAILS\_ENABLED to WELCOME\_EMAILS\_ENABLED for better clarity. Fixed a security issue where users could grant other members permissions they themselves didn't have. Now when updating another user's permissions, you can only add or remove permissions that you currently possess. For example, if you only have 'editor:all' access, you cannot grant or revoke 'admin:all' permissions from other users. The system now sanitizes permission changes to prevent privilege escalation attacks. Removed the strict billing validation check that previously blocked operations when organization billing details were incomplete. Users will no longer encounter the 'You don't have complete billing details' error (HTTP 417) when Stripe ID, billing email, postal code, or country information is missing from their organization settings. Corrected the Alembic downgrade command in the migration README from 'alembic downgrade base' to 'alembic downgrade -1'. The updated command now properly rolls back the database to the previous migration instead of reverting all the way to the initial base state, which is the correct approach for testing migration rollbacks. When adding a payment method, the system now automatically stores billing details (postal code and country) from the payment method if no address information is currently on file. This eliminates the need for separate billing configuration steps and removes billing verification requirements from multiple endpoints, streamlining the payment setup process for new organizations. Enhanced the organization setup process to validate the domain of the billing email address. When updating organization settings, the system now extracts and verifies the domain from the billing email (the part after '@') to ensure it's valid before saving changes. Improved parameter handling across multiple AI providers. Added support for logit\_bias parameter in AI21 Labs, and added support for presence\_penalty, frequency\_penalty, n, logit\_bias, top\_p, and stop parameters in GooseAI. Added comprehensive documentation comments indicating which parameters are supported or not supported for each provider (Anthropic, Cohere, MistralAI, MosaicML), improving API consistency and transparency. The feedback field in request ratings now accepts null values in addition to text strings. Previously, feedback required a string value (defaulting to empty string), but now it can be explicitly set to null when no feedback text is provided alongside a rating. ## December 2023 Subscription tier cards on the homepage now display 'Try for free' button labels for non-authenticated users or users without organization access. This applies to the Startup, Growth, and Scale tiers for both monthly and yearly subscription options, making it clearer to new users that they can try these plans without immediate commitment. Welcome emails now include a reminder showing which authentication method the user signed up with (Google, Github, or Password). This helps users remember their preferred login method when returning to the platform, reducing confusion and login failures. Updated the application name displayed across the platform to show environment-specific branding. Production displays 'Pulze', development shows 'Pulze (DEV)', and local environments show 'Pulze (LOCAL)'. This affects email subjects, email footers, and other user-facing areas where the company name appears, making it easier to identify which environment you're working in. Fixed an edge case where the system would throw an error when all models in a comparison received a score of 0. The normalization function now correctly handles this scenario by preserving the original zero scores instead of failing with a 417 error, allowing the evaluation to complete successfully. Added MistralAI as a new AI provider with support for three API keys configured with 50 requests per minute (RPM) each using least connection load balancing mode. Users can now access MistralAI models through the Pulze API platform alongside existing providers like OpenAI, Anthropic, and others. Added a new `/permissions` endpoint that returns a list of all available permissions in the system. Internal users (belonging to Pulze Seed organization) will also see internal-only permissions in their results. This allows clients to dynamically discover what permissions are available for role and access control management. Welcome emails now gracefully handle cases where users don't have a first name set by using their email address as a fallback. Previously, if a user's first\_name field was empty, the welcome email would display nothing in the greeting. Now it will display the user's email address instead, ensuring a more personalized experience even when full profile information isn't available. Introduced automatic deprecation handling for AI models with date-based filtering. When a model has a scheduled deprecation date, it will automatically be excluded from available models for requests once that date passes. Users will see warning messages for models approaching deprecation in the format 'This model will be deprecated on YYYY/MM/DD. We recommend you disable it for your app before the deprecation date to avoid failed requests.' This helps prevent failed requests by proactively filtering out deprecated models from the selection pool. Added support for MistralAI as a new provider with three models: Mistral Tiny (mistral-tiny) powered by Mistral-7B-v0.2, Mistral Small (mistral-small) powered by Mixtral-8X7B-v0.1 with 12B active parameters, and Mistral Medium (mistral-medium). All models support 32K context windows and are enabled for chat completions with streaming support. Fixed the verify\_model\_features() function to correctly use policies from either app settings or request headers when validating unsupported features. Made max\_tokens parameter optional for AlephAlpha and Replicate providers instead of returning errors when not provided. Added ignore\_unsupported\_features policy to request labels for better tracking. Resolved an issue where the organization request usage tracking function was incorrectly being awaited as an asynchronous operation when it was actually a synchronous function. This fix prevents potential runtime errors and ensures that usage data is properly recorded to the organization table after each API request. Added support for the response\_format parameter in completion requests, allowing users to specify whether the model should output text or structured JSON objects. When using JSON mode (type: "json\_object"), the API will validate that the model supports JSON output and will return an error if attempted with incompatible models. This follows OpenAI's API specification requiring "JSON" to appear in the prompt context when JSON mode is enabled. Fixed a bug in model validation that would incorrectly throw an error when the 'n' or 'best\_of' parameters were set to None. The validation now properly checks if these parameters exist and are greater than 1 before rejecting requests to models that don't support these parameters, preventing false validation failures. Introduced a comprehensive subscription management system with support for trial periods (21 days default), subscription tiers (SCALE, GROWTH, ENTERPRISE), and billing cycles (monthly/yearly). Organizations now have automatic trial tracking, subscription pause/cancellation reasons, and enhanced usage views for token and cost monitoring across applications and organizations. This enables better billing transparency and subscription lifecycle management. Introduced a new policy 'ignore\_unsupported\_features' (defaults to true) that controls how the system handles unsupported model features. When enabled, requests using unsupported features like frequency\_penalty, presence\_penalty, n, or best\_of parameters are processed normally by ignoring the unsupported parameters. When disabled, requests will fail with a FEATURE\_NOT\_SUPPORTED\_BY\_MODEL error if the target model doesn't support the requested features. This provides users with flexibility to either enforce strict feature compatibility checking or allow graceful degradation when using models with limited capabilities. Fixed the API to properly accept prompts formatted as single-item arrays (e.g., \["Say Hello"]) without throwing an error. The error status code for multiple prompts has been changed from 400 Bad Request to 422 Unprocessable Entity to better reflect the validation error. Multiple prompts in an array still correctly return an error as this feature remains unsupported. Fixed an issue where attempting to invite a user who is already a member of the organization would cause an error. The system now checks if the email address belongs to an existing organization member before creating an invitation, and returns a clear error message. This prevents duplicate invitations and improves the user experience when managing organization members. Users can now create custom model variants with pre-configured prompts that automatically wrap user inputs. When creating a model with a prompt, the system establishes a parent-child relationship between the base model and the custom variant, allowing the prompt to be applied automatically before the request is sent to the provider. This enables teams to standardize prompt templates across their organization without requiring users to manually include them in each request. Updated documentation to clarify that gcloud CLI is a required prerequisite for local development. Added instructions for configuring Docker authentication to access Google Cloud artifact registries using the command `gcloud auth configure-docker us-west1-docker.pkg.dev`, which is necessary for pulling required container images. Users can now create custom model configurations that are linked to specific prompts, allowing for reusable model-prompt combinations. These custom models are organization-specific and can be associated with apps. A new endpoint POST /models/with-prompt enables creating these configurations, and DELETE /models/{id}/with-prompt allows removing them. The system now distinguishes between base model settings and prompt-based model settings when retrieving app configurations. Updated the Milvus vector database infrastructure by upgrading the Milvus Operator helm chart from v0.8.0 to v0.8.6, which targets Milvus 2.3.3. This upgrade brings performance improvements, bug fixes, and enhanced stability to the vector database layer used for semantic search and retrieval operations. Fixed an issue where the 'modified\_on' timestamp was not being updated when users edited their prompts. Now when you update a prompt's content, title, or description, the modification timestamp is correctly set to the current date and time, ensuring accurate tracking of when prompts were last changed. ## November 2023 Added two new Anthropic Claude models: Claude 2.0 with 100K context window and Claude 2.1 with 200K context window featuring reduced hallucination rates. Updated the claude-2 alias to point to the latest claude-2.x model with 200K context window. All models now have improved pricing at $8 per million prompt tokens and $24 per million completion tokens (previously \$15 for both). Users can now submit feedback about how they discovered the platform through a new feedback endpoint. This information is automatically synced to the user's HubSpot contact profile in the 'source\_details' field, enabling better understanding of user acquisition channels. Resolved an issue where the token estimation function would crash when encountering special tokens in prompt text. The encoder now correctly handles all special tokens by treating them as actual special tokens rather than regular text, preventing crashes during token counting for model selection and cost estimation. The default rate limit for API requests has been increased from 60 to 200 requests per minute for both organizations and applications. This change provides more headroom for API usage and reduces the likelihood of hitting rate limit errors during normal operations. Fixed an issue where users would encounter an incorrect error response (401 Unauthorized) when attempting to accept an invitation to an organization they already belong to. The system now properly validates organization membership before checking invitation details and returns the correct error status (409 Conflict) with a clear message indicating the user already belongs to the organization. Logs exported to Grafana Loki can now include custom labels from the response metadata. The exporter automatically extracts any labels defined in the response's metadata.labels field and adds them as stream labels in Loki, making it easier to filter and query logs based on custom attributes like 'unit', 'environment', or other user-defined labels. Corrected a typo where 'frequence\_penalty' was misspelled as 'frequence\_penalty' instead of 'frequency\_penalty' in the Cohere provider implementation. This fix ensures the frequency\_penalty parameter is properly passed to Cohere API calls, allowing users to correctly control the penalty applied to frequently used tokens in generated responses. Organizations and applications now have individual rate limit settings stored in the database, defaulting to 60 requests per minute. This replaces the previous hardcoded limit of 9,500 requests per minute and allows for customized rate limiting on a per-org and per-app basis. Rate limits are enforced through Redis and can be configured independently for each organization and application. Fixed an issue where the metrics system would crash when trying to process items with None labels. The label validation and stringification functions now properly handle None values by returning empty dictionaries, preventing errors when custom labels are not provided. Fixed an issue where prompt templates with the prompt placeholder on non-first lines were incorrectly rejected during validation. The regex pattern now supports multiline prompts by adding the DOTALL flag ((?s)), allowing the placeholder to appear anywhere in the template including after newlines and multiple lines of text. Enabled Cross-Origin Resource Sharing (CORS) for external domains to access specific API endpoints including completions, models management, apps configuration, and logs. This allows web applications hosted on external domains to make direct API calls to endpoints like /completions, /models/rank, /models/active, /logs, and others without CORS restrictions, while maintaining security controls for non-API routes. Corrected a parameter name typo in the Aleph Alpha provider where 'frequence\_penalty' was misspelled and has been fixed to 'frequency\_penalty'. This ensures the frequency penalty parameter is properly passed to the Aleph Alpha API, allowing users to correctly control repetition in model responses when using Aleph Alpha models. Added support for OpenAI's GPT-4 Turbo (gpt-4-1106-preview) model with 128K token context window, priced at $0.01 per 1K prompt tokens and $0.03 per 1K completion tokens. This model offers significantly larger context windows compared to previous GPT-4 versions. Also updated the description for Llama 2 70B Chat to be more accurate. Signup validation no longer rejects email addresses based on DNS pingability checks. Previously, the system would reject emails if their domain couldn't be resolved via DNS lookup (socket.gethostbyname), which could falsely reject valid domains experiencing temporary DNS issues. Now only temporary/disposable email domain checks remain, allowing legitimate users with valid but temporarily unreachable domains to sign up successfully. Fixed a critical bug in the delete user from organization endpoint that was incorrectly setting user status to inactive before deletion, causing the operation to fail. Added comprehensive test coverage for user permission updates and user deletion operations, including verification that users can properly update and remove other users based on their role permissions (admin, editor, viewer). Added support for GPT-4 Turbo (gpt-4-1106-preview) with 128K context window, improved instruction following, JSON mode, reproducible outputs, and parallel function calling. Maximum output tokens: 4,096. Additionally, implemented per-model parameter support configuration, enabling models to declare capabilities like function calling, streaming, JSON output, frequency/presence penalties, and n-parameter support, ensuring API requests only use parameters supported by each specific model. The max\_tokens parameter is now optional (defaults to None instead of 16) and will be automatically set to appropriate values based on model requirements. When a model requires max\_tokens but none is provided, the system uses a default of 16 tokens. This change also improves latency metrics by calculating per-token latency based on actual response tokens rather than the requested max\_tokens limit, providing more accurate performance measurements. Added the ability to create and manage AI models through the user interface. Models can now be designated as chat-type models (a new 'is\_chat' field was added), with existing OpenAI models (GPT-4, GPT-3.5-Turbo, and GPT-4-32K) automatically marked as chat models. The model creation system now properly tracks which user added each model using their Auth0 ID, and includes improved error handling for duplicate error codes. OpenAI chat completion requests now support function calling with tools and tool\_choice parameters. Users can define functions that the model can call during conversations, enabling structured outputs and interactive workflows. The implementation includes support for tool definitions with parameters, function call arguments, and tool choice strategies (auto, none, or specific function selection). Prometheus metrics now support custom labels that can be passed through request metadata, allowing users to add their own key-value pairs for better metric filtering and organization. Custom label keys are automatically validated and sanitized to meet Prometheus naming requirements (alphanumeric and underscores only), with values converted to strings. This applies to all three metric types: model latency gauges, app cost gauges, and app usage gauges. Fixed the logs filtering endpoint to properly handle optional date\_to and app\_ids parameters. Previously, these filters were incorrectly applied even when not provided, causing queries to fail or return incorrect results. The labels endpoint now correctly retrieves label keys and their associated values for filtered log searches. Fixed multiple issues with LlamaIndex integration when using OpenAI-compatible format: corrected response format to include required fields (index, model, object type, id, usage), fixed prompt\_id validation bug that was using wrong variable (app\_update.prompt\_id instead of prompt\_id parameter), removed duplicate metadata fields (provider, namespaced\_model), and standardized model/provider references to use namespace format. These fixes ensure LlamaIndex-based RAG queries work correctly with OpenAI SDK clients. Introduced subscription billing capabilities through Stripe integration, currently available in test mode only. The system now tracks token-based pricing for AI models, with separate costs for prompt and completion tokens stored directly in the database. Added subscription management fields to organizations including subscription IDs and end dates, enabling metered billing for model usage. Added a new 'show' parameter to the prompts list endpoint that allows filtering prompts by visibility scope. Users can now filter to view only public prompts ('public'), organization-specific prompts ('org'), or all prompts ('all', default). The public filter also supports an optional 'include\_for\_review' flag to include prompts that are published but pending review or approval. Enhanced the app update functionality to automatically use the prompt ID from the policies object when no top-level prompt ID is provided. The system now checks both the direct prompt\_id field and the policies.prompt\_id field, ensuring the prompt ID is properly applied to the app configuration even when specified only in policies. Introduced a comprehensive prompt review system that allows prompts to be submitted for publication, reviewed, and approved or declined with reasons. Users can now request to make their prompts public, administrators can review and approve/decline submissions with timestamps tracking published\_on, reviewed\_on, and approved\_on dates, and prompts are now organization-scoped with enforcement preventing users from editing or deleting prompts they didn't create. The system also tracks decline reasons when prompts are rejected during review. Fixed multiple issues with prompt management: Apps now validate that assigned prompts exist and belong to the correct organization before saving. Prompt retrieval and updates now return consistent error messages with proper error codes (INVALID\_PROMPT\_ID) instead of generic 404 errors. Prompt deletion now properly checks for associated apps and prevents deletion if the prompt is in use, returning specific app IDs that need to be updated first. Fixed the error response when non-Pulze employees attempt to access internal resources. The system now returns a proper 'ET\_INTERNAL' error code with detailed messaging ('This resource is only accessible for Pulze's internal admins') and includes the organization name in the error details for better debugging. Previously used a generic 401 unauthorized error without proper error categorization. Updated prompts API endpoints to use standard REST conventions. The create prompt endpoint changed from POST /prompts/create to POST /prompts/, and the update prompt endpoint changed from PUT /prompts/update to PUT /prompts/. This change makes the API more consistent with RESTful design patterns while maintaining the same functionality. Added the ability to delete prompts through a new DELETE endpoint. The system now prevents deletion of prompts that are actively being used by any apps, returning error code ET\_0007 with a list of affected app IDs. When a prompt is successfully deleted, all associated apps are cleaned up appropriately with soft-deletion for apps with request history. Introduced support for custom prompt templates that can be associated with apps and applied to requests. Users can now create prompts with a prompt placeholder that dynamically wraps user queries, enabling consistent prompt engineering across requests. The prompt can be set at the app level (prompt\_id field) or overridden per-request via policies, allowing flexible prompt management for different use cases. Significantly increased API rate limits from 50 to 9,500 requests per minute per app and from 150 to 9,500 requests per minute per organization to align with OpenAI's 10,000 RPM Tier 4 limits. This allows for much higher throughput when making requests through the API, reducing rate limit errors for heavy usage scenarios. Users can now regenerate API keys for their apps directly through the UI using a new endpoint (POST /{app_id}/regenerate-key). When regenerated, the app receives a new API key with the 'sk-' prefix while maintaining the same app configuration. This feature requires editor-level permissions and includes improved error handling for invalid API keys. The engine now calculates the required context window based on prompt length and max\_tokens, then only considers models that can accommodate the request. Uses tiktoken's cl100k\_base encoding to estimate token count and filters candidates accordingly. This prevents selection of models with insufficient context windows and provides clearer error messages when no suitable models are available. Introduced a new prompt management feature that allows users to create, retrieve, update, and list prompts within their organization. Users can now store reusable prompts with titles and descriptions, automatically calculate token counts for prompts, and link prompts to applications. The system includes role-based permissions (viewer, editor, admin) for prompt operations and provides a dedicated API endpoint at /prompts for managing prompt templates. ## October 2023 Resolved an issue where dashboard queries were incorrectly filtering active apps. The bug was caused by comparing 'is\_active is True' which would fail when is\_active was NULL, now fixed to properly check 'is\_active' as a boolean condition. This ensures all active apps are properly included in dashboard analytics and results. Added support for advanced OpenAI completion parameters including n (number of completions), logit\_bias, presence\_penalty, frequency\_penalty, top\_p, stop sequences, and best\_of. The balance verification system now accounts for multiple generations per request (when n > 1 or best\_of > 1), providing more accurate cost predictions and preventing requests that would exceed available balance. Transformed request response JSON structure to follow OpenAI API standards. Response fields like 'id', 'usage', 'object', and 'model' are now at the root level instead of nested under 'metadata', making the API more compatible with OpenAI client libraries and tools. A database migration automatically converts existing request logs to the new format. Improved the model selection logic to ensure models are properly scored and ranked even when only one model candidate is requested. This change ensures consistent scoring behavior across all requests, providing better model performance metrics in the response metadata regardless of the number of candidates. The API response format has also been optimized to exclude null fields for cleaner output. The completion API now accepts prompts as either a string or an array format, enabling compatibility with LangChain's prompt formatting. When a single-element array is provided, it is automatically converted to a string. Multi-element arrays are rejected with a clear error message indicating only one prompt value is supported. Internal administrators can now view and edit AI model configurations directly through the UI. This includes updating model properties such as provider, model name, owner, version (@at), GDPR compliance status, open-source status, default active state, public visibility, context window size, URL, and description. Changes to model identifiers automatically regenerate the namespace to maintain consistency across the system. Organizations can now fully remove their Prometheus (prom) and Loki monitoring integrations by clearing the configuration. When an integration update is sent without Prometheus or Loki settings, the system will now properly clear all related fields (endpoint, id, and token) instead of leaving the previous configuration in place. This allows for complete integration removal rather than only supporting updates. Enhanced security by removing the email verification link from the API response when requesting a new verification email. The verification link is now only sent via email and no longer exposed in the API response payload, reducing the risk of link exposure through API logs or client-side code. Email validation now checks if domains are actually valid and reachable, not just whether they're temporary. When inviting users to organizations or validating email addresses, the system now verifies that the domain exists using DNS resolution, providing clearer error messages like 'Domain @example is invalid' for non-existent domains and 'Domains from @example are not allowed' for temporary email providers. Restructured model management by moving all model configurations (including GPT-4, GPT-3.5, Claude, PaLM, Llama 2, and other models) from hardcoded definitions to a dedicated database table. This change enables dynamic model management and per-app model settings, allowing for more flexible model availability and configuration without requiring code deployments. Fixed an issue in the name guessing function where single-word names (names without spaces) would cause errors. Previously, the function attempted to access the second element of a split name array using index \[1], which would fail for single-word names. Now uses pop() to correctly extract the last word, handling both single-word and multi-word names properly. The organization integrations endpoint has been renamed from `/integration` to `/integrations` for better API consistency. Added field validation to ensure integration credentials (id, token, and endpoint) are not empty strings when configuring Prometheus and Loki integrations. This prevents configuration errors from invalid or missing integration parameters. Organizations can now configure external monitoring integrations through a new API endpoint. Administrators can set up Prometheus and Loki integrations by providing endpoint URLs, authentication tokens, and integration IDs. This enables organizations to connect their monitoring and logging infrastructure directly to the platform. Organizations can now integrate with Grafana Cloud for metrics and logs export. The integration adds support for Prometheus remote\_write protocol and Loki for log aggregation. New organization-level configuration fields allow setting Prometheus endpoints, IDs, and tokens, as well as Loki endpoints, IDs, and tokens for secure data export to Grafana Cloud monitoring services. Ray clusters are now automatically shut down and cleaned up immediately after jobs finish execution. This is configured with shutdownAfterJobFinishes enabled and ttlSecondsAfterFinished set to 0, ensuring resources are released promptly and reducing infrastructure costs for Ray-based workloads. Fixed issues with organization creation to properly handle address fields including city, and improved HubSpot integration to prevent duplicate contacts by checking for existing contacts by email before creation. The system now correctly marks users as existing platform users in HubSpot and uses the pulze\_name field instead of the generic name field for organization tracking. Corrected the maximum token limits for several Llama models: CodeLlama-13b now supports up to 16,384 tokens (previously 2,048), and Llama-2-70b-chat now supports up to 4,096 tokens (previously 2,048). Additionally, updated token request limits for Claude models to 100,000 tokens (previously 4,096) in the knowledge graph seed data. These changes allow users to process longer inputs and outputs with these models. Fixed a bug that prevented updating organization address information (address\_1, address\_2, address\_city, address\_zip, address\_state, address\_country) when submitting a full organization update. The endpoint now correctly accepts and processes all address fields regardless of the update type. Also relaxed validation to allow optional values for expense\_synced\_at and pending\_expense fields. Organization creation now requires display names and org names to be at least 4 characters long, with automatic whitespace trimming. New organizations no longer have auto-generated names or placeholder logos - instead they start with empty values, forcing users to set meaningful names through the UI. This ensures better data quality and more intentional organization naming. Fixed a critical security vulnerability in the logs endpoint where users could potentially access logs from applications belonging to other organizations. The system now properly verifies that all requested app IDs belong to the same organization and that the user has permission to access that organization's data before returning any logs. New users are now required to complete organization setup after registration. Personal organizations are created with empty display names that must be filled in, and users can update organization details during initial setup without requiring editor permissions. The organization name format has been changed to include a timestamp (e.g., 'org-2343252343423-{timestamp}') to ensure uniqueness. Implemented support for applying discount coupons and promotional codes to billing accounts, with coupon details (name, ID, and discount amount) now displayed in payment information. Added 3D Secure authentication flow for payment method verification, where users are automatically redirected to their bank's verification page when required by their card. The card verification process now uses Stripe's SetupIntent API instead of charging a verification fee, providing a smoother onboarding experience without temporary charges. Added support for creating alerts based on Google Cloud Monitoring metrics. Includes a pre-configured alert for high backend latency that triggers when 99th percentile latency for HTTPS load balancer backends exceeds 10 seconds for 5 minutes. The monitoring system now supports both namespace-scoped Rules and cluster-wide GlobalRules for flexible alert configuration across different scopes. Payment processing via Stripe has been upgraded from test credentials to live production credentials. All Stripe transactions will now process real payments instead of test payments. This enables the platform to accept actual customer payments and handle production payment workflows. Model names can now include an optional owner prefix, allowing more specific model identification like 'anthropic/claude-2' or 'meta/llama-2'. The system now correctly parses and matches models with owner prefixes, ensuring proper model selection when the owner namespace is specified in requests. Added comprehensive knowledge graph seed data (dated 2023-10-07) containing performance metrics and pricing information for 54+ AI models across 8 providers including AI21 Labs (j2-ultra, j2-mid, j2-light with 8191 token limits), Aleph Alpha (luminous-supreme, luminous-supreme-control, luminous-base-control with 1990 token limits), and others. Each model includes category-specific performance scores across 20 different domains (Arts & Crafts, Technology & Gadgets, Business & Finance, etc.), pricing per token in USD, latency metrics, and availability status. Enhanced the Mistral-7B-OpenOrca model integration to automatically format prompts with the correct chat template markers when they are not already formatted. This ensures proper model behavior without requiring users to manually add template formatting to their prompts. Fixed a compatibility issue with the mistral-7b-openorca model on Replicate provider. The model now correctly receives input using the 'message' parameter instead of 'prompt', allowing it to process requests properly. This change ensures the model works as expected without breaking other Replicate models. Corrected the model identifier from 'mosaicml/llama2-70B-chat' to 'mosaicml/llama2-70b-chat' by fixing the uppercase 'B' to lowercase 'b' in the 70B parameter designation. This ensures proper model naming consistency in the knowledge graph seed data and may resolve issues with model lookups that expect the correct lowercase identifier. Corrected the model path for Mistral 7B OpenOrca on Replicate from 'a16z-infra/mistral-7b-openorca' to 'nateraw/mistral-7b-openorca'. This fixes a copy error that would have prevented users from accessing this model with the correct repository path. Added support for Mistral-7B-OpenOrca (nateraw/mistral-7b-openorca), a fine-tuned version of Mistral-7B-v0.1 trained on the OpenOrca dataset. This model is available through the Replicate provider with a 4K token context window and costs \$0.000045 per token. Also updated the Mistral-7B-Instruct-v0.1 model configuration to increase its max token context from 2K to 4K tokens and improved its description. Increased maximum token context for MosaicML models to match their actual capabilities. The mpt-30b-instruct model now supports 8,192 tokens (up from 2,048), and the llama2-70b-chat model now supports 4,096 tokens (up from 2,048). Users can now generate longer completions and work with larger contexts when using these models. Added two new MosaicML models available for scoring: MPT-30B-Instruct (30B parameters, 8,192-token context length, trained on datasets including Databricks Dolly-15k, HH-RLHF, CompetitionMath, and others) and Llama2-70B-Chat (70B parameters, 4,096-token context length, Meta's dialog-optimized model trained on 2T tokens with 1M+ human annotations). Also updated the MPT-7B-Instruct model description to reflect it as a 6.7B parameter instruction-finetuned model and corrected its pricing from $0.0000005 to $0.00000005 per token. Added support for MosaicML as a new AI model provider with automatic load balancing across three API keys using least connection mode. Each key is configured with a rate limit of 3,500 requests per minute (RPM) for optimal throughput and reliability. Users can now see whether their email address has been verified in their account settings. Added a new endpoint (/verify-email/request) that allows users to manually request a new verification email if their email is not yet verified. The system now tracks email verification status in user profiles and prevents sending duplicate verification emails to already-verified addresses. Added support for Mistral-7B-Instruct-v0.1, a 7-billion-parameter language model available through Replicate (a16z-infra/mistral-7b-instruct-v0.1). The model has an estimated 2048 token context window and uses the Dolly tokenizer as an approximation for token counting. App descriptions are now required to have at least 1 character when updating an app. Previously, empty descriptions were incorrectly accepted, which could result in apps without proper identification. The API now returns a 422 Unprocessable Entity error when attempting to update an app with an empty description. Improved security for user profile updates by retrieving the auth0\_id from the authentication token instead of the request payload. This prevents users from potentially modifying other users' profiles by manipulating the auth0\_id in the request. Additionally, profile editing is now restricted to only Auth0-authenticated users (excluding social login profiles). Users can now update their profile information including first name, last name, and profile picture through a new PUT endpoint. The update synchronizes changes across Auth0, the database, and HubSpot, ensuring profile consistency across all systems. Additionally, the email verification endpoint has been renamed from '/update-user' to '/verify-email' for better clarity. Fixed the rank playground feature to always set max\_switch\_model\_retries to 0, preventing automatic model switching when ranking models. This ensures that model rankings are tested independently without fallback behavior, regardless of app settings or header policies. Also improved validation to require at least one message in playground requests and better error handling for invalid request IDs. Changed the HTTP status code returned from 204 (No Content) to 411 (Length Required) when the system exhausts all retry attempts without generating a valid response. The error message now clearly states '(no answer generated)' instead of 'Empty response'. Additionally, the default policy for switching between models when requests fail has been reduced from 3 retry attempts to 1, meaning the system will now try a maximum of 2 models (original + 1 fallback) before returning an error. ## September 2023 Fixed an issue with the LlamaIndex integration where payload data and headers were not being properly initialized before processing requests. The fix ensures that `populate_payload_data` is called before `process` in all API endpoints (chat completions, completions, and playground), and adds validation to prevent processing without payload data. This resolves potential errors when using LlamaIndex for custom data retrieval and document querying. Restructured app configuration by separating model weights, policies, and general settings into distinct fields. The previous single 'app\_settings' field has been renamed to 'weights' for model selection preferences, while new 'policies' field (based on LLMModelPolicies schema) controls model behavior constraints, and a new 'settings' field stores general app configuration. This allows for more granular control over app behavior and model selection strategies. Also improved file upload validation to prevent duplicate files with identical sizes from being uploaded. The Playground now automatically optimizes internal requests by default (optimize\_internal\_requests=1). This improvement should result in better performance and efficiency when using the Playground feature, as internal API calls will be optimized without requiring manual configuration. Enhanced the LlamaIndex-based document querying engine with a custom keyword extraction template that better identifies relevant keywords while avoiding stopwords. The engine now uses Claude Instant v1 as the default model for fast document indexing operations. Added comprehensive error handling that returns '(no answer)' with HTTP 417 status when the engine fails to generate a response, instead of silently failing. Enhanced the metrics proxy to support Prometheus series queries (api/v1/series) and label name value queries (api/v1/label/**name**/values) with proper filtering and autocompletion. These new endpoints enable users to query time series metadata and metric names that start with 'pulze\_' prefix, improving the metrics exploration and query building experience. The filtering logic now handles metrics without names and applies key-based access control across all query types including vector operations like sum, rate, and count. Added support for MosaicML as a new AI model provider. Users can now access MosaicML-hosted models through the API with support for temperature, top\_p, and max\_tokens parameters. The integration includes automatic token usage calculation using the GPT-NeoX-20B tokenizer and cost tracking per request. Fixed a TypeError that occurred when the metrics proxy received malformed or unexpected data formats from Prometheus. The fix adds validation to ensure metric data is properly structured as a dictionary with expected keys before filtering, preventing crashes when encountering invalid metric formats. Additionally, improved error handling now provides clearer error messages and logging when receiving invalid JSON responses, empty content, or missing required fields like 'data' and 'result' in the response structure. Fixed an issue where LlamaIndex queries on custom data could return empty responses. The system now retries up to the configured max\_same\_model\_retries limit when an empty response is received, and returns a fallback message '(no answer was generated)' with a 417 status code if all retries are exhausted. This ensures users always receive a meaningful response instead of empty results when querying their custom data. Introduced a parent-child hierarchy for logs by adding a parent\_id field to the request table. This allows logs to be organized in nested structures, enabling better tracking of related requests and sub-requests. The change includes database schema updates and modifications to the log retrieval system to support hierarchical log views. The date\_to filter parameter is now optional when querying logs and dashboard statistics. When not provided, the system automatically defaults to the current time (UTC). This simplifies API requests where users want to retrieve data up to the present moment without manually specifying the end date. Added a new metrics API endpoint (/metrics/prometheus-proxy) that proxies requests to Prometheus for monitoring data. The endpoint automatically filters metrics to show only those with the 'pulze\_' prefix that belong to your specific app based on your API key, ensuring you only see metrics relevant to your application. Supports both GET and POST requests for querying Prometheus data. Restructured the billing API by moving payment-related operations to a new dedicated `/billing/payments` endpoint. This architectural improvement separates payment method management (retrieving, adding, and managing payment cards) from other billing operations, making the API more organized and maintainable. Users will now interact with a cleaner API structure for managing their payment methods and viewing billing information including Stripe payment methods, setup intents, and organization credit balance. Refactored the billing system to follow Stripe's best practices. New users now receive a free starting balance in their account (configured per currency). Added comprehensive payment method validation including minimum charge verification (\$0.50) and balance tracking. Improved payment method deletion with safety checks to prevent removing the last payment method. Changed billing information viewing from admin-only to viewer permissions, allowing more team members to see payment details. Fixed an issue where chat prompts were being formatted with incorrect bracket notation (now uses plain role labels like 'user' and 'assistant' instead of '\[user]' and '\[assistant]'). Also resolved a bug where the wrong prompt format was being logged in chat completions, and improved custom header extraction by filtering out additional common headers (host, origin, referrer) that were unnecessarily being stored. Fixed a critical bug in the playground where the temperature parameter was incorrectly using max\_tokens value instead of the actual temperature setting, and weights were not being properly serialized when ranking models. This caused incorrect model recommendations in the playground interface. The fix ensures that model ranking now uses the correct parameters for accurate results. Fixed the ability to resend invitations to users who previously declined - the system now automatically removes the declined invitation and allows a new one to be sent. Also improved the member removal process to properly handle both active organization members and pending invitations, ensuring they are correctly deactivated and deleted from the database. Additionally, the invitation status field is now strictly validated to only accept 'accepted', 'declined', or 'pending' values. Frontend settings now include the organization's complete balance information (credit balance, free balance, spending limits, pending expenses, and billing zip code). This provides users with immediate visibility into their account balance and spending status when accessing the application settings, without requiring a separate API call to the billing endpoint. Introduced granular retry policies that allow independent configuration of same-model retries and model-switching retries. Users can now specify `max_same_model_retries` (attempts with the same model before switching) and `max_switch_model_retries` (attempts with different models after exhausting same-model retries), replacing the previous single `max_retries` parameter. The engine now intelligently rotates through ranked models based on these policies, providing better control over fallback behavior. The custom data upload endpoint now accepts multiple files in a single request instead of just one file. Users can upload multiple files simultaneously to their apps, with each file being processed and stored individually. The API response now includes details about all successfully uploaded files. Organizations created manually through the API no longer receive free signup credits. Only organizations created during user registration receive the initial free balance. This change simplifies the billing system and ensures consistent credit allocation, with pending charges now being synced and processed more reliably through the updated billing system. Fixed an infinite redirect loop that occurred when accessing shared playground conversations. The issue was caused by the optional bearer token authentication incorrectly reading tokens, which has been corrected by properly making the function async and passing the request object. Additionally, improved the error message to clarify when login is required for private conversations. Updated the rank\_models endpoint to return additional metadata for each ranked model, including the full namespace (e.g., 'provider/model\_name') and attribute information (the '@' suffix). This provides more complete model identification information when querying ranked models by score, making it easier to distinguish between different versions or variants of the same model. Activated the billing system for all organizations (previously restricted to internal use only). New organizations now receive $20 USD in free credits upon signup (reduced from $50). The system automatically syncs pending expenses with Stripe when they reach a threshold, and now properly tracks organization-level rate limiting to prevent abuse during the free credit period. Enhanced the label filtering system to support date range filtering and multiple app selection. When retrieving labels and label values, users can now apply the same date and app filters used in other searches, making label filtering consistent with the rest of the dashboard filtering capabilities. The API now uses a unified FilteredSearch schema for better consistency across endpoints. Log timestamps now store millisecond precision instead of second precision, providing more accurate timing information for API requests and responses. The system now uses `time.time_ns() // 1_000_000` to capture timestamps in milliseconds, enabling better tracking and analysis of request latency and timing. A database migration automatically converts existing timestamps to the new millisecond format. Fixed an issue where the max\_tokens parameter was not being properly applied to Replicate API calls. The parameter is now correctly passed within the input object. Additionally, resolved a compatibility issue where passing temperature=0 would fail; the system now defaults to 0.75 (Replicate's default) when temperature is set to 0. Enhanced the logs filtering interface to support multi-column sorting with customizable sort parameters. Users can now sort logs and application lists by multiple fields simultaneously (such as date, description, user information) instead of being limited to a single descending date sort. The sorting parameters can be passed through the API to create more complex query orderings for better data organization and analysis. Fixed an issue where custom labels sent in request headers would fail to parse if no policies were specified alongside them. The system now correctly handles labels independently of policy definitions, preventing request failures when using labels for tracking without associated policies like max\_retries or timeout. Updated the organization API endpoint from `/org` to `/orgs` for better REST API naming consistency. All API calls to organization-related endpoints now use the plural form `/orgs` instead of the singular `/org`. This change affects the API router configuration and test functions. Introduced privacy level settings for API requests, allowing organizations to control data handling policies. Added cost tracking capabilities with a new 'costs\_incurred' flag to distinguish between billable and non-billable requests. Organizations can now receive free credits through a new 'free\_balance' field for managing promotional or trial usage. Improved label filtering to return only unique label keys and values by adding GROUP BY clauses to the database queries. This eliminates duplicate entries when retrieving available label keys and their corresponding values, making the label filtering interface cleaner and more efficient. Introduced a new Playground feature that allows users to test completions and chat completions with model ranking capabilities. The playground includes dedicated endpoints for ranking models based on weights, temperature, and prompts, and provides logs for monitoring requests. Users can now experiment with different settings and see which models perform best for their specific use cases before integrating them into their applications. Enhanced LlamaIndex integration to include file metadata (file names) when indexing documents, which provides better context for AI responses. Added support for multiple response modes (compact\_accumulate, tree\_summarize, refine, simple\_summarize, no\_text, accumulate, compact) configurable via headers, and implemented proper temp directory cleanup to prevent resource exhaustion and hanging issues. Users can now delete custom data files that have been uploaded to their apps. The API endpoint has been updated to allow deletion by file ID (/custom-data/{app_id}/files/{file_id}), and file size tracking has been added to custom data uploads to display how much storage each file uses. This gives users better control over managing their app's data and storage. Custom data files uploaded to apps are now stored directly in the database using a new `app_custom_data` table, replacing the previous filesystem-based storage. This improves data management, backup reliability, and simplifies deployment architecture. Files are temporarily extracted to disk only during query processing and automatically cleaned up afterward. The /completions endpoint now supports LlamaIndex integration for querying custom uploaded documents. When files are uploaded for an app, the endpoint automatically switches to using LlamaIndex with a KeywordTableIndex to query the custom data instead of standard completions. The previous /llama endpoint has been consolidated into the main /completions endpoint, providing a unified interface for both standard and custom-data-powered completions. Added ability to upload custom data files to apps and query them using LlamaIndex integration. Users can now upload files through POST /apps/custom-data/{app_id}, delete files via DELETE endpoint, and view uploaded files with their MIME types when retrieving app details. The completions endpoint now supports custom data retrieval by loading uploaded files from app-specific directories and using LlamaIndex's KeywordTableIndex for semantic search over the custom documents. Added a new `/labels` endpoint that enables autocomplete functionality for filtering logs by labels. Users can now retrieve all available label keys across their logs, or when a specific key is provided, fetch all possible values for that label key. This improves the filtering experience by allowing users to discover and select from existing label keys and values rather than typing them manually. Changed the custom header naming convention for better consistency and clarity. Headers for passing custom configuration are now prefixed with 'Pulze-' (e.g., 'Pulze-Labels', 'Pulze-Weights', 'Pulze-Policies') instead of the previous 'Custom-Labels' format. This standardization makes it clearer which headers are Pulze-specific and improves API consistency across all custom configuration options. Corrected the billing information endpoint to properly return the billing zip code field. Previously, the endpoint was incorrectly configured and wasn't returning the billing\_zip value in the organization spending limits response, which now properly includes this field to match the expected data model. Organizations can now add and manage multiple payment methods for billing. The system automatically detects and prevents duplicate cards using Stripe's card fingerprint verification. Added a new delete endpoint to remove payment methods, and top-up payments can now specify which payment method to use instead of defaulting to a single card. Fixed an issue with the model ranking API endpoint that was failing due to the removal of the 'max\_num\_models' attribute from the request payload. The endpoint now uses a fixed constant of 3 models for ranking instead of accepting a user-configurable parameter, ensuring consistent behavior and preventing errors when requesting ranked model recommendations. Optimized the temporary email domain checking mechanism by switching from a List to a Set data structure when validating against approximately 162,000 disposable email domains. This change reduces lookup time from linear O(n) complexity to constant O(1) complexity, providing near-instantaneous domain validation regardless of list size. The performance improvement is particularly noticeable when processing multiple email validations. Added comprehensive knowledge graph seed data containing performance benchmarks across 20 categories for multiple AI model providers including AI21 Labs (j2-ultra, j2-mid, j2-light with 8191 token limits), Aleph Alpha (luminous-supreme, luminous-supreme-control, luminous-base-control with 1990 token limits), and others. Each model includes detailed metrics such as pricing per token, latency ratings, weight scores, and category-specific performance scores across domains like Arts & Crafts, Technology & Gadgets, Health & Wellness, and more. This data enables better model selection and routing based on use case requirements. Updated the Anthropic Claude v2 model identifier from 'anthropic/claude-v2' to 'anthropic/claude-2' for consistency with Anthropic's naming conventions. This change affects how the model is referenced in API calls. The model maintains its 100K token limit and description as Anthropic's best-in-class offering for complex reasoning tasks. Improved the models API endpoint to return structured model information including provider, model name, and type details instead of plain strings, making it easier to integrate and display model data. Added a new `/models/all` endpoint to retrieve all available models in the platform. API keys now use a configurable prefix (via KEY\_PREFIX setting) instead of hardcoded 'sk-' prefix, and validation ensures keys start with the correct prefix. Fixed an issue where GooseAI API responses with null finish\_reason values would cause errors. The provider now explicitly handles null finish\_reason by converting it to an empty string, ensuring more reliable response processing when using GooseAI models. Introduced a new `/docs-models-table` endpoint that returns an HTML table of available models for documentation purposes. For requests from allowed documentation origins (docs.pulze.ai), the table displays all models including third-party providers. For requests from other origins, only synthetic Pulze models (pulze, pulze-v0) are shown for security. The table includes model names, descriptions, providers, token limits, cutoff dates, and active status with sortable columns. The 'category' field has been removed from the PulzeEngineModelRanking response schema in the ranked models endpoint. This field was previously deprecated and always returned "(deprecated)" as its value. API responses will now only include the 'models' field containing the ranked list of models, making the response cleaner and more straightforward. API request charges are now processed in the background instead of synchronously, improving response times. The system now tracks pending expenses between Stripe syncs with new fields (pending\_expense, expense\_synced\_at, currency) in the organization table. Added the ability to update spending limits (hard/soft) directly through a new API endpoint, and enhanced top-up functionality with background email notifications. The API now consistently returns the request log ID (log\_id) in error responses, making it easier to track and debug failed requests. Previously, when a request failed, the log ID was not included in the error response. This improvement includes latency information in failed request responses and ensures background tasks complete even when errors occur. ## August 2023 Introduced the ability to share playground conversations with others. Users can now generate shareable links for their playground chat sessions, continue from shared conversations, and control visibility settings. The feature includes a new database schema to track shared conversations with unique hashes, titles, and continuation chains, along with a new `/playground` API endpoint to support this functionality. Added support for CodeLlama 13B (replicate/codellama-13b), a 13-billion-parameter Llama model tuned for code completion with 2048 max tokens. The model is initially disabled by default and uses an optimized tokenization approach that doesn't require loading the full LLaMA tokenizer. Added support for Anthropic's Claude 2 model (anthropic/claude-2) with 100,000 token context window. Claude 2 is described as Anthropic's best-in-class offering with superior performance on tasks that require complex reasoning. This update also upgrades the Anthropic SDK to version 0.3.11 with improved error handling for timeouts, connection errors, and rate limits. Added a new Prometheus-compatible metrics endpoint to expose model performance and usage statistics. The ModelMonitor has been updated to use a new response type that better integrates with Prometheus monitoring and alerting systems, enabling improved observability of model API calls, latencies, and error rates. Fixed a security issue where soft-deleted apps (is\_active=False) could still be accessed through API keys, log queries, and shared playground conversations. Now all database queries properly filter out inactive apps, ensuring deleted apps are completely inaccessible. Also added proper 404 error handling for shared playground conversations when no chats are found. Enhanced the model selection algorithm to prioritize models based on their performance in specific task categories (e.g., coding, reasoning, writing). When selecting a model for a particular task, the system now uses the model's category-specific quality score if available, falling back to the average across all categories only when needed. The scoring formula was also improved to properly weight quality (higher is better), latency (lower is better), and cost (lower is better) with normalized values in the 0-1 range. The Playground now supports benchmarking up to 3 models simultaneously, increased from the previous limit of 2 models. This allows users to compare performance and outputs across three different language models in parallel during completion requests, making it easier to evaluate model behavior side-by-side. Added a new `/healthz` endpoint that monitors API service health by checking database and Redis connectivity. The endpoint returns detailed status information including whether each service is active, individual latency measurements for database and Redis operations in milliseconds, total request latency, and current server time. This enables better monitoring and troubleshooting of the API infrastructure. Implemented Playground V2 with the ability to share playground conversations via unique shared IDs. The playground now returns normalized request logs with full conversation history instead of simple message pairs. Added support for custom model weights during app creation and improved model ranking responses with detailed metadata including score breakdowns and reasoning. Multiple API endpoints including logs, models, and app updates can now be accessed using App-specific API Keys in addition to user tokens. When using an App's API Key, log endpoints automatically filter to only show logs for that specific app. Additionally, fixed an issue where model weights would not be properly applied when ranking models - the system now correctly falls back to the app's default settings when custom weights are not provided. Added API key authentication support to the `/models/rank` endpoint, allowing users to retrieve ranked best model recommendations programmatically. The endpoint now accepts playground completion requests with configurable parameters including messages, temperature, max\_tokens, and a new `max_num_models` parameter (default: 2) to control how many top-ranked models are returned with their scores. Previously, this model ranking functionality was only available through other authentication methods. API responses now include detailed model scoring information in the metadata, showing the ranking and scores of the top models considered for each request. This provides transparency into how the system selected the best model, including quality, latency, and overall scores for the top candidates evaluated by the engine. ## July 2023 Implemented a comprehensive email invitation system that sends welcome emails to users invited to join an organization. New users receive an email with an email verification link, while existing verified users receive a welcome email without verification. The system includes Auth0 integration for email verification, custom email templates with organization and inviter details, and automatic handling of verification tickets. Users are redirected to the platform after completing the invitation process. Introduced a comprehensive billing system that allows organizations to manage payment methods and top up account credits through Stripe. Organizations now have a credit balance tracking system, can add and store payment methods (cards), view payment information, and perform top-ups that automatically update their account balance. The system includes support for spending limits (soft and hard) and creates transaction history for all balance changes. New users automatically get a Stripe customer account created during registration. Significantly improved the speed of model selection by optimizing the model transformer initialization. The sentence transformer model (paraphrase-distilroberta-base-v1) is now loaded once globally instead of per-request, reducing latency for prompt classification. Additionally, the scoring algorithm now skips computation entirely when users specify a model directly, further improving response times. Enhanced the Playground's model recommendation engine to consider the complete chat completion request payload when suggesting best models. The engine now receives full context including messages, parameters, and settings to provide more accurate model recommendations tailored to your specific request. Updated invitation error messages to be clearer and more actionable, including specific guidance when invitations are not found, have incorrect status, or belong to different emails. Fixed a bug in name parsing where users with single-word names (no space in full name) would cause errors, now properly handles names without spaces by setting last name as empty string. The terminology for managing API access has been updated from 'Keys' to 'Apps' across the entire application. This includes renaming the /keys API endpoint to /apps, updating database tables and columns (key → app, key\_configuration → app\_settings, key\_id → app\_id), and revising all related UI references. This change provides clearer terminology that better reflects the concept of managing application configurations rather than just API keys. Users can now override default model selection weights (quality, cost, latency) by passing a custom 'weights' object in the custom-labels header. This allows fine-grained control over model selection criteria on a per-request basis, enabling users to optimize for specific priorities like cost-efficiency or response quality without changing their API key configuration. Enhanced the logs endpoint to support filtering capabilities through a new FilterLogsRequest parameter, allowing users to narrow down their log queries. Additionally, strengthened security by enforcing organization-level access control across all key and log operations, ensuring users can only access logs and API keys belonging to their organization. Added a response\_text column to the request table for improved log readability. Added support for two new synthetic model identifiers: 'pulze' and 'pulze-v0'. When users specify either of these models in their API requests, the system automatically routes to the optimal underlying model through Pulze's intelligent selection. These models bypass the standard allowed model list and are available to all API keys, enabling users to leverage Pulze's model routing without specifying a particular provider's model. Users' Terms of Service and Privacy Policy acceptance is now tracked with timestamps. The API now returns the last review dates for both documents through a new /general/settings endpoint, allowing the frontend to display when users last accepted these policies. Added a new /general/accept-terms endpoint to record when users accept either the privacy policy or terms of service. Apps (API keys) created without a description now automatically receive a randomly generated name combining an adjective and noun (e.g., 'autumn\_waterfall', 'silent\_moon'). Previously, Apps could be created with no name, making them difficult to identify. This improvement makes it easier to distinguish between multiple Apps in your organization. Corrected a bug where the Errors and Latency graphs were displaying each other's data. The Errors graph now correctly shows the count of failed requests (status code >= 400), while the Latency graph displays average request latency in seconds. Additionally, the Savings graph now displays positive values instead of negative values, and graph calculations now properly handle datasets with zero values. Switched the model monitoring from gpt-neo-20b to replicate/dolly-v2-12b. Additionally, the falcon-40b-instruct model has been deactivated and is no longer available for use. This change updates the model lineup to use Dolly v2 12B from Replicate as the monitored model. The dashboard now allows the API to control which graphs are displayed and whether data should be shown cumulatively or as individual data points. This includes four main graphs: Requests (blue), Errors (red), Latency (blue), and Cost Savings (green). Users can now toggle between cumulative and non-cumulative views for each graph independently through the API configuration, providing more flexible data visualization options. Added two new AI providers to the knowledge graph: AlephAlpha with luminous-supreme model (1990 token limit, \$0.000038 per token) and Hugging Face integration. The knowledge graph has been updated with new performance metrics across 20 content categories for existing models, showing recalibrated scoring data for better model selection and routing. Refreshed the visual appearance of analytics graphs to use the official Pulze color palette. Requests and Latency graphs now display in Pulze blue (#017EFA), Error graphs in red (#EF4444), and Savings graphs in green (#14BD81) for improved brand consistency and visual clarity. Colored logging is now disabled by default and can be enabled by setting the PULZE\_LOG\_COLOR environment variable to 'True'. Previously, colored output was always enabled, which could cause issues in environments that don't support ANSI color codes. The logger now defaults to plain text formatting unless explicitly configured otherwise. GooseAI models gpt-j-6b and gpt-neo-20b (both with 2048 max tokens) have been disabled by default and are no longer active for use. These models can still be manually enabled if needed, but will not be available in the default model selection. Cleaned up four legacy OpenAI base models (davinci, curie, babbage, and ada) from the knowledge graph seed data. These older base model configurations have been removed while their versioned counterparts (text-davinci-003, text-curie-001, text-babbage-001, and text-ada-001) remain available. This streamlines the model catalog by removing duplicate entries for deprecated model naming conventions. Added a new cumulative cost savings graph that tracks the total amount saved over time by using the platform. The graph displays savings in dollars and accumulates values across the selected time period, providing visibility into overall cost optimization. The requests graph color was also updated from green to blue to differentiate it from the new green savings graph. Added support for Replicate as a new AI provider, including the dolly-v2-12b model (replicate/dolly-v2-12b). This integration includes automatic token counting using the databricks tokenizer, cost calculation, and full completion API support with configurable temperature and max\_tokens parameters. Added a new endpoint '/playground-best-models' that returns the top 2 recommended models based on the engine's analysis of your completion request. Enhanced the API key management system to track newly available models and provide model update information when retrieving or updating keys. Added a new '/merge-models' endpoint that allows enabling or disabling newly available models for specific API keys in bulk. Added support for Huggingface as a new LLM provider, enabling access to the open-source falcon-40b-instruct model. The integration includes automatic token counting using the model's tokenizer, cost calculation, and error handling for endpoint availability. Users can now select Huggingface models through the API alongside existing providers like OpenAI, Anthropic, and Cohere. ## June 2023 User profiles are now automatically updated with the latest information from Auth0 during both sign-in and sign-up. This ensures profile details like name, email, email verification status, and profile picture remain synchronized between authentication provider and the application. Previously, existing users' information was not updated after initial registration. Fixed a bug in the GooseAI provider where usage data was being read from the wrong response object (response\_openai instead of res), which would cause errors when processing completion requests. This ensures that token usage information is correctly extracted from the API response. Fixed a critical bug in the GooseAI provider where the 'choices' key was not being properly extracted from API responses, which would cause request failures. The fix ensures that completion choices are now correctly populated in the response object before calculating tokens and metadata. Improved latency tracking across all LLM providers with standardized measurement. The Aleph Alpha provider now includes latency metrics in response metadata, matching the behavior of other providers like OpenAI, Anthropic, and Cohere. Latency is measured in seconds and rounded to 4 decimal places for consistency. Fixed a calculation error in the dashboard's time interval generation that caused incorrect minute-level data grouping for time periods before 1AM. The issue was caused by calculating the range size in hours and multiplying by 60, instead of directly calculating the range in minutes. This affected minute-granularity charts displaying data for time ranges under 8 hours. Users can now retrieve a specific log entry by its unique ID through the GET /logs/{log_id} endpoint. This enhancement improves log inspection capabilities by allowing direct access to individual log records instead of only viewing paginated lists. The endpoint includes proper authorization checks to ensure users can only access logs from keys within their organization. Fixed date format in statistics graph responses by removing the incorrect '.000Z' timezone suffix. Date fields now return in the format 'YYYY-MM-DDTHH:MM:SS' instead of 'YYYY-MM-DDTHH:MM:SS.000Z', providing more accurate timestamp representation without falsely implying UTC timezone when timezone information wasn't being properly set. Added support for AI21 Labs models (j2-ultra, j2-mid, j2-light with 8191 token context) and Aleph Alpha models (luminous-supreme and luminous-supreme-control with 1990 token context). All models include performance scores across 20 categories including Arts & Crafts, Technology & Gadgets, Business & Finance, and more. Updated knowledge graph to version 2023-06-22 with complete category scoring for intelligent model routing. Added support for AlephAlpha's Luminous model family with 4 models: luminous-supreme (largest, best for creative writing), luminous-supreme-control, luminous-base-control (fastest and cheapest, ideal for classification), and luminous-extended-control. All models support up to 1,990 max tokens and are optimized for different use cases including information extraction, language simplification, classification, and labeling tasks. Fixed an issue where API responses from model endpoints could be missing the 'created' timestamp field. The timestamp generation has been moved from the database layer to the engine provider layer, ensuring all responses include a proper Unix timestamp indicating when the response was created, even when errors occur during request processing. Resolved a critical bug where OpenAI API responses were being overwritten, causing the service to break. The fix ensures that response data from OpenAI's completion and chat completion endpoints is now properly preserved by storing the API response in a separate variable (response\_openai) before extracting choices and usage data into the final response object. This affects both OpenAI text completion and chat completion models. Improved the playground chat interface to display the active optimization goal (e.g., 'Optimizing for: cost' or 'Optimizing for: latency') as a separate label. The weight distribution label now uses an 'info' style instead of 'success' for better visual distinction between the optimization goal and weight values. Expanded API provider support by adding three new vendors: AI21 Labs, Aleph Alpha, and Anthropic. These providers are now included in the API key seeding configuration alongside existing providers (Cohere, GooseAI, and OpenAI), allowing users to configure and use models from these additional vendors through the API. Added support for AI21 Labs (now called AI21 Studio) as a new model provider. Users can now access AI21 Labs models including j2-mid for text completion tasks. The integration includes full support for token counting, cost calculation, and latency tracking with configurable parameters like temperature, top\_p, max\_tokens, and number of results. Added integration with Aleph Alpha as a new AI model provider. Users can now access Aleph Alpha's language models through the API, with full support for completions, token counting, cost calculation, and latency tracking. The provider is now available alongside existing providers (OpenAI, Anthropic, Cohere, GooseAI). Added support for AI21 Labs (now called AI21 Studio) as a new AI provider. Users can now access AI21's J2-Mid model through the platform for text completion tasks. The integration includes full support for model parameters like temperature, top\_p, max\_tokens, and multiple completions (n parameter), along with token usage tracking and cost calculation. Corrected a typo in the Anthropic provider where the temperature parameter was misspelled as 'temperatur', causing the temperature setting to be ignored during API calls. This fix ensures that temperature values are now properly applied to Anthropic model completions, allowing users to correctly control response randomness. Dashboard analytics now support per-minute granularity for time ranges under 8 hours, enabling more detailed monitoring of recent API usage patterns. Added the ability to filter dashboard metrics by specific API keys, allowing users to analyze performance and costs for individual keys. The key creation response now includes the key\_id field for easier reference in filtering. Updated the model knowledge graph to the 2023-06-15 version with enhanced data accuracy. Improved token usage tracking by changing category types from integers to floats for more precise cost calculations and model weight handling. Fixed token usage validation in OpenAI providers to properly handle new usage data fields and log warnings for unknown keys. Added a new API endpoint at /models that allows users to retrieve a list of active models based on their API key permissions and current model availability. This endpoint validates the user's API key and returns models filtered by their account's model settings and the knowledge graph stored in Redis. Improved API response times by configuring the Cloud Run autoscaler to maintain a minimum of 25 instances running at all times. This eliminates cold starts for most requests, ensuring faster and more consistent API response times, especially during periods of low traffic or after idle periods. Changed the API to process requests sequentially (concurrency=1) instead of using async threadpools to resolve thread safety issues with PyTorch model instances and SentenceTransformer. CPU allocation reduced from 4 cores to 1 core per container to match the new single-request processing model. This ensures more stable and reliable request processing, though individual requests may now be handled one at a time per worker instance. Introduced a new playground API endpoint that allows users to test chat completions directly through the UI. The playground now displays comprehensive response metadata including provider and model information, latency metrics, cost estimates, and configurable quality/speed/cost weighting preferences. Users can save and review their playground conversations with detailed labels showing performance characteristics. Switched the prompt categorization system back to the SBERT (Sentence-BERT) model using 'paraphrase-distilroberta-base-v1' for better accuracy, replacing the previous TF-IDF vectorizer approach. Additionally, enabled multi-worker processing in both development and production environments by utilizing all available CPU cores, which significantly improves API throughput and request handling capacity. Introduced a new Dashboard API endpoint that provides comprehensive usage statistics including request counts, token usage, cost savings metrics, and latency data. Added playground functionality with mock fetch capabilities for testing API requests. Enhanced database views to track latency metrics from request metadata for better performance monitoring. Added support for Anthropic's Claude v1 models for text completion requests. The integration includes full token calculation, cost tracking, and usage metrics. Users can now access Claude models through the Pulze API with automatic prompt formatting using Anthropic's HUMAN\_PROMPT and AI\_PROMPT format, with support for configurable temperature and max tokens parameters. Fixed token counting for Cohere API responses when using the text\_completion endpoint. The system was incorrectly looking for tokens in choice\['message']\['content'] instead of choice\['text'], which caused token calculation failures. Also corrected the provider name from 'cohereai' to 'cohere' in the knowledge graph configuration. Standardized all text completion request types from 'completions' to 'text\_completions' across all providers for consistency. Fixed Cohere text completion response format to return 'text' field directly instead of wrapping it in a 'message' object with role and content, aligning with standard text completion response structure. The /logs endpoint now returns the HTTP status code for each request in the response data. This allows users to see the status code (e.g., 200, 400, 500) alongside other request details like prompt, payload, and response, making it easier to track and debug API request outcomes. Fixed a configuration bug where the least\_connection mode setting had trailing whitespace for Cohere, OpenAI, and GooseAI vendor environment variables. This whitespace could have caused the mode to be incorrectly interpreted, potentially affecting load balancing behavior across these API providers. Fixed a critical issue where the API key validation system would crash when attempting to decode Redis values that were missing or returned unexpected data types. The system now gracefully handles these exceptions by logging the error and returning false for invalid keys, preventing service disruptions during rate limit checks. Fixed an issue where Cohere API keys were not being included when seeding the Redis database with provider API keys. The seeding script now properly includes Cohere alongside OpenAI and GooseAI providers. Added verification logging to confirm all three providers (Cohere, OpenAI, GooseAI) are successfully seeded. Renamed the 'cohereai' provider to 'cohere' throughout the system for consistency with the official provider name. This affects the Cohere command and command-light models (4096 token max). Added infrastructure support for Cohere API keys in cloud deployments with least-connection load balancing mode. Added initial support for CohereAI as a new LLM provider, enabling users to access Cohere's language models through the completions API. The integration includes automatic token calculation, cost tracking, latency metrics, and full error handling. Additionally, fixed the scoring algorithm to properly handle inverse metrics for cost and latency optimization when selecting models. Enhanced the log export process to properly exit with error status code 1 when aborting due to missing 'good\_answer' column tables. This ensures that automated scripts and CI/CD pipelines can correctly detect when log export operations fail, preventing silent failures in production environments. Fixed an issue where users with only a personal organization would receive an empty organization list. The API now correctly returns all organizations a user belongs to, including their personal organization, ensuring users always see their available organizations. Users can now rate API request logs with a thumbs up/down and provide textual feedback through a new rating endpoint. Organization profiles can now be updated with comprehensive billing information including address fields (address\_1, address\_2, city, zip, state, country), billing email, and spending limits (soft and hard limits). Personal organizations are restricted from these modifications. When the service is over capacity or hits rate limits, users now receive a clear, actionable error message: 'We are currently over capacity. Please try again later, and if the problem persists, contact [support@pulze.ai](mailto:support@pulze.ai) for further assistance.' Previously, users would see raw technical error messages. Additionally, error details are now logged for better debugging and support. Fixed an issue where rate limit errors (HTTP 429) were not being properly formatted when returned to users. The error detail is now converted to a string format, ensuring error messages are displayed correctly instead of potentially showing object representations. This affects all API endpoints that enforce rate limiting. Updated the error message shown when the service is over capacity from 'We are over capacity' to 'We are currently over capacity' to provide clearer communication about the temporary nature of the issue. This message appears when OpenAI or GooseAI API keys are unavailable, and continues to direct users to contact [support@pulze.ai](mailto:support@pulze.ai) if problems persist. Added proper error handling for when the service exceeds capacity and Redis cannot provide valid API keys. Users now receive a clear HTTP 429 (Too Many Requests) error message stating "We are over capacity. Please try again later, and if the problem persists, contact [support@pulze.ai](mailto:support@pulze.ai) for further assistance" instead of experiencing undefined behavior. This applies to OpenAI and GooseAI API endpoints. The OpenAPI specification endpoint (/api/v1/openapi.json) is now disabled by default to prevent exposing internal API schema details in production. This security enhancement prevents potential attackers from viewing the complete API structure and available endpoints. The endpoint can still be manually enabled in development environments if needed. Added support for a new 'Custom-Labels' header that allows users to attach custom key-value labels to API requests for tracking and categorization purposes. Labels are passed as JSON objects in the header and are returned in the response metadata, enabling users to organize and filter requests by custom dimensions like environment (internal/external), request type, or any other custom attributes. Users can now specify a target model in their API requests using the 'model' parameter, allowing direct model selection while still validating against allowed models for the API key. The engine will use the specified model instead of the automatic scoring system, enabling manual model selection for testing and specific use cases. Added friendly error handling with HTTP 418 status for requests targeting the 'pulze' model name. ## May 2023 Added intelligent load balancing system to distribute requests across multiple API keys for OpenAI and GooseAI providers. The system supports least-connection mode to select API keys with the lowest active request count, respects rate limits (RPM - requests per minute) for each key, and automatically tracks usage counters in Redis. This helps prevent rate limit errors and improves reliability by distributing load across available API keys. API Keys table now supports sorting by multiple columns with ascending or descending order. Users can sort keys by any column (such as creation date, name, or request counts) and specify custom sort orders. The endpoint was changed from GET to POST to accept sorting parameters including column name, sort direction (asc/desc), and whether to enable multiple column sorting. Changed billing endpoint permissions from Viewer to Editor level, requiring Editor access to view billing information across all time periods (minute-by-minute, daily, and monthly usage). Additionally, updated organization admin list visibility to require Admin permissions instead of Viewer permissions, and removed the Admin permission requirement for creating new organizations, allowing any authenticated user to create their first organization. Introduced a new 'cost\_savings' field in API response metadata that calculates potential savings based on provider and usage patterns. Improved cost calculation by computing costs before adding to metadata, ensuring more accurate estimates. The cost savings feature compares actual costs against baseline costs to show users how much they're saving through optimized routing and model selection. Implemented validation logic to verify that models are both allowed for an API key and active in the knowledge graph before processing requests. The system now returns clear error messages when the knowledge graph is unavailable (503 Service Unavailable) or when no valid models are found for an API key (404 Not Found), preventing requests from failing silently and improving debugging capabilities. Enhanced error handling across OpenAI and GooseAI API endpoints to return more accurate HTTP status codes. Rate limit errors now properly return HTTP 429 (Too Many Requests) instead of 404, invalid request errors return HTTP 400 (Bad Request) with detailed error messages, and unexpected errors return HTTP 500 (Internal Server Error). This provides clearer feedback when API calls fail and helps developers debug issues more effectively. Enhanced the API server to properly handle HTTPS connections when deployed behind reverse proxies like Nginx. Added automatic HTTPS redirect middleware to ensure all connections use secure protocols, and configured Uvicorn with forwarded-allow-ips='\*' to correctly process forwarded headers from proxy servers. This replaces the previous custom redirect handling middleware with a more robust solution. Removed the latency column from request tracking and the average latency metric from API key statistics. The API key list view no longer displays average response latency per key, simplifying the metrics shown to focus on request counts (total requests and last week requests) along with status codes. Merged organization members and invites into a single endpoint that displays both existing members and pending invitations together. The API key list now includes creator information (name and picture), request statistics (total requests and last week's requests), and average latency metrics for each key. Also added the ability to soft-delete API keys by marking them as inactive instead of permanently removing them. Fixed an issue where API requests with trailing slashes were being automatically redirected (HTTP 307) to URLs without trailing slashes, which could cause problems with certain API clients. The new middleware now properly handles routes regardless of trailing slashes without performing redirects, ensuring more predictable API behavior. Fixed an issue where API requests without trailing slashes were being automatically redirected, causing problems with certain API calls. The API router now handles URLs consistently regardless of whether they end with a trailing slash, preventing unexpected redirect behavior that could break integrations or cause request failures. Fixed an issue in the chat format converter where messages with roles other than 'user' or 'assistant' would cause errors or be silently dropped. The converter now includes a fallback handler that preserves the content of messages with unrecognized roles, ensuring all chat messages are properly processed. Fixed a bug in the chat message format converter where non-list chat messages would result in an empty string being returned. The function now correctly processes individual chat messages by properly assigning the formatted prompt string to the return variable, ensuring chat messages are converted to the expected format (User: ... / Assistant: ...) even when processing single messages. Fixed organization invitation permissions to properly use the standardized PERMISSIONS.VIEWER.ALL constant instead of the hardcoded 'view:all' string. This ensures invited users receive the correct viewer-level permissions when joining an organization. Additionally improved permission error messages to show both required and actual user permissions for easier debugging. Replaced simple role flags (is\_admin, is\_editor, is\_viewer) with a flexible permission-based system that allows fine-grained access control per resource type. Organization members now have specific permissions (like VIEWER.KEY, EDITOR.KEY) instead of broad role assignments, enabling more precise control over who can view, edit, or manage API keys, logs, and billing information. This change affects all endpoints including keys, logs, billing, and organization management. Implemented a complete user authentication system including a user table with Auth0 integration, post-registration/post-login webhook handlers that automatically create user records, and an organization invitation system. Users can now be invited to organizations via email with tracked invitation statuses (pending, accepted, declined), and the system handles both email/password and social login (Google, GitHub) authentication methods while keeping user data synchronized with Auth0. The Makefile command for cleaning up local development has been renamed from 'make cleanup' to 'make clean' for consistency with standard conventions. Additionally, fixed race condition errors that occurred when running the cleanup command if containers (redis-stack, pulzeai-db, or pulzeai-backend) were not already running - the command now checks if containers exist before attempting to stop them and provides informative messages. Updated role-based access control (RBAC) permissions across multiple endpoints. Billing and logs endpoints now allow viewer-level access (previously required admin/editor), enabling more team members to view usage data without edit permissions. Organization and API key management endpoints adjusted to require editor permissions instead of admin for delete/update operations. The org list endpoint now returns detailed role information (admin, editor, viewer) for each organization the user belongs to. Added multi-tenant organization functionality allowing users to belong to multiple organizations with role-based access control. Organizations now support personal and shared workspaces, with member tracking including last login and join dates. API requests are now scoped to organizations, and billing endpoints enforce organization-based permissions (admin and editor roles) instead of user-level authorization. Introduced a new database seeding script (scripts/seed\_database.sh) to streamline local development environment setup. Developers can now populate their local database with test data by running a single command after starting the development server. The documentation has been updated with clear instructions to run the seeding script as the second step in the local setup process. Updated the frontend CORS allowed origins to use port 5173 instead of port 3000, matching the default Vite development server port. This fixes cross-origin request issues when running the frontend locally. The outdated localhost:3000 and local.auth:3000 endpoints have been removed from the allowed origins list. Enhanced the setup documentation with detailed Poetry installation instructions, including the command and troubleshooting steps for certificate issues. Also clarified that Docker must be installed and running before running the application locally, and corrected the repository URL to the official pulze/api location. Enhanced the latency optimization mode to filter model candidates based on API key model\_settings restrictions before evaluating latency metrics. This ensures that when using latency-optimized requests, only models explicitly allowed in your API key configuration will be considered, preventing failed requests to restricted models and improving response reliability. Improved API authentication security by validating that API tokens are both valid and active before allowing access. Previously, the system only checked if a token existed in the database; now it also verifies the token's `is_active` status, preventing inactive or revoked tokens from being used. Additionally, updated the unauthorized error message from 'Bearer token missing or unknown' to a more concise 'Unauthorized' response for both chat completions and completions endpoints. API keys can now be configured with allowed model routing settings. When validating API keys, the system now returns model\_settings and key\_configuration parameters that control which models each key can access, enabling fine-grained access control and model routing at the key level. Implemented Redis-based rate limiting for API requests with a hardcoded limit of 50 requests per minute per API key. When the rate limit is exceeded, users will receive a 429 HTTP error with details about their current request count. Rate limits are tracked using one-minute time windows and automatically reset each minute. Fixed a security issue where API token scopes and permissions were not being validated during authentication. The system now correctly checks that tokens have the required scopes (space-separated string values) and permissions (list values) before granting access to protected endpoints, preventing unauthorized access to API resources. Added support for customizable model settings and key configurations when creating API keys. Keys now include an is\_active status flag for better lifecycle management. Also added the ability to update existing API keys through a new PUT endpoint, allowing users to modify key settings after creation. Fixed an issue where no error message was returned when the time threshold was exceeded before generating any results. Now displays a clear error message suggesting to increase the time limit from the current value to double (e.g., if limit is 5s, suggests increasing to 10s), along with information about which execution modes were attempted.