🌐 Make websites accessible for AI agents. Automate tasks online with ease.
### Browser Use Version 0.2.6 ### Bug Description, Steps to Reproduce, Screenshots 1. I have a prompt that directs `browser-use` _(`0.2.6`, latest release at the time of writing)_ to go to a specific documentation page, and from it generate code from it. 2. Using `gemini-2.5-pro-preview-05-06` _(But I've had the same issue with other models, `gpt-4o`, on an older version of `browser-use`:`0.2.5`)_ 3. I provide an `output_model` that matches the same JSON format requested in my prompt 4. The navigation and extraction goes well, but the step right after the extraction, it tries and fails to output a valid structure. - when debugging at `if response.get('parsing_error') and 'raw' in response:` in [`browser_use/agent/service.py:1106`](https://github.com/browser-use/browser-use/blob/59215cd1f2c1b925bf259cb9336a983cea619045/browser_use/agent/service.py#L732) I can see that the parsing_error is : ```bash 1 validation error for AgentOutput action.0.done.data.operations Input should be a valid dictionary [type=dict_type, input_value='{\n "AirShopping": {\...n }\\n}\n"\n }\n}', input_type=str] For further information visit https://errors.pydantic.dev/2.11/v/dict_type ``` _(I understand that the model is wrongly outputting a stringified JSON instead of JSON itself, this is **not** the bug I'm reporting)_ 5. The issue is that in this `if`, it seemingly ignores the parsing error, and simply set `parsed` to the same `AgentOutput`, starting an **silent** infinite loop of output parsing failures that will run until the max steps. The bug I'm reporting here is that this sequence of event should trigger the normal ❌ failure message, with 3 retries by default, instead of sending it in an infinite loop. - Are there only specific conditions for using `output_model` ? - I have read https://github.com/browser-use/browser-use/issues/1587 but I am not using any custom controller actions, only `Controller(output_model=output_model)` > _(Related but irrelevant to this bug report, I also don't understand why the LLM fails to output proper JSON, specifically stringifying the inner structure, is that a known Gemini issue ?)_ ### Failing Python Code ```python async def run(self, task: str, message_context: str, output_model: type[BaseModel] = None): llm = ChatGoogleGenerativeAI(model="gemini-2.5-pro-preview-05-06") browser_profile = BrowserProfile( wait_for_network_idle_page_load_time=3.0, viewport_expansion=-1, keep_alive=True ) browser_session = BrowserSession( browser_profile=browser_profile, ) controller = Controller(output_model=output_model) extend_system_message = "You\'re a research assistant. Your job is to use your browser_tool to navigate within your allowed pages to find the answer to the user\'s question. You decide which pages to visit (only visit the ones relevant to the user\'s question) and in what order." agent = Agent( llm=llm, controller=controller, browser_session=self.browser_session, use_vision=True, extend_system_message=extend_system_message, message_context=message_context, task=task, ) history = await agent.run() result = history.final_result() return output_model.model_validate_json(result) ``` ### LLM Model gemini-2.5-pro ### Operating System & Browser Versions Ubuntu 20.04 ### Full DEBUG Log Output ```shell INFO [browser_use.telemetry.service] Anonymized telemetry enabled. See https://docs.browser-use.com/development/telemetry for more information. INFO [browser_use.BrowserSession⛶5a96.20] 🌎 Launching new local browser playwright:chromium keep_alive=True user_data_dir= ~/.config/browseruse/profiles/default INFO [browser_use.Agent✻5af5 on ⛶5af5.20] 🧠 Starting a browser-use agent 0.2.6 with base_model=models/gemini-2.5-pro-preview-05-06 +tools +vision +memory extraction_model=None INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🚀 Starting task:REDACTED PROMPT DESCRIBING THE CODE TO GENERATED BASED ON THE DOCUMENTATION PAGE TO VISIT, ENDING WITH THE FOLLOWING : **Output:** Output the following JSON format : { "search_method": ... The **complete** code for the `search` method ... "operations": { "OPERATION_NAME": { "request": ... The **complete** code for this operation's request class ... "response": ... The **complete** code for this operation's response class ... } ... } } INFO [browser_use.BrowserSession⛶5a96.20] ➡️ Page navigation [0]about:blank used 0.0 KB in 3.08s INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 📍 Step 1: Evaluating page with 0 interactive elements on: about:blank ==================================================================================================== INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🧠 LLM call => ChatGoogleGenerativeAI [✉️ 8 msg, ~9216 tk, 44071 char, 📷 img] => JSON out + 🔨 20 tools (function_calling) INFO [browser_use.Agent✻5af5 on ⛶5a96.24] ❓ Eval: Unknown INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🧠 Memory: Starting the task. I need to generate PHP code for a flight search service. The first step is to navigate to the API documentation. INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🎯 Next goal: Navigate to the documentation page to find information about the AirShopping endpoint. INFO [browser_use.controller.service] 🔗 Navigated to REDACTED_URL INFO [browser_use.Agent✻5af5 on ⛶5a96.24] ☑️ Executed action 1/1: go_to_url(url='REDACTED_URL') INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 📍 Step 2: Ran 1 actions in 13.87s: ✅ 1 <browser_use.agent.service.Agent object at 0x7f55a8930590> INFO [browser_use.BrowserSession⛶5a96.20] ➡️ Page navigation [0]REDACTED_URL used 14.2 KB in 3.04s INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 📍 Step 2: Evaluating page with 49 interactive elements on: REDACTED_URL ==================================================================================================== INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🧠 LLM call => ChatGoogleGenerativeAI [✉️ 11 msg, ~10833 tk, 525904 char, 📷 img] => JSON out + 🔨 20 tools (function_calling) INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 👍 Eval: Success INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🧠 Memory: Navigated to the API documentation page. The next step is to find the AirShopping endpoint. INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🎯 Next goal: Click on the AirShopping endpoint in the navigation menu to view its documentation. INFO [browser_use.controller.service] 🖱️ Clicked button with index 18: 04. AirShopping INFO [browser_use.Agent✻5af5 on ⛶5a96.24] ☑️ Executed action 1/1: click_element_by_index(index=18 xpath=None) INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 📍 Step 3: Ran 1 actions in 10.73s: ✅ 1 <browser_use.agent.service.Agent object at 0x7f55a8930590> INFO [browser_use.BrowserSession⛶5a96.20] ➡️ Page navigation [0]REDACTED_URL used 16.1 KB in 3.04s INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 📍 Step 3: Evaluating page with 608 interactive elements on: REDACTED_URL ==================================================================================================== INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🧠 LLM call => ChatGoogleGenerativeAI [✉️ 14 msg, ~35227 tk, 335879 char, 📷 img] => JSON out + 🔨 20 tools (function_calling) INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 👍 Eval: Success INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🧠 Memory: Navigated to the AirShopping endpoint documentation. Now I need to extract the request and response details. INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🎯 Next goal: Extract the request and response information for the AirShopping endpoint to generate the PHP classes. INFO [browser_use.controller.service] 📄 Extracted from page : REDACTED LONG CONTENT OF THE PAGE'S PROPERLY EXTRACTED CONTENT INFO [browser_use.Agent✻5af5 on ⛶5a96.24] ☑️ Executed action 1/1: extract_content(goal='Extract request and response details for AirShopping endpoint' include_links=False) INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 📍 Step 4: Ran 1 actions in 683.60s: ✅ 1 <browser_use.agent.service.Agent object at 0x7f55a8930590> INFO [browser_use.BrowserSession⛶5a96.20] ➡️ Page navigation [0]REDACTED_URL used 16.1 KB in 3.08s INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 📍 Step 4: Evaluating page with 608 interactive elements on: REDACTED_URL ==================================================================================================== INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🧠 LLM call => ChatGoogleGenerativeAI [✉️ 17 msg, ~53633 tk, 390624 char, 📷 img] => JSON out + 🔨 20 tools (function_calling) WARNING [browser_use.agent.message_manager.utils] Failed to parse model output: Expecting value: line 1 column 1 (char 0) INFO [browser_use.Agent✻5af5 on ⛶5a96.24] ❓ Eval: Executing action INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🧠 Memory: Using tool call INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🎯 Next goal: Execute AgentOutput INFO [browser_use.Agent✻5af5 on ⛶5a96.24] ☑️ Executed action 1/1: unknown() INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 📍 Step 5: Ran 1 actions in 2155.07s: ✅ 1 <browser_use.agent.service.Agent object at 0x7f55a8930590> INFO [browser_use.BrowserSession⛶5a96.20] ➡️ Page navigation [0]REDACTED_URL used 16.1 KB in 3.05s INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 📍 Step 5: Evaluating page with 608 interactive elements on: REDACTED_URL ================================================================================================================================================================================= INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🧠 LLM call => ChatGoogleGenerativeAI [✉️ 19 msg, ~53704 tk, 390624 char, 📷 img] => JSON out + 🔨 20 tools (function_calling) INFO [browser_use.Agent✻5af5 on ⛶5a96.24] ❓ Eval: Executing action INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🧠 Memory: Using tool call INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🎯 Next goal: Execute AgentOutput INFO [browser_use.Agent✻5af5 on ⛶5a96.24] ☑️ Executed action 1/1: unknown() INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 📍 Step 6: Ran 1 actions in 225.30s: ✅ 1 <browser_use.agent.service.Agent object at 0x7f55a8930590> INFO [browser_use.BrowserSession⛶5a96.20] ➡️ Page navigation [0]REDACTED_URL used 16.1 KB in 3.03s INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 📍 Step 6: Evaluating page with 608 interactive elements on: REDACTED_URL ================================================================================================================================================================================= INFO [browser_use.Agent✻5af5 on ⛶5a96.24] 🧠 LLM call => ChatGoogleGenerativeAI [✉️ 21 msg, ~53775 tk, 390624 char, 📷 img] => JSON out + 🔨 20 tools (function_calling) ```
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be still under discussion. The issue was opened by pascal-gervais-momentum and has received 0 comments.