๐ Make websites accessible for AI agents. Automate tasks online with ease.
### Bug Description I am working on a test case automation PoC using BrowserUse and Playwright. I have defined the task with the detailed steps for the test-cases, but I have also implemented fallback mechanism, i.e., @controller.action(โโ) functions for those individual test-cases. So I have 4 test cases defined. All of them have their individual @controller.action(โโ) functions implemented as fallback mechanism, and also for all of them I have defined the steps within the task. The issue is- The AI is inconsistent. 1. Sometimes it uses the custom @controller.action(โโ) functions for all the 4 test cases. 2. Sometimes it uses the custom @controller.action(โโ) functions for 2-3 test cases, and then for the rest it uses its own brain as in the steps defined in the task = โโ However, I want it to always use the custom @controller.action(โโ) functions as long as they are defined. How do I do that? ### Reproduction Steps Simply define the test cases within the task prompt for the Agent. All of them should have their individual @controller.action(โโ) functions implemented as fallback mechanism. ### Code Sample ```python #-----------------------------------Test Case 1----------------------------------- @controller.action('Verify that the user can log in to the application using valid credentials.') async def verify_user_login(browser: BrowserContext): page = await browser.get_current_page() try: pass except Exception as e: pass #-----------------------------------Test Case 2----------------------------------- @controller.action('Verify that the site navigation at the top of the page displays all pages in the application.') async def verify_site_nav(browser: BrowserContext): page = await browser.get_current_page() try: pass except Exception as e: pass #-----------------------------------Test Case 3----------------------------------- @controller.action('Verify that the user can bulk change the owner status using the Bulk modify button.') async def bulk_modify_owner_status(browser: BrowserContext): page = await browser.get_current_page() try: pass except Exception as e: pass # Function to integrate BrowserUse and Playwright for browser automation and execute defined test cases async def executeTestCases(): # Define the prompt/task for the AI agent task = f""" **AI Agent Task: UI Testing Automation** **Objective: Execute defined test cases on the application and summarize the results.** --- 1. Execute each test case exactly once. No retries or reattempts at all. - Check if the actual outcomes match the expected results, indicating successful execution. - **Test Case 1** {{ "name": "Login Functionality Test", "description": "Verify that the user can log in to the application using valid credentials.", "steps": [ "- Navigate to the login URL {app_url}.", "- Click 'Log in with Email ID'.", "- Select the account {login_account}.", "- Wait for Authenticator approval (if prompted).", "- Original login page re-appears. Wait for about 10 seconds.", "- Verify that the dashboard loads successfully by checking the presence of a 'Sign out' button." ], "expected_result": "Dashboard loads successfully and the 'Sign out' button is visible." }} - **Test Case 2** {{ "name": "Navigation Display Test", "description": "Verify that the site navigation at the top of the page displays all pages in the application.", "steps": [ "- Observe the navigation bar at the top of the page." ], "expected_result": "All pages in the application are listed in the navigation bar." }} - **Test Case 3** {{ "name": "Bulk Modify Owner Status Test", "description": "Verify that the user can bulk change the owner status using the 'Bulk modify' button.", "steps": [ "- Select multiple rows in the queue.", "- Click the 'Bulk modify' button.", "- Change the owner status." ], "expected_result": "The owner status of the selected rows is updated." }} 2. Do not stop until you have covered each test case exactly once. 3. If any test case fails, **log an error and move on to the next test case** instead of retrying. - No retries or reattempts should be performed for any test case at all. - Continue testing with the next test case to ensure all scenarios are evaluated. --- **Key Requirements & Error Handling** - Ensure the dashboard actually loads before executing the test cases. - Ensure thorough coverage of all the test cases provided. - Validate test outputs against expected results. - Handle errors gracefully and log them for review. - No retries or reattempts should be performed for any test case at all. - Maintain security and confidentiality of any sensitive information. """ # Initialize the AzureChatOpenAI language model with the provided credentials llm = AzureChatOpenAI( model_name=azure_openai_model, openai_api_key=azure_openai_api_key, azure_endpoint=azure_openai_endpoint, deployment_name=azure_openai_model, api_version=azure_openai_api_version ) #print(llm.invoke(task)) try: # Specify the path to our Chrome executable chrome_path = r'C:\Program Files\Google\Chrome Beta\Application\chrome.exe' user_data_dir = r'C:\Users\ak\Downloads\ChromeUserData' chrome_debug_port = 9222 # Connect to our existing Chrome installation and start Chrome in debugging mode subprocess.Popen([chrome_path, f'--remote-debugging-port={chrome_debug_port}', f'--user-data-dir={user_data_dir}']) time.sleep(5) # Wait for Chrome to start try: response = requests.get(f'http://localhost:{chrome_debug_port}/json/version') if response.status_code == 200: print("Chrome is running and accessible.") else: print(f"Unexpected status code: {response.status_code}") except Exception as e: print(f"Failed to connect to Chrome: {e}") # Connect to the browser browser = Browser( config=BrowserConfig( chrome_instance_path=chrome_path, remote_debugging_port=chrome_debug_port ) ) # Initialize the BrowserUse Agent with the defined task, language model, browser, and controller agent = Agent(task=task, llm=llm, browser=browser, controller=controller) # Run the agent asynchronously and capture the run print("Starting BrowserUse agent run...") history = await agent.run() print("BrowserUse agent run completed.") # Save entire history to a file history.save_to_file("agentResults.json") # Extract and print the final result from the agent's run history result = history.final_result() if result: # Convert to JSON string if not already if not isinstance(result, str): result = json.dumps(result) print(f"Result: {result}") # Validate and parse the JSON result using Pydantic parsed_result: TestCasesSummary = TestCasesSummary.model_validate_json(result) report_lines = ["## Test Case Results Summary"] # Iterate over each test-case to format and print for test_case in parsed_result.test_cases: report_lines.append(f"\n### Test Case Title: {test_case.title}") report_lines.append("- **Steps Executed:**") for step in test_case.steps: report_lines.append(f" - {step}") report_lines.append(f"- **Expected Result:** \n - {test_case.expected_result}") report_lines.append(f"- **Actual Outcome Status:** \n - {test_case.actual_outcome_status}") report_lines.append("- **Actual Outcome Details:**") for detail in test_case.actual_outcome_details: report_lines.append(f" - {detail}") # Print the formatted report print("\n".join(report_lines)) # Save the formatted report to a text file filename = "test_case_results.txt" with open(f"{filename}", "w") as file: file.write("\n".join(report_lines)) print(f"Results saved to {filename}") else: print('No results to display. The agent did not produce any output.') # Close the browser instance await browser.close() except Exception as e: print(f"Failed to connect to the browser or execute the task: {e}") if __name__ == "__main__": asyncio.run(executeTestCases()) ``` ### Version 0.1.41 ### LLM Model Other (specify in description) ### Operating System Windows 11 64-bit ### Relevant Log Output ```shell ```
This issue appears to be discussing a feature request or bug report related to the repository. Based on the content, it seems to be still under discussion. The issue was opened by rshdeka and has received 0 comments.