🟡 Beginner–Intermediate 🖥️ Type: AI Computer Agent / GUI Automation 💸 Free & Open Source (Apache-2.0) ⭐ 29.4k GitHub Stars
What is UI-TARS Desktop & Agent TARS?
UI-TARS Desktop and Agent TARS are two free, open-source AI agent tools made by ByteDance (the company behind TikTok) that let you control your computer using plain English instructions. You type what you want to do, and the AI takes over — it sees your screen, moves the mouse, clicks buttons, types text, and completes tasks on your behalf.
Think of it like hiring a virtual assistant who can actually see your screen and use any app on your computer. You say “Search Google for the latest AI news, open the first three results, and summarise each one” — and it does exactly that, step by step, by taking screenshots and using your mouse and keyboard just like a human would.
There are two products in this family, and you can use either or both:
- UI-TARS Desktop — a downloadable desktop app (like any normal program) that gives the AI full control of your computer and browser. Best for beginners. Download and run, no coding needed for the basic version.
- Agent TARS CLI — a command-line tool aimed at more technical users. One command to install, powerful multimodal agent with a built-in web UI, file handling, shell command execution, and MCP tool integration. Ideal for developers or power users.
Both are built on the UI-TARS model — a vision-language AI that “sees” your screen visually (through screenshots) rather than by reading code. This means it can automate any app or website, even ones that have no public API or automation support.
Who is it for?
- People who want to automate repetitive computer tasks — copy-pasting data between apps, filling forms, sorting files, searching the web — without writing any code
- Content creators and marketers who want to automate research, screenshot-gathering, or repetitive browser tasks
- Business users and office workers who spend time on the same workflows every day and want an AI to handle them
- Developers and AI builders who want a powerful open-source agent framework to build or experiment with GUI automation
- Researchers interested in how AI agents interact with computer interfaces — UI-TARS is a state-of-the-art research model as well as a practical tool
- Anyone who’s tried Operator (OpenAI) or Claude Computer Use and wants a free, self-hosted, open-source alternative
- Privacy-conscious users who want an AI that runs locally on their machine without sending screen data to a cloud service
What makes it special?
- It sees your screen like a human does — doesn’t use code inspection or browser APIs. It takes screenshots and understands what’s visible, so it works on any app, any website, any software — even proprietary tools with no automation support
- Made by ByteDance, one of the world’s top AI labs — the UI-TARS model is a serious research project, not a hobby tool. UI-TARS-2 (September 2025) reached roughly 60% of human-level performance in complex game environments — a meaningful benchmark for real-world automation
- Two ways to use it — a beginner-friendly desktop app (UI-TARS Desktop) and a powerful command-line tool (Agent TARS CLI), so you can start simple and level up as you need more
- Works on Windows and Mac — the desktop app is cross-platform with native installers for both operating systems
- Bring your own AI model — connects to Claude, GPT-4, the UI-TARS model on Hugging Face, or any compatible model. You choose what AI brain runs it
- Agent TARS CLI is one command away — no cloning, no building. Just
npx @agent-tars/cli@latestand it launches with a full web UI in your browser - MCP tool integration — Agent TARS supports MCP (Model Context Protocol) servers so you can connect it to external services like file systems, databases, and APIs
- Built-in sandbox mode — Agent TARS CLI v0.3.0 includes an isolated sandbox environment so the agent can execute code safely without touching your main system
- Real-time visual feedback — as it works, you can watch it take screenshots and see exactly what it’s doing and why
- 29,400+ GitHub stars and actively maintained — one of the fastest-growing open-source agent projects, with regular major releases
Requirements before you start
For UI-TARS Desktop (the easy desktop app)
- Windows 10/11 or macOS 12+ — the app has native installers for both
- 4 GB RAM minimum (8 GB recommended)
- An AI model API key — you need at least one of these to power the agent’s brain:
- Anthropic Claude — get a key at console.anthropic.com
- OpenAI GPT — get a key at platform.openai.com
- UI-TARS model on Hugging Face — the purpose-built model, free to self-host (requires a GPU)
- A web browser installed — Chrome is best for browser automation tasks
For Agent TARS CLI (the command-line tool)
- Node.js 22 or higher — download here. Check your version with
node --version - A terminal / command prompt — Mac: Terminal. Windows: PowerShell or Command Prompt
- An AI model API key (same options as above)
- 500 MB free disk space
💡 Which one should I use? If you just want to try it out with no technical setup, use UI-TARS Desktop — download the installer and run it like any normal app. If you’re comfortable with a terminal and want a more powerful, scriptable tool, use Agent TARS CLI.
Step-by-step setup
🖥️ Part 1 — UI-TARS Desktop (easiest, no terminal needed)
Step 1 — Download the installer
Go to the GitHub releases page: github.com/bytedance/UI-TARS-desktop/releases
Under the latest release, download the file for your operating system:
- Windows: download the
.exeinstaller - Mac: download the
.dmgfile
Alternatively, if you have Homebrew on Mac, you can install it with one command:
brew install --cask ui-tars
Step 2 — Install the app
Windows: Double-click the downloaded .exe file and follow the installation wizard. If Windows shows a security warning, click “More info” then “Run anyway” — this is normal for new open-source apps.
Mac: Open the .dmg file and drag the UI-TARS icon into your Applications folder. Then open it from Applications.
Step 3 — Open UI-TARS Desktop and go to Settings
Launch the app. On first run, it will ask you to configure your AI model. Click the Settings (gear) icon.
Step 4 — Configure your AI model
In Settings, fill in the following fields:
- Language:
en - VLM Provider: Choose your AI provider:
- Select Anthropic for Claude
- Select OpenAI for GPT models
- Select Hugging Face for UI-TARS-1.5 if you’re self-hosting the purpose-built model
- VLM API Key: Paste your API key here
- VLM Model Name: Enter the model name (e.g.
claude-opus-4-5for Claude, orgpt-4ofor OpenAI)
Click Save.
Step 5 — Give the app screen access permissions
UI-TARS needs permission to take screenshots and control your mouse and keyboard.
Mac: Go to System Settings → Privacy & Security → Screen Recording. Enable UI-TARS Desktop. Also enable it under Accessibility. Restart the app after granting permissions.
Windows: The app will prompt you for the required permissions when you first run a task. Click Allow when asked.
Step 6 — Give your first instruction
In the main chat box, type what you want the agent to do. Be specific. For example:
Open Chrome, go to BBC News, find the top headline, and tell me what it says.
Open Notepad, write a short to-do list for today, and save it to the Desktop as today.txt
Go to YouTube, search for "open source AI tools 2025", and open the first result.
Press Enter. The agent will start working. You’ll see it take screenshots and execute steps in real time. You can watch everything it does.
Step 7 — Review and stop the agent
Watch the agent work. At any point, you can click Stop to pause or cancel the task. Once it’s done, it will show you a summary of what it did.
⚠️ Safety tip: Don’t run the agent on tasks involving sensitive accounts, passwords, or financial sites until you’re comfortable with how it behaves. Start with simple, low-risk tasks first.
⌨️ Part 2 — Agent TARS CLI (for developers and power users)
Step 1 — Check your Node.js version
node --version
You need v22.x.x or higher. If not, download Node.js 22+ from nodejs.org.
Step 2 — Launch Agent TARS (no install needed with npx)
The easiest way is to run it directly without installing anything permanently:
npx @agent-tars/cli@latest
This downloads and runs Agent TARS in one step. A web UI will open in your browser automatically.
Or, to install it globally so you can run it any time with just agent-tars:
npm install @agent-tars/cli@latest -g
Then run it with:
agent-tars
Step 3 — Open the web UI
Agent TARS automatically opens a browser window at http://localhost:8888 (or similar). This is the chat interface where you give it instructions.
Step 4 — Configure your API key
In the web UI, click Settings and enter your API key and model provider. You can also pass these directly when launching:
agent-tars --model claude-sonnet-4-6 --apiKey your_anthropic_key
Or set them as environment variables so you don’t have to type them each time:
# Mac / Linux
export ANTHROPIC_API_KEY=your_key_here
# Windows PowerShell
$env:ANTHROPIC_API_KEY="your_key_here"
Step 5 — Give Agent TARS a task
In the web UI chat box, type your task. Agent TARS can handle more complex multi-step workflows than the desktop app. For example:
Search the web for the 5 most popular open-source AI tools this week, summarise each one in a bullet point, and save the result to a file called ai-tools.md
It will think through the steps, execute them one by one, and show you the Event Stream (a real-time log of everything it’s doing and why) as it works.
Step 6 — (Optional) Enable Sandbox mode
For tasks that involve running code or shell commands, use the isolated sandbox so the agent can’t accidentally modify your system:
agent-tars --sandbox
This runs the agent inside a safe container. Highly recommended if you’re asking it to execute scripts or terminal commands.
Common errors and fixes
| Error | What it means | How to fix it |
|---|---|---|
| App opens but agent does nothing when given a task | API key is missing, wrong, or the model name is incorrect | Go to Settings and double-check your API key and model name. Make sure you selected the correct VLM Provider to match your key |
| “Screen recording permission denied” on Mac | macOS has not given the app permission to see your screen | Go to System Settings → Privacy & Security → Screen Recording. Add UI-TARS Desktop and enable it. Also check Accessibility. Fully quit and restart the app |
| Windows shows “Windows protected your PC” warning | Windows SmartScreen blocks unsigned apps from new publishers | Click “More info” then “Run anyway.” This is standard for new open-source installers that haven’t yet built a reputation with Microsoft’s system |
node: command not found (Agent TARS CLI) | Node.js is not installed or the wrong version | Download Node.js 22+ from nodejs.org. Run node --version after installing to confirm |
| Agent gets stuck in a loop or repeats the same action | The task instruction is too vague or the page state isn’t changing as expected | Click Stop. Rephrase your instruction to be more specific. Break large tasks into smaller, clearer steps |
| Agent clicks the wrong element on screen | The model’s visual understanding of your screen layout needs more context | Add more detail to your instruction — describe exactly where on the screen the button or element is. Use the higher-capability model (e.g. Claude Opus instead of Haiku) for better visual accuracy |
npx @agent-tars/cli fails or hangs | Node.js is below version 22, or npm cache issue | Run node --version — must be v22+. Also try clearing npm cache: npm cache clean --force then retry |
| High API costs accumulating quickly | Each screenshot sent to the model costs tokens; complex or long tasks add up | Use a smaller, cheaper model (like GPT-4o-mini or Claude Haiku) for simple tasks. Save larger models for tasks that need high visual accuracy. Set API spending limits on your provider’s dashboard |
Free vs Paid comparison
UI-TARS Desktop and Agent TARS are completely free and open source. Your only costs are the AI model API calls. Here’s a full breakdown:
| Feature | UI-TARS / Agent TARS (Free) | Paid Alternatives (e.g. OpenAI Operator) |
|---|---|---|
| Software cost | $0 — open source | $20–$200+/month |
| AI model cost | Pay-per-use API (or self-host for $0) | Included in subscription |
| Works on any app/software | ✅ Yes — vision-based, no API required for target app | ✅ Yes (Operator) / ⚠️ Browser-only for most |
| Runs locally / private | ✅ Yes — runs on your machine | ❌ Cloud-based |
| Choose your own AI model | ✅ Claude, GPT, UI-TARS, and more | ❌ Locked to one provider |
| Self-host the model for full privacy | ✅ Yes — run UI-TARS model locally | ❌ Not possible |
| Desktop app (no terminal) | ✅ Yes — UI-TARS Desktop | ✅ Web-based UI |
| MCP tool integration | ✅ Yes (Agent TARS CLI) | Limited / varies |
| Sandbox / isolated execution | ✅ Yes (Agent TARS CLI v0.3+) | Sometimes |
| Actively maintained & updated | ✅ Frequent releases from ByteDance | ✅ Yes |
Bottom line: UI-TARS is one of the only free tools that can automate any app on your computer, not just the browser. For light personal use — with a cheap model like Claude Haiku or GPT-4o-mini — you can run full computer automation for pennies per session. Self-hosting the UI-TARS model brings it to genuinely zero cost.
Alternatives — 3 similar tools
1. Open Interpreter
A free, open-source AI agent that runs on your computer and can execute code, manage files, browse the web, and control applications through a terminal interface. Different approach — it relies more on code execution than visual screen understanding. Very powerful for developers who want a coding-focused AI agent. MIT licensed.
🔗 github.com/OpenInterpreter/open-interpreter
2. Claude Computer Use (by Anthropic)
Anthropic’s own computer-use capability, available via the Claude API. Like UI-TARS, it sees the screen and controls the mouse and keyboard. Requires API access and is pay-per-use, but is considered among the most capable and careful computer-use AI available. No free tier for heavy use — but worth knowing as a benchmark for quality comparison.
🔗 docs.anthropic.com/en/docs/build-with-claude/computer-use
3. AutoHotkey + AI
AutoHotkey is a free, open-source scripting language for Windows that automates mouse clicks, keyboard input, and app interactions. It’s been around for decades and is extremely reliable. Unlike UI-TARS it requires writing scripts manually (no natural language), but it runs entirely locally with no AI API costs and no latency. Combine it with an LLM to generate scripts and you get a surprisingly powerful free automation stack.
🚀 Want more free AI tools like this?
We find, test, and write plain-English setup guides for the best free and open-source AI tools — so you don’t have to dig through GitHub yourself.Browse Free AI Tools at globalaiforce.com/shop →
📸 Follow us for daily AI tool tips and tutorials: instagram.com/globalaiforce