UI-TARS Desktop & Agent TARS: How to Install and Set Up (2026 Guide)

🟡 Beginner–Intermediate 🖥️ Type: AI Computer Agent / GUI Automation 💸 Free & Open Source (Apache-2.0) ⭐ 29.4k GitHub Stars

What is UI-TARS Desktop & Agent TARS?

UI-TARS Desktop and Agent TARS are two free, open-source AI agent tools made by ByteDance (the company behind TikTok) that let you control your computer using plain English instructions. You type what you want to do, and the AI takes over — it sees your screen, moves the mouse, clicks buttons, types text, and completes tasks on your behalf.

Think of it like hiring a virtual assistant who can actually see your screen and use any app on your computer. You say “Search Google for the latest AI news, open the first three results, and summarise each one” — and it does exactly that, step by step, by taking screenshots and using your mouse and keyboard just like a human would.

There are two products in this family, and you can use either or both:

UI-TARS Desktop — a downloadable desktop app (like any normal program) that gives the AI full control of your computer and browser. Best for beginners. Download and run, no coding needed for the basic version.
Agent TARS CLI — a command-line tool aimed at more technical users. One command to install, powerful multimodal agent with a built-in web UI, file handling, shell command execution, and MCP tool integration. Ideal for developers or power users.

Both are built on the UI-TARS model — a vision-language AI that “sees” your screen visually (through screenshots) rather than by reading code. This means it can automate any app or website, even ones that have no public API or automation support.

Who is it for?

People who want to automate repetitive computer tasks — copy-pasting data between apps, filling forms, sorting files, searching the web — without writing any code
Content creators and marketers who want to automate research, screenshot-gathering, or repetitive browser tasks
Business users and office workers who spend time on the same workflows every day and want an AI to handle them
Developers and AI builders who want a powerful open-source agent framework to build or experiment with GUI automation
Researchers interested in how AI agents interact with computer interfaces — UI-TARS is a state-of-the-art research model as well as a practical tool
Anyone who’s tried Operator (OpenAI) or Claude Computer Use and wants a free, self-hosted, open-source alternative
Privacy-conscious users who want an AI that runs locally on their machine without sending screen data to a cloud service

What makes it special?

It sees your screen like a human does — doesn’t use code inspection or browser APIs. It takes screenshots and understands what’s visible, so it works on any app, any website, any software — even proprietary tools with no automation support
Made by ByteDance, one of the world’s top AI labs — the UI-TARS model is a serious research project, not a hobby tool. UI-TARS-2 (September 2025) reached roughly 60% of human-level performance in complex game environments — a meaningful benchmark for real-world automation
Two ways to use it — a beginner-friendly desktop app (UI-TARS Desktop) and a powerful command-line tool (Agent TARS CLI), so you can start simple and level up as you need more
Works on Windows and Mac — the desktop app is cross-platform with native installers for both operating systems
Bring your own AI model — connects to Claude, GPT-4, the UI-TARS model on Hugging Face, or any compatible model. You choose what AI brain runs it
Agent TARS CLI is one command away — no cloning, no building. Just npx @agent-tars/cli@latest and it launches with a full web UI in your browser
MCP tool integration — Agent TARS supports MCP (Model Context Protocol) servers so you can connect it to external services like file systems, databases, and APIs
Built-in sandbox mode — Agent TARS CLI v0.3.0 includes an isolated sandbox environment so the agent can execute code safely without touching your main system
Real-time visual feedback — as it works, you can watch it take screenshots and see exactly what it’s doing and why
29,400+ GitHub stars and actively maintained — one of the fastest-growing open-source agent projects, with regular major releases

Requirements before you start

For UI-TARS Desktop (the easy desktop app)

Windows 10/11 or macOS 12+ — the app has native installers for both
4 GB RAM minimum (8 GB recommended)
An AI model API key — you need at least one of these to power the agent’s brain:
- Anthropic Claude — get a key at console.anthropic.com
- OpenAI GPT — get a key at platform.openai.com
- UI-TARS model on Hugging Face — the purpose-built model, free to self-host (requires a GPU)
A web browser installed — Chrome is best for browser automation tasks

For Agent TARS CLI (the command-line tool)

Node.js 22 or higher — download here. Check your version with node --version
A terminal / command prompt — Mac: Terminal. Windows: PowerShell or Command Prompt
An AI model API key (same options as above)
500 MB free disk space

💡 Which one should I use? If you just want to try it out with no technical setup, use UI-TARS Desktop — download the installer and run it like any normal app. If you’re comfortable with a terminal and want a more powerful, scriptable tool, use Agent TARS CLI.

Step-by-step setup

🖥️ Part 1 — UI-TARS Desktop (easiest, no terminal needed)

Step 1 — Download the installer

Go to the GitHub releases page: github.com/bytedance/UI-TARS-desktop/releases

Under the latest release, download the file for your operating system:

Windows: download the .exe installer
Mac: download the .dmg file

Alternatively, if you have Homebrew on Mac, you can install it with one command:

brew install --cask ui-tars

Step 2 — Install the app

Windows: Double-click the downloaded .exe file and follow the installation wizard. If Windows shows a security warning, click “More info” then “Run anyway” — this is normal for new open-source apps.

Mac: Open the .dmg file and drag the UI-TARS icon into your Applications folder. Then open it from Applications.

Step 3 — Open UI-TARS Desktop and go to Settings

Launch the app. On first run, it will ask you to configure your AI model. Click the Settings (gear) icon.

Step 4 — Configure your AI model

In Settings, fill in the following fields:

Language: en
VLM Provider: Choose your AI provider:
- Select Anthropic for Claude
- Select OpenAI for GPT models
- Select Hugging Face for UI-TARS-1.5 if you’re self-hosting the purpose-built model
VLM API Key: Paste your API key here
VLM Model Name: Enter the model name (e.g. claude-opus-4-5 for Claude, or gpt-4o for OpenAI)

Click Save.

Step 5 — Give the app screen access permissions

UI-TARS needs permission to take screenshots and control your mouse and keyboard.

Mac: Go to System Settings → Privacy & Security → Screen Recording. Enable UI-TARS Desktop. Also enable it under Accessibility. Restart the app after granting permissions.

Windows: The app will prompt you for the required permissions when you first run a task. Click Allow when asked.

Step 6 — Give your first instruction

In the main chat box, type what you want the agent to do. Be specific. For example:

Open Chrome, go to BBC News, find the top headline, and tell me what it says.

Open Notepad, write a short to-do list for today, and save it to the Desktop as today.txt

Go to YouTube, search for "open source AI tools 2025", and open the first result.

Press Enter. The agent will start working. You’ll see it take screenshots and execute steps in real time. You can watch everything it does.

Step 7 — Review and stop the agent

Watch the agent work. At any point, you can click Stop to pause or cancel the task. Once it’s done, it will show you a summary of what it did.

⚠️ Safety tip: Don’t run the agent on tasks involving sensitive accounts, passwords, or financial sites until you’re comfortable with how it behaves. Start with simple, low-risk tasks first.

⌨️ Part 2 — Agent TARS CLI (for developers and power users)

Step 1 — Check your Node.js version

node --version

You need v22.x.x or higher. If not, download Node.js 22+ from nodejs.org.

Step 2 — Launch Agent TARS (no install needed with npx)

The easiest way is to run it directly without installing anything permanently:

npx @agent-tars/cli@latest

This downloads and runs Agent TARS in one step. A web UI will open in your browser automatically.

Or, to install it globally so you can run it any time with just agent-tars:

npm install @agent-tars/cli@latest -g

Then run it with:

agent-tars

Step 3 — Open the web UI

Agent TARS automatically opens a browser window at http://localhost:8888 (or similar). This is the chat interface where you give it instructions.

Step 4 — Configure your API key

In the web UI, click Settings and enter your API key and model provider. You can also pass these directly when launching:

agent-tars --model claude-sonnet-4-6 --apiKey your_anthropic_key

Or set them as environment variables so you don’t have to type them each time:

# Mac / Linux
export ANTHROPIC_API_KEY=your_key_here

# Windows PowerShell
$env:ANTHROPIC_API_KEY="your_key_here"

Step 5 — Give Agent TARS a task

In the web UI chat box, type your task. Agent TARS can handle more complex multi-step workflows than the desktop app. For example:

Search the web for the 5 most popular open-source AI tools this week, summarise each one in a bullet point, and save the result to a file called ai-tools.md

It will think through the steps, execute them one by one, and show you the Event Stream (a real-time log of everything it’s doing and why) as it works.

Step 6 — (Optional) Enable Sandbox mode

For tasks that involve running code or shell commands, use the isolated sandbox so the agent can’t accidentally modify your system:

agent-tars --sandbox

This runs the agent inside a safe container. Highly recommended if you’re asking it to execute scripts or terminal commands.

Common errors and fixes

Error	What it means	How to fix it
App opens but agent does nothing when given a task	API key is missing, wrong, or the model name is incorrect	Go to Settings and double-check your API key and model name. Make sure you selected the correct VLM Provider to match your key
“Screen recording permission denied” on Mac	macOS has not given the app permission to see your screen	Go to System Settings → Privacy & Security → Screen Recording. Add UI-TARS Desktop and enable it. Also check Accessibility. Fully quit and restart the app
Windows shows “Windows protected your PC” warning	Windows SmartScreen blocks unsigned apps from new publishers	Click “More info” then “Run anyway.” This is standard for new open-source installers that haven’t yet built a reputation with Microsoft’s system
`node: command not found` (Agent TARS CLI)	Node.js is not installed or the wrong version	Download Node.js 22+ from nodejs.org. Run `node --version` after installing to confirm
Agent gets stuck in a loop or repeats the same action	The task instruction is too vague or the page state isn’t changing as expected	Click Stop. Rephrase your instruction to be more specific. Break large tasks into smaller, clearer steps
Agent clicks the wrong element on screen	The model’s visual understanding of your screen layout needs more context	Add more detail to your instruction — describe exactly where on the screen the button or element is. Use the higher-capability model (e.g. Claude Opus instead of Haiku) for better visual accuracy
`npx @agent-tars/cli` fails or hangs	Node.js is below version 22, or npm cache issue	Run `node --version` — must be v22+. Also try clearing npm cache: `npm cache clean --force` then retry
High API costs accumulating quickly	Each screenshot sent to the model costs tokens; complex or long tasks add up	Use a smaller, cheaper model (like GPT-4o-mini or Claude Haiku) for simple tasks. Save larger models for tasks that need high visual accuracy. Set API spending limits on your provider’s dashboard

Free vs Paid comparison

UI-TARS Desktop and Agent TARS are completely free and open source. Your only costs are the AI model API calls. Here’s a full breakdown:

Feature	UI-TARS / Agent TARS (Free)	Paid Alternatives (e.g. OpenAI Operator)
Software cost	$0 — open source	$20–$200+/month
AI model cost	Pay-per-use API (or self-host for $0)	Included in subscription
Works on any app/software	✅ Yes — vision-based, no API required for target app	✅ Yes (Operator) / ⚠️ Browser-only for most
Runs locally / private	✅ Yes — runs on your machine	❌ Cloud-based
Choose your own AI model	✅ Claude, GPT, UI-TARS, and more	❌ Locked to one provider
Self-host the model for full privacy	✅ Yes — run UI-TARS model locally	❌ Not possible
Desktop app (no terminal)	✅ Yes — UI-TARS Desktop	✅ Web-based UI
MCP tool integration	✅ Yes (Agent TARS CLI)	Limited / varies
Sandbox / isolated execution	✅ Yes (Agent TARS CLI v0.3+)	Sometimes
Actively maintained & updated	✅ Frequent releases from ByteDance	✅ Yes

Bottom line: UI-TARS is one of the only free tools that can automate any app on your computer, not just the browser. For light personal use — with a cheap model like Claude Haiku or GPT-4o-mini — you can run full computer automation for pennies per session. Self-hosting the UI-TARS model brings it to genuinely zero cost.

Alternatives — 3 similar tools

1. Open Interpreter

A free, open-source AI agent that runs on your computer and can execute code, manage files, browse the web, and control applications through a terminal interface. Different approach — it relies more on code execution than visual screen understanding. Very powerful for developers who want a coding-focused AI agent. MIT licensed.

🔗 github.com/OpenInterpreter/open-interpreter

2. Claude Computer Use (by Anthropic)

Anthropic’s own computer-use capability, available via the Claude API. Like UI-TARS, it sees the screen and controls the mouse and keyboard. Requires API access and is pay-per-use, but is considered among the most capable and careful computer-use AI available. No free tier for heavy use — but worth knowing as a benchmark for quality comparison.

🔗 docs.anthropic.com/en/docs/build-with-claude/computer-use

3. AutoHotkey + AI

AutoHotkey is a free, open-source scripting language for Windows that automates mouse clicks, keyboard input, and app interactions. It’s been around for decades and is extremely reliable. Unlike UI-TARS it requires writing scripts manually (no natural language), but it runs entirely locally with no AI API costs and no latency. Combine it with an LLM to generate scripts and you get a surprisingly powerful free automation stack.

🔗 autohotkey.com

🚀 Want more free AI tools like this?

We find, test, and write plain-English setup guides for the best free and open-source AI tools — so you don’t have to dig through GitHub yourself.Browse Free AI Tools at globalaiforce.com/shop →

📸 Follow us for daily AI tool tips and tutorials: instagram.com/globalaiforce