Skip to main content
πŸŽ“ Claude Code Masterclass Learn AI-assisted development on Udemy β€” plus the companion book on Leanpub & Amazon. Start Learning
GitHub Copilot Token-Based Billing replacing Premium Request Units
AI

GitHub Copilot Token-Based Billing: What Changes on June 1

GitHub Copilot moves from Premium Request Units to token-based billing on June 1, 2026. What this means for costs, model selection, and engineering teams.

LB
Luca Berton
Β· 7 min read

The billing model is changing

Starting June 1, 2026, GitHub Copilot is moving from Premium Request Units (PRU) to Token-Based Billing (TBB). This is not a minor pricing adjustment β€” it fundamentally changes how organizations pay for AI-assisted development and requires engineering leaders to rethink their AI governance strategy.

The shift means customers will start paying their actual compute cost per use, instead of the abstracted PRU model that has been hiding the real token consumption behind multipliers.

If you are managing GitHub Copilot for a team or enterprise, this is the most important change since Copilot launched.

What Premium Request Units were

PRUs abstracted away the complexity of token-based pricing behind a simple multiplier system:

  • Powerful reasoning models like Claude Opus 4.5/4.6 had a 3x multiplier β€” every chat turn cost 3 PRUs
  • Smaller models like GPT-5.4 mini had a 0.33x multiplier β€” three turns for one PRU
  • Base models like GPT-4o and GPT-4.1 were included at 0x PRUs β€” effectively free
  • Auto mode gave a 10% discount by letting GitHub select the optimal model per prompt

Monthly PRU allotments were included in each plan tier, with configurable overage budgets at $0.04 per PRU for groups of users.

The problem: PRUs were hiding massive variations in actual compute cost. A single PRU could represent anywhere from thousands to millions of tokens depending on the model, the conversation length, and the context window size. Users who discovered they could run Claude Opus in fast mode (30x multiplier) could drain an entire team’s budget in a single day.

Why GitHub is making this change

The reason is straightforward: AI infrastructure costs are growing with massive peaks, and the PRU abstraction layer was absorbing losses that are no longer sustainable.

All model vendors β€” Anthropic, OpenAI, Google β€” bill by tokens. GitHub was the intermediary translating token costs into PRUs, and that translation was increasingly unfavorable as:

  • Models got more expensive (Opus 4.7 reasoning costs are significant)
  • Context windows got larger (more input tokens per prompt)
  • Agentic workflows multiplied token consumption (Copilot Cloud Agent running multi-step plans)
  • Engineers started using more expensive models by default without understanding the cost implications

Token-based billing aligns GitHub’s pricing with the actual underlying compute costs.

What tokens actually cost

A token is the fundamental unit of input and output for large language models β€” roughly a partial word or subword. Every token sent to and generated by a model consumes GPU, CPU, RAM, and networking resources in a data center.

The key pricing factors:

  • Cost is calculated per 1 million tokens
  • Input tokens are cheaper than output tokens β€” the model does not need to predict input, only output
  • Cached tokens cost less β€” repeated context from earlier in a conversation can be cached
  • Reasoning effort multiplies output tokens β€” Low/Medium/High/Extra High settings directly impact token count

Here is the reality check: an average chat session consumes 1–3 million tokens, depending on the work and conversation length.

Using the 80/20 rule (80% input, 20% output) for a 1 million token session:

Claude Sonnet 4.6 (moderate cost)

  • 800K input tokens + 200K output tokens
  • Input: ~$2.40 + Output: $3.00 = **$5.40 per session**

Claude Opus 4.7 (high cost)

  • 800K input tokens + 200K output tokens
  • Input: ~$12.00 + Output: $60.00 = **$72.00 per session**

That is a 13x cost difference for the same conversation length. And engineers who default to Opus for every task β€” including simple code completion β€” will generate bills that make finance teams very uncomfortable.

The model selection problem

This is where the change hits engineering culture. Under PRUs, model selection was somewhat abstracted. Engineers knew Opus cost more PRUs, but the actual dollar impact was fuzzy.

Under token-based billing, every model choice has a direct financial impact:

DecisionCost impact
Using Opus instead of Sonnet for a simple refactor10-13x more expensive
Reasoning effort High instead of Low3-5x more output tokens
Sending entire repository as contextInput tokens explode
Long conversation instead of fresh sessionAccumulated context grows every turn
Fast mode Opus 4.6 (30x PRU equivalent)Extreme token consumption

The smart pattern that experienced engineers already follow:

  1. Plan with an expensive reasoning model (Opus 4.7) β€” use it for architecture, complex logic, design decisions
  2. Implement with a cheaper model (Haiku 0.33x, GPT-5 mini) β€” converting an existing plan into code does not require deep reasoning
  3. Review with a mid-tier model (Sonnet) β€” catch issues without premium pricing

This plan-implement separation is the single most important cost optimization pattern for token-based billing.

Editor choice now matters financially

Under token-based billing, your editor’s behavior becomes a cost factor. Different editors send different amounts of context to the model:

  • Repository-level context: Some editors send your entire repository for analysis β€” massive input token counts
  • File-level context: Others send only the open file and immediate dependencies β€” much leaner
  • Conversation history: Long chat sessions accumulate context with every turn β€” costs compound

The choice between VS Code, Visual Studio, JetBrains IDEs, GitHub CLI, and third-party integrations now has direct billing implications because each handles context differently.

What to do before June 1

1. Understand your current usage

Before the switch, establish a baseline. Use GitHub’s REST API for PRU usage reporting:

# Get PRU usage report for your organization
curl -L \
  -H "Accept: application/vnd.github+json" \
  -H "Authorization: Bearer $GITHUB_TOKEN" \
  "https://api.github.com/orgs/{org}/settings/billing/usage?apiVersion=2026-03-10"

Identify your heaviest users β€” the ones burning through PRUs by the 10th of the month. These are the engineers who will generate the largest token bills.

2. Configure user-level budgets

GitHub is releasing user-level budgets. Configure them immediately:

  • Start with each user’s monthly included budget equivalent
  • Set 75% consumption alerts to catch runaway usage early
  • Create a process for requesting overage β€” do not auto-approve unlimited spending

3. Educate engineers on model selection

This is not about restricting access. It is about informed choice:

  • Simple code completion: Use base models (free/cheap) or auto mode
  • Refactoring and implementation: Sonnet-tier models
  • Architecture and complex reasoning: Opus-tier models, but only when the task warrants it
  • Never use Opus for everything β€” the engineer who heard β€œalways use the best model” is about to cost you a fortune

4. Monitor with tooling

The AI Engineering Fluency VS Code extension analyzes local Copilot interaction files across:

  • Visual Studio Code (and Codium derivatives)
  • Visual Studio
  • GitHub Copilot CLI
  • OpenCode, Crush, and other CLIs using your Copilot subscription

It calculates actual token costs based on real usage and supports opt-in upload to cloud storage for team-level analysis.

5. Establish a FinOps practice for AI

Token-based billing means AI compute joins your FinOps practice. Treat it the same way you treat cloud spend:

  • Visibility β€” dashboards showing token consumption by user, team, and model
  • Optimization β€” identifying waste and coaching efficient usage patterns
  • Governance β€” budgets, alerts, and approval workflows for overage

The bigger picture

This change is not unique to GitHub. The entire AI tooling ecosystem is moving toward transparent, token-based pricing as the abstraction layers prove unsustainable:

  • Model providers already bill by tokens
  • Cloud AI services (Azure OpenAI, Bedrock, Vertex AI) bill by tokens
  • GitHub Copilot is now joining that model
  • Every AI IDE will eventually follow

Organizations that build AI cost governance now β€” monitoring, budgets, education, model selection guidelines β€” will be prepared not just for this change, but for the broader shift toward pay-per-token AI infrastructure.

The companies that ignore it will discover in July’s invoice that an unmanaged transition to token-based billing can be very expensive.

Key takeaways

  1. June 1, 2026: PRU billing ends, token-based billing begins
  2. Model choice is now a financial decision β€” Opus vs Sonnet can be 13x cost difference
  3. Plan with expensive models, implement with cheap ones β€” the most important optimization pattern
  4. Editor choice matters β€” different editors send different amounts of context
  5. Configure user-level budgets immediately β€” set alerts at 75% consumption
  6. Establish AI FinOps β€” dashboards, governance, education
  7. Educate engineers before June 1 β€” informed choice, not restriction

Free 30-min AI & Cloud consultation

Book Now