The billing model is changing
Starting June 1, 2026, GitHub Copilot is moving from Premium Request Units (PRU) to Token-Based Billing (TBB). This is not a minor pricing adjustment β it fundamentally changes how organizations pay for AI-assisted development and requires engineering leaders to rethink their AI governance strategy.
The shift means customers will start paying their actual compute cost per use, instead of the abstracted PRU model that has been hiding the real token consumption behind multipliers.
If you are managing GitHub Copilot for a team or enterprise, this is the most important change since Copilot launched.
What Premium Request Units were
PRUs abstracted away the complexity of token-based pricing behind a simple multiplier system:
- Powerful reasoning models like Claude Opus 4.5/4.6 had a 3x multiplier β every chat turn cost 3 PRUs
- Smaller models like GPT-5.4 mini had a 0.33x multiplier β three turns for one PRU
- Base models like GPT-4o and GPT-4.1 were included at 0x PRUs β effectively free
- Auto mode gave a 10% discount by letting GitHub select the optimal model per prompt
Monthly PRU allotments were included in each plan tier, with configurable overage budgets at $0.04 per PRU for groups of users.
The problem: PRUs were hiding massive variations in actual compute cost. A single PRU could represent anywhere from thousands to millions of tokens depending on the model, the conversation length, and the context window size. Users who discovered they could run Claude Opus in fast mode (30x multiplier) could drain an entire teamβs budget in a single day.
Why GitHub is making this change
The reason is straightforward: AI infrastructure costs are growing with massive peaks, and the PRU abstraction layer was absorbing losses that are no longer sustainable.
All model vendors β Anthropic, OpenAI, Google β bill by tokens. GitHub was the intermediary translating token costs into PRUs, and that translation was increasingly unfavorable as:
- Models got more expensive (Opus 4.7 reasoning costs are significant)
- Context windows got larger (more input tokens per prompt)
- Agentic workflows multiplied token consumption (Copilot Cloud Agent running multi-step plans)
- Engineers started using more expensive models by default without understanding the cost implications
Token-based billing aligns GitHubβs pricing with the actual underlying compute costs.
What tokens actually cost
A token is the fundamental unit of input and output for large language models β roughly a partial word or subword. Every token sent to and generated by a model consumes GPU, CPU, RAM, and networking resources in a data center.
The key pricing factors:
- Cost is calculated per 1 million tokens
- Input tokens are cheaper than output tokens β the model does not need to predict input, only output
- Cached tokens cost less β repeated context from earlier in a conversation can be cached
- Reasoning effort multiplies output tokens β Low/Medium/High/Extra High settings directly impact token count
Here is the reality check: an average chat session consumes 1β3 million tokens, depending on the work and conversation length.
Using the 80/20 rule (80% input, 20% output) for a 1 million token session:
Claude Sonnet 4.6 (moderate cost)
- 800K input tokens + 200K output tokens
- Input: ~$2.40 + Output:
$3.00 = **$5.40 per session**
Claude Opus 4.7 (high cost)
- 800K input tokens + 200K output tokens
- Input: ~$12.00 + Output:
$60.00 = **$72.00 per session**
That is a 13x cost difference for the same conversation length. And engineers who default to Opus for every task β including simple code completion β will generate bills that make finance teams very uncomfortable.
The model selection problem
This is where the change hits engineering culture. Under PRUs, model selection was somewhat abstracted. Engineers knew Opus cost more PRUs, but the actual dollar impact was fuzzy.
Under token-based billing, every model choice has a direct financial impact:
| Decision | Cost impact |
|---|---|
| Using Opus instead of Sonnet for a simple refactor | 10-13x more expensive |
| Reasoning effort High instead of Low | 3-5x more output tokens |
| Sending entire repository as context | Input tokens explode |
| Long conversation instead of fresh session | Accumulated context grows every turn |
| Fast mode Opus 4.6 (30x PRU equivalent) | Extreme token consumption |
The smart pattern that experienced engineers already follow:
- Plan with an expensive reasoning model (Opus 4.7) β use it for architecture, complex logic, design decisions
- Implement with a cheaper model (Haiku 0.33x, GPT-5 mini) β converting an existing plan into code does not require deep reasoning
- Review with a mid-tier model (Sonnet) β catch issues without premium pricing
This plan-implement separation is the single most important cost optimization pattern for token-based billing.
Editor choice now matters financially
Under token-based billing, your editorβs behavior becomes a cost factor. Different editors send different amounts of context to the model:
- Repository-level context: Some editors send your entire repository for analysis β massive input token counts
- File-level context: Others send only the open file and immediate dependencies β much leaner
- Conversation history: Long chat sessions accumulate context with every turn β costs compound
The choice between VS Code, Visual Studio, JetBrains IDEs, GitHub CLI, and third-party integrations now has direct billing implications because each handles context differently.
What to do before June 1
1. Understand your current usage
Before the switch, establish a baseline. Use GitHubβs REST API for PRU usage reporting:
# Get PRU usage report for your organization
curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $GITHUB_TOKEN" \
"https://api.github.com/orgs/{org}/settings/billing/usage?apiVersion=2026-03-10"Identify your heaviest users β the ones burning through PRUs by the 10th of the month. These are the engineers who will generate the largest token bills.
2. Configure user-level budgets
GitHub is releasing user-level budgets. Configure them immediately:
- Start with each userβs monthly included budget equivalent
- Set 75% consumption alerts to catch runaway usage early
- Create a process for requesting overage β do not auto-approve unlimited spending
3. Educate engineers on model selection
This is not about restricting access. It is about informed choice:
- Simple code completion: Use base models (free/cheap) or auto mode
- Refactoring and implementation: Sonnet-tier models
- Architecture and complex reasoning: Opus-tier models, but only when the task warrants it
- Never use Opus for everything β the engineer who heard βalways use the best modelβ is about to cost you a fortune
4. Monitor with tooling
The AI Engineering Fluency VS Code extension analyzes local Copilot interaction files across:
- Visual Studio Code (and Codium derivatives)
- Visual Studio
- GitHub Copilot CLI
- OpenCode, Crush, and other CLIs using your Copilot subscription
It calculates actual token costs based on real usage and supports opt-in upload to cloud storage for team-level analysis.
5. Establish a FinOps practice for AI
Token-based billing means AI compute joins your FinOps practice. Treat it the same way you treat cloud spend:
- Visibility β dashboards showing token consumption by user, team, and model
- Optimization β identifying waste and coaching efficient usage patterns
- Governance β budgets, alerts, and approval workflows for overage
The bigger picture
This change is not unique to GitHub. The entire AI tooling ecosystem is moving toward transparent, token-based pricing as the abstraction layers prove unsustainable:
- Model providers already bill by tokens
- Cloud AI services (Azure OpenAI, Bedrock, Vertex AI) bill by tokens
- GitHub Copilot is now joining that model
- Every AI IDE will eventually follow
Organizations that build AI cost governance now β monitoring, budgets, education, model selection guidelines β will be prepared not just for this change, but for the broader shift toward pay-per-token AI infrastructure.
The companies that ignore it will discover in Julyβs invoice that an unmanaged transition to token-based billing can be very expensive.
Key takeaways
- June 1, 2026: PRU billing ends, token-based billing begins
- Model choice is now a financial decision β Opus vs Sonnet can be 13x cost difference
- Plan with expensive models, implement with cheap ones β the most important optimization pattern
- Editor choice matters β different editors send different amounts of context
- Configure user-level budgets immediately β set alerts at 75% consumption
- Establish AI FinOps β dashboards, governance, education
- Educate engineers before June 1 β informed choice, not restriction