The error message “Sorry, the response hit the length limit. Please rephrase your prompt.” is one of the most common frustrations with Microsoft Copilot (formerly Bing Chat). Here is why it happens and how to fix it.
Why This Happens
Microsoft Copilot has a token output limit per response. When your question requires a long answer — detailed code, comprehensive lists, long explanations — the model hits this ceiling and truncates the output.
This is not a bug. It is a deliberate limit to manage compute costs and response times.
Fix 1: Break Your Prompt into Smaller Parts
Instead of asking for everything at once:
Before (hits limit):
“Write a complete Python web application with authentication, database, API endpoints, tests, and deployment instructions”
After (works):
“Write the database models for a Python web app with user authentication”
Then follow up with:
“Now add the API endpoints for the models above”
Fix 2: Ask Copilot to Continue
When the response cuts off, simply type:
“Continue from where you left off”
Or:
“Continue”
Copilot will pick up where it stopped. You may need to do this 2-3 times for very long responses.
Fix 3: Start a New Conversation
Copilot conversations accumulate context. Long conversation histories eat into the token budget, leaving less room for responses:
- Click the New Topic button (broom icon)
- Re-ask your question in a fresh conversation
- You will get a longer response with the full token budget available
Fix 4: Request a Specific Format
Structured formats use fewer tokens:
“Give me a bullet-point list of the top 10 Kubernetes security practices” (shorter than paragraph form)
“Summarize in a table with columns: Tool, Purpose, License” (compact output)
“Give me just the code, no explanations” (eliminates commentary)
Fix 5: Set Explicit Length Constraints
Tell Copilot how long you want the response:
“In under 500 words, explain…”
“Give me a brief overview (3 paragraphs max) of…”
“List the top 5 most important…”
Fix 6: Switch Conversation Style
Microsoft Copilot offers different conversation styles:
- Creative — longer, more detailed responses
- Balanced — default, moderate length
- Precise — shorter, focused responses
Try Creative mode if you need longer outputs, or Precise if you want Copilot to stay concise and avoid hitting the limit.
Fix 7: Use Copilot in Different Contexts
The token limit varies by product:
| Product | Typical Limit | Notes |
|---|---|---|
| Copilot (free web) | ~4,000 tokens output | Most restrictive |
| Copilot Pro | ~8,000 tokens output | Longer responses |
| M365 Copilot | Varies by app | Word/Excel have different limits |
| Copilot in VS Code | ~8,000 tokens | Code-optimized |
| GitHub Copilot Chat | ~4,096 tokens | Context window matters |
If you are on the free tier, upgrading to Copilot Pro doubles your response length.
For Developers: GitHub Copilot Chat Limits
If you are hitting this in GitHub Copilot Chat in VS Code:
# Use /fix or /explain for targeted responses
/explain this function
/fix the error in this file
# Use @workspace for scoped questions
@workspace how is authentication implemented?
# Break large refactoring into steps
"Refactor the database layer only"
"Now refactor the API routes to use the new database layer"Related Posts
About the Author
I am Luca Berton, AI and Cloud Advisor. I help teams adopt AI tools productively. Book a consultation.