InstructLab is a revolutionary approach to model improvement that democratizes fine-tuning. Instead of requiring massive labeled datasets and GPU farms, you can teach models new skills through a simple YAML-based workflow. In this article, weβll explore the four-step InstructLab process and build a custom domain expert.
InstructLab (Instructlab Lab) is an open-source project that enables you to:
The magic lies in its generative approach: instead of manually labeling thousands of examples, InstructLab uses the base model itself to generate diverse, synthetic training data.
As detailed in Practical RHEL AI, the InstructLab workflow consists of:
Write capability statements and translate them to taxonomy seeds (Chapter 5):
mkdir -p ~/instructlab-project/taxonomy
cd ~/instructlab-projectDefine a skill by creating a YAML file:
# taxonomy/domain/healthcare/medical-coding.yaml
version: 2
metadata:
name: "Medical Diagnosis Assistant"
description: "Assists with ICD-10 medical coding and diagnosis mapping"
author: "enterprise-ai-team"
created: "2025-11-22"
task_description: >
The task is to map clinical descriptions to ICD-10 diagnostic codes.
seed_examples:
- question: "A 65-year-old patient presents with persistent headache, nausea, and visual disturbances for 3 days"
answer: "Possible diagnoses include migraine (G43), tension headache (G44), or intracranial hypertension (G93.2). Recommend neurological consultation."
- question: "Patient reports acute chest pain radiating to left arm with shortness of breath"
answer: "Urgent evaluation for acute coronary syndrome (I24) required. Perform ECG and troponin levels immediately."
- question: "Chronic low back pain with radiation to left leg following a fall"
answer: "Suggest lumbar strain (M54.5) with possible nerve compression (M51). Consider MRI imaging."InstructLab generates diverse examples from your seed examples:
# Initialize InstructLab
ilab init
# Generate synthetic data
ilab data generate \
--taxonomy-path taxonomy \
--num-generate-threads 4 \
--output-dir generated_dataThis creates thousands of synthetic examples by:
Inspect generated data:
head -20 generated_data/medical-coding_gen.jsonlTrain a Granite or Mixtral model with your synthetic data:
# Download base model (if not already present)
ilab model download --model granite-7b
# Fine-tune model
ilab model train \
--model granite-7b \
--training-data generated_data/medical-coding_gen.jsonl \
--output-dir fine-tuned-models \
--num-epochs 2 \
--batch-size 8 \
--learning-rate 1e-4Monitor training progress:
# Watch training logs in real-time
tail -f fine-tuned-models/training.logTraining Configuration Options:
| Parameter | Default | Recommendation |
|---|---|---|
| num_epochs | 1 | 1-3 for good results |
| batch_size | 8 | 4-16 depending on GPU memory |
| learning_rate | 1e-4 | 5e-5 to 5e-4 range |
| warmup_steps | 100 | 5-10% of total steps |
Deploy the improved model as an OpenAI-compatible API:
# Start the inference server
ilab model serve \
--model-dir fine-tuned-models/best_model \
--port 8000 \
--gpu-layers allTest your model:
curl http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "granite-medical",
"messages": [
{
"role": "user",
"content": "Patient with severe abdominal pain in upper left quadrant"
}
],
"temperature": 0.7,
"max_tokens": 500
}'Combine multiple domains in a single fine-tuning run:
# Directory structure
taxonomy/
βββ healthcare/
β βββ medical-coding.yaml
β βββ drug-interactions.yaml
βββ finance/
β βββ portfolio-analysis.yaml
β βββ risk-assessment.yaml
βββ legal/
βββ contract-review.yaml
# Generate data for all skills
ilab data generate --taxonomy-path taxonomy --num-generate-threads 8Validate your taxonomy before training:
ilab taxonomy validate --taxonomy-path taxonomyExpected output:
β healthcare/medical-coding.yaml: 5 examples, valid structure
β finance/portfolio-analysis.yaml: 8 examples, valid structure
β legal/contract-review.yaml: 4 examples, valid structure
Total: 17 skills, 87 seed examplesIteratively improve your model:
# Evaluate model performance
ilab model evaluate \
--model fine-tuned-models/best_model \
--test-data test_examples.jsonl \
--metrics accuracy,bleu,rouge
# Based on results, refine taxonomy and retrain
vim taxonomy/healthcare/medical-coding.yaml
ilab data generate --taxonomy-path taxonomy
ilab model train # Retrain with improved dataHereβs a complete example for a finance domain:
# taxonomy/finance/portfolio-rebalancing.yaml
version: 2
metadata:
name: "Portfolio Rebalancing Advisor"
description: "Provides investment rebalancing recommendations"
author: "finance-team"
task_description: >
Help investors determine when and how to rebalance their portfolios
based on market conditions and investment goals.
seed_examples:
- question: |
Current portfolio: 60% stocks, 30% bonds, 10% cash
Target allocation: 70% stocks, 25% bonds, 5% cash
Market context: Stocks up 15% YTD, bonds stable
Should I rebalance?
answer: |
Yes, rebalance to target allocation. Your stock position has drifted
significantly above target (+10%). Execute:
1. Sell 10% of stocks
2. Allocate 5% to bonds
3. Move 5% excess cash to bonds
This captures gains and maintains risk profile.
- question: |
30/70 portfolio (30% stocks, 70% bonds)
Economic outlook: Anticipated rate cuts in Q1
Performance: Stocks +8%, Bonds -2%
Action plan?
answer: |
With rate cuts expected, consider tactical rebalancing:
1. Increase stock allocation to 35-40% (anticipated bond weakness)
2. Maintain high-quality bond allocation
3. Keep 5-10% cash for opportunities
Monitor rate cut announcements for execution timing.ilab model train \
--model granite-7b \
--training-data generated_data/training.jsonl \
--use-deepspeed \
--deepspeed-stage 2ilab model serve \
--model-dir fine-tuned-models/best_model \
--inference-backend vllmwatch -n 1 nvidia-smiIssue: Out of memory during training
# Reduce batch size
ilab model train \
--batch-size 4 \
--gradient-accumulation-steps 2Issue: Model quality not improving
# Check data quality
ilab data validate --data-file generated_data/training.jsonl
# Increase seed examples
# Edit YAML files to add more diverse examples
ilab data generate --taxonomy-path taxonomy --num-generate-threads 8Issue: Inference latency too high
# Use quantization
ilab model serve \
--model-dir fine-tuned-models/best_model \
--quantization int8Now that you understand InstructLab:
InstructLab transforms AI from βmagic black boxβ into a practical tool you can teach, tune, and continuously improve.
Ready to fine-tune? Start with a simple skill, validate your approach, and scale to enterprise domains. The next article will cover deploying models to production with monitoring and governance.
Ready to build custom AI skills for your organization?
Practical RHEL AI provides comprehensive InstructLab coverage:
Practical RHEL AI shows you how to fine-tune models that understand your businessβwithout massive training datasets.
Learn More βBuy on Amazon β