π Book Reference: This article is based on Chapters 4 and 8 of Practical RHEL AI, covering SPDX lineage tracking for enterprise AI governance and compliance.
As AI systems become subject to increasing regulatory scrutiny, organizations need robust methods to track model provenance. SPDX (Software Package Data Exchange) provides a standardized format for documenting the lineage of AI modelsβfrom training data to deployed artifacts.
Practical RHEL AI introduces SPDX lineage tracking as a key governance feature, ensuring enterprises can:
SPDX is an open standard for communicating software bill of materials (SBOM) information. For AI, it extends to:
| Component | SPDX Documentation |
|---|---|
| Base Model | Origin, license, version |
| Training Data | Sources, licenses, transformations |
| Fine-tuning | Parameters, taxonomy, timestamps |
| Deployed Model | Checksums, environment, dependencies |
{
"spdxVersion": "SPDX-2.3",
"dataLicense": "CC0-1.0",
"SPDXID": "SPDXRef-DOCUMENT",
"name": "granite-enterprise-v1-model-sbom",
"documentNamespace": "https://example.com/models/granite-enterprise-v1",
"creationInfo": {
"created": "2024-01-15T10:30:00Z",
"creators": [
"Organization: Acme Corp",
"Tool: RHEL AI InstructLab-1.0"
]
},
"packages": []
}{
"SPDXID": "SPDXRef-BaseModel",
"name": "granite-3b-instruct",
"versionInfo": "2024.1",
"supplier": "Organization: IBM",
"downloadLocation": "https://huggingface.co/ibm/granite-3b-instruct",
"filesAnalyzed": false,
"licenseConcluded": "Apache-2.0",
"licenseDeclared": "Apache-2.0",
"copyrightText": "Copyright 2024 IBM",
"checksums": [
{
"algorithm": "SHA256",
"checksumValue": "a1b2c3d4e5f6..."
}
]
}{
"SPDXID": "SPDXRef-TrainingData",
"name": "enterprise-taxonomy-v1",
"versionInfo": "1.0.0",
"supplier": "Organization: Acme Corp",
"downloadLocation": "NOASSERTION",
"filesAnalyzed": true,
"licenseConcluded": "LicenseRef-Proprietary",
"copyrightText": "Copyright 2024 Acme Corp",
"comment": "Internal taxonomy with 500 skill definitions",
"annotations": [
{
"annotationType": "OTHER",
"annotator": "Tool: InstructLab",
"annotationDate": "2024-01-10T14:00:00Z",
"comment": "Synthetic data generated: 50,000 examples"
}
]
}{
"SPDXID": "SPDXRef-FineTunedModel",
"name": "granite-enterprise-v1",
"versionInfo": "1.0.0",
"supplier": "Organization: Acme Corp",
"downloadLocation": "registry.acme.com/models/granite-enterprise-v1",
"licenseConcluded": "LicenseRef-Proprietary",
"checksums": [
{
"algorithm": "SHA256",
"checksumValue": "f6e5d4c3b2a1..."
}
],
"externalRefs": [
{
"referenceCategory": "OTHER",
"referenceType": "mlflow-run-id",
"referenceLocator": "runs:/abc123/artifacts/model"
}
]
}Document how components relate:
{
"relationships": [
{
"spdxElementId": "SPDXRef-FineTunedModel",
"relatedSpdxElement": "SPDXRef-BaseModel",
"relationshipType": "DERIVED_FROM"
},
{
"spdxElementId": "SPDXRef-FineTunedModel",
"relatedSpdxElement": "SPDXRef-TrainingData",
"relationshipType": "GENERATED_FROM"
},
{
"spdxElementId": "SPDXRef-FineTunedModel",
"relatedSpdxElement": "SPDXRef-DeepSpeedConfig",
"relationshipType": "BUILD_TOOL_OF"
}
]
}flowchart TB
subgraph Lineage["Model Lineage Graph"]
Base["Base Model<br/>Granite"]
Data["Training Data<br/>Taxonomy"]
Base -->|DERIVED_FROM| FineTuned["Fine-tuned<br/>Enterprise Model"]
Data -->|GENERATED_FROM| FineTuned
endGenerate SPDX documents during training:
from instructlab.spdx import SPDXGenerator
# Initialize generator
spdx_gen = SPDXGenerator(
organization="Acme Corp",
namespace_base="https://acme.com/models"
)
# After training completes
spdx_document = spdx_gen.generate(
model_name="granite-enterprise-v1",
base_model="granite-3b-instruct",
training_data_path="./taxonomy",
output_model_path="./output/model",
training_config="./ds_config.json"
)
# Save document
spdx_gen.save(spdx_document, "./sbom/model-sbom.spdx.json")# .gitlab-ci.yml
stages:
- train
- document
- validate
- deploy
generate_sbom:
stage: document
script:
- python generate_spdx.py
- spdx-tools verify sbom/model-sbom.spdx.json
artifacts:
paths:
- sbom/model-sbom.spdx.json
validate_lineage:
stage: validate
script:
- python validate_lineage.py sbom/model-sbom.spdx.json
- |
if [ $? -ne 0 ]; then
echo "Lineage validation failed"
exit 1
fiThe EU AI Act requires documentation of:
SPDX addresses these through:
{
"SPDXID": "SPDXRef-RiskAssessment",
"name": "risk-assessment-v1",
"annotations": [
{
"annotationType": "REVIEW",
"annotator": "Person: Jane Smith (Risk Officer)",
"annotationDate": "2024-01-20T09:00:00Z",
"comment": "Risk level: LIMITED. Approved for production use in customer service scenarios."
}
]
}| Standard | SPDX Support |
|---|---|
| EU AI Act | Full |
| NIST AI RMF | Partial |
| ISO/IEC 42001 | Full |
| SOC 2 | Partial |
from spdx_tools.spdx.parser import parse_from_file
from spdx_tools.spdx.model import RelationshipType
def get_model_ancestry(spdx_file, model_id):
"""Trace model back to its origins."""
document = parse_from_file(spdx_file)
ancestry = []
current = model_id
while current:
package = find_package(document, current)
ancestry.append(package)
# Find DERIVED_FROM relationship
parent = find_relationship(
document,
current,
RelationshipType.DERIVED_FROM
)
current = parent
return ancestry
# Example usage
ancestry = get_model_ancestry(
"sbom/model-sbom.spdx.json",
"SPDXRef-FineTunedModel"
)
for model in ancestry:
print(f"- {model.name} v{model.version_info}")Model Ancestry:
- granite-enterprise-v1 v1.0.0
- granite-3b-instruct v2024.1
- granite-base-3b v1.0.0 (IBM Research)models/
βββ granite-enterprise-v1/
β βββ model.safetensors
β βββ config.json
β βββ tokenizer.json
β βββ sbom/
β βββ model-sbom.spdx.json
β βββ training-data-sbom.spdx.json
β βββ dependencies-sbom.spdx.jsonThis article covers material from:
Need bulletproof model provenance tracking?
Practical RHEL AI covers SPDX and governance comprehensively:
Practical RHEL AI gives you the tools to track, audit, and prove compliance for every model you deploy.
Learn More βBuy on Amazon β