quality_assurance · education · workflow

Training Llama 3.3 Swallow: A Japanese Sovereign LLM on Amazon SageMaker HyperPod

The Institute of Science Tokyo sought to build a large language model with enhanced Japanese capabilities capable of surpassing existing leading models, requiring efficient large-scale distributed training infrastructure on cloud.

How it works
Common implementation structure
How this type of workflow is generally built, generalized across documented cases — not tied to any one vendor's stack. Click any stage to read what happens there. Specific products that implement these stages appear in “Tools commonly seen” below.
Stage 1 · Corpus quality filtering
The Swallow Education Classifier extracts educationally valuable content from the Japanese web corpus.
Tools used
Amazon SageMaker HyperPodAmazon EC2Amazon S3Amazon FSx for LustreMegatron-LMWeights & Biases · partnerNCCLAWS-OFI-NCCLAmazon Managed Service for PrometheusAmazon Managed GrafanaDCGM ExporterEFA ExporterElastic Fabric Adapter (EFA)SlurmAWS CloudFormationSwallow Education ClassifierHugging FaceSlack · partner
Outcome

Llama 3.3 Swallow outperforms GPT-4o, GPT-4o-mini, GPT-3.5, and Qwen2.5-72B on Japanese benchmarks, and the distributed checkpointing system saves checkpoints up to 10 times faster compared to synchronous approaches.

Results
Time saved16 days and 6 hours
Volume10 times faster
Source

https://aws.amazon.com/blogs/machine-learning/training-llama-3-3-swallow-a-japanese-sovereign-llm-on-amazon-sagemaker-hyperpod?tag=soumet-20

How we source this →

Grounding & classification
Source type: technical build writeup
38 fields verified against source quotes, 1 dropped as unverifiable.
content generationdocument classificationknowledge basemetric backednamed customerproduction runtime claimedsource backedtools describedvendor confirmedworkflow describededucationsoftwareaccuracy improvementcycle time reductiontechnical build writeupquality assurancedata sync enrichment