DeepSpeed

Patient Tools

Read, save, and share this guide

Use these quick tools to make this medical article easier to read, print, save, or share with a family member.

Patient Mode

Understand this article easily

Switch between simple English and easy Bangla patient notes. This is for education and does not replace a doctor consultation.

In recent years, large-scale deep learning models have demonstrated impressive capabilities, excelling at tasks across natural language processing, computer vision, and speech domains. Companies now use these models to power novel AI-driven user experiences across a whole spectrum of applications and industries. However, efficiently training...

For severe symptoms, danger signs, pregnancy, child illness, or sudden worsening, seek urgent medical care.

বাংলা রোগী নোট এখনো যোগ করা হয়নি। পোস্ট এডিটরে “RX Bangla Patient Mode” বক্স থেকে সহজ বাংলা সারাংশ যোগ করুন।

এই তথ্য শিক্ষা ও সচেতনতার জন্য। এটি ডাক্তারি পরীক্ষা, রোগ নির্ণয় বা প্রেসক্রিপশনের বিকল্প নয়।

Article Summary

In recent years, large-scale deep learning models have demonstrated impressive capabilities, excelling at tasks across natural language processing, computer vision, and speech domains. Companies now use these models to power novel AI-driven user experiences across a whole spectrum of applications and industries. However, efficiently training large models with 10’s or 100’s of billions of parameters is difficult—the sheer size of these models requires them to...

Key Takeaways

  • This article explains Enabling state-of-the-art DL Stack on AMD GPU in simple medical language.
  • This article explains Power efficient distributed training of large models in simple medical language.
  • This article explains Democratizing large model training in simple medical language.
  • This article explains Using DeepSpeed on AMD GPUs—Getting started in simple medical language.
Educational health guideWritten for patient understanding and clinical awareness.
Reviewed content workflowUse writer and reviewer profiles for stronger trust.
Emergency safety firstUrgent warning signs are highlighted below.

Seek urgent medical care if you notice

These warning signs are general safety guidance. Local emergency numbers and clinical judgment should always come first.

  • Severe symptoms, breathing difficulty, fainting, confusion, or rapidly worsening illness.
  • New weakness, severe pain, high fever, or symptoms after a serious injury.
  • Any symptom that feels urgent, unusual, or unsafe for the patient.
1

Emergency now

Use emergency care for severe, sudden, rapidly worsening, or life-threatening symptoms.

2

See a doctor

Book a professional medical evaluation if symptoms persist, worsen, recur often, affect daily activities, or occur in a high-risk patient.

3

Learn safely

Use this article to understand possible causes, tests, treatment options, prevention, and questions to ask your clinician.

Before reading

RX Patient Tools

Use these quick guides before reading the article, or return to them when you need help preparing questions for a doctor.

Start here Choose the right pathway for symptoms, reports, medicines, or urgent warning signs. Disease article roadmap Read this topic step by step: meaning, symptoms, warning signs, diagnosis, treatment, prevention, and follow-up. Treatment planner Prepare questions about treatment choices, benefits, risks, side effects, and follow-up. Family & caregiver guide Organize symptoms, reports, medicines, questions, and follow-up safely. Nutrition & diet guide Prepare food, hydration, supplement, and medicine-timing questions safely. Prevention guide Organize risk factors, protective habits, screening, and warning signs. Recovery guide Prepare a safe plan for activity, rehabilitation, warning signs, and follow-up.

In recent years, large-scale deep learning models have demonstrated impressive capabilities, excelling at tasks across natural language processing, computer vision, and speech domains. Companies now use these models to power novel AI-driven user experiences across a whole spectrum of applications and industries. However, efficiently training large models with 10’s or 100’s of billions of parameters is difficult—the sheer size of these models requires them to be distributed across multiple nodes with careful orchestration of compute and communication.

DeepSpeed, as part of Microsoft’s AI at Scale initiative, is a popular open-source library for PyTorch that addresses these difficulties and vastly improves the scale, speed, cost, and usability of large model training and inference. It addresses the scaling challenges by allowing users to easily apply a powerful suite of compute, memory, and communication optimization techniques with minimal code changes. With these techniques, DeepSpeed has enabled training the largest transformer model with 530 billion parameters for language generation and helped speed-up training and inference time by a factor of two times to 20 times for real-life scenarios. It is also integrated into popular training libraries like HuggingFace Transformers and PyTorch Lightning.

Since 2006, AMD has been developing and continuously improving their GPU hardware and software technology for high-performance computing (HPC) and machine learning. Their open software platform, ROCm, contains the libraries, compilers, runtimes, and tools necessary for accelerating compute-intensive applications on AMD GPUs. Today, the major machine learning frameworks (like PyTorch, TensorFlow) have ROCm supported binaries that are fully upstreamed so that users can directly run their code written using these frameworks on AMD Instinct GPU hardware and other ROCm compatible GPU hardware—without any porting effort.

AMD has worked closely with the Microsoft DeepSpeed team to bring the suite of parallelization and optimization techniques for the training of large models efficiently on AMD GPUs supporting ROCm. This unlocks the ability to efficiently train models with hundreds of billions of parameters on a wide choice of GPU hardware and system configurations ranging from a single desktop to a distributed cluster of high-performance AMD Instinct™ MI100/MI200 accelerators.

Enabling state-of-the-art DL Stack on AMD GPU

As an effective enabler of large model training, DeepSpeed provides a suite of powerful parallelism and memory optimizations, such as ZeROZeRO-OffloadZeRO-Infinity, and 3D parallelism, which are crucial for efficient training at massive model scales. With DeepSpeed, model scientists can significantly scale up their model sizes on AMD GPUs well beyond the limits of pure data parallelism. As an example, figure 1 shows model sizes that can be trained using 128 MI100 GPUs (on eight nodes) using different DeepSpeed optimizations. In general, each DeepSpeed optimization enables model scaling of two orders of magnitude compared to the 1.5 billion parameters limits of data parallelism. At the extreme, ZeRO-Infinity powers models with nearly 2 trillion parameters.

DeepSpeed
Figure 1: DeepSpeed enables over three orders of magnitude of model scaling on 128 MI100 GPUs.

To achieve the optimizations in compute, memory, and communication, DeepSpeed makes use of HIP (language/runtime), rocBLAS (for GEMMs), and RCCL (for communication) libraries in the ROCm stack. The following figure shows how DeepSpeed interacts with AMD’s ROCm software stack. It requires a version of PyTorch that is built for ROCm.

DeepSpeed
Figure 2: DeepSpeed and the AMD ROCm stack.

Through careful work across both AMD and Microsoft engineers, we are proud to announce that DeepSpeed v0.6 works natively with ROCm-enabled GPUs. This new release of DeepSpeed uses the same APIs as prior releases and does not require any user code changes to leverage the full features of DeepSpeed on ROCm-enabled GPUs. DeepSpeed’s python-level code remains unchanged primarily due to the seamless ROCm experience on PyTorch. DeepSpeed’s CUDA-specific kernels are exposed to users through ROCm’s automatic hipification tools embedded in the PyTorch runtime. This automatic hipification allows DeepSpeed users to continue to enjoy a simple install through PyPI and just-in-time (JIT) hipification and compilation at runtime if or when kernels are utilized by end-users. In addition, ROCm has become an integral part of DeepSpeed’s continuous integration (CI) testing, which will elevate ROCm support in DeepSpeed for all future pull requests and new features.

Power efficient distributed training of large models

DeepSpeed enables high training efficiency while running distributed training for large models with billions of parameters across multiple MI100 GPUs and nodes. For example, figure 3 shows that on 8 MI100 nodes/64 GPUs, DeepSpeed trains a wide range of model sizes, from 0.3 billion parameters (such as Bert-Large) to 50 billion parameters, at efficiencies that range from 38TFLOPs/GPU to 44TFLOPs/GPU.

DeepSpeed
Figure 3: DeepSpeed enables efficient training for a wide range of real-world model sizes.

DeepSpeed also empowers MI100 GPUs to obtain good training scalability for large models as the number of GPUs increases. For example, the plot below in figure 4 shows that for training a 10B parameter model on up to 64 MI100 GPUs, DeepSpeed achieves a throughput scaling (weak scaling) that is close to perfect linear speedup.

DeepSpeed
Figure 4: DeepSpeed achieves good training throughput scalability on 64 MI100 GPUs.

Democratizing large model training

An important democratizing feature of DeepSpeed is the ability to reduce the number of GPUs required to fit large models by offloading model states to the central processing unit (CPU) memory and NVMe memory. Offloading makes large models accessible to users with a limited GPU budget by enabling the training (or finetuning) of models with 10s or 100s of billions of parameters on a single node. Below, we briefly provide a flavor of the model scaling that DeepSpeed enables on a single MI100 GPU.

Efficient model scaling on single GPU

The figure below shows that ZeRO-Offload (such as offloading to CPU memory) can train much larger models (such as 12B parameters), on a single MI100 GPU, compared to the baseline PyTorch which runs out of memory (OOM) for models larger than 1.2B parameters. Moreover, ZeRO-Offload sustains higher training throughput (41—51 TFLOPs) than PyTorch (30 TFLOPs) by enabling larger batch sizes. In summary, ZeRO-Offload supports model sizes ten times larger than otherwise possible and at higher performance.

DeepSpeed
Figure 5: DeepSpeed enables 10 times model scaling on a single MI100 GPU with great efficiency. PyTorch runs out of memory for models larger than 1.2B.

Extreme model scaling on single GPU

The figure below shows that ZeRO-Infinity (such as offloading to NVMe memory) enables even more dramatic scaling in model size on a single MI100 GPU. ZeRO-Infinity utilizes the available 3.5TB NVMe memory in the server to train models as large as 120B parameters on one GPU, thereby scaling the model size by two orders of magnitude compared to the baseline.

DeepSpeed
Figure 6: DeepSpeed enables two orders of magnitude of model scaling on a single MI100 GPU.

We also noticed that the training throughput with a single NVMe device (6.2GB/sec reads and 3.2GB/sec writes) was 12 TFLOPs and we can increase the throughput by using more NVMe devices. For example, we observed that by using four NVMe devices we were able to double the throughput to 24 TFLOPs.

Using DeepSpeed on AMD GPUs—Getting started

It is convenient to use DeepSpeed with the latest ROCm software on a range of supported AMD GPUs.

Installation

The simplest way to use DeepSpeed for ROCm is to use the pre-built docker image (rocm/deepspeed:latest) available on Docker Hub.

You can also easily install DeepSpeed for ROCm by just using “pip install deepspeed”. For details and advanced options please refer to the installation section of the DeepSpeed documentation page.

Using DeepSpeed on ROCm with HuggingFace models

The HuggingFace Transformers is compatible with the latest DeepSpeed and ROCm stack.

  • Several language examples on HuggingFace repository can be easily run on AMD GPUs without any code modifications. We have tested several models like BERT, BART, DistilBERT, T5-Large, DeBERTa-V2-XXLarge, GPT2 and RoBERTa-Large with DeepSpeed ZeRO-2 on ROCm.
  • DeepSpeed can be activated in HuggingFace examples using the deepspeed command-line argument, `--deepspeed=deepspeed_config.json`.

We’ve demonstrated how DeepSpeed and AMD GPUs work together to enable efficient large model training for a single GPU and across distributed GPU clusters. We hope you can take these capabilities to quickly transform your ideas into fully trained models in no time on top of AMD GPUs.

We would love to hear feedback and welcome contributions on the DeepSpeed and AMD ROCm GitHub repos.

AMD ROCm is also supported as an execution provider in the ONNX Runtime for Training, another open-source project led by Microsoft. We refer you to a previous blog for more details. DeepSpeed is composable with ONNX Runtime using the open source ORTModule that is part of ONNX Runtime for PyTorch package. This allows the composition of DeepSpeed and ONNX Runtime optimizations. You can find more details in this blog post.

Contributors

This work was made possible with deep collaboration between system researchers and engineers at AMD and Microsoft. The contributors of this work include Jithun Nair, Jeff Daily, Ramya Ramineni, Aswin Mathews, and Peng Sun from AMD; Olatunji Ruwase, Jeff Rasley, Jeffrey Zhu, Yuxiong He, and Gopi Kumar from Microsoft.

This posting is the authors’ own opinion and may not represent AMD’s or Microsoft’s positions, strategies or opinions. Links to third-party sites are provided for convenience and unless explicitly stated, neither AMD nor Microsoft is responsible for the contents of such linked sites and no endorsement is implied.

Doctor visit helper

Prepare before seeing a doctor

A simple rural-patient checklist to help you explain symptoms clearly, ask better questions, and avoid unsafe self-treatment.

Safety note: This is not a prescription or diagnosis. For severe symptoms, pregnancy danger signs, children with serious illness, chest pain, breathing difficulty, stroke-like weakness, or major injury, seek urgent care.

Which doctor may help?

Start with a registered doctor or the nearest qualified health center.

What to tell the doctor

  • Write when the problem started and how it changed.
  • Bring old prescriptions, investigation reports, and current medicines.
  • Write allergies, pregnancy status, diabetes, kidney/liver disease, and major past illnesses.
  • Bring one family member if the patient is weak, elderly, confused, or a child.

Questions to ask

  • What is the most likely cause of my symptoms?
  • Which danger signs mean I should go to hospital quickly?
  • Which tests are necessary now, and which can wait?
  • How should I take medicines safely and what side effects should I watch for?
  • When should I come for follow-up?

Tests to discuss

  • Vital signs: temperature, pulse, blood pressure, oxygen saturation
  • Basic physical examination by a clinician
  • CBC, urine test, blood sugar, or imaging only when clinically needed

Avoid these mistakes

  • Do not use antibiotics, steroid tablets/injections, or strong painkillers without proper medical advice.
  • Do not hide pregnancy, kidney disease, ulcer, allergy, or blood thinner use.
  • Do not delay emergency care when danger signs are present.

Medicine safety and first-aid guide

This section is for patient education only. It does not replace a doctor, pharmacist, or emergency care.

Safe first steps

  • Rest, drink safe water, and observe symptoms carefully.
  • Keep a written note of symptoms, duration, temperature, medicines already taken, and allergy history.
  • Seek medical care quickly if symptoms are severe, worsening, or unusual for the patient.

OTC medicine safety

  • For mild pain or fever, ask a registered pharmacist or doctor before using common over-the-counter pain/fever medicines.
  • Do not combine multiple pain medicines without advice, especially if you have kidney disease, liver disease, stomach ulcer, asthma, pregnancy, or take blood thinners.
  • Do not give adult medicines to children unless a qualified clinician advises it.

Avoid these mistakes

  • Do not start antibiotics without a proper medical decision.
  • Do not use steroid tablets or injections casually for quick relief.
  • Do not delay emergency care because of home remedies.

Get urgent help if

  • Severe symptoms, confusion, fainting, breathing difficulty, chest pain, severe dehydration, or sudden weakness need urgent medical care.
Medicine names, dose, and timing must be decided by a qualified clinician or pharmacist after checking age, pregnancy, allergy, other diseases, and current medicines.

For rural patients and family caregivers

Patient health record and symptom diary

Write your symptoms, medicines already taken, test results, and questions before visiting a doctor. This note stays on your device unless you print or copy it.

Doctor to discuss: Doctor / qualified healthcare provider
Tests to discuss with doctor
  • Basic vital signs: temperature, pulse, blood pressure, oxygen level if needed
  • Relevant blood, urine, imaging, or specialist tests only after clinical assessment
Questions to ask
  • What is the most likely cause of my symptoms?
  • Which warning signs mean I should go to emergency care?
  • Which tests are really needed now?
  • Which medicines are safe for my age, pregnancy status, allergy, kidney/liver/stomach condition, and current medicines?

Emergency warning signs such as chest pain, severe breathing difficulty, sudden weakness, confusion, severe dehydration, major injury, or loss of bladder/bowel control need urgent medical care. Do not wait for online information.

Safe pathway to proper treatment

Care roadmap for: DeepSpeed

Use this simple roadmap to understand the next safe steps. It is educational and does not replace examination by a doctor.

Go to emergency care if you notice:
  • Severe or rapidly worsening symptoms
  • Breathing difficulty, chest pain, fainting, confusion, severe weakness, major injury, or severe dehydration
Doctor / service to discuss: Qualified healthcare provider; specialist depends on symptoms and examination.
  1. Step 1

    Check danger signs first

    If danger signs are present, seek emergency care and do not wait for online information.

  2. Step 2

    Record the symptom story

    Write when symptoms started, severity, medicines already taken, allergies, pregnancy status, and test results.

  3. Step 3

    Visit a qualified clinician

    A doctor, nurse, or qualified healthcare provider can examine you and decide which tests or treatment are needed.

  4. Step 4

    Do only useful tests

    Do tests after clinical assessment. Avoid unnecessary tests, random antibiotics, or repeated medicines without diagnosis.

  5. Step 5

    Follow up and return early if worse

    If symptoms worsen, new warning signs appear, or treatment is not helping, return for review quickly.

Rural patient practical tips
  • Take a written symptom diary and all previous prescriptions/test reports.
  • Do not hide medicines already taken, even herbal or over-the-counter medicines.
  • Ask which warning signs mean urgent referral to hospital.

This roadmap is for education. A real diagnosis and treatment plan requires history, examination, and clinical judgment.

RX Patient Help

Ask a health question safely

Write your symptom story. A health professional or site editor can review it before any answer is prepared. This box is not for emergency care.

Emergency first: Severe chest pain, breathing trouble, unconsciousness, stroke signs, severe injury, heavy bleeding, or rapidly worsening symptoms need urgent local medical care now.

Frequently Asked Questions

Is this article a replacement for a doctor?

No. It is educational content only. Patients should consult a qualified clinician for diagnosis and treatment.

When should I seek urgent care?

Seek urgent care for severe symptoms, rapidly worsening condition, breathing difficulty, severe pain, neurological changes, or any emergency warning sign.

References

Add references, clinical guidelines, textbooks, journal articles, or trusted medical sources here. You can edit this area from the RX Article Professional Blocks panel.