Local-first machine learning

Patient Tools

Read, save, and share this guide

Use these quick tools to make this medical article easier to read, print, save, or share with a family member.

Patient Mode

Understand this article easily

Switch between simple English and easy Bangla patient notes. This is for education and does not replace a doctor consultation.

As machine learning usage continues to permeate across industries, we see broadening diversity in deployment targets, with companies choosing to run locally on-client versus cloud-based services for security, performance, and cost reasons. On-device machine learning model serving is a difficult task, especially given the limited...

For severe symptoms, danger signs, pregnancy, child illness, or sudden worsening, seek urgent medical care.

বাংলা রোগী নোট এখনো যোগ করা হয়নি। পোস্ট এডিটরে “RX Bangla Patient Mode” বক্স থেকে সহজ বাংলা সারাংশ যোগ করুন।

এই তথ্য শিক্ষা ও সচেতনতার জন্য। এটি ডাক্তারি পরীক্ষা, রোগ নির্ণয় বা প্রেসক্রিপশনের বিকল্প নয়।

Article Summary

As machine learning usage continues to permeate across industries, we see broadening diversity in deployment targets, with companies choosing to run locally on-client versus cloud-based services for security, performance, and cost reasons. On-device machine learning model serving is a difficult task, especially given the limited bandwidth of early-stage startups. This guest post from the team at Pieces shares the problems and solutions evaluated for their on-device model...

Key Takeaways

  • This article explains Local-first machine learning in simple medical language.
  • This article explains Our ideal machine learning runtime in simple medical language.
  • This article explains Production workflow in simple medical language.
  • This article explains Conclusion in simple medical language.
Educational health guideWritten for patient understanding and clinical awareness.
Reviewed content workflowUse writer and reviewer profiles for stronger trust.
Emergency safety firstUrgent warning signs are highlighted below.

Seek urgent medical care if you notice

These warning signs are general safety guidance. Local emergency numbers and clinical judgment should always come first.

  • Severe symptoms, breathing difficulty, fainting, confusion, or rapidly worsening illness.
  • New weakness, severe pain, high fever, or symptoms after a serious injury.
  • Any symptom that feels urgent, unusual, or unsafe for the patient.
1

Emergency now

Use emergency care for severe, sudden, rapidly worsening, or life-threatening symptoms.

2

See a doctor

Book a professional medical evaluation if symptoms persist, worsen, recur often, affect daily activities, or occur in a high-risk patient.

3

Learn safely

Use this article to understand possible causes, tests, treatment options, prevention, and questions to ask your clinician.

Before reading

RX Patient Tools

Use these quick guides before reading the article, or return to them when you need help preparing questions for a doctor.

Start here Choose the right pathway for symptoms, reports, medicines, or urgent warning signs. Disease article roadmap Read this topic step by step: meaning, symptoms, warning signs, diagnosis, treatment, prevention, and follow-up. Treatment planner Prepare questions about treatment choices, benefits, risks, side effects, and follow-up. Family & caregiver guide Organize symptoms, reports, medicines, questions, and follow-up safely. Nutrition & diet guide Prepare food, hydration, supplement, and medicine-timing questions safely. Prevention guide Organize risk factors, protective habits, screening, and warning signs. Recovery guide Prepare a safe plan for activity, rehabilitation, warning signs, and follow-up.

As machine learning usage continues to permeate across industries, we see broadening diversity in deployment targets, with companies choosing to run locally on-client versus cloud-based services for security, performance, and cost reasons. On-device machine learning model serving is a difficult task, especially given the limited bandwidth of early-stage startups. This guest post from the team at Pieces shares the problems and solutions evaluated for their on-device model serving stack and how ONNX Runtime serves as their backbone of success.

Local-first machine learning

Pieces is a code snippet management tool that allows developers to save, search, and reuse their snippets without interrupting their workflow. The magic of Pieces is that it automatically enriches these snippets so that they’re more useful to the developer after being stored in Pieces. A large part of this enrichment is driven by our machine learning models that provide programming language detection, concept tagging, semantic description, snippet clustering, optical character recognition, and much more. To enable full coverage of the developer workflow, we must run these models from the desktop, terminal, integrated development environment, browser, and team communication channels.

Like many businesses, our first instinct was to serve these models as cloud endpoints; however, we realized this wouldn’t suit our needs for a few reasons. First, in order to maintain a seamless developer workflow, our models must have low latency. The round trip to the server is lost time we can’t afford. Second, our users are frequently working with proprietary code, so privacy is a primary concern. Sending this data over the wire would expose it to potential attacks. Finally, hosting models on performant cloud machines can be very expensive and is an unnecessary cost in our opinion. We firmly believe that advances in modern personal hardware can be taken advantage of to rival or even improve upon the performance of models on virtual machines. Therefore, we needed an on-device model serving platform that would provide us with these benefits while still giving our machine learning engineers the flexibility that cloud serving offers. After some trial and error, ONNX Runtime emerged as the clear winner.

Our ideal machine learning runtime

When we set out to find the backbone of our machine learning serving system, we were looking for the following qualities:

  • Easy implementation—It should fit seamlessly into our stack and require minimal custom code to implement and maintain. Our application is built in Flutter, so the runtime would ideally work natively in the Dart language so that our non-machine learning engineers could confidently interact with the API.
  • Balanced—As I mentioned above, performance is key to our success, so we need a runtime that can spin up and perform inference lightning fast. On the other hand, we also need useful tools to optimize model performance, ease model format conversion, and generally facilitate the machine learning engineering processes.
  • Model coverage—It should support the vast majority of machine learning model operators and architectures, especially cutting-edge models, such as those in the transformer family.

TensorFlow Lite

Our initial research revealed three potential options: TensorFlow Lite, TorchServe, and ONNX Runtime. TensorFlow Lite was our top pick because of how easy it would be to implement. We found an open source Dart package which provided Dart bindings to the TensorFlow Lite C API out-of-the-box. This allowed us to simply import the package and immediately have access to machine learning models in our application without worrying about the lower-level details in C and C++.

The tiny runtime offered great performance and worked very well for the initial models we tested in production. However, we quickly ran into a huge blocker: converting other model formats to TensorFlow Lite is a pain. Our first realization of this limitation came when we tried and failed to convert a simple PyTorch LSTM to TensorFlow Lite. This spurred further research into how else we might be limited. We found that many of the models we planned to work on in the future would have to be trained in TensorFlow or Keras because of conversion issues. This was problematic because we’ve found that there’s not a one-size-fits-all machine learning framework. Some are better suited for certain tasks, and our machine learning engineers differ in preference and skill level for each of these frameworks—unfortunately, we tend to favor PyTorch over TensorFlow.

This issue was then compounded by the fact that TensorFlow Lite only supports a subset of the machine learning operators available in TensorFlow and Keras—importantly, it lags in more cutting-edge operators that are required in new, high-performance architectures. This was the final straw for us with TensorFlow Lite. We were looking to implement a fairly standard transformer-based model that we’d trained in TensorFlow and found that the conversion was impossible. To take advantage of the leaps and bounds made in large language models, we needed a more flexible runtime.

TorchServe

Having learned our lesson on locking ourselves into a specific training framework, we opted to skip testing out TorchServe so that we would not run into the same conversion issues.

ONNX Runtime saves the day

Like TensorFlow Lite, ONNX Runtime gave us a lightweight runtime that focused on performance, but where it really stood out was the model coverage. Being built around the ONNX format, which was created to solve interoperability between machine learning tools, it allowed our machine learning engineers to choose the framework that works best for them and the task at hand and have confidence that they would be able to convert their model to ONNX in the end. This flexibility brought more fluidity to our research and development process and reduced the time spent preparing new models for release.

Another large benefit of ONNX Runtime for us is a standardized model optimization pipeline, truly becoming the “balanced” tool we were looking for. By serving models in a single format, we’re able to iterate through a fixed set of known optimizations until we find the desired speed, size, and accuracy tradeoff for each model. Specifically, for each of our ONNX models, the last step before production is to apply different levels of ONNX Runtime graph optimizations and linear quantization. The ease of this process is a quick win for us every time.

Speaking of feature-richness, a final reason that we chose ONNX Runtime was that the baseline performance was good but there were many options we could implement down the road to improve performance. Due to the way we currently build our app, we have been limited to the vanilla CPU builds of ONNX Runtime. However, an upcoming modification to our infrastructure will allow us to utilize execution providers to serve optimized versions of ONNX Runtime based on a user’s CPU and GPU architecture. We also plan to implement dynamic thread management as well as IOBinding for GPU-enabled devices.

Production workflow

Now that we’ve covered our reasoning for choosing ONNX Runtime, we’ll do a brief technical walkthrough of how we utilize ONNX Runtime to facilitate model deployment.

Model conversion

After we’ve finished training a new model, our first step towards deployment is getting that model into an ONNX format. The specific conversion approach depends on the framework used to train the model. We have successfully used the conversion tools supplied by HuggingFacePyTorch, and TensorFlow.

Some model formats are not supported by these conversion tools, but luckily ONNX Runtime has its own internal conversion utilities. We recently used these tools to implement a T5 transformer model for code description generation. The HuggingFace model uses a BeamSearch node for text generation that we were only able to convert to ONNX using ONNX Runtime’s convert generation.py tool, which is included in their transformer utilities.

ONNX model optimization

Our first optimization step is running the ONNX model through all ONNX Runtime optimizations, using GraphOptimizationLevel.ORT_ENABLE_ALL, to reduce model size and startup time. We perform all these optimizations offline so that our ONNX Runtime binary doesn’t have to perform them on startup. We are able to consistently reduce model size and latency very easily with this utility.

Our second optimization step is quantization. Again, ONNX Runtime provides an excellent utility for this. We’ve used both quantize_dynamic() and quantize_static() in production, depending on our desired balance of speed and accuracy for a specific model.

Inference

Once we have an optimized ONNX model, it’s ready to be put into production. We’ve created a thin wrapper around the ONNX Runtime C++ API which allows us to spin up an instance of an inference session given an arbitrary ONNX model. We based this wrapper on the onnxruntime-inference-examples repository. After developing this simple wrapper binary, we were able to quickly get native Dart support using the Dart FFI (Foreign Function Interface) to create Dart bindings for our C++ API. This reduces the friction between teams at Pieces by allowing our Dart software engineers to easily inject our machine learning efforts into all of our services.

Conclusion

On-device machine learning requires a tool that is performant yet allows you to take full advantage of the current state-of-the-art machine learning models. ONNX Runtime gracefully meets both needs, not to mention the incredibly helpful ONNX Runtime engineers on GitHub that are always willing to assist and are constantly pushing ONNX Runtime forward to keep up with the latest trends in machine learning. It’s for these reasons that we at Pieces confidently rest our entire machine learning architecture on its shoulders.

Doctor visit helper

Prepare before seeing a doctor

A simple rural-patient checklist to help you explain symptoms clearly, ask better questions, and avoid unsafe self-treatment.

Safety note: This is not a prescription or diagnosis. For severe symptoms, pregnancy danger signs, children with serious illness, chest pain, breathing difficulty, stroke-like weakness, or major injury, seek urgent care.

Which doctor may help?

Start with a registered doctor or the nearest qualified health center.

What to tell the doctor

  • Write when the problem started and how it changed.
  • Bring old prescriptions, investigation reports, and current medicines.
  • Write allergies, pregnancy status, diabetes, kidney/liver disease, and major past illnesses.
  • Bring one family member if the patient is weak, elderly, confused, or a child.

Questions to ask

  • What is the most likely cause of my symptoms?
  • Which danger signs mean I should go to hospital quickly?
  • Which tests are necessary now, and which can wait?
  • How should I take medicines safely and what side effects should I watch for?
  • When should I come for follow-up?

Tests to discuss

  • Vital signs: temperature, pulse, blood pressure, oxygen saturation
  • Basic physical examination by a clinician
  • CBC, urine test, blood sugar, or imaging only when clinically needed

Avoid these mistakes

  • Do not use antibiotics, steroid tablets/injections, or strong painkillers without proper medical advice.
  • Do not hide pregnancy, kidney disease, ulcer, allergy, or blood thinner use.
  • Do not delay emergency care when danger signs are present.

Medicine safety and first-aid guide

This section is for patient education only. It does not replace a doctor, pharmacist, or emergency care.

Safe first steps

  • Rest, drink safe water, and observe symptoms carefully.
  • Keep a written note of symptoms, duration, temperature, medicines already taken, and allergy history.
  • Seek medical care quickly if symptoms are severe, worsening, or unusual for the patient.

OTC medicine safety

  • For mild pain or fever, ask a registered pharmacist or doctor before using common over-the-counter pain/fever medicines.
  • Do not combine multiple pain medicines without advice, especially if you have kidney disease, liver disease, stomach ulcer, asthma, pregnancy, or take blood thinners.
  • Do not give adult medicines to children unless a qualified clinician advises it.

Avoid these mistakes

  • Do not start antibiotics without a proper medical decision.
  • Do not use steroid tablets or injections casually for quick relief.
  • Do not delay emergency care because of home remedies.

Get urgent help if

  • Severe symptoms, confusion, fainting, breathing difficulty, chest pain, severe dehydration, or sudden weakness need urgent medical care.
Medicine names, dose, and timing must be decided by a qualified clinician or pharmacist after checking age, pregnancy, allergy, other diseases, and current medicines.

For rural patients and family caregivers

Patient health record and symptom diary

Write your symptoms, medicines already taken, test results, and questions before visiting a doctor. This note stays on your device unless you print or copy it.

Doctor to discuss: Doctor / qualified healthcare provider
Tests to discuss with doctor
  • Basic vital signs: temperature, pulse, blood pressure, oxygen level if needed
  • Relevant blood, urine, imaging, or specialist tests only after clinical assessment
Questions to ask
  • What is the most likely cause of my symptoms?
  • Which warning signs mean I should go to emergency care?
  • Which tests are really needed now?
  • Which medicines are safe for my age, pregnancy status, allergy, kidney/liver/stomach condition, and current medicines?

Emergency warning signs such as chest pain, severe breathing difficulty, sudden weakness, confusion, severe dehydration, major injury, or loss of bladder/bowel control need urgent medical care. Do not wait for online information.

Safe pathway to proper treatment

Care roadmap for: Local-first machine learning

Use this simple roadmap to understand the next safe steps. It is educational and does not replace examination by a doctor.

Go to emergency care if you notice:
  • Severe or rapidly worsening symptoms
  • Breathing difficulty, chest pain, fainting, confusion, severe weakness, major injury, or severe dehydration
Doctor / service to discuss: Qualified healthcare provider; specialist depends on symptoms and examination.
  1. Step 1

    Check danger signs first

    If danger signs are present, seek emergency care and do not wait for online information.

  2. Step 2

    Record the symptom story

    Write when symptoms started, severity, medicines already taken, allergies, pregnancy status, and test results.

  3. Step 3

    Visit a qualified clinician

    A doctor, nurse, or qualified healthcare provider can examine you and decide which tests or treatment are needed.

  4. Step 4

    Do only useful tests

    Do tests after clinical assessment. Avoid unnecessary tests, random antibiotics, or repeated medicines without diagnosis.

  5. Step 5

    Follow up and return early if worse

    If symptoms worsen, new warning signs appear, or treatment is not helping, return for review quickly.

Rural patient practical tips
  • Take a written symptom diary and all previous prescriptions/test reports.
  • Do not hide medicines already taken, even herbal or over-the-counter medicines.
  • Ask which warning signs mean urgent referral to hospital.

This roadmap is for education. A real diagnosis and treatment plan requires history, examination, and clinical judgment.

RX Patient Help

Ask a health question safely

Write your symptom story. A health professional or site editor can review it before any answer is prepared. This box is not for emergency care.

Emergency first: Severe chest pain, breathing trouble, unconsciousness, stroke signs, severe injury, heavy bleeding, or rapidly worsening symptoms need urgent local medical care now.

Frequently Asked Questions

Is this article a replacement for a doctor?

No. It is educational content only. Patients should consult a qualified clinician for diagnosis and treatment.

When should I seek urgent care?

Seek urgent care for severe symptoms, rapidly worsening condition, breathing difficulty, severe pain, neurological changes, or any emergency warning sign.

References

Add references, clinical guidelines, textbooks, journal articles, or trusted medical sources here. You can edit this area from the RX Article Professional Blocks panel.