What is Computer Vision?

Patient Tools

Read, save, and share this guide

Use these quick tools to make this medical article easier to read, print, save, or share with a family member.

Article Summary

The amount of visual data in the world—and on the web—grows exponentially every day. This is thanks in part to the popularity of video, millions of networked IoT sensors, and the number of cameras, which are close to outnumbering people on the planet. That much data is a complex problem, but visual data, in particular, is very difficult for computers to understand. To make computers...

Educational health guideWritten for patient understanding and clinical awareness.
Reviewed content workflowUse writer and reviewer profiles for stronger trust.
Emergency safety firstUrgent warning signs are highlighted below.

Seek urgent medical care if you notice

These warning signs are general safety guidance. Local emergency numbers and clinical judgment should always come first.

  • Sudden vision loss, severe eye pain, new flashes, or many new floaters.
  • Eye symptoms after injury or chemical exposure.
  • Rapidly worsening redness, swelling, or vision changes.
1

Emergency now

Use emergency care for severe, sudden, rapidly worsening, or life-threatening symptoms.

2

See a doctor

Book a professional medical evaluation if symptoms persist, worsen, recur often, affect daily activities, or occur in a high-risk patient.

3

Learn safely

Use this article to understand possible causes, tests, treatment options, prevention, and questions to ask your clinician.

The amount of visual data in the world—and on the web—grows exponentially every day. This is thanks in part to the popularity of video, millions of networked IoT sensors, and the number of cameras, which are close to outnumbering people on the planet. That much data is a complex problem, but visual data, in particular, is very difficult for computers to understand.

To make computers get better at seeing—and not just seeing, but extracting high-level information—scientists have worked to recreate human vision in computers for the past 50 years. It’s called computer vision, and it’s the science behind advancements like self-driving cars and Facebook’s facial and image recognition technologies.

The challenge of helping computers to see

Human vision is complicated enough. That’s mostly because how humans understand what we see depends largely on our experiences and memories. We’ve been training our brains since the day we were born, which puts computers at a disadvantage. Unless every image that a computer processes is annotated, which would require countless hours for humans to do—tagging an image of an apple “apple,” “fruit,” “red,” “food,” etc.—computers must rely on algorithms to understand what they’re seeing.

And that’s where the genius of computer vision comes in. With support from artificial intelligence, neural networks, deep learning, parallel computing, and machine learning, it’s helping to bridge the gap between computers seeing and computers comprehending what they see.

Previously we covered image recognition and compared a few image recognition APIs. Here, we’ll take a step back and briefly look at the broader field of computer vision.

Human hardware and software: How people see and understand what they’re seeing

It’s easy to take for granted the way our eyes and brains work in tandem to instantaneously help us do something like duck when an object is coming at us at a high rate of speed. It’s not just our eyes at work here; there’s a lot going on to make that split-second response possible, and what we’re seeing is only part of it. A mix of hardware (our eyes) and software (our brains) makes it all work.

We understand an apple is an apple, regardless of shadows, light, colors, or size. These comprehensions happen subconsciously, thanks to interactions we’ve had with the world over time.

So how can you recreate this in a computer? Let’s first look at how a human does this, then see what components a computer would need to do the same.

Human vision

Let’s say you see an apple—it could be a piece of fruit, a drawing, or the logo on the back of a laptop. Here’s how a human processes that, step by step.

  1. Our eyes (with their retinas, photoreceptors, and millions of neurons feeding data to our optical nerves) are the lenses that gather information about objects and images, including light, colors, shadows, depth, and movement. Our eyes are the hardware, but they require software to understand what we’re seeing. So here’s the first step: Our eyes gather light bouncing off an apple.
  2. Next, that light is transformed into information for the brain. Neurons behind the lenses of our eyes process that raw visual data before it makes its way to the brain, working fast to turn light, edges, and motion into usable information for the visual cortex.
  3. The visual cortex is the part of the brain that processes what we’re seeing—and it’s so complex and staggeringly fast, scientists understand only some of what it can do. That it’s still largely a mystery makes it difficult to recreate in computers, but algorithms and convolutional neural networks are getting us closer. At this point, the apple is understood to be an apple, whether it’s green, red, or a drawing of an apple.
  4. The visual part of our brain relies on the rest of the brain for context around what we’re seeing. Our brain, including our memory and other powers of deduction we learn from the day we’re born, provides this context. In the apple example, if we noticed the apple looked moldy or bruised, that would allow us to infer it was a rotten apple and, subsequently, not fit to eat.

Computer vision

Now let’s look at how those steps translate to computers.

1. Cameras, lenses, and sensors gather raw visual input from images and objects (in many cases, with more precision and sensitivity than the human eye!). But without the software components, they’re still just sophisticated camera equipment.

2. When we see an apple, we instinctively know what it is, but a computer sees data about that apple—numbers and RGB values that represent different colors and intensities. Carnegie Mellon University’s Field Robotics Center notes, “It takes robot vision programs about 100 computer instructions to derive single edge or motion detections from comparable video images. A hundred million instructions are needed to do a million detections, and 1,000 MIPS to repeat them 10 times per second to match the retina.” This presents one of the first challenges for computer vision: How can we equip computers to mimic human vision without it taking an impractical amount of time and resources?

Numerous algorithms have been designed to detect kernels—clusters of pixels that indicate certain features in an image. These algorithms can mimic the behavior of the visual cortex, but they need many layers to do it effectively.

3. That’s where giving a computer more context is helpful, but the amount of data required to let computers recognize objects the way the human memory can is immense. The computing power required would be impractical. Neural networks mimic the biological neural networks in our brains, and they help replace all those years of learning humans have. By accessing these networks, computers can teach themselves things we’ve learned over time, removing the need for millions of computer instructions.

convolutional neural network provides an even smarter way to process the values in an image using banks of artificial neurons and learned kernels that can detect interesting features in an image. Layers and layers of learned kernels with increasing degrees of complexity can process an image in parallel—one layer for edges, one for shapes, one for different facial features, and one for surrounding objects, for example—then run those through a final neuron that puts it all together: an image of “a female smiling on a beach.” This layered approach is deep learning in action.

Likewise, recurrent neural networks can process images in videos, and machine learning and artificial intelligence help them get smarter along the way.

What can computer vision do?

“At Upwork, the data science team uses computer vision to help predict the effect of visuals on the hiring manager’s decision-making process.”

—Thanh Tran, VP, Data Science, Upwork

The data science team at Upwork uses computer vision to learn how images affect hiring manager decisions. Thanh Tran, Head of Data Science at Upwork, says “Beyond the conventional use cases of detecting objects, we use computer vision techniques to help predict the effect of visuals on the hiring manager’s decision-making process. For instance, a convolutional neural network is incorporated into our predictive models to learn features from job seekers profile images that may have a positive impact on the interview and hire rate of job applications.”

Computer vision is also responsible for biometric data, such as a visual scan of your face that grants you access to your smartphone. Google’s Pixel 3 smartphone is able to take photos once it detects everyone in the photo is smiling. It’s one of the most compelling applications for computer vision.

“Image recognition, and computer vision more broadly, is integral to a number of emerging technologies, from high-profile advances like driverless cars and facial recognition software to more prosaic but no less important developments, like building smart factories that can spot defects and irregularities on the assembly line, or developing software to allow insurance companies to process and categorize photographs of claims automatically.”—Tyler Keenan, “How Image Recognition Works”

As computer vision gets smarter, computers will be more accurate and better able to sift through the millions of images and hours of video flooding the web. Convolutional neural networks will allow computer vision to take on more-complex challenges, with fewer errors.

Whether it’s simple barcode scanners or video content analysis with recurrent neural networks, computer vision isn’t just here to stay—it’s only just beginning.

Patient safety assistant

Check your symptom safely

Hi, I am RX Symptom Navigator. I can help you understand what to read next and what warning signs need care.
Warning: Do not use this in emergencies, pregnancy, severe illness, or as a substitute for a doctor. For children or teens, use with a parent/guardian and clinician.
A rural-friendly guide: warning signs, when to see a doctor, related articles, tests to discuss, and OTC safety education.
1 Symptom 2 Severity 3 Safe guidance
First safety question

Is there chest pain, breathing trouble, fainting, confusion, severe bleeding, stroke-like weakness, severe injury, or pregnancy danger sign?

Choose quickly

Browse by body area
Start here: Write or select a symptom. The guide will show warning signs, doctor guidance, diagnostic tests to discuss, OTC safety education, and related RX articles.

Important: This tool is educational only. It cannot diagnose, treat, or replace a doctor. OTC information is not a prescription. In an emergency, contact local emergency services or go to the nearest hospital.

Doctor visit helper

Prepare before seeing a doctor

A simple rural-patient checklist to help you explain symptoms clearly, ask better questions, and avoid unsafe self-treatment.

Safety note: This is not a prescription or diagnosis. For severe symptoms, pregnancy danger signs, children with serious illness, chest pain, breathing difficulty, stroke-like weakness, or major injury, seek urgent care.

Which doctor may help?

Start with a registered doctor or the nearest qualified health center.

What to tell the doctor

  • Write when the problem started and how it changed.
  • Bring old prescriptions, investigation reports, and current medicines.
  • Write allergies, pregnancy status, diabetes, kidney/liver disease, and major past illnesses.
  • Bring one family member if the patient is weak, elderly, confused, or a child.

Questions to ask

  • What is the most likely cause of my symptoms?
  • Which danger signs mean I should go to hospital quickly?
  • Which tests are necessary now, and which can wait?
  • How should I take medicines safely and what side effects should I watch for?
  • When should I come for follow-up?

Tests to discuss

  • Vital signs: temperature, pulse, blood pressure, oxygen saturation
  • Basic physical examination by a clinician
  • CBC, urine test, blood sugar, or imaging only when clinically needed

Avoid these mistakes

  • Do not use antibiotics, steroid tablets/injections, or strong painkillers without proper medical advice.
  • Do not hide pregnancy, kidney disease, ulcer, allergy, or blood thinner use.
  • Do not delay emergency care when danger signs are present.

Medicine safety and first-aid guide

This section is for patient education only. It does not replace a doctor, pharmacist, or emergency care.

Safe first steps

  • Rest, drink safe water, and observe symptoms carefully.
  • Keep a written note of symptoms, duration, temperature, medicines already taken, and allergy history.
  • Seek medical care quickly if symptoms are severe, worsening, or unusual for the patient.

OTC medicine safety

  • For mild pain or fever, ask a registered pharmacist or doctor before using common over-the-counter pain/fever medicines.
  • Do not combine multiple pain medicines without advice, especially if you have kidney disease, liver disease, stomach ulcer, asthma, pregnancy, or take blood thinners.
  • Do not give adult medicines to children unless a qualified clinician advises it.

Avoid these mistakes

  • Do not start antibiotics without a proper medical decision.
  • Do not use steroid tablets or injections casually for quick relief.
  • Do not delay emergency care because of home remedies.

Get urgent help if

  • Severe symptoms, confusion, fainting, breathing difficulty, chest pain, severe dehydration, or sudden weakness need urgent medical care.
Medicine names, dose, and timing must be decided by a qualified clinician or pharmacist after checking age, pregnancy, allergy, other diseases, and current medicines.

For rural patients and family caregivers

Patient health record and symptom diary

Write your symptoms, medicines already taken, test results, and questions before visiting a doctor. This note stays on your device unless you print or copy it.

Doctor to discuss: Doctor / qualified healthcare provider
Tests to discuss with doctor
  • Basic vital signs: temperature, pulse, blood pressure, oxygen level if needed
  • Relevant blood, urine, imaging, or specialist tests only after clinical assessment
Questions to ask
  • What is the most likely cause of my symptoms?
  • Which warning signs mean I should go to emergency care?
  • Which tests are really needed now?
  • Which medicines are safe for my age, pregnancy status, allergy, kidney/liver/stomach condition, and current medicines?

Emergency warning signs such as chest pain, severe breathing difficulty, sudden weakness, confusion, severe dehydration, major injury, or loss of bladder/bowel control need urgent medical care. Do not wait for online information.

Safe pathway to proper treatment

Patient care roadmap

Use this simple roadmap to understand the next safe steps. It is educational and does not replace examination by a doctor.

Go to emergency care if you notice:
  • Severe or rapidly worsening symptoms
  • Breathing difficulty, chest pain, fainting, confusion, severe weakness, major injury, or severe dehydration
Doctor / service to discuss: Qualified healthcare provider; specialist depends on symptoms and examination.
  1. Step 1

    Check danger signs first

    If danger signs are present, seek emergency care and do not wait for online information.

  2. Step 2

    Record the symptom story

    Write when symptoms started, severity, medicines already taken, allergies, pregnancy status, and test results.

  3. Step 3

    Visit a qualified clinician

    A doctor, nurse, or qualified healthcare provider can examine you and decide which tests or treatment are needed.

  4. Step 4

    Do only useful tests

    Do tests after clinical assessment. Avoid unnecessary tests, random antibiotics, or repeated medicines without diagnosis.

  5. Step 5

    Follow up and return early if worse

    If symptoms worsen, new warning signs appear, or treatment is not helping, return for review quickly.

Rural patient practical tips
  • Take a written symptom diary and all previous prescriptions/test reports.
  • Do not hide medicines already taken, even herbal or over-the-counter medicines.
  • Ask which warning signs mean urgent referral to hospital.

This roadmap is for education. A real diagnosis and treatment plan requires history, examination, and clinical judgment.

RX Patient Help

Ask a health question safely

Write your symptom story. A health professional or site editor can review it before any answer is prepared. This box is not for emergency care.

Emergency first: Severe chest pain, breathing trouble, unconsciousness, stroke signs, severe injury, heavy bleeding, or rapidly worsening symptoms need urgent local medical care now.

Frequently Asked Questions

The challenge of helping computers to see Human vision is complicated enough. That’s mostly because how humans understand what we see depends largely on our experiences and memories. We’ve been training our brains since the day we were born, which puts computers at a disadvantage. Unless every image that a computer processes is annotated, which would require countless hours for humans to do—tagging an image of an apple “apple,” “fruit,” “red,” “food,” etc.—computers must rely on algorithms to understand what they’re seeing.And that’s where the genius of computer vision comes in. With support from artificial intelligence, neural networks, deep learning, parallel computing, and machine learning, it’s helping to bridge the gap between computers seeing and computers comprehending what they see.Previously we covered image recognition and compared a few image recognition APIs. Here, we’ll take a step back and briefly look at the broader field of computer vision. Human hardware and software: How people see and understand what they’re seeing It’s easy to take for granted the way our eyes and brains work in tandem to instantaneously help us do something like duck when an object is coming at us at a high rate of speed. It’s not just our eyes at work here; there’s a lot going on to make that split-second response possible, and what we’re seeing is only part of it. A mix of hardware (our eyes) and software (our brains) makes it all work.We understand an apple is an apple, regardless of shadows, light, colors, or size. These comprehensions happen subconsciously, thanks to interactions we’ve had with the world over time.So how can you recreate this in a computer? Let’s first look at how a human does this, then see what components a computer would need to do the same. Human vision Let’s say you see an apple—it could be a piece of fruit, a drawing, or the logo on the back of a laptop. Here’s how a human processes that, step by step.Our eyes (with their retinas, photoreceptors, and millions of neurons feeding data to our optical nerves) are the lenses that gather information about objects and images, including light, colors, shadows, depth, and movement. Our eyes are the hardware, but they require software to understand what we’re seeing. So here’s the first step: Our eyes gather light bouncing off an apple. Next, that light is transformed into information for the brain. Neurons behind the lenses of our eyes process that raw visual data before it makes its way to the brain, working fast to turn light, edges, and motion into usable information for the visual cortex. The visual cortex is the part of the brain that processes what we’re seeing—and it’s so complex and staggeringly fast, scientists understand only some of what it can do. That it’s still largely a mystery makes it difficult to recreate in computers, but algorithms and convolutional neural networks are getting us closer. At this point, the apple is understood to be an apple, whether it’s green, red, or a drawing of an apple. The visual part of our brain relies on the rest of the brain for context around what we’re seeing. Our brain, including our memory and other powers of deduction we learn from the day we’re born, provides this context. In the apple example, if we noticed the apple looked moldy or bruised, that would allow us to infer it was a rotten apple and, subsequently, not fit to eat.Computer vision Now let’s look at how those steps translate to computers.1. Cameras, lenses, and sensors gather raw visual input from images and objects (in many cases, with more precision and sensitivity than the human eye!). But without the software components, they’re still just sophisticated camera equipment.2. When we see an apple, we instinctively know what it is, but a computer sees data about that apple—numbers and RGB values that represent different colors and intensities. Carnegie Mellon University’s Field Robotics Center notes, “It takes robot vision programs about 100 computer instructions to derive single edge or motion detections from comparable video images. A hundred million instructions are needed to do a million detections, and 1,000 MIPS to repeat them 10 times per second to match the retina.” This presents one of the first challenges for computer vision: How can we equip computers to mimic human vision without it taking an impractical amount of time and resources?Numerous algorithms have been designed to detect kernels—clusters of pixels that indicate certain features in an image. These algorithms can mimic the behavior of the visual cortex, but they need many layers to do it effectively.3. That’s where giving a computer more context is helpful, but the amount of data required to let computers recognize objects the way the human memory can is immense. The computing power required would be impractical. Neural networks mimic the biological neural networks in our brains, and they help replace all those years of learning humans have. By accessing these networks, computers can teach themselves things we’ve learned over time, removing the need for millions of computer instructions.A convolutional neural network provides an even smarter way to process the values in an image using banks of artificial neurons and learned kernels that can detect interesting features in an image. Layers and layers of learned kernels with increasing degrees of complexity can process an image in parallel—one layer for edges, one for shapes, one for different facial features, and one for surrounding objects, for example—then run those through a final neuron that puts it all together: an image of “a female smiling on a beach.” This layered approach is deep learning in action.Likewise, recurrent neural networks can process images in videos, and machine learning and artificial intelligence help them get smarter along the way. What can computer vision do?

“At Upwork, the data science team uses computer vision to help predict the effect of visuals on the hiring manager’s decision-making process.” —Thanh Tran, VP, Data Science, Upwork The data science team at Upwork uses computer vision to learn how images affect hiring manager decisions. Thanh Tran, Head of Data Science at Upwork, says “Beyond the conventional use cases of detecting objects, we use computer vision techniques to help predict the effect of visuals on the hiring manager’s decision-making process.…

References

Add references, clinical guidelines, textbooks, journal articles, or trusted medical sources here. You can edit this area from the RX Article Professional Blocks panel.