Relevant Segment Extraction (RSE)

Patient Tools

Read, save, and share this guide

Use these quick tools to make this medical article easier to read, print, save, or share with a family member.

Patient Mode

Understand this article easily

Switch between simple English and easy Bangla patient notes. This is for education and does not replace a doctor consultation.

Relevant segment extraction (RSE) is a method of reconstructing multi-chunk segments of contiguous text out of retrieved chunks. This step occurs after vector search (and optionally reranking), but before presenting the retrieved context to the LLM. This method ensures that nearby chunks are presented to...

For severe symptoms, danger signs, pregnancy, child illness, or sudden worsening, seek urgent medical care.

বাংলা রোগী নোট এখনো যোগ করা হয়নি। পোস্ট এডিটরে “RX Bangla Patient Mode” বক্স থেকে সহজ বাংলা সারাংশ যোগ করুন।

এই তথ্য শিক্ষা ও সচেতনতার জন্য। এটি ডাক্তারি পরীক্ষা, রোগ নির্ণয় বা প্রেসক্রিপশনের বিকল্প নয়।

Article Summary

Relevant segment extraction (RSE) is a method of reconstructing multi-chunk segments of contiguous text out of retrieved chunks. This step occurs after vector search (and optionally reranking), but before presenting the retrieved context to the LLM. This method ensures that nearby chunks are presented to the LLM in the order they appear in the original document. It also adds in chunks that are not marked...

Key Takeaways

  • This article explains Motivation in simple medical language.
  • This article explains Key Components in simple medical language.
  • This article explains Method Details in simple medical language.
Educational health guideWritten for patient understanding and clinical awareness.
Reviewed content workflowUse writer and reviewer profiles for stronger trust.
Emergency safety firstUrgent warning signs are highlighted below.

Seek urgent medical care if you notice

These warning signs are general safety guidance. Local emergency numbers and clinical judgment should always come first.

  • Severe symptoms, breathing difficulty, fainting, confusion, or rapidly worsening illness.
  • New weakness, severe pain, high fever, or symptoms after a serious injury.
  • Any symptom that feels urgent, unusual, or unsafe for the patient.
1

Emergency now

Use emergency care for severe, sudden, rapidly worsening, or life-threatening symptoms.

2

See a doctor

Book a professional medical evaluation if symptoms persist, worsen, recur often, affect daily activities, or occur in a high-risk patient.

3

Learn safely

Use this article to understand possible causes, tests, treatment options, prevention, and questions to ask your clinician.

Before reading

RX Patient Tools

Use these quick guides before reading the article, or return to them when you need help preparing questions for a doctor.

Start here Choose the right pathway for symptoms, reports, medicines, or urgent warning signs. Disease article roadmap Read this topic step by step: meaning, symptoms, warning signs, diagnosis, treatment, prevention, and follow-up. Treatment planner Prepare questions about treatment choices, benefits, risks, side effects, and follow-up. Family & caregiver guide Organize symptoms, reports, medicines, questions, and follow-up safely. Nutrition & diet guide Prepare food, hydration, supplement, and medicine-timing questions safely. Prevention guide Organize risk factors, protective habits, screening, and warning signs. Recovery guide Prepare a safe plan for activity, rehabilitation, warning signs, and follow-up.
Doctor visit helper

Prepare before seeing a doctor

A simple rural-patient checklist to help you explain symptoms clearly, ask better questions, and avoid unsafe self-treatment.

Safety note: This is not a prescription or diagnosis. For severe symptoms, pregnancy danger signs, children with serious illness, chest pain, breathing difficulty, stroke-like weakness, or major injury, seek urgent care.

Which doctor may help?

Start with a registered doctor or the nearest qualified health center.

What to tell the doctor

  • Write when the problem started and how it changed.
  • Bring old prescriptions, investigation reports, and current medicines.
  • Write allergies, pregnancy status, diabetes, kidney/liver disease, and major past illnesses.
  • Bring one family member if the patient is weak, elderly, confused, or a child.

Questions to ask

  • What is the most likely cause of my symptoms?
  • Which danger signs mean I should go to hospital quickly?
  • Which tests are necessary now, and which can wait?
  • How should I take medicines safely and what side effects should I watch for?
  • When should I come for follow-up?

Tests to discuss

  • Vital signs: temperature, pulse, blood pressure, oxygen saturation
  • Basic physical examination by a clinician
  • CBC, urine test, blood sugar, or imaging only when clinically needed

Avoid these mistakes

  • Do not use antibiotics, steroid tablets/injections, or strong painkillers without proper medical advice.
  • Do not hide pregnancy, kidney disease, ulcer, allergy, or blood thinner use.
  • Do not delay emergency care when danger signs are present.

Medicine safety and first-aid guide

This section is for patient education only. It does not replace a doctor, pharmacist, or emergency care.

Safe first steps

  • Rest, drink safe water, and observe symptoms carefully.
  • Keep a written note of symptoms, duration, temperature, medicines already taken, and allergy history.
  • Seek medical care quickly if symptoms are severe, worsening, or unusual for the patient.

OTC medicine safety

  • For mild pain or fever, ask a registered pharmacist or doctor before using common over-the-counter pain/fever medicines.
  • Do not combine multiple pain medicines without advice, especially if you have kidney disease, liver disease, stomach ulcer, asthma, pregnancy, or take blood thinners.
  • Do not give adult medicines to children unless a qualified clinician advises it.

Avoid these mistakes

  • Do not start antibiotics without a proper medical decision.
  • Do not use steroid tablets or injections casually for quick relief.
  • Do not delay emergency care because of home remedies.

Get urgent help if

  • Severe symptoms, confusion, fainting, breathing difficulty, chest pain, severe dehydration, or sudden weakness need urgent medical care.
Medicine names, dose, and timing must be decided by a qualified clinician or pharmacist after checking age, pregnancy, allergy, other diseases, and current medicines.

For rural patients and family caregivers

Patient health record and symptom diary

Write your symptoms, medicines already taken, test results, and questions before visiting a doctor. This note stays on your device unless you print or copy it.

Doctor to discuss: Doctor / qualified healthcare provider
Tests to discuss with doctor
  • Basic vital signs: temperature, pulse, blood pressure, oxygen level if needed
  • Relevant blood, urine, imaging, or specialist tests only after clinical assessment
Questions to ask
  • What is the most likely cause of my symptoms?
  • Which warning signs mean I should go to emergency care?
  • Which tests are really needed now?
  • Which medicines are safe for my age, pregnancy status, allergy, kidney/liver/stomach condition, and current medicines?

Emergency warning signs such as chest pain, severe breathing difficulty, sudden weakness, confusion, severe dehydration, major injury, or loss of bladder/bowel control need urgent medical care. Do not wait for online information.

Safe pathway to proper treatment

Care roadmap for: Relevant Segment Extraction (RSE)

Use this simple roadmap to understand the next safe steps. It is educational and does not replace examination by a doctor.

Go to emergency care if you notice:
  • Severe or rapidly worsening symptoms
  • Breathing difficulty, chest pain, fainting, confusion, severe weakness, major injury, or severe dehydration
Doctor / service to discuss: Qualified healthcare provider; specialist depends on symptoms and examination.
  1. Step 1

    Check danger signs first

    If danger signs are present, seek emergency care and do not wait for online information.

  2. Step 2

    Record the symptom story

    Write when symptoms started, severity, medicines already taken, allergies, pregnancy status, and test results.

  3. Step 3

    Visit a qualified clinician

    A doctor, nurse, or qualified healthcare provider can examine you and decide which tests or treatment are needed.

  4. Step 4

    Do only useful tests

    Do tests after clinical assessment. Avoid unnecessary tests, random antibiotics, or repeated medicines without diagnosis.

  5. Step 5

    Follow up and return early if worse

    If symptoms worsen, new warning signs appear, or treatment is not helping, return for review quickly.

Rural patient practical tips
  • Take a written symptom diary and all previous prescriptions/test reports.
  • Do not hide medicines already taken, even herbal or over-the-counter medicines.
  • Ask which warning signs mean urgent referral to hospital.

This roadmap is for education. A real diagnosis and treatment plan requires history, examination, and clinical judgment.

RX Patient Help

Ask a health question safely

Write your symptom story. A health professional or site editor can review it before any answer is prepared. This box is not for emergency care.

Emergency first: Severe chest pain, breathing trouble, unconsciousness, stroke signs, severe injury, heavy bleeding, or rapidly worsening symptoms need urgent local medical care now.

Frequently Asked Questions

Motivation When chunking documents for RAG, choosing the right chunk size is an exercise in managing tradeoffs. Large chunks provide better context to the LLM than small chunks, but they also make it harder to precisely retrieve specific pieces of information. Some queries (like simple factoid questions) are best handled by small chunks, while other queries (like higher-level questions) require very large chunks. There are some queries that can be answered with a single sentence from the document, while there are other queries that require entire sections or chapters to properly answer. Most real-world RAG use cases face a combination of these types of queries. What we really need is a more dynamic system that can retrieve short chunks when that's all that's needed, but can also retrieve very large chunks when required. How do we do that? Our solution is motivated by one simple insight: relevant chunks tend to be clustered within their original documents. Key Components Chunk text key-value store RSE requires being able to retrieve chunk text from a database quickly, using a doc_id and chunk_index as keys. This is because not all chunks that need to be included in a given segment will have been returned in the initial search results. Therefore some sort of key-value store may need to be used in addition to the vector database. Method Details Document chunking Standard document chunking methods can be used. The only special requirement here is that documents are chunked with no overlap. This allows us to reconstruct sections of the document (i.e. segments) by concatenating chunks. RSE optimization After the standard chunk retrieval process is completed, which ideally includes a reranking step, the RSE process can begin. The first step is to combine the absolute relevance value (i.e the similarity score) and the relevance rank. This provides a more robust starting point than just using the similarity score on its own or just using the rank on its own. Then we subtract a constant threshold value (let's say 0.2) from each chunk's value, such that irrelevant chunks have a negative value (as low as -0.2), and relevant chunks have a positive value (as high as 0.8). By calculating chunk values this way we can define segment value as just the sum of the chunk values. For example suppose chunks 0-4 in a document have the following chunk values: [-0.2, -0.2, 0.4, 0.8, -0.1]. The segment that includes only chunks 2-3 would have value 0.4+0.8=1.2. Finding the best segments then becomes a constrained version of the maximum sum subarray problem. We use a brute force search with a few heuristics to make it efficient. This generally takes ~5-10ms. Setup First, some setup. You'll need a Cohere API key to run some of these cells, as we use their excellent reranker to calculate relevance scores. In [4]: import os import numpy as np from typing import List from scipy.stats import beta import matplotlib.pyplot as plt import cohere from dotenv import load_dotenv # Load environment variables from a .env file load_dotenv() os.environ["CO_API_KEY"] = os.getenv('CO_API_KEY') # Cohere API key We define a few helper functions. We'll use the Cohere Rerank API to calculate relevance values for our chunks. Normally, we'd start with a vector and/or keyword search to narrow down the list of candidates, but since we're just dealing with a single document here we can just send all chunks directly to the reranker, keeping things a bit simpler. In [11]: from langchain_text_splitters import RecursiveCharacterTextSplitter def split_into_chunks(text: str, chunk_size: int): """ Split a given text into chunks of specified size using RecursiveCharacterTextSplitter. Args: text (str): The input text to be split into chunks. chunk_size (int, optional): The maximum size of each chunk. Defaults to 800. Returns: list[str]: A list of text chunks. Example: >>> text = "This is a sample text to be split into chunks." >>> chunks = split_into_chunks(text, chunk_size=10) >>> print(chunks) ['This is a', 'sample', 'text to', 'be split', 'into', 'chunks.'] """ text_splitter = RecursiveCharacterTextSplitter(chunk_size=chunk_size, chunk_overlap=0, length_function=len) texts = text_splitter.create_documents([text]) chunks = [text.page_content for text in texts] return chunks def transform(x: float): """ Transformation function to map the absolute relevance value to a value that is more uniformly distributed between 0 and 1. The relevance values given by the Cohere reranker tend to be very close to 0 or 1. This beta function used here helps to spread out the values more uniformly. Args: x (float): The absolute relevance value returned by the Cohere reranker Returns: float: The transformed relevance value """ a, b = 0.4, 0.4 # These can be adjusted to change the distribution shape return beta.cdf(x, a, b) def rerank_chunks(query: str, chunks: List[str]): """ Use Cohere Rerank API to rerank the search results Args: query (str): The search query chunks (list): List of chunks to be reranked Returns: similarity_scores (list): List of similarity scores for each chunk chunk_values (list): List of relevance values (fusion of rank and similarity) for each chunk """ model = "rerank-english-v3.0" client = cohere.Client(api_key=os.environ["CO_API_KEY"]) decay_rate = 30 reranked_results = client.rerank(model=model, query=query, documents=chunks) results = reranked_results.results reranked_indices = [result.index for result in results] reranked_similarity_scores = [result.relevance_score for result in results] # in order of reranked_indices # convert back to order of original documents and calculate the chunk values similarity_scores = [0] * len(chunks) chunk_values = [0] * len(chunks) for i, index in enumerate(reranked_indices): absolute_relevance_value = transform(reranked_similarity_scores[i]) similarity_scores[index] = absolute_relevance_value chunk_values[index] = np.exp(-i/decay_rate)*absolute_relevance_value # decay the relevance value based on the rank return similarity_scores, chunk_values def plot_relevance_scores(chunk_values: List[float], start_index: int = None, end_index: int = None) -> None: """ Visualize the relevance scores of each chunk in the document to the search query Args: chunk_values (list): List of relevance values for each chunk start_index (int): Start index of the chunks to be plotted end_index (int): End index of the chunks to be plotted Returns: None Plots: Scatter plot of the relevance scores of each chunk in the document to the search query """ plt.figure(figsize=(12, 5)) plt.title(f"Similarity of each chunk in the document to the search query") plt.ylim(0, 1) plt.xlabel("Chunk index") plt.ylabel("Query-chunk similarity") if start_index is None: start_index = 0 if end_index is None: end_index = len(chunk_values) plt.scatter(range(start_index, end_index), chunk_values[start_index:end_index]) In [12]: # File path for the input document FILE_PATH = "../data/nike_2023_annual_report.txt" with open(FILE_PATH, 'r') as file: text = file.read() chunks = split_into_chunks(text, chunk_size=800) print (f"Split the document into {len(chunks)} chunks") Split the document into 500 chunks Visualize chunk relevance distribution across single document In [31]: # Example query that requires a longer result than a single chunk query = "Nike consolidated financial statements" similarity_scores, chunk_values = rerank_chunks(query, chunks) In [39]: plot_relevance_scores(chunk_values) How to interpret the chunk relevance plot above In the plot above, the x-axis represents the chunk index. The first chunk in the document has index 0, the next chunk has index 1, etc. The y-axis represents the relevance of each chunk to the query. Viewing it this way lets us see how relevant chunks tend to be clustered in one or more sections of a document. Note: the relevance values in this plot are actually a combination of the raw relevance value and the relevance ranks. An exponential decay function is applied to the ranks, and that is then multiplied by the raw relevance value. Using this combination provides a more robust measure of relevance than using just one or the other. Zooming in Now let's zoom in on that cluster of relevant chunks for a closer look. In [34]: plot_relevance_scores(chunk_values, 320, 340) What's interesting to note here is that only 7 of these 20 chunks have been marked as relevant by our reranker. And many of the non-relevant chunks are sandwiched between relevant chunks. Looking at the span of 323-336, exactly half of those chunks are marked as relevant and the other half are marked as not relevant. Let's see what this part of the document contains In [ ]: def print_document_segment(chunks: List[str], start_index: int, end_index: int): """ Print the text content of a segment of the document Args: chunks (list): List of text chunks start_index (int): Start index of the segment end_index (int): End index of the segment (not inclusive) Returns: None Prints: The text content of the specified segment of the document """ for i in range(start_index, end_index): print(f"\nChunk {i}") print(chunks[i]) print_document_segment(chunks, 320, 340) We can see that the Consolidated Statement of Income starts in chunk 323, and everything up to chunk 333 contains consolidated financial statements, which is what we're looking for. So every chunk in that range is indeed relevant and necessary for our query, yet only about half of those chunks were marked as relevant by the reranker. So in addition to providing more complete context to the LLM, by combining these clusters of relevant chunks we actually find important chunks that otherwise would have been ignored. What can we do with these clusters of relevant chunks?

The core idea is that clusters of relevant chunks, in their original contiguous form, provide much better context to the LLM than individual chunks can. Now for the hard part: how do we actually identify these clusters? If we can calculate chunk values in such a way that the value of a segment is just the sum of the values of its constituent chunks, then finding the optimal segment is a version of the maximum subarray problem, for which a…

What if the answer is contained in a single chunk?

In the case where only a single chunk, or a few isolated chunks, are relevant to the query, we don't want to create large segments out of them. We just want to return those specific chunks. RSE can handle that scenario well too. Since there are no clusters of relevant chunks, it basically reduces to standard top-k retrieval in that case. We'll leave it as an exercise to the reader to see what happens to the chunk relevance plot and…