Proposition de stage - 2025
Advancing Handwriting Mathematical Expression Recognition (HMER) with Text Descriptions and Large Language Models (LLMs)
Niveau : Master 2
Période : spring/summer 2025
Context
Handwriting Mathematical Expression Recognition (HMER) focuses on identifying and interpreting handwritten mathematical content, a task that is crucial for digitizing and analyzing historical and modern mathematical documents.
Traditional HMER systems primarily rely on visual features extracted from handwritten strokes or symbols. However, they often struggle with ambiguous handwriting or incomplete contextual information.
Recent advancements in Large Language Models (LLMs) like GPT and BERT have shown their ability to process and embed text into high-dimensional vectors that capture contextual meaning. These embeddings can serve as complementary inputs to HMER systems, enriching their understanding of mathematical symbols through natural
language.
Aims / Work
The main goals of this internship are as follows:
- Review state-of-the-art techniques in HMER and LLMs.
- Identify and collect mathematical text descriptions from open-source repositories and datasets.
- Explore fine-tuning LLMs for mathematics-specific text understanding.
- Evaluate the selected approach(es) and propose improvements.
The results of this internship have the potential to lead to contributions in high-impact conferences (e.g., ICDAR).
Contacts : harold.mouchere@ls2n.fr, yejing.xie@ls2n.fr ;