
Sample images from the IM2LaTeX-100k dataset showing various mathematical expressions.
A deep learning-based system for converting images of mathematical expressions into LaTeX code, addressing challenges in digital document processing and enhancing accessibility of mathematical content.
The Image to LaTeX (img2latex) project implements a deep learning-based system for converting images of mathematical expressions into LaTeX code. This technology addresses a significant challenge in digital document processing: transforming visual representations of mathematical formulas into their corresponding markup representation, which is essential for editing, searching, and accessibility.
Mathematical expressions are ubiquitous in scientific, engineering, and academic literature, but transferring them between different formats can be cumbersome. Traditional Optical Character Recognition (OCR) systems often struggle with the complex two-dimensional structure of mathematical formulas. The img2latex project provides an end-to-end solution to automatically recognize and transcribe mathematical expressions from images, significantly reducing the manual effort required for digitizing printed mathematical content.
The system employs a sequence-to-sequence architecture, combining convolutional neural networks (CNNs) or residual networks (ResNets) for image encoding with Long Short-Term Memory (LSTM) networks for LaTeX sequence decoding. This approach leverages recent advances in computer vision and natural language processing to achieve state-of-the-art performance in formula recognition.
The img2latex system uses the IM2LaTeX-100k dataset, which contains over 100,000 images of mathematical expressions paired with their corresponding LaTeX code. Our analysis of the dataset revealed:
Based on this analysis, our preprocessing pipeline includes:
The LaTeX formulas undergo tokenization using a custom LaTeXTokenizer
class,
which handles special LaTeX tokens and limits sequence length to a maximum of 141 tokens
(the 95th percentile of the dataset formula lengths).
The img2latex system offers two model variants, each with specific strengths and applications:
The CNN-LSTM model consists of:
The final output is flattened and passed through a dense layer to create the embedding
The ResNet-LSTM model replaces the CNN encoder with a pre-trained ResNet:
The training process implements several key strategies:
During inference, the model offers three decoding strategies:
Beam search (with beam size 3) improved BLEU scores by an average of 7.2% compared to greedy search, making it the preferred decoding strategy for most applications.
Our experiments evaluated the performance of both CNN-LSTM and ResNet-LSTM architectures with various hyperparameter settings. Key configurations included:
We evaluated the model performance using four key metrics:
Our training process spanned 25 epochs, with the following progression in validation metrics for our best-performing model (img2latex_v2):
Epoch | Loss | Accuracy | BLEU | Levenshtein |
---|---|---|---|---|
1 | 2.2778 | 0.4986 | 0.0827 | 0.2311 |
5 | 1.8408 | 0.5760 | 0.1241 | 0.2609 |
10 | 1.6909 | 0.6022 | 0.1377 | 0.2716 |
15 | 1.6338 | 0.6116 | 0.1464 | 0.2781 |
20 | 1.6030 | 0.6180 | 0.1502 | 0.2799 |
25 | 1.5663 | 0.6256 | 0.1539 | 0.2829 |
The comparison between our CNN-LSTM and ResNet-LSTM models showed:
Common error patterns included:
Performance strongly correlated with formula complexity:
Analysis of our experiments revealed several important findings that inform future development directions.
The img2latex project successfully demonstrates the viability of deep learning approaches for converting images of mathematical expressions to LaTeX code. Our implementation of CNN-LSTM and ResNet-LSTM architectures shows promising results, achieving reasonable accuracy on the challenging task of mathematical formula recognition.
Key achievements of the project include:
Despite these successes, several challenges remain. The model still struggles with very complex formulas, particularly those with nested structures or uncommon mathematical symbols. Additionally, the current approach requires significant computational resources for training and could benefit from further optimization.
Future work could focus on:
The img2latex system provides a strong foundation for further research and development in mathematical formula recognition, with potential applications in digital document processing, accessibility tools, and educational technology.
Our Image to LaTeX conversion system demonstrates that deep learning approaches can effectively address the challenging task of mathematical formula recognition. While the current system achieves promising results with token-level accuracy of 62.56% and BLEU score of 0.1539, there remain significant opportunities for improvement in handling complex formulas and computational efficiency. The project provides a solid foundation for future work and real-world applications in document digitization and accessibility.
Sample images from the IM2LaTeX-100k dataset showing various mathematical expressions.
Visualization of multiple performance metrics across training epochs, showing consistent improvement.