1. Introduction & Task Summary
This report details the research conducted on DeepSeek-OCR, a new Artificial Intelligence (AI) tool. So I took this tool for a spin to figure out what it actually does, how it works under the hood, and whether it's any good by feeding it a test document. The verdict? DeepSeek OCR does a solid job of extracting text and making sense of complicated documents—there are a few small hiccups, but nothing major.
2. What is DeepSeek-OCR?
Think of DeepSeek-OCR as a super smart document scanner that's powered by AI. Regular OCR tools just try to grab letters from images, but DeepSeek-OCR actually gets what the page is supposed to look like.
- It reads the words - Pulls text from document images
- It gets the layout - Recognizes headings, paragraphs, lists, and tables
- It keeps things organized - Spits everything out in Markdown so the formatting stays intact
- It's built for speed - Made to run fast on good hardware
3. How I tested it
To evaluate DeepSeek-OCR, the following steps were taken:
- Accessed the Tool: Obtained code and instructions from the official GitHub project page.
- Set Up Environment: Used a cloud-based GPU service (Runpod) with an NVIDIA GPU.
- Installed Software: Installed DeepSeek-OCR and required libraries (PyTorch, transformers, CUDA 11.8).
- Prepared Test Document: Chose a complex 19-page PDF report.
- Ran OCR Process: Executed a script to convert the PDF content to Markdown.
- Analyzed Results: Downloaded and reviewed the generated Markdown and extracted images.
4. The Test Document
The tool was tested on a 19 page PDF document that has complex report containing text, headings, lists, multiple tables, charts, and graphics, making it a rigorous test case.
5. Results of the OCR Test
DeepSeek-OCR successfully processed all 19 pages of the PDF.
- Output Format: Generated 19 separate output folders (page_1 to page_19), each containing a Markdown (.mmd) file and extracted graphics as image files (.jpg).
- Overall Quality: The quality of the extraction was very high. Structural elements were recognized accurately.
Specific Observations:
- Text Extraction: Generally excellent; paragraphs and lists were captured accurately.
- Heading Recognition: Headings and subheadings were correctly identified and formatted in Markdown.
- Table Parsing: A major strength; perfectly converted complex tables into structured Markdown format.
- Image/Chart Handling: Correctly identified charts and visual elements as separate images rather than attempting to read internal text.
Minor Errors Noted:
- Typos: Occasional minor typos (e.g., "Fridav" for "Friday", "Supeior" for "Superior").
- Formatting: Minor inconsistencies with bullet points on some pages.
- Data Duplication: Observed in some empty table cells.
- Missing Elements: Footer text was missed on the final page.
6. Conclusion & Summary of Findings
DeepSeek-OCR proved to be a powerful and highly effective tool for converting complex PDFs into structured digital formats. Its key strengths lie in its accurate text extraction and its excellent ability to parse document layouts, especially tables. While not entirely flawless, its performance on this challenging document was impressive.
7. Supplementary Files
- DeepSeek-OCR folder: Source code and necessary files.
- Output_results folder: Raw output generated by DeepSeek-OCR, including Markdown files and extracted images for all 19 pages.
Output Results: View Results