Edit Banana: The Complete Guide to Converting Static Charts into Editable Diagrams

Every researcher, analyst, and content creator has faced this frustrating scenario: you find the perfect chart in a PDF, a research paper, or a screenshot — but it's a static image. You can't change the colors, update the data, or adjust the layout. It's a "pixel dead image," locked in format forever. Edit Banana changes this entirely.

Built by BIT-DataLab (Beijing Institute of Technology), Edit Banana is an open-source framework that uses SAM 3 (Segment Anything Model 3) and multimodal large language models to convert static charts, diagrams, and figures into fully editable DrawIO (XML) and PowerPoint (PPTX) files — preserving layout logic, color matching, element hierarchy, and text accuracy.

With 3,800+ GitHub stars, 233 forks, and an active online demo, Edit Banana is at the forefront of the emerging field of chart derendering — the reverse engineering of visual data representations back into their editable components.

What Is Edit Banana?

Edit Banana is a Universal Content Re-Editor — its tagline says it all: "Make the Uneditable, Editable." The tool takes a static image (PNG/JPG) or PDF containing charts, flowcharts, diagrams, or statistical figures, and outputs an editable file where every element can be individually dragged, styled, and modified.

Key Stats

Metric	Value
GitHub Stars	3,800+
Forks	233
Language	Python
License	AGPL-3.0
Created	January 2026
Last Updated	March 2026
Contributors	6
Topics	ai, data, figure, llm, python, nanobanana
Organization	BIT-DataLab (Beijing Institute of Technology)
Online Demo	editbanana.anxin6.cn

What Problem Does It Solve?

In academic research, business reporting, and content creation, charts and diagrams are routinely shared as images embedded in PDFs, slides, or papers. These static formats make it impossible to:

Update data without recreating the entire chart
Change styling (colors, fonts, line thickness)
Extract underlying numerical data
Adapt charts for different presentations or publications
Modify individual elements like arrows, labels, or shapes

Edit Banana solves all of these by reconstructing the diagram from its pixel representation into a structured, editable format.

How It Works: The Architecture Pipeline

Edit Banana uses a sophisticated multi-stage pipeline:

Stage 1: Input Processing

The system accepts PNG/JPG images or PDF files containing charts, flowcharts, statistical figures, or diagrams.

Stage 2: Segmentation with SAM 3

Using a fine-tuned SAM 3 (Segment Anything Model 3) mask decoder, Edit Banana identifies and segments every visual element in the diagram:

Shapes (rectangles, circles, diamonds)
Lines and arrows
Text regions
Background areas
Color fills and strokes

The segmentation preserves spatial relationships — where elements are positioned relative to each other, how they connect, and their hierarchical structure.

Stage 3: Text Extraction (Parallel Processing)

A dual-engine text extraction pipeline runs in parallel:

Local OCR (Tesseract): Detects text bounding boxes and extracts standard text content. Runs entirely offline with no API dependencies.
Pix2Text: Specialized in mathematical formula recognition and LaTeX conversion — turning rendered formulas like ∫f(x)dx back into their LaTeX source code.
Crop-Guided Strategy: Rather than passing the entire image to OCR, the system extracts individual text/formula regions as high-resolution crops, then sends these focused regions to the text engine for maximum accuracy.

Stage 4: Fixed Multi-Round VLM Scanning

A Multimodal LLM (Qwen-VL or GPT-4V) scans the diagram in multiple rounds to understand:

Logical relationships between elements
Flow direction (for flowcharts)
Data relationships (for bar/line/pie charts)
Element hierarchy and grouping

Stage 5: XML/PPTX Generation

The final stage merges spatial data from SAM 3 segmentation with text OCR results to generate:

DrawIO XML: Directly importable into draw.io, the most popular free diagramming tool
PPTX: PowerPoint files where each element is a separate, editable shape

Key Features

🎯 High-Fidelity Reconstruction

The reconstruction preserves:

Layout logic — exact spatial positioning of elements
Color matching — extracted dominant colors applied to fills/strokes
Element hierarchy — grouping, layering, and nesting
Text accuracy — including mathematical formulae via LaTeX

📐 Individual Element Manipulation

Once converted, every element in the output can be:

Dragged to new positions
Restyled (colors, borders, shadows)
Resized
Deleted or duplicated
Connected with new arrows/lines

🧮 Mathematical Formula Support

The Pix2Text integration means rendered equations are converted back to LaTeX source code — critical for academic papers where formula editing is common.

🖥️ Multiple Interfaces

CLI: python main.py -i input/diagram.png
FastAPI Server: server_pa.py for API-based access
Web Demo: editbanana.anxin6.cn

👥 Multi-User Concurrency

The server supports concurrent users with:

Global Lock mechanism for thread-safe GPU access
LRU Cache to persist image embeddings across requests
Credit system (10 free credits for new users) to prevent resource abuse

Getting Started

Prerequisites

Python 3.10+
CUDA-capable GPU (highly recommended)
Tesseract OCR engine

Installation

# Clone
git clone https://github.com/BIT-DataLab/Edit-Banana.git
cd Image2DrawIO

# Create directories
mkdir -p input output sam3_output

# Download SAM 3 model weights
# From: https://modelscope.cn/models/facebook/sam3
# Place in: models/sam3.pt

# Install dependencies
pip install -r requirements.txt

# Install Tesseract (Ubuntu)
sudo apt install tesseract-ocr tesseract-ocr-chi-sim

# Configure
cp config/config.yaml.example config/config.yaml

Usage

# Process a single image
python main.py -i input/test_diagram.png

# Output XML saved to output/ directory

Configuration (`config.yaml`)

Customize the pipeline:

sam3: Score thresholds, NMS thresholds, max iteration loops
paths: Input/output directories
dominant_color: Color extraction sensitivity

Use Cases

📚 Academic Research

Researchers frequently need to re-create or modify charts from published papers for their own publications. Edit Banana eliminates the need to recreate charts from scratch — just feed in the original figure and get an editable version.

📊 Business Reporting

Analysts working with quarterly reports, market research decks, or competitor analyses often encounter charts locked in PDFs. Edit Banana extracts them into editable PowerPoint slides.

🎨 Content Creation

Bloggers, educators, and presenters can transform any chart they find into customizable assets — changing colors to match their brand, updating labels, or restructuring layouts.

🔬 Data Recovery

When the original data behind a chart is lost (common in legacy reports), Edit Banana can reconstruct the visual elements, making it possible to approximate and extract the underlying data.

Edit Banana vs Alternatives

This is a chart/diagram derendering tool. Here's how it compares:

Feature	Edit Banana	Pic2Chart	ChartReader	WebPlotDigitizer	ChartDetective
Stars	3.8K	N/A (SaaS)	200+	N/A (Web)	100+
Input	Image + PDF	Image	Image	Image + PDF	Vector charts
Output	DrawIO + PPTX	PowerPoint SVG	Data tables	Data extraction	Data points
Segmentation	✅ SAM 3	❌	✅ Transformers	❌ Manual	❌ Interactive
LaTeX Support	✅ Pix2Text	❌	❌	❌	❌
VLM Integration	✅ Qwen-VL/GPT-4V	❌	✅	❌	❌
Self-hosted	✅	❌	✅	❌	✅
Editable Diagram	✅ Full elements	✅ Partial	❌ Data only	❌ Data only	❌ Data only
Online Demo	✅	✅	❌	✅	❌

When to Choose Each

Edit Banana: When you need the full diagram reconstructed as an editable file with every element manipulable. Best for flowcharts, architecture diagrams, and complex figures.
Pic2Chart: Quick online conversion of simple diagrams to PowerPoint SVG. No self-hosting option.
ChartReader: When you need to extract numerical data from bar/line charts for analysis. Outputs data tables, not editable diagrams.
WebPlotDigitizer: Industry standard for manual data extraction from XY plots. Best when precision matters and you're willing to do manual calibration.
ChartDetective: Interactive tool for vector chart data extraction. Requires manual interaction.

Edit Banana's Unique Advantage

The key differentiator is that Edit Banana is the only tool that performs full diagram reconstruction into editable formats. Other tools focus on data extraction — getting numbers from charts. Edit Banana goes further by reconstructing the visual representation itself as a fully editable diagram where every shape, line, text, and color can be individually modified.

Frequently Asked Questions

What types of diagrams does it handle?

Flowcharts, architecture diagrams, statistical charts (bar, line, pie), UML diagrams, mind maps, and any visual diagram with distinct elements.

Do I need a GPU?

A CUDA-capable GPU is highly recommended for SAM 3 inference. CPU-only mode is possible but significantly slower.

How accurate is the text extraction?

Standard text uses Tesseract OCR with high accuracy. Mathematical formulae use Pix2Text for LaTeX conversion. The crop-guided strategy ensures high-resolution input for maximum accuracy.

Can I use this commercially?

The code is licensed under AGPL-3.0, which requires derivative works to be open-sourced. For commercial use without this requirement, contact the team.

Is there an API?

Yes. The server_pa.py FastAPI server provides HTTP endpoints for programmatic access. The online demo at editbanana.anxin6.cn also accepts uploads.

GitHub vs Web version — which is better?

The web platform at editbanana.anxin6.cn has the most up-to-date features. The GitHub repository trails behind but provides full self-hosting capability.

Conclusion

Edit Banana represents a breakthrough in the field of chart derendering. By combining SAM 3 segmentation, multimodal LLMs, and dual-engine OCR into a single pipeline, it achieves what was previously impossible: converting static chart images into fully editable diagrams with preserved layout, colors, and text — including mathematical formulae.

With 3,800+ stars and backing from Beijing Institute of Technology's Data Lab, this project fills a gap that researchers, analysts, and content creators have struggled with for years. No more recreating charts from scratch — just upload an image and get an editable DrawIO or PowerPoint file in seconds.

Edit Banana: The Complete Guide to Converting Static Charts into Editable Diagrams

Edit Banana: The Complete Guide to Converting Static Charts into Editable Diagrams

What Is Edit Banana?

Key Stats

What Problem Does It Solve?

How It Works: The Architecture Pipeline

Stage 1: Input Processing

Stage 2: Segmentation with SAM 3

Stage 3: Text Extraction (Parallel Processing)

Stage 4: Fixed Multi-Round VLM Scanning

Stage 5: XML/PPTX Generation

Key Features

🎯 High-Fidelity Reconstruction

📐 Individual Element Manipulation

🧮 Mathematical Formula Support

🖥️ Multiple Interfaces

👥 Multi-User Concurrency

Getting Started

Prerequisites

Installation

Usage

Configuration (config.yaml)

Use Cases

📚 Academic Research

📊 Business Reporting

🎨 Content Creation

🔬 Data Recovery

Edit Banana vs Alternatives

When to Choose Each

Edit Banana's Unique Advantage

Frequently Asked Questions

What types of diagrams does it handle?

Do I need a GPU?

How accurate is the text extraction?

Can I use this commercially?

Is there an API?

GitHub vs Web version — which is better?

Conclusion

Resources

Tags

Configuration (`config.yaml`)