# DOCX to Markdown Converter This Python script converts DOCX files to Markdown format, preserving formatting such as headings, bold, italic, underline, strikethrough, and highlight. It also extracts images from the DOCX file and saves them in an `images` directory. ## Features - Converts DOCX to Markdown format - Preserves text formatting (headings, bold, italic, underline, strikethrough, highlight) - Extracts images and saves them in an `images` directory - Processes tables and converts them to Markdown format - Command-line interface for specifying input and output files ## Requirements - Python 3.x - python-docx library Install the required dependencies with: ```bash pip install python-docx ``` ## Usage ```bash python docx_to_md.py [output_directory] ``` ### Examples ```bash # Convert a DOCX file to Markdown (output to current directory) python docx_to_md.py document.docx # Convert a DOCX file to Markdown with a specific output directory python docx_to_md.py document.docx /path/to/output/directory # If not specified, the output directory defaults to the current directory python docx_to_md.py document.docx ``` The output Markdown file will have the same name as the input DOCX file, but with a `.md` extension. ## How It Works 1. The script reads the DOCX file using the `python-docx` library 2. It extracts all images from the document and saves them in an `images` subdirectory 3. It processes paragraphs, preserving formatting: - Headings are converted to Markdown headings (#, ##, ###, etc.) - Bold text is wrapped in `**` - Italic text is wrapped in `*` - Underlined text is wrapped in `*` - Strikethrough text is wrapped in `~~` - Highlighted text is wrapped in `**` 4. Tables are converted to Markdown table format 5. The output is written to the specified Markdown file ## Output Structure The script creates the following structure: ``` output.md # The main Markdown file images/ # Directory containing extracted images image_1.png image_2.png ... ``` ## License This project is licensed under the MIT License.