Skip to content

Command line tool to extract review changes from a docx file as plain text with HTML tags <ins> and <del>.

Notifications You must be signed in to change notification settings

alanlivio/docxreviews2txt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

docxreviews2txt

Command line tool to extract review changes from a docx file as plain text. It is useful when reviewing a PDF file as docx, and you need to share the changes as plain text.

How to install?

pip install docxreviews2txt

How to use it?

usage: docxreviews2txt [-h] [--version] docx

Command line tool to extract review changes from a docx file as plain text using HTML tags <ins> and <del>.

positional arguments:
  docx        input docx

options:
  -h, --help  show this help message and exit
  --version   show version

Example:

$ docxreviews2txt tests/lorem_ipsum.docx
txt reviews at file:///home/alan/src/docxreviews2txt/tests/lorem_ipsum_review.txt
$ cat /home/alan/src/docxreviews2txt/tests/lorem_ipsum_review.txt
Typos suggestions using HTML tags <ins> and <del>:
- dolor sit amet, consectetur <ins>Lorem ipsum</ins><del>adipiscing</del>
- sit amet, consectetur adipiscing<ins>s</ins> elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim <ins>do</ins>
- Ut enim ad minim <ins>Lorem</ins>veniam<ins>ipsum</ins>
- dolor sit amet, consectetur <del>adipiscing</del>

Known issues

The tool fails to capture changes in Docx files with text organized in tables (e.g., pdf2docx converts columns to tables).

References

This project takes inspiration from:

About

Command line tool to extract review changes from a docx file as plain text with HTML tags <ins> and <del>.

Topics

Resources

Stars

Watchers

Forks

Languages