ParanoiDF - A PDF Analysis Suite

ParanoiDF - A PDF Analysis Suite

ParanoiDF is a PDF analysis suite based on PeePDF. It is the swiss army knife of PDF analysis tools.

Requirements:

  • PdfCrack (To crack passwords)
  • Calibre's ebook-convert (To remove DRM)
  • QPDF (To decrypt PDFs)
  • NLTK Natural Language ToolKit, and Java (To use the command redact)
  • lxml (To support XML output)

Usage: 

paranoiDF.py [options] InputFile
There are two important options when ParanoiDF is executed:
  • -f: Ignores the parsing errors. Analysing malicious files probably leads to parsing errors, so this parameter should be set.
  • -l: Sets the loose mode, so does not search for the endobj tag because it's not obligatory. Helpful with malformed files.

Simple execution:

  • Shows the statistics of the file after being decoded/decrypted and analyzed:
python paranoiDF.py [options] pdf_file

Interactive console:

  • Executes the interactive console, giving a wide range of tools to play with.
python paranoiDF.py -i 

Batch execution:

  • It's possible to use a commands file to specify the commands to be executed in the batch mode. This type of execution is good to automatize analysis of multiple files:
python paranoiDF.py [options] -s script_file

Tools/functions when running the ParanoiDF:
  • -t Text Display: Using pdf2txt.py from PDFMiner this option parses and renders all pure text inside a PDF.
  • -u URL: Downloads the PDF from the link and saves it in a new directory named after the website it was obtained from. This option simply uses an OS call to the command WGET.
  • crackpw: This executes PDFCrack tool by performing an OS call. The command allows the user to input a custom dictionary, perform a benchmark or continue from a saved state file. If no custom dictionary is given, this command will attempt to brute force a password using a modifiable charset text file in the "ParanoiDF/pdfcrack" directory.
  • decrypt:  This uses an OS call to "QPDF" which decrypts the PDF document and outputs the decrypted file. This requires the user-password.
  • encrypt: Encrypts an input PDF document with any password you specify. Uses 128-bit RC4 encryption.
  • embedf: Creates a blank PDF document with an embedded file. This is for research purposes to show how files can be embedded in PDFs. This command imports Make-pdf-embedded.py script as a module.
  • embedjs: Similiar to "embedf", but embeds custom JavaScript file inside a new blank PDF document. If no custom JavaScript file is given, a default app.alert messagebox is embedded.
  • extractJS: This attempts to extract any embedded JavaScript in a PDF document.
  • redact: Generates a list of words that fit inside a redaction box in a PDF document. The words (with a custom sentence) can then be parsed in a grammar parser and a custom amount can be displayed depending on their score.
  • removeDRM: Remove DRM (editing, copying etc.) restrictions from PDF document and output to a new file. This does not need the owner-password and there is a possibility the document will lose some formatting. This command works by calling Calibre's "ebook-convert" tool.

Note: Type "help" to get a list of commands. Type "help [command]" to get a description/usage on a specific command.



No comments

Powered by Blogger.