Skip to content

add json output to olemeta.py#549

Open
remotephone wants to merge 7 commits intodecalage2:masterfrom
remotephone:master
Open

add json output to olemeta.py#549
remotephone wants to merge 7 commits intodecalage2:masterfrom
remotephone:master

Conversation

@remotephone
Copy link
Copy Markdown

@remotephone remotephone commented Mar 30, 2020


name: olemeta.py json output
about: Update olemeta to provide json output with -o flag and to be run imported into other tools.


Is your feature request related to a problem? Please describe.

The olemeta tool works great to provide a table format for interactive use of the tool. I want to be able to import the tool into other scripts to automate some triage of malicious documents. The existing tool does not import well into other scripts and does not provide output I can manipulate easily.

Describe the solution you'd like
Update olemeta to provide json output with -o flag and to be run imported into other tools.

Describe alternatives you've considered
There are probably other ways to write the code. This was intended to be a minor change, but I added some comments about handling error output in the logging and I also considered created dedicated functions to remove duplicate code as your comments suggested, but figured I'd get something working first.

Additional context
This change would allow the user to provide a -o flag interactively to generate output in json. Normal use of the tool would not change as the default still outputs a table without the -o flag, but it could now be imported into other scripts with something like:

import olefile
from oletools import olemeta

with open('file.doc', 'rb') as file:
    output = 'json'
    ole = olefile.OleFileIO(file)
    meta = olemeta.process_ole(ole)
    json_metadata = olemeta.process_output(meta, output)
    print(json_metadata)

@remotephone
Copy link
Copy Markdown
Author

remotephone commented Mar 31, 2020

The latest commit handles cases where byte objects were being returned unencoded (breaks json serializers) and datetime.datetime() values were returned. All values are passing through a cleaner function before being added to the dictionary.

@remotephone
Copy link
Copy Markdown
Author

I ran into a document with latin-1 encoding that broke the clean_output function. This commit handles latin-1 encoded characters.

@decalage2
Copy link
Copy Markdown
Owner

Thanks @remotephone, this looks good.
If you just need to get metadata for a python script/app, then a direct call to olefile get_metadata() would give you a python object with simple attributes: https://olefile.readthedocs.io/en/latest/Howto.html#extract-metadata (olemeta is just a simple wrapper around it)
But if you need integration with non-python tools, then indeed JSON is a good way to do it.

@decalage2 decalage2 self-requested a review April 1, 2020 19:57
@decalage2 decalage2 self-assigned this Apr 1, 2020
@decalage2 decalage2 added this to the oletools 0.56 milestone Apr 1, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants