Skip to content

Add sampling to flow heuristic.#215

Open
kristiandueholm wants to merge 1 commit intoalufers:masterfrom
kristiandueholm:master
Open

Add sampling to flow heuristic.#215
kristiandueholm wants to merge 1 commit intoalufers:masterfrom
kristiandueholm:master

Conversation

@kristiandueholm
Copy link
Copy Markdown

This pull request solves the issue of mitmproxy dump files (flows) getting interpreted to be .har files by detect_input_format(). The error I have been seeing is:

TypeError: 'int' object is not subscriptable

The proposed solution should solve a lot of issues where inserting -f flow makes the program run properly. For example #213, #171, and likely #214.

Root cause

Enabling the debugging mode by setting the MITMPROXY2SWAGGER_DEBUG environment variable revealed that the heuristics generated in detect_input_format() was higher for .har even though the file was a flow dump. The main heuristic for detecting flow files is non-printable (ascii) characters. The underlying issue is that mitmproxy_dump_file_huristic() assumes these will be present in the first 2048 bytes. In my case these were filled with certificates, containing purely printable characters, causing a miss in the heuristic.

Proposed solution

Instead of relying on the first 2048 bytes, sample throughout the file for non-printables.

@frafra
Copy link
Copy Markdown

frafra commented Aug 1, 2025

I have the same problem and this PR fixes the issue.

@dmiller423
Copy link
Copy Markdown

Why is this not pulled? or any solution, this is kindof a stupid error with many people hitting it pointlessly.

@kristiandueholm
Copy link
Copy Markdown
Author

I sent an email to @alufers but I could not get in touch with him. Can you try, maybe telegram? Else we will have to fork..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants