I just shipped version 2.1.1 of the BigBear Unicode Security Scanner, bringing a major fix for binary file false positives and a bunch of new safety and usability improvements. Thanks to the BigBearCommunity for ongoing feedback and for helping make this tool stronger and more secure!
2.1.1 – 2025-10-30: Binary File False Positive Fix
What’s new:
Automatic Binary File Skipping
The scanner now skips binary files by default—no more false positive alerts from archives, images, videos, executables, or docs!- Files now auto-skipped:
Archives (.jar, .war, .ear, .zip, .tar, .gz, etc.)
Images (.jpg, .png, .svg, .webp, etc.)
Video/Audio (.mp4, .avi, .mp3, .flac, etc.)
Executables (.exe, .dll, .so, etc.)
Fonts (.ttf, .woff, .otf, etc.)
Binary docs (.pdf, .docx, .xls, etc.)
- Files now auto-skipped:
New --include-binaryFlag
Want to scan binaries anyway? You can opt-in for deep scans.
Better Test Coverage
Added tests to ensure binary files are skipped correctly (unless you enable--include-binary).
Other Fixes:
- No more Unicode false positives in .jar/.zip/.png/.pdf, etc.
- Shell scripts are now accurately detected as non-binary when appropriate.
- Improved binary detection: uses both file extension & MIME type for checks.
Internals:
- Upgraded detection logic—new
is_binary_file()for comprehensive checks.
- Upgraded detection logic—new
Docs & Help:
- Expanded README and CLI help for binary scanning.
- Test suite expanded from 9 to 11 tests.
Security Notes:
- Text-based Unicode threats and AI attacks are still detected—smart defaults keep you safe!
- Full control: Force binary scanning any time, if needed.
2.1.0 – 2025-10-23: False Positive Fixes for Emoji, Documentation & UI
Context-aware emoji detection & exclusion
Unicode skipping options for docs (smart quotes, dashes, ellipsis, etc.)
New allowlist template for legit Unicode usages (UI, docs, i18n, math, etc.)
Expanded test suite for emoji and typography edge cases
2.0.0 – 2024: Massive AI+ Unicode Security Overhaul
Detects 150+ risky Unicode patterns and AI injection exploits
Homograph attack coverage (Cyrillic, Greek, Armenian, Thai lookalikes)
Enhanced CLI w/ JSON output, severity filters, allowlists, and easy CI/CD integration
Documentation & usage examples for everything
Full changelog and doc updates here:
I appreciate everyone in the BigBearCommunity for helping make this project more secure, accurate, and developer-friendly! If you run into edge cases or have feedback for the next release, please reach out in the forum or GitHub.
If you find this project helpful, please consider donating to support my work: https://ko-fi.com/bigbeartechworld