🚀 BigBear Unicode Scanner 2.x Series Major Updates (incl. 2.1.1 release!)

dragonfire1119 · November 1, 2025, 5:09pm

I just shipped version 2.1.1 of the BigBear Unicode Security Scanner, bringing a major fix for binary file false positives and a bunch of new safety and usability improvements. Thanks to the BigBearCommunity for ongoing feedback and for helping make this tool stronger and more secure!

2.1.1 – 2025-10-30: Binary File False Positive Fix

What’s new:

Automatic Binary File Skipping
The scanner now skips binary files by default—no more false positive alerts from archives, images, videos, executables, or docs!
- Files now auto-skipped:
  Archives (.jar, .war, .ear, .zip, .tar, .gz, etc.)
  Images (.jpg, .png, .svg, .webp, etc.)
  Video/Audio (.mp4, .avi, .mp3, .flac, etc.)
  Executables (.exe, .dll, .so, etc.)
  Fonts (.ttf, .woff, .otf, etc.)
  Binary docs (.pdf, .docx, .xls, etc.)
New --include-binary Flag
Want to scan binaries anyway? You can opt-in for deep scans.
Better Test Coverage
Added tests to ensure binary files are skipped correctly (unless you enable --include-binary).
Other Fixes:
- No more Unicode false positives in .jar/.zip/.png/.pdf, etc.
- Shell scripts are now accurately detected as non-binary when appropriate.
- Improved binary detection: uses both file extension & MIME type for checks.
Internals:
- Upgraded detection logic—new is_binary_file() for comprehensive checks.
Docs & Help:
- Expanded README and CLI help for binary scanning.
- Test suite expanded from 9 to 11 tests.
Security Notes:
- Text-based Unicode threats and AI attacks are still detected—smart defaults keep you safe!
- Full control: Force binary scanning any time, if needed.

2.1.0 – 2025-10-23: False Positive Fixes for Emoji, Documentation & UI

Context-aware emoji detection & exclusion
Unicode skipping options for docs (smart quotes, dashes, ellipsis, etc.)
New allowlist template for legit Unicode usages (UI, docs, i18n, math, etc.)
Expanded test suite for emoji and typography edge cases

2.0.0 – 2024: Massive AI+ Unicode Security Overhaul

Detects 150+ risky Unicode patterns and AI injection exploits
Homograph attack coverage (Cyrillic, Greek, Armenian, Thai lookalikes)
Enhanced CLI w/ JSON output, severity filters, allowlists, and easy CI/CD integration
Documentation & usage examples for everything

Full changelog and doc updates here:

github.com/bigbeartechworld/big-bear-scripts

check-for-unicode/CHANGELOG.md

master

# Changelog

All notable changes to this project will be documented in this file.

The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [Unreleased]

## [2.1.1] - 2025-10-30 Binary File False Positive Fix

### Added
- 🚫 **Automatic Binary File Skipping**: Scanner now automatically skips binary files by default to prevent false positives
  - Skips archives: `.jar`, `.war`, `.ear`, `.zip`, `.tar`, `.gz`, etc.
  - Skips images: `.jpg`, `.png`, `.gif`, `.svg`, `.webp`, etc.
  - Skips videos: `.mp4`, `.avi`, `.mov`, `.mkv`, etc.
  - Skips audio: `.mp3`, `.wav`, `.ogg`, `.flac`, etc.
  - Skips executables: `.exe`, `.dll`, `.so`, `.dylib`, etc.
  - Skips fonts: `.ttf`, `.otf`, `.woff`, `.woff2`, etc.
  - Skips binary documents: `.pdf`, `.doc`, `.docx`, `.xls`, etc.

This file has been truncated. show original

I appreciate everyone in the BigBearCommunity for helping make this project more secure, accurate, and developer-friendly! If you run into edge cases or have feedback for the next release, please reach out in the forum or GitHub.

If you find this project helpful, please consider donating to support my work: https://ko-fi.com/bigbeartechworld