workflows: Add check for common LLM targeted unicode characters (#1974)

* workflows: Add check for zero width unicode characters

* refactor: exclude .git directory in zero width unicode character check

* Include additional zero width unicode character in check

* refactor: update zero width unicode check to include format characters
This commit is contained in:
Adrian Gallagher
2025-08-01 11:05:32 +10:00
committed by GitHub
parent 2ab4379189
commit 3178366d86
3 changed files with 16 additions and 4 deletions

View File

@@ -51,3 +51,15 @@ jobs:
echo "::error::Replace !errors.Is(err, target) with testify equivalents"
exit 1
- name: Check for LLM targeted invisible Unicode
run: |
WHITELIST=''
if [[ -z "$WHITELIST" ]]; then
PATTERN='(?!\x20)[\p{Cf}\p{Z}\p{M}]'
else
PATTERN="(?![\x20$WHITELIST])[\p{Cf}\p{Z}\p{M}]"
fi
grep -r -n -I --color=always --exclude-dir=.git -P "$PATTERN" . || exit 0
echo "::error::Remove zero-width/format, separator or combining-mark characters"
exit 1