anti-trojan-source

Detect trojan source attacks that employ unicode bidi attacks to inject malicious code

About

Detects cases of trojan source attacks that employ unicode bidi attacks to inject malicious code, as well as other attacks that use confusable characters (such as glassworm attacks). The tool uses both an explicit list of dangerous Unicode characters and category-based detection to catch invisible characters by their Unicode category (Format and Control categories).

anti-trojan-source-example.mov

If you're using ESLint:

See: eslint-plugin-anti-trojan-source for a purpose-bulit plugin to detect anti-trojan characters.
This plugin inspired work to create an anti-trojan rule detect-bidi-characters in eslint-plugin-security and if you're already using that security plugin then it is advised to turn on that rule.

Detection Capabilities

anti-trojan-source provides comprehensive protection by detecting:

277 explicit confusable characters including bidirectional Unicode, zero-width characters, variation selectors, and more
All Unicode Format characters (Cf category) - catches invisible formatting characters by category
All Unicode Control characters (Cc category) - except commonly-used whitespace (TAB, LF, CR)
Extended Variation Selectors (U+E0100 to U+E01EF) - 240 additional characters

This category-based approach makes the detection future-proof against new Unicode characters that may be added to dangerous categories.

Invisible Characters Support Matrix

The following table lists the various types of invisible character format that may be used in malicious attacks that anti-trojan-source is capable of detecting:

Attack Type	Supported	Description
Trojan Source	✅	Using bidirectional Unicode characters to create code that appears different from what the compiler executes. More details at trojansource.codes.
Glassworm	✅	Using confusable characters (homoglyphs) to create misleading identifiers or string literals, which can lead to vulnerabilities.
Extended Variation Selectors	✅	240 additional variation selectors (U+E0100-U+E01EF) that can alter character appearance invisibly.
Category-Based Detection	✅	Detects ALL Unicode Format (Cf) and Control (Cc) characters by category, making detection future-proof.

Why is Confusable Unicode Character detection important?

The following publication on the topic of unicode characters attacks, dubbed Trojan Source: Invisible Vulnerabilities, has caused a lot of concern from potential supply chain attacks where adversaries are able to inject malicious code into the source code of a project, slipping by unseen in the code review process. This project expands on that to detect other forms of confusable characters that can be used in similar attacks.

For more information on the topic, you're welcome to read on the official website trojansource.codes and the following source code repository which contains the source code of the publication.

Table of Contents

About
Use as a CLI
Use as an eslint plugin
Use as a library
- Simple boolean check
- Detailed findings
Use as a pre-commit hook
Contributing
Author

Use as a CLI

anti-trojan-source is an npm package that supports detecting files that contain confusable unicode characters in them, per the research.

Detect confusable characters using file globbing

The following command will detect all files that contain confusable unicode characters in them based on the file matching pattern that was provided to it:

npx anti-trojan-source --files='src/**/*.js'

If it doesn't find anything it will return with a 0 exit code and print to stdout:

[✓] No confusable characters detected

Detect confusable characters using file paths

npx anti-trojan-source '/src/index.js' '/src/helper.js'

If it found any matching confusable unicode characters, it will return with an exit code of 1 and print to stderr:

[x] Detected cases of confusable characters in the following files:
|
 - /src/index.js
 - /src/helper.js
Note: For backward compatibility, `hasTrojanSource({...})` is still exported as an alias to `hasConfusables({...})`. It is deprecated and will be removed in a future major version. Prefer `hasConfusables` going forward.

Detect confusable characters by piping input

If you just run npx anti-trojan-source and pipe in a file contents, it will detect the confusable unicode characters in that file:

cat /src/index.js | npx anti-trojan-source

Verbose output mode

Use the --verbose (or -v) flag to get detailed information about each detected character, including line and column numbers, character names, and Unicode code points:

npx anti-trojan-source --files='src/**/*.js' --verbose

Example output:

[x] Detected cases of trojan source in the following files:
| 
 - src/utils.js

   Line 12:34 - U+200B ZERO WIDTH SPACE [Cf (Format)]
   Snippet: const value = getUserInput()
   Line 45:10 - U+202E RIGHT-TO-LEFT OVERRIDE [Cf (Format)]
   Snippet: if (isAdmin) { // Check permissions

This mode is particularly useful for:

Code reviews: Quickly identify where invisible characters are located
Debugging: Understand which specific characters are causing issues
Security audits: Get detailed reports of all suspicious characters

JSON output mode

Use the --json (or -j) flag to get machine-readable JSON output, perfect for CI/CD integration and automated processing:

npx anti-trojan-source --files='src/**/*.js' --json

Example output:

[
  {
    "file": "src/utils.js",
    "findings": [
      {
        "line": 12,
        "column": 34,
        "codePoint": "U+200B",
        "name": "ZERO WIDTH SPACE",
        "category": "Cf (Format)",
        "snippet": "const value = getUserInput()"
      }
    ]
  }
]

This mode enables:

CI/CD integration: Parse results programmatically in your pipeline
Custom reporting: Build your own reporting tools on top of the detection
Automated workflows: Trigger specific actions based on findings

Use as an eslint plugin

Refer to the ESLint Plugin for this CLI and the README on that repository which clearly explains how to set it up: eslint-plugin-anti-trojan-source.

Use as a library

Simple boolean check

To use it as a library and pass it file contents to detect (backward compatible):

import { hasConfusables } from 'anti-trojan-source'

const isDangerous = hasConfusables({
  sourceText: 'if (accessLevel != "user‮ ⁦// Check if admin⁩ ⁦") {'
})

console.log(isDangerous) // true or false

hasConfusables returns a boolean when called without the detailed option.

Detailed findings

Get comprehensive information about detected characters including their location, names, and categories:

import { hasConfusables } from 'anti-trojan-source'

const findings = hasConfusables({
  sourceText: 'const value\u200b = 123', // Contains ZERO WIDTH SPACE
  detailed: true
})

console.log(findings)
// [
//   {
//     line: 1,
//     column: 12,
//     codePoint: "U+200B",
//     name: "ZERO WIDTH SPACE",
//     category: "Cf (Format)",
//     snippet: "const value = 123"
//   }
// ]

Each finding includes:

line: Line number where the character was found
column: Column number where the character was found
codePoint: Unicode code point (e.g., "U+200B")
name: Descriptive name of the character
category: Unicode category or classification
snippet: Context from the line (up to 80 characters)

You can also check multiple files at once:

import { hasConfusablesInFiles } from 'anti-trojan-source'

const results = hasConfusablesInFiles({
  filePaths: ['src/index.js', 'src/utils.js'],
  detailed: true // Optional: get detailed findings
})

console.log(results)
// [
//   {
//     file: "src/index.js",
//     findings: [ /* array of findings */ ]
//   }
// ]

Use as a pre-commit hook

To add this tool to your project as a pre-commit hook, try this sample configuration in .pre-commit-config.yaml:

repos:
  - repo: https://github.com/lirantal/anti-trojan-source
    rev: v1.3.3  # choose the release you want
    hooks:
      - id: anti-trojan-source

Contributing

Please consult CONTRIBUTING for guidelines on contributing to this project.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.github		.github
.husky		.husky
__tests__		__tests__
bin		bin
cjs		cjs
docs		docs
src		src
.gitignore		.gitignore
.pre-commit-hooks.yaml		.pre-commit-hooks.yaml
.prettierignore		.prettierignore
.prettierrc.json		.prettierrc.json
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
package-lock.json		package-lock.json
package.json		package.json
rollup.config.mjs		rollup.config.mjs

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

anti-trojan-source

About

Detection Capabilities

Invisible Characters Support Matrix

Why is Confusable Unicode Character detection important?

Use as a CLI

Detect confusable characters using file globbing

Detect confusable characters using file paths

Detect confusable characters by piping input

Verbose output mode

JSON output mode

Use as an eslint plugin

Use as a library

Simple boolean check

Detailed findings

Use as a pre-commit hook

Contributing

Author

About

Uh oh!

Releases 19

Uh oh!

Contributors 3

Languages

License

lirantal/anti-trojan-source

Folders and files

Latest commit

History

Repository files navigation

anti-trojan-source

About

Detection Capabilities

Invisible Characters Support Matrix

Why is Confusable Unicode Character detection important?

Use as a CLI

Detect confusable characters using file globbing

Detect confusable characters using file paths

Detect confusable characters by piping input

Verbose output mode

JSON output mode

Use as an eslint plugin

Use as a library

Simple boolean check

Detailed findings

Use as a pre-commit hook

Contributing

Author

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 19

Uh oh!

Contributors 3

Languages