Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
107 changes: 106 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,14 @@ project.languages #=> { "Ruby" => 119387 }

### Command line usage

The `github-linguist` executable operates in two distinct modes:

1. **[Git Repository mode](#git-repository)** - Analyzes an entire Git repository (when given a directory path or no path)
2. **[Single file mode](#single-file)** - Analyzes a specific file (when given a file path)

#### Git Repository

A repository's languages stats can also be assessed from the command line using the `github-linguist` executable.
A repository's languages stats can be assessed from the command line using the `github-linguist` executable.
Without any options, `github-linguist` will output the language breakdown by percentage and file size.

```bash
Expand Down Expand Up @@ -151,6 +156,51 @@ lib/linguist.rb
```

##### `--strategies`

The `--strategies` or `-s` flag will show the language detection strategy used for each file. This is useful for understanding how Linguist determined the language of specific files. Note that unless the `--json` flag is specified, this flag will set the `--breakdown` flag implicitly.

You can try running `github-linguist` on the root directory in this repository itself with the strategies flag:

```console
$ github-linguist --breakdown --strategies
66.84% 264519 Ruby
24.68% 97685 C
6.57% 25999 Go
1.29% 5098 Lex
0.32% 1257 Shell
0.31% 1212 Dockerfile

Ruby:
Gemfile [Filename]
Rakefile [Filename]
bin/git-linguist [Extension]
bin/github-linguist [Extension]
lib/linguist.rb [Extension]
```

If a file's language is affected by `.gitattributes`, the strategy will show the original detection method along with a note indicating whether the gitattributes setting changed the result or confirmed it.

For instance, if you had the following .gitattributes overrides in your repo:

```gitattributes

*.ts linguist-language=JavaScript
*.js linguist-language=JavaScript

```

the output of Linguist would be something like this:

```console
100.00% 217 JavaScript

JavaScript:
demo.ts [Heuristics (overridden by .gitattributes)]
demo.js [Extension (confirmed by .gitattributes)]
```

##### `--json`

The `--json` or `-j` flag output the data into JSON format.
Expand All @@ -168,6 +218,8 @@ $ github-linguist --breakdown --json

```

NB. The `--strategies` flag has no effect, when the `--json` flag is present.

#### Single file

Alternatively you can find stats for a single file using the `github-linguist` executable.
Expand All @@ -182,6 +234,59 @@ grammars.yml: 884 lines (884 sloc)
language: YAML
```

#### Additional options

##### `--breakdown`

This flag has no effect in *Single file* mode.

##### `--strategies`

When using the `--strategies` or `-s` flag with a single file, you can see which detection method was used:

```console
$ github-linguist --strategies lib/linguist.rb
lib/linguist.rb: 105 lines (96 sloc)
type: Text
mime type: application/x-ruby
language: Ruby
strategy: Extension
```

If a file's language is affected by `.gitattributes`, the strategy will show whether the gitattributes setting changed the result or confirmed it:

In this fictitious example, it says "confirmed by .gitattributes" since the detection process (using the Filename strategy) would have given the same output as the override:
```console
.devcontainer/devcontainer.json: 27 lines (27 sloc)
type: Text
mime type: application/json
language: JSON with Comments
strategy: Filename (confirmed by .gitattributes)
```

In this other fictitious example, it says "overridden by .gitattributes" since the gitattributes setting changes the detected language to something different:

```console
test.rb: 13 lines (11 sloc)
type: Text
mime type: application/x-ruby
language: Java
strategy: Extension (overridden by .gitattributes)
```

Here, the `.rb` file would normally be detected as Ruby by the Extension strategy, but `.gitattributes` overrides it to be detected as Java instead.

##### `--json`

Using the `--json` flag will give you the output for a single file in JSON format:

```console
$ github-linguist --strategies --json lib/linguist.rb
{"lib/linguist.rb":{"lines":105,"sloc":96,"type":"Text","mime_type":"application/x-ruby","language":"Ruby","large":false,"generated":false,"vendored":false}}
```

NB. The `--strategies` has no effect, when the `--json` flag is present.

#### Docker

If you have Docker installed you can either build or use
Expand Down
20 changes: 19 additions & 1 deletion lib/linguist/lazy_blob.rb
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,25 @@ def language
return @language if defined?(@language)

@language = if lang = git_attributes['linguist-language']
Language.find_by_alias(lang)
detected_language = Language.find_by_alias(lang)

# If strategies are being tracked, get the original strategy that would have been used
if detected_language && Linguist.instrumenter
# Get the original strategy by calling super (which calls Linguist.detect)
original_language = super
original_strategy_info = Linguist.instrumenter.detected_info[self.name]
original_strategy = original_strategy_info ? original_strategy_info[:strategy] : "Unknown"

if original_language == detected_language
strategy_name = "#{original_strategy} (confirmed by .gitattributes)"
else
strategy_name = "#{original_strategy} (overridden by .gitattributes)"
end

strategy = Struct.new(:name).new(strategy_name)
Linguist.instrument("linguist.detected", blob: self, strategy: strategy, language: detected_language)
end
detected_language
else
super
end
Expand Down
21 changes: 21 additions & 0 deletions test/test_basic_instrumenter.rb
Original file line number Diff line number Diff line change
Expand Up @@ -81,4 +81,25 @@ def test_tracks_filename_strategy
assert_equal "Filename", @instrumenter.detected_info[blob.name][:strategy]
assert_equal "Dockerfile", @instrumenter.detected_info[blob.name][:language]
end

def test_tracks_override_strategy
# Simulate a blob with a gitattributes override
blob = Linguist::FileBlob.new("Gemfile", "")
# Simulate detection with gitattributes strategy showing the override
strategy = Struct.new(:name).new("Filename (overridden by .gitattributes)")
language = Struct.new(:name).new("Java")
@instrumenter.instrument("linguist.detected", blob: blob, strategy: strategy, language: language) {}
assert @instrumenter.detected_info.key?(blob.name)
assert_match(/overridden by \.gitattributes/, @instrumenter.detected_info[blob.name][:strategy])
assert_equal "Java", @instrumenter.detected_info[blob.name][:language]
end
end

def test_override_strategy_is_recorded
# This file is overridden by .gitattributes to be detectable and language Markdown
blob = sample_blob("Markdown/tender.md")
Linguist.detect(blob)
assert @instrumenter.detected_info.key?(blob.name)
assert_includes ["GitAttributes"], @instrumenter.detected_info[blob.name][:strategy]
assert_equal "Markdown", @instrumenter.detected_info[blob.name][:language]
end
139 changes: 139 additions & 0 deletions test/test_cli_integration.rb
Original file line number Diff line number Diff line change
@@ -0,0 +1,139 @@
require_relative "./helper"
require 'tmpdir'
require 'fileutils'
require 'open3'

class TestCLIIntegration < Minitest::Test
def setup
@temp_dir = Dir.mktmpdir('linguist_cli_test')
@original_dir = Dir.pwd
Dir.chdir(@temp_dir)

# Initialize a git repository
system("git init --quiet")
system("git config user.name 'Test User'")
system("git config user.email 'test@example.com'")
end

def teardown
Dir.chdir(@original_dir)
FileUtils.rm_rf(@temp_dir)
end

def test_strategies_flag_with_gitattributes_override
# Create a .gitattributes file that overrides language detection
File.write('.gitattributes', "*.special linguist-language=Ruby\n")

# Create a test file with a non-Ruby extension but Ruby content
File.write('test.special', "puts 'Hello, World!'\n")

# Stage and commit the files
system("git add .")
system("git commit -m 'Initial commit' --quiet")

# Run github-linguist with --strategies flag from the original directory but pointing to our test file
stdout, stderr, status = Open3.capture3(
"bundle", "exec", "github-linguist", File.join(@temp_dir, "test.special"), "--strategies",
chdir: @original_dir
)

assert status.success?, "CLI command failed: #{stderr}"
assert_match(/language:\s+Ruby/, stdout, "Should detect Ruby language")
assert_match(/strategy:\s+.*\(overridden by \.gitattributes\)/, stdout, "Should show override in strategy")
end

def test_strategies_flag_with_normal_detection
# Create a normal Ruby file
File.write('test.rb', "puts 'Hello, World!'\n")

# Stage and commit the file
system("git add .")
system("git commit -m 'Initial commit' --quiet")

# Run github-linguist with --strategies flag
stdout, stderr, status = Open3.capture3(
"bundle", "exec", "github-linguist", File.join(@temp_dir, "test.rb"), "--strategies",
chdir: @original_dir
)

assert status.success?, "CLI command failed: #{stderr}"
assert_match(/language:\s+Ruby/, stdout, "Should detect Ruby language")
assert_match(/strategy:\s+Extension/, stdout, "Should show Extension strategy")
end

def test_breakdown_with_gitattributes_strategies
# Create multiple files with different detection methods
File.write('.gitattributes', "*.special linguist-language=JavaScript\n")
File.write('override.special', "console.log('overridden');\n")
File.write('normal.js', "console.log('normal');\n")
File.write('Dockerfile', "FROM ubuntu\n")

# Stage and commit the files
system("git add .")
system("git commit -m 'Initial commit' --quiet")

# Run github-linguist with --breakdown --strategies flags on the test repository
stdout, stderr, status = Open3.capture3(
"bundle", "exec", "github-linguist", @temp_dir, "--breakdown", "--strategies",
chdir: @original_dir
)

assert status.success?, "CLI command failed: #{stderr}"

# Check that GitAttributes strategy appears for the overridden file
assert_match(/override\.special \[.* \(overridden by \.gitattributes\)\]/, stdout, "Should show override for overridden file")

# Check that normal detection strategies appear for other files
assert_match(/normal\.js \[Extension\]/, stdout, "Should show Extension strategy for .js file")
assert_match(/Dockerfile \[Filename\]/, stdout, "Should show Filename strategy for Dockerfile")
end

def test_json_output_preserves_functionality
# Create a simple test file
File.write('test.rb', "puts 'Hello, World!'\n")

# Stage and commit the file
system("git add .")
system("git commit -m 'Initial commit' --quiet")

# Run github-linguist with --json flag
stdout, stderr, status = Open3.capture3(
"bundle", "exec", "github-linguist", File.join(@temp_dir, "test.rb"), "--json",
chdir: @original_dir
)

assert status.success?, "CLI command failed: #{stderr}"

# Parse JSON output
require 'json'
result = JSON.parse(stdout)

test_file_key = File.join(@temp_dir, "test.rb")
assert_equal "Ruby", result[test_file_key]["language"], "JSON output should contain correct language"
assert_equal "Text", result[test_file_key]["type"], "JSON output should contain correct type"
end

def test_repository_scan_with_gitattributes
# Create a more complex repository structure
FileUtils.mkdir_p('src')
File.write('.gitattributes', "*.config linguist-language=JavaScript\n")
File.write('src/app.rb', "class App\nend\n")
File.write('config.config', "var x = 1;\n")

# Stage and commit the files
system("git add .")
system("git commit -m 'Initial commit' --quiet")

# Run github-linguist on the test repository
stdout, stderr, status = Open3.capture3(
"bundle", "exec", "github-linguist", @temp_dir, "--breakdown", "--strategies",
chdir: @original_dir
)

assert status.success?, "CLI command failed: #{stderr}"

# Verify that both normal and override detection work in repository scan
assert_match(/src\/app\.rb \[Extension\]/, stdout, "Should show Extension strategy for Ruby file")
assert_match(/config\.config \[.* \(overridden by \.gitattributes\)\]/, stdout, "Should show override for overridden file")
end
end