diff --git a/README.md b/README.md index 57acfade9a..dc8ac5dc58 100644 --- a/README.md +++ b/README.md @@ -70,9 +70,14 @@ project.languages #=> { "Ruby" => 119387 } ### Command line usage +The `github-linguist` executable operates in two distinct modes: + +1. **[Git Repository mode](#git-repository)** - Analyzes an entire Git repository (when given a directory path or no path) +2. **[Single file mode](#single-file)** - Analyzes a specific file (when given a file path) + #### Git Repository -A repository's languages stats can also be assessed from the command line using the `github-linguist` executable. +A repository's languages stats can be assessed from the command line using the `github-linguist` executable. Without any options, `github-linguist` will output the language breakdown by percentage and file size. ```bash @@ -151,6 +156,51 @@ lib/linguist.rb … ``` +##### `--strategies` + +The `--strategies` or `-s` flag will show the language detection strategy used for each file. This is useful for understanding how Linguist determined the language of specific files. Note that unless the `--json` flag is specified, this flag will set the `--breakdown` flag implicitly. + +You can try running `github-linguist` on the root directory in this repository itself with the strategies flag: + +```console +$ github-linguist --breakdown --strategies +66.84% 264519 Ruby +24.68% 97685 C +6.57% 25999 Go +1.29% 5098 Lex +0.32% 1257 Shell +0.31% 1212 Dockerfile + +Ruby: + Gemfile [Filename] + Rakefile [Filename] + bin/git-linguist [Extension] + bin/github-linguist [Extension] + lib/linguist.rb [Extension] + … +``` + +If a file's language is affected by `.gitattributes`, the strategy will show the original detection method along with a note indicating whether the gitattributes setting changed the result or confirmed it. + +For instance, if you had the following .gitattributes overrides in your repo: + +```gitattributes + +*.ts linguist-language=JavaScript +*.js linguist-language=JavaScript + +``` + +the output of Linguist would be something like this: + +```console +100.00% 217 JavaScript + +JavaScript: + demo.ts [Heuristics (overridden by .gitattributes)] + demo.js [Extension (confirmed by .gitattributes)] +``` + ##### `--json` The `--json` or `-j` flag output the data into JSON format. @@ -168,6 +218,8 @@ $ github-linguist --breakdown --json ``` +NB. The `--strategies` flag has no effect, when the `--json` flag is present. + #### Single file Alternatively you can find stats for a single file using the `github-linguist` executable. @@ -182,6 +234,59 @@ grammars.yml: 884 lines (884 sloc) language: YAML ``` +#### Additional options + +##### `--breakdown` + +This flag has no effect in *Single file* mode. + +##### `--strategies` + +When using the `--strategies` or `-s` flag with a single file, you can see which detection method was used: + +```console +$ github-linguist --strategies lib/linguist.rb +lib/linguist.rb: 105 lines (96 sloc) + type: Text + mime type: application/x-ruby + language: Ruby + strategy: Extension +``` + +If a file's language is affected by `.gitattributes`, the strategy will show whether the gitattributes setting changed the result or confirmed it: + +In this fictitious example, it says "confirmed by .gitattributes" since the detection process (using the Filename strategy) would have given the same output as the override: +```console +.devcontainer/devcontainer.json: 27 lines (27 sloc) + type: Text + mime type: application/json + language: JSON with Comments + strategy: Filename (confirmed by .gitattributes) +``` + +In this other fictitious example, it says "overridden by .gitattributes" since the gitattributes setting changes the detected language to something different: + +```console +test.rb: 13 lines (11 sloc) + type: Text + mime type: application/x-ruby + language: Java + strategy: Extension (overridden by .gitattributes) +``` + +Here, the `.rb` file would normally be detected as Ruby by the Extension strategy, but `.gitattributes` overrides it to be detected as Java instead. + +##### `--json` + +Using the `--json` flag will give you the output for a single file in JSON format: + +```console +$ github-linguist --strategies --json lib/linguist.rb +{"lib/linguist.rb":{"lines":105,"sloc":96,"type":"Text","mime_type":"application/x-ruby","language":"Ruby","large":false,"generated":false,"vendored":false}} +``` + +NB. The `--strategies` has no effect, when the `--json` flag is present. + #### Docker If you have Docker installed you can either build or use diff --git a/lib/linguist/lazy_blob.rb b/lib/linguist/lazy_blob.rb index 3d60b60ea6..69c1126742 100644 --- a/lib/linguist/lazy_blob.rb +++ b/lib/linguist/lazy_blob.rb @@ -73,7 +73,25 @@ def language return @language if defined?(@language) @language = if lang = git_attributes['linguist-language'] - Language.find_by_alias(lang) + detected_language = Language.find_by_alias(lang) + + # If strategies are being tracked, get the original strategy that would have been used + if detected_language && Linguist.instrumenter + # Get the original strategy by calling super (which calls Linguist.detect) + original_language = super + original_strategy_info = Linguist.instrumenter.detected_info[self.name] + original_strategy = original_strategy_info ? original_strategy_info[:strategy] : "Unknown" + + if original_language == detected_language + strategy_name = "#{original_strategy} (confirmed by .gitattributes)" + else + strategy_name = "#{original_strategy} (overridden by .gitattributes)" + end + + strategy = Struct.new(:name).new(strategy_name) + Linguist.instrument("linguist.detected", blob: self, strategy: strategy, language: detected_language) + end + detected_language else super end diff --git a/test/test_basic_instrumenter.rb b/test/test_basic_instrumenter.rb index f85096bbdc..f80f81b3fe 100644 --- a/test/test_basic_instrumenter.rb +++ b/test/test_basic_instrumenter.rb @@ -81,4 +81,25 @@ def test_tracks_filename_strategy assert_equal "Filename", @instrumenter.detected_info[blob.name][:strategy] assert_equal "Dockerfile", @instrumenter.detected_info[blob.name][:language] end + + def test_tracks_override_strategy + # Simulate a blob with a gitattributes override + blob = Linguist::FileBlob.new("Gemfile", "") + # Simulate detection with gitattributes strategy showing the override + strategy = Struct.new(:name).new("Filename (overridden by .gitattributes)") + language = Struct.new(:name).new("Java") + @instrumenter.instrument("linguist.detected", blob: blob, strategy: strategy, language: language) {} + assert @instrumenter.detected_info.key?(blob.name) + assert_match(/overridden by \.gitattributes/, @instrumenter.detected_info[blob.name][:strategy]) + assert_equal "Java", @instrumenter.detected_info[blob.name][:language] + end +end + +def test_override_strategy_is_recorded + # This file is overridden by .gitattributes to be detectable and language Markdown + blob = sample_blob("Markdown/tender.md") + Linguist.detect(blob) + assert @instrumenter.detected_info.key?(blob.name) + assert_includes ["GitAttributes"], @instrumenter.detected_info[blob.name][:strategy] + assert_equal "Markdown", @instrumenter.detected_info[blob.name][:language] end diff --git a/test/test_cli_integration.rb b/test/test_cli_integration.rb new file mode 100644 index 0000000000..57f5fc5a20 --- /dev/null +++ b/test/test_cli_integration.rb @@ -0,0 +1,139 @@ +require_relative "./helper" +require 'tmpdir' +require 'fileutils' +require 'open3' + +class TestCLIIntegration < Minitest::Test + def setup + @temp_dir = Dir.mktmpdir('linguist_cli_test') + @original_dir = Dir.pwd + Dir.chdir(@temp_dir) + + # Initialize a git repository + system("git init --quiet") + system("git config user.name 'Test User'") + system("git config user.email 'test@example.com'") + end + + def teardown + Dir.chdir(@original_dir) + FileUtils.rm_rf(@temp_dir) + end + + def test_strategies_flag_with_gitattributes_override + # Create a .gitattributes file that overrides language detection + File.write('.gitattributes', "*.special linguist-language=Ruby\n") + + # Create a test file with a non-Ruby extension but Ruby content + File.write('test.special', "puts 'Hello, World!'\n") + + # Stage and commit the files + system("git add .") + system("git commit -m 'Initial commit' --quiet") + + # Run github-linguist with --strategies flag from the original directory but pointing to our test file + stdout, stderr, status = Open3.capture3( + "bundle", "exec", "github-linguist", File.join(@temp_dir, "test.special"), "--strategies", + chdir: @original_dir + ) + + assert status.success?, "CLI command failed: #{stderr}" + assert_match(/language:\s+Ruby/, stdout, "Should detect Ruby language") + assert_match(/strategy:\s+.*\(overridden by \.gitattributes\)/, stdout, "Should show override in strategy") + end + + def test_strategies_flag_with_normal_detection + # Create a normal Ruby file + File.write('test.rb', "puts 'Hello, World!'\n") + + # Stage and commit the file + system("git add .") + system("git commit -m 'Initial commit' --quiet") + + # Run github-linguist with --strategies flag + stdout, stderr, status = Open3.capture3( + "bundle", "exec", "github-linguist", File.join(@temp_dir, "test.rb"), "--strategies", + chdir: @original_dir + ) + + assert status.success?, "CLI command failed: #{stderr}" + assert_match(/language:\s+Ruby/, stdout, "Should detect Ruby language") + assert_match(/strategy:\s+Extension/, stdout, "Should show Extension strategy") + end + + def test_breakdown_with_gitattributes_strategies + # Create multiple files with different detection methods + File.write('.gitattributes', "*.special linguist-language=JavaScript\n") + File.write('override.special', "console.log('overridden');\n") + File.write('normal.js', "console.log('normal');\n") + File.write('Dockerfile', "FROM ubuntu\n") + + # Stage and commit the files + system("git add .") + system("git commit -m 'Initial commit' --quiet") + + # Run github-linguist with --breakdown --strategies flags on the test repository + stdout, stderr, status = Open3.capture3( + "bundle", "exec", "github-linguist", @temp_dir, "--breakdown", "--strategies", + chdir: @original_dir + ) + + assert status.success?, "CLI command failed: #{stderr}" + + # Check that GitAttributes strategy appears for the overridden file + assert_match(/override\.special \[.* \(overridden by \.gitattributes\)\]/, stdout, "Should show override for overridden file") + + # Check that normal detection strategies appear for other files + assert_match(/normal\.js \[Extension\]/, stdout, "Should show Extension strategy for .js file") + assert_match(/Dockerfile \[Filename\]/, stdout, "Should show Filename strategy for Dockerfile") + end + + def test_json_output_preserves_functionality + # Create a simple test file + File.write('test.rb', "puts 'Hello, World!'\n") + + # Stage and commit the file + system("git add .") + system("git commit -m 'Initial commit' --quiet") + + # Run github-linguist with --json flag + stdout, stderr, status = Open3.capture3( + "bundle", "exec", "github-linguist", File.join(@temp_dir, "test.rb"), "--json", + chdir: @original_dir + ) + + assert status.success?, "CLI command failed: #{stderr}" + + # Parse JSON output + require 'json' + result = JSON.parse(stdout) + + test_file_key = File.join(@temp_dir, "test.rb") + assert_equal "Ruby", result[test_file_key]["language"], "JSON output should contain correct language" + assert_equal "Text", result[test_file_key]["type"], "JSON output should contain correct type" + end + + def test_repository_scan_with_gitattributes + # Create a more complex repository structure + FileUtils.mkdir_p('src') + File.write('.gitattributes', "*.config linguist-language=JavaScript\n") + File.write('src/app.rb', "class App\nend\n") + File.write('config.config', "var x = 1;\n") + + # Stage and commit the files + system("git add .") + system("git commit -m 'Initial commit' --quiet") + + # Run github-linguist on the test repository + stdout, stderr, status = Open3.capture3( + "bundle", "exec", "github-linguist", @temp_dir, "--breakdown", "--strategies", + chdir: @original_dir + ) + + assert status.success?, "CLI command failed: #{stderr}" + + # Verify that both normal and override detection work in repository scan + assert_match(/src\/app\.rb \[Extension\]/, stdout, "Should show Extension strategy for Ruby file") + assert_match(/config\.config \[.* \(overridden by \.gitattributes\)\]/, stdout, "Should show override for overridden file") + end +end