Skip to content

Check Similarity

AI Assistant Guide for detecting duplicate TypeScript/JavaScript code using AST comparison.

Purpose

Detects duplicate TypeScript/JavaScript code using AST comparison for refactoring.

Markdown

md
# Check Similarity

AI Assistant Guide for detecting duplicate TypeScript/JavaScript code using AST comparison.

## Purpose

Detects duplicate TypeScript/JavaScript code using AST comparison for refactoring.

## AI Prompt

### 日本語
```
`similarity-ts .` でコードの意味的な類似が得られます。あなたはこれを実行し、ソースコードの重複を検知して、リファクタリング計画を立てます。細かいオプションは similarity-ts -h で確認してください。
```

### English
```
Run `similarity-ts .` to detect semantic code similarities. Execute this command, analyze the duplicate code patterns, and create a refactoring plan. Check `similarity-ts -h` for detailed options.
```

## Installation

```bash
cargo install similarity-ts
# check options
similarity-ts --help
```

## Key Options

- `--threshold <0-1>`: Similarity threshold (default: 0.8)
- `--min-tokens <n>`: Skip functions with <n AST nodes (recommended: 20-30)
- `--print`: Show actual code snippets
- `--experimental-types`: Enable type similarity detection (interfaces, type aliases)
- `--filter-function <name>`: Filter results by function name (substring match)
- `--filter-function-body <text>`: Filter results by function body content (substring match)

## AI Refactoring Workflow

### 1. Broad Scan

Find all duplicates in codebase:

```bash
similarity-ts src/ --threshold 0.85 --min-tokens 25
```

### 2. Focused Analysis

Examine specific file pairs:

```bash
similarity-ts file1.ts file2.ts --threshold 0.8 --min-tokens 20 --print
```

### 3. Threshold Tuning

If no results, progressively lower:

```bash
similarity-ts file1.ts file2.ts --threshold 0.75 --min-tokens 20
similarity-ts file1.ts file2.ts --threshold 0.7 --min-tokens 20
```

## Output Format

```
Function: functionName (file.ts:startLine-endLine)
Similar to: otherFunction (other.ts:startLine-endLine)
Similarity: 85%
```

## Effective Thresholds

- `0.95+`: Nearly identical (variable renames only)
- `0.85-0.95`: Same algorithm, minor differences
- `0.75-0.85`: Similar structure, different details
- `0.7-0.75`: Related logic, worth investigating

## Refactoring Strategy

1. **Start with high threshold** (0.9) to find obvious duplicates
2. **Compare specific pairs** when similarity found
3. **Use --print** to see actual code differences
4. **Extract common logic** into shared functions/modules
5. **Re-run after refactoring** to verify no new duplicates

## Common Patterns to Refactor

- **Data processing loops** with different field names
- **API handlers** with similar request/response logic
- **Validation functions** with different rules
- **State management** with repeated patterns

## Best Practices

- Use `--min-tokens` for accurate complexity filtering (20-30 tokens)
- Focus on files with 80%+ similarity first
- Check if similar functions are in same module (easier to refactor)
- Consider function size - larger duplicates have more impact
- Look for patterns across multiple files, not just pairs