licensedog
Описание
Языки
- Python95,2%
- Shell1,8%
- PowerShell1,7%
- Batchfile1,3%
LicenseDog

Find licenses for third‑party packages and write the results to OUT.csv (or JSONL). Designed for fast, reliable compliance inventories with robust caching and an optional Textual TUI.
What is this and why was it created?
When you ship software, you need an auditable list of licenses for your dependencies. licensedog reads a simple IN.csv of package identifiers and resolves each package's license by probing canonical repository license files and trusted metadata. It outputs a machine‑readable table you can import into spreadsheets or pipelines, and it caches aggressively so repeated runs are fast and deterministic.
Supported package ecosystems:
- 🐹 Go modules (20+ domain patterns: github.com, gopkg.in, google.golang.org, filippo.io, dario.cat, sigs.k8s.io, and more)
- 📦 npm packages (registry API, tarball inspection, Code tab web scraping)
- 🐍 Python packages (PyPI JSON API, web scraping fallback)
- ⚙️ GitHub Actions (actions/checkout@v4, etc.)
Input Format Example
Here's what a typical looks like:
github.com/lufia/plan9stats@v0.0.0-20211012122336-39d0f177ccd0
react-transition-group@4.4.2
pyyaml@6.0.2
Such files can be generated by CycloneDX or similar SBOM tools. Important: There's no guarantee that package identifiers from SBOM actually exist or that their license information reflects the current state in the ever-changing Internet.
Resolution Priority Order
licensedog uses a very simple strategy: the more specialized version wins. All else being equal, what can be linked to wins. The last desperate measure is to download the source or link to a main/master/canary branch in Git, if nothing else is available.
npm packages:
- 🎯 Version-specific Git refs (tags/branches like
,v2.2.1)2.2.1 - 📋 npm registry metadata (API
field)license - 🌐 npm Code tab scraping (web UI, unless disabled with
)--noweb - 📦 npm tarball inspection (download & extract package)
- 🔄 PyPI fallback (for misclassified packages)
- 🌿 Generic Git branches (master/main - last resort)
Go modules:
- 🎯 Version-specific Git refs (tags/branches)
- 📋 pkg.go.dev metadata/scraping (web check)
- 📦 Go module archive (download from proxy.golang.org)
- 🌿 Generic Git branches (master/main - last resort)
Python packages:
- 🎯 Version-specific Git refs (tags/branches)
- 📋 PyPI JSON API (metadata
field and classifiers)license - 🎯 GitHub version-specific refs (extracted from PyPI metadata)
- 🌐 PyPI web scraping (page sidebar, unless disabled with
)--noweb - 📦 PyPI sdist inspection (download & extract source distribution)
- 🌿 Generic Git branches (master/main - last resort)
Key principle: Version-specific sources (git tags or package archives) are always preferred over generic / branches, which may contain different code than the released version. Web-based checks (APIs and scraping) are tried before downloading full archives to minimize bandwidth usage.
Installation
🚀 Standard Installation (Recommended)
For development environments with full features including web scraping support.
What you get:
- ✅ Full support for all package types (Go, npm, Python, GitHub Actions)
- ✅ Web scraping for PyPI pages and npm Code tab (resolves hard-to-find licenses)
- ✅ Fast HTTP with retries and connection pooling (requests)
- ✅ Fast SPDX fuzzy matching (rapidfuzz, 10-100x faster)
- ✅ Rich terminal UI with progress bars (textual)
🔒 High-Security Installation (Zero Dependencies)
For restricted environments where browser automation is prohibited.
What you get:
- ✅ Full support for Go modules and GitHub Actions
- ✅ Basic npm support (registry API only, no Code tab scraping)
- ✅ Basic Python support (PyPI JSON API only, no web scraping)
- ❌ No web scraping (some packages may not resolve)
- ⚠️ Slower HTTP (urllib instead of requests with retries)
- ⚠️ Slower SPDX matching (difflib instead of rapidfuzz, 10-100x slower)
- ⚠️ Plain text output only (no TUI)
Key point: The flag is required to ensure no web scraping attempts are made.
🎛️ Installation Options & Feature Groups
All dependencies are optional with graceful fallbacks. You can choose which features to install:
Using uv (recommended)
Using pip (traditional)
Feature Groups Explained
| Group | Dependencies | Purpose | When to Use |
|---|---|---|---|
| requests≥2.31.0 | Fast HTTP with retries, session reuse | Always recommended |
| rapidfuzz≥3.5.0 | Fast fuzzy SPDX matching (10-100x faster) | Always recommended |
| textual≥0.47.0 | Rich terminal UI with progress bars | Interactive use |
| playwright≥1.40.0 | Web scraping for PyPI/npm Code tab | When hard-to-resolve packages present |
| All of the above | Full features | Development, best results |
Fallback behavior:
- No
: Uses urllib (slower, no retry logic)http - No
: Uses difflib (10-100x slower fuzzy matching)spdx - No
: Plain text progress logs onlytui - No
: API-only mode (some packages may not resolve)web
General Usage
Quick Start
Simplest way (3 steps):
- Put your
file in the licensedog directoryIN.csv - Run
(Linux/macOS) or./licensedog(Windows PowerShell) or.\licensedog.ps1(Windows CMD)licensedog.bat - Your results are in OUT.csv
By default, licensedog displays a pseudographical text UI (TUI) in the console with progress bars and real-time updates.
With custom paths:
Wrapper script modes:
UI Mode Control (mutually exclusive):
Direct Python invocation:
IN.csv Format
One package identifier per line (no header). Duplicates are automatically de‑duplicated.
Supported version separators:
- Standard
separator:@webpack@5.95.0 - CycloneDX-style
separator::(auto-normalized towebpack:5.95.0)@
Examples (see and ):
github.com/lufia/plan9stats@v0.0.0-20211012122336-39d0f177ccd0
google.golang.org/appengine@v1.6.8
gopkg.in/yaml.v3@v3.0.1
sigs.k8s.io/yaml@v1.4.0
filippo.io/edwards25519@v1.1.0
react-transition-group@4.4.2
@next/swc-linux-x64-gnu@14.2.24
pyyaml@6.0.2
pathspec@0.12.1
actions/checkout@v4
actions/setup-go@v5
OUT.csv Format
Columns written by default ():
| Column | Description |
|---|---|
| Input package id as emitted |
| Detected SPDX id (e.g., , ), empty if unknown |
| Canonical URL where the license text was found |
| or |
| Origin: , , , , , , etc. |
| Repository/ref hint used (when applicable) |
| README URL that mentioned license (when applicable) |
| Human‑readable steps taken to resolve |
| Reserved; currently empty |
| / heuristic flag |
| / heuristic flag |
| / Excel-parseable flag for namespace clashes |
| Namespace clash detection details (npm vs PyPI) |
To emit line‑delimited JSON instead:
How License Resolution Works
licensedog follows a conservative, explainable pipeline for each package:
1. Package Classification
Automatic ecosystem detection using pattern matching:
-
Go modules: 20+ domain patterns
,github.com/*,gitlab.com/*bitbucket.org/*,google.golang.org/*golang.org/*,gopkg.in/*,go.uber.org/*go.etcd.io/*,filippo.io/*,dario.cat/*sigs.k8s.io/*- And more...
-
GitHub Actions: Pattern
(no dots in org/repo)org/repo@version- Examples:
,actions/checkout@v4actions/setup-go@v5
- Examples:
-
Python packages: Conservative detection for packages with Python keywords
- Examples:
,pyyaml@6.0.2pathspec@0.12.1
- Examples:
-
npm packages: Default for packages not matching above patterns
- Examples:
,react@18.2.0@babel/core@7.22.0
- Examples:
2. Repository Inference
- Go modules: Map to GitHub org/repo using hardcoded rules or go-get meta tags
- Special cases:
→gopkg.in/yaml.v2,go-yaml/yaml→google.golang.org/appenginegolang/appengine
- Special cases:
- npm packages: Query npm registry metadata for repository URL
- Python packages: Query PyPI JSON API for package metadata
- GitHub Actions: Direct mapping to github.com/org/repo
3. Direct License Probing
Try LICENSE files at version-specific refs (tags/branches), then fallback to main/master:
- Files checked:
,LICENSE,LICENSE.md,license,license.md,COPYING, etc.COPYING.md - Uses
for fast direct accessraw.githubusercontent.com - Heuristics detect common licenses (MIT, Apache-2.0, BSD, ISC, MPL)
- Fuzzy SPDX matching against SPDX corpus (configurable threshold via
)--min-spdx-score
4. Policy Flags & Clash Detection
: Inferred from SPDX patterns (is_copyleft,GPL-,AGPL-,LGPL-,MPL)CDDL: Set for licenses requiring attribution (MIT, BSD, Apache, MPL, GPL family, ISC)requires_notice: Boolean flag (clash_exists/TRUE) indicating namespace conflicts - Excel will parse this as a boolean typeFALSE: Detailed description when a package with the same name@version exists on both npm and PyPI (ambiguous packages default to npm)clashes
Why only npm and PyPI packages can clash:
Go packages are never checked for clashes because their format makes them unambiguous:
- Go packages always include the full import path:
,github.com/gorilla/mux@v1.8.0gopkg.in/yaml.v3@v3.0.1 - npm/PyPI packages use simple names:
,webpack@5.95.0requests@2.31.0
The presence of domain paths (, , etc.) makes Go packages instantly recognizable and impossible to confuse with npm or PyPI packages. Only packages with simple names (no domain prefix) are checked for namespace conflicts between the npm and PyPI registries.
5. Caching & Deduplication
- SQLite cache
stores HTTP metadata and resolved resultslicense_cache.sqlite3 - SPDX license index cached under
directoryspdx/ - Single-flight behavior: each unique package resolved once per run
- Conditional GET using ETag/Last-Modified minimizes network traffic
All resolution steps are recorded in for auditability.
Advanced Usage & Parameters
Core Options
Wrapper Script Options
The wrapper scripts (, , ) support special flags:
How --nouv works:
- Without --nouv: Wrapper checks for uv and runs uv run licensedog.py
- With --nouv: Wrapper runs
directly (no uv required)python3 licensedog.py - With --nouv --noweb: High-security mode with zero dependencies
When to use --nouv:
- uv is not installed and you don't want to install it
- Restricted environments where uv installation is not allowed
- You prefer pip for dependency management
- Testing with specific Python environments
Web Scraping Control
Network & Performance
License Detection
UI & Logging
Caching
Targeted Updates
Examples
Standard Installation Examples
High-Security Examples
Update Single Package
Sample Output
Input ():
github.com/lufia/plan9stats@v0.0.0-20211012122336-39d0f177ccd0
pyyaml@6.0.2
actions/checkout@v4
react-transition-group@4.4.2
Output excerpt ():
Requirements
- Python 3.10+ (required - uses modern type hints like
)str | None - All other dependencies are optional with graceful fallbacks
When to Use Each Installation Method
| Environment | Installation | Command | Features |
|---|---|---|---|
| 🚀 Development | Standard (uv + all extras) | or | Full features, best results |
| 🏢 Corporate CI/CD | Standard without web | or | Fast, no browser |
| 🔒 High-Security | Zero dependencies | or | No external tools |
| 🔧 No uv available | Pure Python with pip | or | Use pip instead of uv |
| 📦 Docker Alpine | Minimal with http/spdx | then | Small image, fast |
| 💻 Local Testing | Standard (uv + all extras) | | Full features |
Troubleshooting
"playwright not found" error
You're trying to use web scraping without Playwright installed. Options:
- Add --noweb flag: python3 licensedog.py --text --noweb
- Install Playwright: pip install playwright && playwright install chromium
Packages not resolving
- Check package identifier format (use
separator, not@): - Enable verbose logging: --verbose
- Try with web scraping enabled (install
)playwright - For hard-to-resolve packages, check
column in outputaction_details
Slow performance
- Install optional dependencies: pip install -e ".[http,spdx]"
- Use more threads: --threads 8
- Use cache from previous run: existing
is reused automaticallylicense_cache.sqlite3
License
Universal Permissive License. This is the most libertarian license of all. It forever and with minimal additional requirements allows using the text for any purposes and transfers patent rights, if they happen to occur there. This is a more permissive license than Apache2 and MIT. This license is needed if you adhere to an ideology opposite to Richard Stallman's, and directly allow using something for any purposes (including commercial), requiring nothing in return.