licensedog

0

Описание

Языки

  • Python95,2%
  • Shell1,8%
  • PowerShell1,7%
  • Batchfile1,3%
README.md

LicenseDog

LicenseDog Screenshot

Find licenses for third‑party packages and write the results to OUT.csv (or JSONL). Designed for fast, reliable compliance inventories with robust caching and an optional Textual TUI.

What is this and why was it created?

When you ship software, you need an auditable list of licenses for your dependencies. licensedog reads a simple IN.csv of package identifiers and resolves each package's license by probing canonical repository license files and trusted metadata. It outputs a machine‑readable table you can import into spreadsheets or pipelines, and it caches aggressively so repeated runs are fast and deterministic.

Supported package ecosystems:

  • 🐹 Go modules (20+ domain patterns: github.com, gopkg.in, google.golang.org, filippo.io, dario.cat, sigs.k8s.io, and more)
  • 📦 npm packages (registry API, tarball inspection, Code tab web scraping)
  • 🐍 Python packages (PyPI JSON API, web scraping fallback)
  • ⚙️ GitHub Actions (actions/checkout@v4, etc.)

Input Format Example

Here's what a typical

IN.csv
looks like:

github.com/lufia/plan9stats@v0.0.0-20211012122336-39d0f177ccd0 react-transition-group@4.4.2 pyyaml@6.0.2

Such files can be generated by CycloneDX or similar SBOM tools. Important: There's no guarantee that package identifiers from SBOM actually exist or that their license information reflects the current state in the ever-changing Internet.


Resolution Priority Order

licensedog uses a very simple strategy: the more specialized version wins. All else being equal, what can be linked to wins. The last desperate measure is to download the source or link to a main/master/canary branch in Git, if nothing else is available.

npm packages:

  1. 🎯 Version-specific Git refs (tags/branches like
    v2.2.1
    ,
    2.2.1
    )
  2. 📋 npm registry metadata (API
    license
    field)
  3. 🌐 npm Code tab scraping (web UI, unless disabled with
    --noweb
    )
  4. 📦 npm tarball inspection (download & extract package)
  5. 🔄 PyPI fallback (for misclassified packages)
  6. 🌿 Generic Git branches (master/main - last resort)

Go modules:

  1. 🎯 Version-specific Git refs (tags/branches)
  2. 📋 pkg.go.dev metadata/scraping (web check)
  3. 📦 Go module archive (download from proxy.golang.org)
  4. 🌿 Generic Git branches (master/main - last resort)

Python packages:

  1. 🎯 Version-specific Git refs (tags/branches)
  2. 📋 PyPI JSON API (metadata
    license
    field and classifiers)
  3. 🎯 GitHub version-specific refs (extracted from PyPI metadata)
  4. 🌐 PyPI web scraping (page sidebar, unless disabled with
    --noweb
    )
  5. 📦 PyPI sdist inspection (download & extract source distribution)
  6. 🌿 Generic Git branches (master/main - last resort)

Key principle: Version-specific sources (git tags or package archives) are always preferred over generic

master
/
main
branches, which may contain different code than the released version. Web-based checks (APIs and scraping) are tried before downloading full archives to minimize bandwidth usage.


Installation

For development environments with full features including web scraping support.

What you get:

  • ✅ Full support for all package types (Go, npm, Python, GitHub Actions)
  • ✅ Web scraping for PyPI pages and npm Code tab (resolves hard-to-find licenses)
  • ✅ Fast HTTP with retries and connection pooling (requests)
  • ✅ Fast SPDX fuzzy matching (rapidfuzz, 10-100x faster)
  • ✅ Rich terminal UI with progress bars (textual)

🔒 High-Security Installation (Zero Dependencies)

For restricted environments where browser automation is prohibited.

What you get:

  • ✅ Full support for Go modules and GitHub Actions
  • ✅ Basic npm support (registry API only, no Code tab scraping)
  • ✅ Basic Python support (PyPI JSON API only, no web scraping)
  • ❌ No web scraping (some packages may not resolve)
  • ⚠️ Slower HTTP (urllib instead of requests with retries)
  • ⚠️ Slower SPDX matching (difflib instead of rapidfuzz, 10-100x slower)
  • ⚠️ Plain text output only (no TUI)

Key point: The

--noweb
flag is required to ensure no web scraping attempts are made.


🎛️ Installation Options & Feature Groups

All dependencies are optional with graceful fallbacks. You can choose which features to install:

Using pip (traditional)

Feature Groups Explained

GroupDependenciesPurposeWhen to Use
http
requests≥2.31.0Fast HTTP with retries, session reuseAlways recommended
spdx
rapidfuzz≥3.5.0Fast fuzzy SPDX matching (10-100x faster)Always recommended
tui
textual≥0.47.0Rich terminal UI with progress barsInteractive use
web
playwright≥1.40.0Web scraping for PyPI/npm Code tabWhen hard-to-resolve packages present
all
All of the aboveFull featuresDevelopment, best results

Fallback behavior:

  • No
    http
    : Uses urllib (slower, no retry logic)
  • No
    spdx
    : Uses difflib (10-100x slower fuzzy matching)
  • No
    tui
    : Plain text progress logs only
  • No
    web
    : API-only mode (some packages may not resolve)

General Usage

Quick Start

Simplest way (3 steps):

  1. Put your
    IN.csv
    file in the licensedog directory
  2. Run
    ./licensedog
    (Linux/macOS) or
    .\licensedog.ps1
    (Windows PowerShell) or
    licensedog.bat
    (Windows CMD)
  3. Your results are in
    OUT.csv

By default, licensedog displays a pseudographical text UI (TUI) in the console with progress bars and real-time updates.

With custom paths:

Wrapper script modes:

UI Mode Control (mutually exclusive):

Direct Python invocation:

IN.csv Format

One package identifier per line (no header). Duplicates are automatically de‑duplicated.

Supported version separators:

  • Standard
    @
    separator:
    webpack@5.95.0
  • CycloneDX-style
    :
    separator:
    webpack:5.95.0
    (auto-normalized to
    @
    )

Examples (see

examples/IN-short.csv
and
examples/IN-long.csv
):

github.com/lufia/plan9stats@v0.0.0-20211012122336-39d0f177ccd0 google.golang.org/appengine@v1.6.8 gopkg.in/yaml.v3@v3.0.1 sigs.k8s.io/yaml@v1.4.0 filippo.io/edwards25519@v1.1.0 react-transition-group@4.4.2 @next/swc-linux-x64-gnu@14.2.24 pyyaml@6.0.2 pathspec@0.12.1 actions/checkout@v4 actions/setup-go@v5

OUT.csv Format

Columns written by default (

--format csv
):

ColumnDescription
package
Input package id as emitted
license
Detected SPDX id (e.g.,
MIT
,
Apache-2.0
), empty if unknown
license_url
Canonical URL where the license text was found
action
LICENSE_FOUND
or
LICENSE_NOT_FOUND
source_type
Origin:
GITHUB
,
NPMJS
,
NPMJS+CODE_TAB
,
PYPI
,
PYPI+WEB
,
GODEV
, etc.
git_source
Repository/ref hint used (when applicable)
readme_url
README URL that mentioned license (when applicable)
action_details
Human‑readable steps taken to resolve
evidence_snippet
Reserved; currently empty
is_copyleft
true
/
false
heuristic flag
requires_notice
true
/
false
heuristic flag
clash_exists
TRUE
/
FALSE
Excel-parseable flag for namespace clashes
clashes
Namespace clash detection details (npm vs PyPI)

To emit line‑delimited JSON instead:


How License Resolution Works

licensedog follows a conservative, explainable pipeline for each package:

1. Package Classification

Automatic ecosystem detection using pattern matching:

  • Go modules: 20+ domain patterns

    • github.com/*
      ,
      gitlab.com/*
      ,
      bitbucket.org/*
    • google.golang.org/*
      ,
      golang.org/*
    • gopkg.in/*
      ,
      go.uber.org/*
      ,
      go.etcd.io/*
    • filippo.io/*
      ,
      dario.cat/*
      ,
      sigs.k8s.io/*
    • And more...
  • GitHub Actions: Pattern

    org/repo@version
    (no dots in org/repo)

    • Examples:
      actions/checkout@v4
      ,
      actions/setup-go@v5
  • Python packages: Conservative detection for packages with Python keywords

    • Examples:
      pyyaml@6.0.2
      ,
      pathspec@0.12.1
  • npm packages: Default for packages not matching above patterns

    • Examples:
      react@18.2.0
      ,
      @babel/core@7.22.0

2. Repository Inference

  • Go modules: Map to GitHub org/repo using hardcoded rules or go-get meta tags
    • Special cases:
      gopkg.in/yaml.v2
      go-yaml/yaml
      ,
      google.golang.org/appengine
      golang/appengine
  • npm packages: Query npm registry metadata for repository URL
  • Python packages: Query PyPI JSON API for package metadata
  • GitHub Actions: Direct mapping to
    github.com/org/repo

3. Direct License Probing

Try LICENSE files at version-specific refs (tags/branches), then fallback to main/master:

  • Files checked:
    LICENSE
    ,
    LICENSE.md
    ,
    license
    ,
    license.md
    ,
    COPYING
    ,
    COPYING.md
    , etc.
  • Uses
    raw.githubusercontent.com
    for fast direct access
  • Heuristics detect common licenses (MIT, Apache-2.0, BSD, ISC, MPL)
  • Fuzzy SPDX matching against SPDX corpus (configurable threshold via
    --min-spdx-score
    )

4. Policy Flags & Clash Detection

  • is_copyleft
    : Inferred from SPDX patterns (
    GPL-
    ,
    AGPL-
    ,
    LGPL-
    ,
    MPL
    ,
    CDDL
    )
  • requires_notice
    : Set for licenses requiring attribution (MIT, BSD, Apache, MPL, GPL family, ISC)
  • clash_exists
    : Boolean flag (
    TRUE
    /
    FALSE
    ) indicating namespace conflicts - Excel will parse this as a boolean type
  • clashes
    : Detailed description when a package with the same name@version exists on both npm and PyPI (ambiguous packages default to npm)

Why only npm and PyPI packages can clash:

Go packages are never checked for clashes because their format makes them unambiguous:

  • Go packages always include the full import path:
    github.com/gorilla/mux@v1.8.0
    ,
    gopkg.in/yaml.v3@v3.0.1
  • npm/PyPI packages use simple names:
    webpack@5.95.0
    ,
    requests@2.31.0

The presence of domain paths (

github.com/
,
gopkg.in/
, etc.) makes Go packages instantly recognizable and impossible to confuse with npm or PyPI packages. Only packages with simple names (no domain prefix) are checked for namespace conflicts between the npm and PyPI registries.

5. Caching & Deduplication

  • SQLite cache
    license_cache.sqlite3
    stores HTTP metadata and resolved results
  • SPDX license index cached under
    spdx/
    directory
  • Single-flight behavior: each unique package resolved once per run
  • Conditional GET using ETag/Last-Modified minimizes network traffic

All resolution steps are recorded in

action_details
for auditability.


Advanced Usage & Parameters

Core Options

Wrapper Script Options

The wrapper scripts (

licensedog
,
licensedog.ps1
,
licensedog.bat
) support special flags:

How --nouv works:

  • Without --nouv: Wrapper checks for uv and runs
    uv run licensedog.py
  • With --nouv: Wrapper runs
    python3 licensedog.py
    directly (no uv required)
  • With --nouv --noweb: High-security mode with zero dependencies

When to use --nouv:

  • uv is not installed and you don't want to install it
  • Restricted environments where uv installation is not allowed
  • You prefer pip for dependency management
  • Testing with specific Python environments

Web Scraping Control

Network & Performance

License Detection

UI & Logging

Caching

Targeted Updates


Examples

Standard Installation Examples

High-Security Examples

Update Single Package

Sample Output

Input (

IN.csv
):

github.com/lufia/plan9stats@v0.0.0-20211012122336-39d0f177ccd0 pyyaml@6.0.2 actions/checkout@v4 react-transition-group@4.4.2

Output excerpt (

OUT.csv
):


Requirements

  • Python 3.10+ (required - uses modern type hints like
    str | None
    )
  • All other dependencies are optional with graceful fallbacks

When to Use Each Installation Method

EnvironmentInstallationCommandFeatures
🚀 DevelopmentStandard (uv + all extras)
./licensedog --text
or
uv run licensedog.py --text
Full features, best results
🏢 Corporate CI/CDStandard without web
./licensedog --text --noweb
or
uv pip install -e ".[http,spdx]"
Fast, no browser
🔒 High-SecurityZero dependencies
./licensedog --nouv --noweb --text
or
python3 licensedog.py --text --noweb
No external tools
🔧 No uv availablePure Python with pip
./licensedog --nouv --text
or
pip install -e ".[http,spdx]"
Use pip instead of uv
📦 Docker AlpineMinimal with http/spdx
pip install -e ".[http,spdx]"
then
python3 licensedog.py --text --noweb
Small image, fast
💻 Local TestingStandard (uv + all extras)
./licensedog --text
Full features

Troubleshooting

"playwright not found" error

You're trying to use web scraping without Playwright installed. Options:

  1. Add --noweb flag:
    python3 licensedog.py --text --noweb
  2. Install Playwright:
    pip install playwright && playwright install chromium

Packages not resolving

  1. Check package identifier format (use
    @
    separator, not
    :
    )
  2. Enable verbose logging:
    --verbose
  3. Try with web scraping enabled (install
    playwright
    )
  4. For hard-to-resolve packages, check
    action_details
    column in output

Slow performance

  1. Install optional dependencies:
    pip install -e ".[http,spdx]"
  2. Use more threads:
    --threads 8
  3. Use cache from previous run: existing
    license_cache.sqlite3
    is reused automatically

License

Universal Permissive License. This is the most libertarian license of all. It forever and with minimal additional requirements allows using the text for any purposes and transfers patent rights, if they happen to occur there. This is a more permissive license than Apache2 and MIT. This license is needed if you adhere to an ideology opposite to Richard Stallman's, and directly allow using something for any purposes (including commercial), requiring nothing in return.