Skip to contents

A focused, spec-complete implementation of the Public Suffix List (PSL) for R. pslr bundles a reproducible, pinned PSL snapshot and implements the official prevailing-rule algorithm to answer public-suffix (eTLD) and registrable-domain (eTLD+1) queries.

  • Distinguishes the ICANN and PRIVATE rule sections.
  • Accepts Unicode, ASCII, and A-label hostnames via punycoder canonicalization; returns ASCII or Unicode output.
  • Works fully offline from the bundled snapshot; an explicit, validated psl_refresh() is the only network path.
  • Matcher compiled with cpp11; no external system library required.

Installation

Install from GitHub:

# install.packages("pak")
pak::pak("bart-turczynski/pslr")

pslr depends on punycoder, which is installed automatically from CRAN.

Usage

library(pslr)

public_suffix("www.example.co.uk")       #> "co.uk"
registrable_domain("www.example.co.uk")  #> "example.co.uk"

# ICANN vs PRIVATE sections
public_suffix("user.github.io")                    #> "github.io"
public_suffix("user.github.io", section = "icann") #> "io"

# Explicit membership vs the implicit default rule
is_public_suffix("madeuptld")                  #> TRUE  (implicit "*")
is_public_suffix("madeuptld", unknown = "na")  #> NA    (explicit only)

# Split a host, or inspect the prevailing rule
suffix_extract("blog.user.github.io")
public_suffix_rule("a.b.kobe.jp")

See vignette("introduction", package = "pslr") for the full tour: section choice, the unknown-suffix policy, IDN output, terminal dots, refresh and activation, reproducibility, and security notes.

Reproducibility

A result depends on both which list answered and how hosts were normalized. psl_version() reports the active-list provenance plus the runtime normalization identifiers; record it alongside reproducibility-sensitive output.

Development

Install dependencies plus the dev tooling used by the checks:

Rscript -e 'pak::local_install_deps(dependencies = TRUE)'

Run the same verification CI runs (lint + R CMD check --as-cran):

Rscript -e 'lints <- lintr::lint_package(); if (length(lints)) { print(lints); quit(status = 1) }' && Rscript -e 'rcmdcheck::rcmdcheck(args = "--as-cran", error_on = "warning")'

R CMD check runs the testthat and cucumber specs, so the behaviour specs are verified as part of the check. A non-CRAN performance benchmark and its release gate live in bench/benchmark.R; recorded reference results are in docs/benchmarks.md.

Project layout

  • R/ — package source (edit roxygen comments here, not man/ or NAMESPACE).
  • src/ — the cpp11 matcher core.
  • man/ — generated help pages (devtools::document()).
  • tests/testthat/ — testthat tests and cucumber feature specs.
  • vignettes/ — long-form documentation.
  • data-raw/ — the deterministic snapshot regeneration pipeline.
  • docs/ — durable project context (PRD.md, architecture.md, benchmarks.md).

pslr is part of a small ecosystem of R packages by the same author:

  • punycoder — the Punycode and IDNA codec that pslr uses for host canonicalization before PSL matching. Use it directly for raw Unicode ↔︎ ACE round-trips.
  • rurl — full URL parsing, normalization, cleaning, and joining toolkit. Uses pslr as its PSL engine; reach for it when you need more than domain extraction.

License

Package code is MIT licensed. The bundled Public Suffix List data (inst/extdata/) is distributed under the Mozilla Public License 2.0; see inst/NOTICE and inst/extdata/PSL-LICENSE.