Skip to contents

Extracts the path component of a URL.

Usage

get_path(
  url,
  protocol_handling = "keep",
  case_handling = c("lower_host", "keep", "lower", "upper"),
  trailing_slash_handling = c("none", "keep", "strip"),
  index_page_handling = c("keep", "strip"),
  path_normalization = c("none", "collapse_slashes", "dot_segments", "both"),
  path_encoding = c("keep", "encode", "decode")
)

Arguments

url

A character vector of URLs.

protocol_handling

A character string specifying how to handle protocols. Defaults to "keep".

  • "keep": If a scheme exists (http, https, ftp, ftps), it's used. If no scheme, "http://" is added.

  • "none": If a scheme exists, it's used. If no scheme, then no scheme is used (scheme component will be NA).

  • "strip": Any existing scheme is removed (scheme component will be NA).

  • "http": The scheme is forced to be "http".

  • "https": The scheme is forced to be "https".

case_handling

How to handle casing of the returned path. Defaults to "lower_host", which preserves the path's original casing (paths are case-sensitive per RFC 3986 §6.2.2.1). Use "lower"/"upper" to force a case.

trailing_slash_handling

A character string specifying how to handle trailing slashes in the path component of the cleaned URL. Defaults to "none".

  • "none": (Default) No specific handling is applied. Path remains as is after initial parsing.

  • "keep": Ensures a trailing slash. If a path exists and doesn't end with one, it's added. If path is just "/", it's kept.

  • "strip": Removes a trailing slash if present, unless the path is solely "/".

index_page_handling

A character string specifying how to handle index/default pages. Defaults to "keep".

  • "keep": (Default) Leave index/default page segments untouched.

  • "strip": Remove a trailing index.* or default.* segment (case-insensitive).

path_normalization

How to normalize path structure. Defaults to "none".

  • "none": (Default) No normalization.

  • "collapse_slashes": Collapse duplicate slashes in the path.

  • "dot_segments": Resolve . and .. segments per RFC 3986.

  • "both": Apply both collapse_slashes and dot_segments.

path_encoding

How to handle percent-encoding in the path for `clean_url`. Defaults to "keep".

  • "keep": Leave the path percent-encoding untouched.

  • "encode": Normalize by decoding first, then percent-encoding each segment (slashes preserved).

  • "decode": Percent-decode UTF-8 sequences in the path.

Value

A character vector of URL paths.

Examples

get_path("http://example.com/some/path?query=1")
#> http://example.com/some/path?query=1 
#>                         "/some/path"