Path Handling Functions of R for Python
Unifying Python’s Path Handling Functions a la R’s Fashion.

Path handling functions in base R are built-in, clean and intuitive, while Python’s path handling functions are distributed, duplicated, and slightly differing across three different standard packages ( os
, pathlib
, and shutils
).
How about to unify them into one Python package?
Let’s call this package rpath
. And let’s together create a package while walking through R’s and Python’s path handling commands along this article.
Setting up the package backbone using poetry
Poetry makes package creation and deploy easy in Python, providing a high degree of reproducibility.
We install poetry using the installing script from poetry:
$ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3
Important: Don’t use $ pip install poetry
. They warn also in poetry
’s site to not to install via pip
.
Check the correct installation by:
$ poetry --version
Now let’s install a poetry project (the package):
$ poetry new rpath
## Created package rpath in rpath
Let’s enter the folder of the package by:
$ cd rpath
And open the pyproject.toml
file with an editor of your choice, e.g. gedit (in Ubuntu). The .toml
file is the config file for poetry
packages. Feel free to choose any other editor than gedit:
$ gedit pyproject.toml
We write/complete the file to:
[tool.poetry]
name = "rpath"
version = "0.1.0"
description = "R's path handling functions for Python"
authors = ["Gwang-Jin Kim <gwang.jin.kim.phd@gmail.com>"][tool.poetry.dependencies]
python = "^3.9"[tool.poetry.dev-dependencies]
pytest = "^5.2"[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"
And save the changes.
We check the toml
file for correctness:
$ poetry check
## All set!
Then, we create and open the main.py
file with our favorite editor and write there our code.
$ YOUR_EDITOR rpath/main.py # in my case `gedit`
And were write there:
import os
import pathlib
import shutil
And now we are ready to write the functions for this package.
1. How to get/set the current directory in R and Python
In R, this would be:
getwd()
setwd("/to/dir")
In Python, we have:
# get the current directory:
os.path.abspath(os.getcwd())
# os.path.abspath() helps to deal with Windows path peculiarities
os.path.realpath('.') # eliminates symbolic links
# and replace them by the real path
pathlib.Path.cwd()
pathlib.Path(__file__).resolve() # resolve to absolute path# set the current directory:
os.chdir("/to/dir")
# no corresponding in pathlib or shutil
So we write into rpath/main.py
:
def getwd():
return os.path.realpath(os.path.abspath(os.getcwd()))def setwd(path):
return os.chdir(path)
The, we would be able to call in Python:
rpath.getwd()rpath.setwd(path)
2. How to dissect and access Path Name Components in R and Python
In R, we have:
# parent directory name
dirname(path)# file name
basename(path)# file extension ## note, the following functions are not base R functions
## you have to install the packages
## preceding `::`
## e.g. by `install.packages("tools")
## or "rio" or "xfun"
## e.g. 'csv' for files ending with '.csv'
tools::file_ext(path) # or:
rio::file_ext(path) # or:
xfun::file_ext(path)# split a path into its components # how to dissect path folder and file names
# "/a/b/c.txt" => c("a", "b", "c.txt")
split_path <- function(path) {
if (dirname(path) %in% c(".", path)) return(basename(path))
return(c(basename(path), split_path(dirname(path))))
}
# from here
And in Python:
# parent directory:
os.path.dirname(path)# file name:
os.path.basename(path)# file extension:
filename, file_extension = os.path.splitext(path)# file extension:
filename, file_extension = os.path.splitext(path)
So we can make out of it:
def dirname(path):
return os.path.dirname(path)def basename(path):
return os.path.basename(path)def file_ext(path):
return os.path.splitext(path)[1]
3. How to join Path Components in R and Python
In R, we have:
file.path("/a", "b", "file.txt") ## "/a/b/file.txt"
And in Python, we have to options:
os.path.join("/dir", "dir", "file.txt")
pathlib.Path("/dir") / "dir" / "file.txt"
pathlib.PurePath("/dir", "dir", "file.txt")
# make Path() objects to string using `str()`
# or their `.as_posix()` method
# they all create "/dir/dir/file.txt"
So to have the file.path()
in Python, we do:
def file_path(*args):
return os.path.join(*args)
But I like the /
overloading too …
Path = pathlib.Path
So then, we would be able to use:
rpath.file_path("/a", "b", "c.txt") ## "/a/b/c.txt"
But also:
str(rpath.Path("/a") / "b" / "c.txt") ## "/a/b/c.txt"
4. How to recursively list Files in Folders in R and Python
In R this would be:
# folders
list.dirs(path = ".", full.names = TRUE, recursive = TRUE)# files
list.files("/to/dir", pattern="\\.csv", full.names=TRUE, recursive=TRUE) ## pattern is regex
And in Python, we have:
pathlib.Path("/dir").glob("**/*.csv")
# recursivity through wildcards `**`!
We can provide the R functions in Python by:
def list_dirs(path, pattern = "*", full_names=True, recursive=True):
pattern = "**" + os.sep + pattern if recursive else pattern
res = [p for p in pathlib.Path(path).glob(pattern) \
if os.path.isdir(p)]
if full_names:
return res
else:
return [basename(p) for p in res]def list_files(path, pattern="*", full_names = True, recursive=True):
pattern = "**" + os.sep + pattern if recursive else pattern
res = [p for p in pathlib.Path(path).glob(pattern) \
if os.path.isfile(p)]
if full_names:
return res
else:
return [basename(p) for p in res]## `pattern` is here not regex pattern, but bash file path pattern of `glob`
5. How to test for Existence of Files and Folders in R and Python
In R, this is solved by:
# folders
dir.exists("/to/dir") # or file.exists("/to/dir")# files
file.exists("/to/file.txt")
In Python, we use one of those:
# for both:
pathlib.Path("/to/dir/or/file.txt").exists()# for folders
pathlib.Path("to/dir").is_dir()
os.path.isdir("to/dir")# for files
pathlib.Path("to/file.txt").is_file()
os.path.isfile("to/dir")# there is also
os.path.islink(path)
So we can summarize to:
def dir_exists(path):
return os.path.isdir(path)def file_exists(path):
return os.path.isfile(path)
6. How to create new Folders and Files in R and Python
In R, we use:
# folders
dir.create("/to/new/dir", recursive=TRUE)# files
file.create("/to/file.txt")
Which corresponds in Python to:
pathlib.Path("/to/new/dir").mkdir(exist_ok=True, parents=True)
os.makedirs("/to/new/dir", exist_ok=True)# create a file (touch)
pathlib.Path("/to/new/file.txt").touch()
So we overtake R’s syntax by:
def dir_create(path, recursive=True):
os.makedirs(path, exist_ok=True)
print(f"Created {path}")def file_create(path):
pathlib.Path(path).touch()
print(f"Created {path}")
7. How to (recursively) delete Files and Folders in R and Python
In R, we delete by the unlink()
function:
# folders
unlink("/to/dir", recursive=TRUE)# files
unlink("/to/file.txt")
Which in Python unlink()
, too:
# for both
os.unlink("/to/dir/or/file.txt")
pathlib.Path("/to/dir/or/file.txt").unlink()# rmdir (however, I never liked rmdir - it deletes only empty folders)
os.rmdir(dir_path, missing_ok=True)
So, this is easy:
def unlink(path):
os.unlink(path)
8. How to copy Folders with contents and Files in R and Python
In R, we can do:
# copy folder architecture recursively (not keeping original dates)
dir.create('/to/dir', recursive=TRUE) # subsequently followed by:
file.copy("/from/dir/or/file.txt", "/to/dir", recursive=TRUE, copy.date=FALSE)
In Python, we have:
# copy a folder recursively
shutil.copytree(src="/from/dir", dst="/to/dir",
symlinks=False,
ignore=None,
copy_function=shutil.copy2,
ignore_dangling_symlinks=False,
dirs_exist_ok=False)# copy a file
shutil.copyfile("/from/file.txt",
"/to/file.txt", # must be a complete file
follow_symlinks=True)
shutil.copy("/to/file.txt", "/new/dir/file.txt")
# only Python3.8+ #
# in <Python3.8 put `str()` around Path() objects! or `.as_posix()`!
To implement file.copy()
of R, seems to be tricky.
Let’s implement an easy version covering the most frequent cases.
So either the from
path is a directory — then we want recursively copy all files in the folder to the new directory. So for that, we can use shutil.copytree()
.
If the from
path is a file — we use shutil.copyfile()
. However, probably, we want to allow existing directories as target to
. That we could check using dir_exist()
written above.
def file_copy(_from, _to):
if dir_exist(_from):
shutil.copytree(src=_from, dst=_to,
symlinks=False,
ignore=None,
copy_function=shutil.copy2,
ignore_dangling_symlinks=False,
dirs_exist_ok=False)
elif file_exist(_from):
if dir_exist(_to):
_to = file_path(_to, basename(_from))
shutil.copyfile(_from, _to,
follow_symlinks=True)
else:
raise FileNotFoundError
9. How to move and rename Folders and Files in R and Python
To admit, there is no real mv
equivalent in R, except for single files.
# move a folder
file.copy(from = "/from/dir", to = "/to/dir",
overwrite = recursive,
recursive = FALSE,
copy.mode = TRUE)
unlink("/from/dir")
# so this is not actually a mv in the sense of unix commands,
# but a copy and delete recursively# files
# move a file
file.rename("from", "to")
# or one could also copy first the file and then unlink() the original one.
In this point, Python has shutil.move()
which is an equivalent of Unix system’s mv
:
# move a folder recursively
shutil.move(src="/from/dir", dst="/new/dir", copy_function=shutil.copy2)# move files:
os.rename("/to/file.txt", "/new/dir")
pathlib.Path("/to/file.txt").rename("/new/dir/file.txt")
shutil.move(src="/to/file.txt", dst="/new/dir")
So this might be the only functionality, where we prefer Python’s move()
over R’s file.rename()
. However, we will implement the latter:
def move(_from, _to, copy_function=shutil.copy2):
return shutil.move(src=_from, dst=_to,
copy_function=copy_function)def file_rename(_from, _to):
return shutil.move(src=_from, dst=_to)
Finally, we have left:
10. How to get the size and creation time of Folders and Files in R and Python
R’s solution to this problem:
obj <- file.info(path)
# watch available variables by
str(obj)# file size
obj$size# is path a directory?
obj$isdir# modification, creation, access time
obj$mtime, obj$ctime, obj$atime
For this, Python has:
os.getsize(path)
os.getatime(path)
os.getctime(path)
os.getmtime(path)# and one can use the setters too
os.path.isdir(path)# look for methods in
dir(os)
dir(os.path)
dir(pathlib.Path)
To imitate R’s object, we can write a class for it in our Python package:
class FileInfo:
def __init__(self, path):
self.path = path
self.size = os.getsize(path)
self.atime = os.getatime(path)
self.ctime = os.getctime(path)
self.mtime = os.getmtime(path)
self.isdir = os.path.isdir(path)
Build and Publish using poetry
First of all, we have to export the function names and class names, so that they are exported from the package, therefore accessible from outside the package.
For that, we open the ~/rpath/rapth/__init__.py
file which is in the folder, where the main.py
file is. And write there the import files:
__version__ = '0.1.0' # this line poetry wrote
from .main import getwd, setwd, dirname, basename, file_ext
from .main import file_path, Path, list_dirs, list_files
from .main import dir_exists, file_exists, dir_create, file_create
from .main import unlink, file_copy, move, file_rename, FileInfo
Then, we have to build and publish the package:
$ poetry build
$ poetry publish
# asks for our username and password in PyPI
# so you should have an account there or register
Cave: Poetry is very strict with versioning, so every time you want to rebuild and republish (for updating changes in the package code), you have to increase the version number in the pyproject.toml
file! (You can give version number 0.1.0-0
though and upgrade only the number after the —
if you want to avoid an inflation of rising version numbers).
Once done and removed all errors, we can fire from anywhere in this planet into a terminal: $ pip install rpath
, open Python (or ipython
), and have all the path handling functions available at our finger tip! Congratulations!
(The full code is listed in https://bitbucket.org/freiburgmls/rpath/src/master/ , my bitbucket workspace).