CodeX
Published in

CodeX

Path Handling Functions of R for Python

Unifying Python’s Path Handling Functions a la R’s Fashion.

Photo by Birgit Held from Pexels

Path handling functions in base R are built-in, clean and intuitive, while Python’s path handling functions are distributed, duplicated, and slightly differing across three different standard packages ( os, pathlib, and shutils).

How about to unify them into one Python package?

Let’s call this package rpath. And let’s together create a package while walking through R’s and Python’s path handling commands along this article.

Setting up the package backbone using poetry

Poetry makes package creation and deploy easy in Python, providing a high degree of reproducibility.

We install poetry using the installing script from poetry:

$ curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python3

Important: Don’t use $ pip install poetry . They warn also in poetry’s site to not to install via pip.

Check the correct installation by:

$ poetry --version

Now let’s install a poetry project (the package):

$ poetry new rpath
## Created package rpath in rpath

Let’s enter the folder of the package by:

$ cd rpath

And open the pyproject.toml file with an editor of your choice, e.g. gedit (in Ubuntu). The .tomlfile is the config file for poetry packages. Feel free to choose any other editor than gedit:

$ gedit pyproject.toml

We write/complete the file to:

[tool.poetry]
name = "rpath"
version = "0.1.0"
description = "R's path handling functions for Python"
authors = ["Gwang-Jin Kim <gwang.jin.kim.phd@gmail.com>"]
[tool.poetry.dependencies]
python = "^3.9"
[tool.poetry.dev-dependencies]
pytest = "^5.2"
[build-system]
requires = ["poetry-core>=1.0.0"]
build-backend = "poetry.core.masonry.api"

And save the changes.

We check the toml file for correctness:

$ poetry check
## All set!

Then, we create and open the main.py file with our favorite editor and write there our code.

$ YOUR_EDITOR rpath/main.py # in my case `gedit`

And were write there:

import os
import pathlib
import shutil

And now we are ready to write the functions for this package.

1. How to get/set the current directory in R and Python

In R, this would be:

getwd()
setwd("/to/dir")

In Python, we have:

# get the current directory:
os.path.abspath(os.getcwd())
# os.path.abspath() helps to deal with Windows path peculiarities
os.path.realpath('.') # eliminates symbolic links
# and replace them by the real path
pathlib.Path.cwd()
pathlib.Path(__file__).resolve() # resolve to absolute path
# set the current directory:
os.chdir("/to/dir")
# no corresponding in pathlib or shutil

So we write into rpath/main.py:

def getwd():
return os.path.realpath(os.path.abspath(os.getcwd()))
def setwd(path):
return os.chdir(path)

The, we would be able to call in Python:

rpath.getwd()rpath.setwd(path)

2. How to dissect and access Path Name Components in R and Python

In R, we have:

# parent directory name
dirname(path)# file name
basename(path)# file extension
## note, the following functions are not base R functions
## you have to install the packages
## preceding `::`
## e.g. by `install.packages("tools")
## or "rio" or "xfun"
## e.g. 'csv' for files ending with '.csv'
tools::file_ext(path) # or:
rio::file_ext(path) # or:
xfun::file_ext(path)# split a path into its components
# how to dissect path folder and file names
# "/a/b/c.txt" => c("a", "b", "c.txt")
split_path <- function(path) {
if (dirname(path) %in% c(".", path)) return(basename(path))
return(c(basename(path), split_path(dirname(path))))
}
# from here

And in Python:

# parent directory:
os.path.dirname(path)
# file name:
os.path.basename(path)
# file extension:
filename, file_extension = os.path.splitext(path)
# file extension:
filename, file_extension = os.path.splitext(path)

So we can make out of it:

def dirname(path):
return os.path.dirname(path)def basename(path):
return os.path.basename(path)def file_ext(path):
return os.path.splitext(path)[1]

3. How to join Path Components in R and Python

In R, we have:

file.path("/a", "b", "file.txt") ## "/a/b/file.txt"

And in Python, we have to options:

os.path.join("/dir", "dir", "file.txt")                                                
pathlib.Path("/dir") / "dir" / "file.txt"
pathlib.PurePath("/dir", "dir", "file.txt")
# make Path() objects to string using `str()`
# or their `.as_posix()` method
# they all create "/dir/dir/file.txt"

So to have the file.path() in Python, we do:

def file_path(*args):
return os.path.join(*args)

But I like the / overloading too …

Path = pathlib.Path

So then, we would be able to use:

rpath.file_path("/a", "b", "c.txt") ## "/a/b/c.txt"

But also:

str(rpath.Path("/a") / "b" / "c.txt") ## "/a/b/c.txt"

4. How to recursively list Files in Folders in R and Python

In R this would be:

# folders
list.dirs(path = ".", full.names = TRUE, recursive = TRUE)
# files
list.files("/to/dir", pattern="\\.csv", full.names=TRUE, recursive=TRUE) ## pattern is regex

And in Python, we have:

pathlib.Path("/dir").glob("**/*.csv") 
# recursivity through wildcards `**`!

We can provide the R functions in Python by:

def list_dirs(path, pattern = "*", full_names=True, recursive=True):
pattern = "**" + os.sep + pattern if recursive else pattern
res = [p for p in pathlib.Path(path).glob(pattern) \
if os.path.isdir(p)]
if full_names:
return res
else:
return [basename(p) for p in res]
def list_files(path, pattern="*", full_names = True, recursive=True):
pattern = "**" + os.sep + pattern if recursive else pattern
res = [p for p in pathlib.Path(path).glob(pattern) \
if os.path.isfile(p)]
if full_names:
return res
else:
return [basename(p) for p in res]
## `pattern` is here not regex pattern, but bash file path pattern of `glob`

5. How to test for Existence of Files and Folders in R and Python

In R, this is solved by:

# folders
dir.exists("/to/dir") # or file.exists("/to/dir")
# files
file.exists("/to/file.txt")

In Python, we use one of those:

# for both:
pathlib.Path("/to/dir/or/file.txt").exists()
# for folders
pathlib.Path("to/dir").is_dir()
os.path.isdir("to/dir")
# for files
pathlib.Path("to/file.txt").is_file()
os.path.isfile("to/dir")
# there is also
os.path.islink(path)

So we can summarize to:

def dir_exists(path):
return os.path.isdir(path)def file_exists(path):
return os.path.isfile(path)

6. How to create new Folders and Files in R and Python

In R, we use:

# folders
dir.create("/to/new/dir", recursive=TRUE)
# files
file.create("/to/file.txt")

Which corresponds in Python to:

pathlib.Path("/to/new/dir").mkdir(exist_ok=True, parents=True)
os.makedirs("/to/new/dir", exist_ok=True)
# create a file (touch)
pathlib.Path("/to/new/file.txt").touch()

So we overtake R’s syntax by:

def dir_create(path, recursive=True):
os.makedirs(path, exist_ok=True)
print(f"Created {path}")
def file_create(path):
pathlib.Path(path).touch()
print(f"Created {path}")

7. How to (recursively) delete Files and Folders in R and Python

In R, we delete by the unlink() function:

# folders
unlink("/to/dir", recursive=TRUE)
# files
unlink("/to/file.txt")

Which in Python unlink(), too:

# for both
os.unlink("/to/dir/or/file.txt")
pathlib.Path("/to/dir/or/file.txt").unlink()
# rmdir (however, I never liked rmdir - it deletes only empty folders)
os.rmdir(dir_path, missing_ok=True)

So, this is easy:

def unlink(path):
os.unlink(path)

8. How to copy Folders with contents and Files in R and Python

In R, we can do:

# copy folder architecture recursively (not keeping original dates)
dir.create('/to/dir', recursive=TRUE) # subsequently followed by:
file.copy("/from/dir/or/file.txt", "/to/dir", recursive=TRUE, copy.date=FALSE)

In Python, we have:

# copy a folder recursively
shutil.copytree(src="/from/dir", dst="/to/dir",
symlinks=False,
ignore=None,
copy_function=shutil.copy2,
ignore_dangling_symlinks=False,
dirs_exist_ok=False)# copy a file
shutil.copyfile("/from/file.txt",
"/to/file.txt", # must be a complete file
follow_symlinks=True)
shutil.copy("/to/file.txt", "/new/dir/file.txt")
# only Python3.8+ #
# in <Python3.8 put `str()` around Path() objects! or `.as_posix()`!

To implement file.copy()of R, seems to be tricky.

Let’s implement an easy version covering the most frequent cases.

So either the from path is a directory — then we want recursively copy all files in the folder to the new directory. So for that, we can use shutil.copytree().

If the from path is a file — we use shutil.copyfile(). However, probably, we want to allow existing directories as target to. That we could check using dir_exist() written above.

def file_copy(_from, _to):
if dir_exist(_from):
shutil.copytree(src=_from, dst=_to,
symlinks=False,
ignore=None,
copy_function=shutil.copy2,
ignore_dangling_symlinks=False,
dirs_exist_ok=False)
elif file_exist(_from):
if dir_exist(_to):
_to = file_path(_to, basename(_from))
shutil.copyfile(_from, _to,
follow_symlinks=True)
else:
raise FileNotFoundError

9. How to move and rename Folders and Files in R and Python

To admit, there is no real mv equivalent in R, except for single files.

# move a folder
file.copy(from = "/from/dir", to = "/to/dir",
overwrite = recursive,
recursive = FALSE,
copy.mode = TRUE)
unlink("/from/dir")
# so this is not actually a mv in the sense of unix commands,
# but a copy and delete recursively
# files
# move a file
file.rename("from", "to")
# or one could also copy first the file and then unlink() the original one.

In this point, Python has shutil.move() which is an equivalent of Unix system’s mv:

# move a folder recursively
shutil.move(src="/from/dir", dst="/new/dir", copy_function=shutil.copy2)
# move files:
os.rename("/to/file.txt", "/new/dir")
pathlib.Path("/to/file.txt").rename("/new/dir/file.txt")
shutil.move(src="/to/file.txt", dst="/new/dir")

So this might be the only functionality, where we prefer Python’s move() over R’s file.rename(). However, we will implement the latter:

def move(_from, _to, copy_function=shutil.copy2):
return shutil.move(src=_from, dst=_to,
copy_function=copy_function)def file_rename(_from, _to):
return shutil.move(src=_from, dst=_to)

Finally, we have left:

10. How to get the size and creation time of Folders and Files in R and Python

R’s solution to this problem:

obj <- file.info(path)
# watch available variables by
str(obj)
# file size
obj$size
# is path a directory?
obj$isdir
# modification, creation, access time
obj$mtime, obj$ctime, obj$atime

For this, Python has:

os.getsize(path)
os.getatime(path)
os.getctime(path)
os.getmtime(path)
# and one can use the setters too
os.path.isdir(path)
# look for methods in
dir(os)
dir(os.path)
dir(pathlib.Path)

To imitate R’s object, we can write a class for it in our Python package:

class FileInfo:

def __init__(self, path):
self.path = path
self.size = os.getsize(path)
self.atime = os.getatime(path)
self.ctime = os.getctime(path)
self.mtime = os.getmtime(path)
self.isdir = os.path.isdir(path)

Build and Publish using poetry

First of all, we have to export the function names and class names, so that they are exported from the package, therefore accessible from outside the package.

For that, we open the ~/rpath/rapth/__init__.py file which is in the folder, where the main.py file is. And write there the import files:

__version__ = '0.1.0' # this line poetry wrote
from .main import getwd, setwd, dirname, basename, file_ext
from .main import file_path, Path, list_dirs, list_files
from .main import dir_exists, file_exists, dir_create, file_create
from .main import unlink, file_copy, move, file_rename, FileInfo

Then, we have to build and publish the package:

$ poetry build
$ poetry publish
# asks for our username and password in PyPI
# so you should have an account there or register

Cave: Poetry is very strict with versioning, so every time you want to rebuild and republish (for updating changes in the package code), you have to increase the version number in the pyproject.toml file! (You can give version number 0.1.0-0 though and upgrade only the number after the if you want to avoid an inflation of rising version numbers).

Once done and removed all errors, we can fire from anywhere in this planet into a terminal: $ pip install rpath , open Python (or ipython ), and have all the path handling functions available at our finger tip! Congratulations!

(The full code is listed in https://bitbucket.org/freiburgmls/rpath/src/master/ , my bitbucket workspace).

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store