Image for post
Image for post
Photo by Eric Prouzet on Unsplash

How to broadcast tokenizer and use it with UDFs

In this short post, I would show how to use Hugging Face Tokenizer. Before going into details on how to apply tokenizer on a DataFrame using PySpark, let us first look at some simple tokenizer code.

The above code simply loads a pre-trained tokenizer roberta-base. It prints out a dictionary that has two keys. In this post, we would only focus on input_ids which are basically the tokens corresponding to the word “hello”.

The first step to use the tokenizer on a DataFrame is to convert it into UDF. In the code below, we create a method tokenize which…

Image for post
Image for post

in python i trust — verse 4

list comprehension

is some snazzy s**t

make ma code smaller

it’s called pying it

in python i trust

in python i trust

List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition.


As per above, list comprehension could be used if you are creating a list from another sequence or iterable (i.e. list, dictionary items, etc.) …

Image for post
Image for post
Photo by Timothy Dykes on Unsplash

in python i trust — verse 1

if you got a list

or gotta dictionary

add them up now

that I call some witchery

in python i trust

in python i trust

This is about a simple list and dictionary addition.

Above prints out [1, 2, 3, 4, 5], which is merging two lists together.

Starting Python 3.9 dictionaries could be merged too.

>>> d = {'spam': 1, 'eggs': 2, 'cheese': 3}
>>> e = {'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> d | e
{'spam': 1, 'eggs': 2, 'cheese': 'cheddar', 'aardvark': 'Ethel'}
>>> e | d
{'cheese': 3, 'aardvark': 'Ethel', 'spam': 1, 'eggs': 2}


Image for post
Image for post

Using python decorator to do random sampling

This post is more educational in terms of learning a few concepts together. I was wondering how can I use python decorators with a specific Data Science concept and the idea for this post was born. Let us first look into the random sampling on how it is done.

Random Sampling

The inverse transform is one of the methods to generate random samples from some of the well-known distributions. Inverse transformation takes uniform samples u between 0 and 1 and returns the largest number x from distribution P(X) such that the probability of X below x is less than equal to u.

Why are they not the same?

Two different sets of coffee beans
Two different sets of coffee beans
Photo by Coffee Geek on Unsplash.

“The DecoratorPattern is a pattern described in the DesignPatternsBook. It is a way of apparently modifying an object’s behavior, by enclosing it inside a decorating object with a similar interface.

This is not to be confused with PythonDecorators, which is a language feature for dynamically modifying a function or class.” — Wiki.Python

Python has a feature called decorators, which should not be confused with the design pattern Decorator Pattern. In this article, I will go through examples of both Decorator Pattern and Python decorators to help distinguish the two.

Decorator Pattern

So let’s focus on the definition above: “It is a way…

Image for post
Image for post
Photo by Miltiadis Fragkidis on Unsplash

How to modify a function with multiple parametrized decorators?

A decorator object is a way to modify the functionality of a decorated object. There are various ways to implement the decorator in python. In this post, we will discuss a few and how multiple decorators can be chained together to truly enhance the functionality of an object. In python, functions/methods are just objects so in this post, we would look at both classes and methods to implement decorators.

Simple Decorator

Before we jump into nesting, let us look at a simple decorator. I will first start with a simple add method.

The above method just takes two arguments a and…

Image for post
Image for post
Photo by Kelly Sikkema on Unsplash

Finding the frequency of occurrence of unique combinations of items

To understand frequent itemsets one first needs to understand frequent and itemsets. Let us first look at what itemsets mean. simply put itemsets are the group of items that appear together in a transaction or record. The size of the group could be as small as 1 to as large as the number of all items within that transaction or record. Even size 0 could be considered but that would not produce anything meaningful.

Itemsets or Powerset

Let us dig deeper with some code on itemsets. Let us start with one example of a record.

This record only has three items apple…

Image for post
Image for post
Photo by Siora Photography on Unsplash

confidential, credentialed, and public

Since the publication of OAuth 2.0 in 2012, various RFCs have been published to extend the existing protocol as well as highlight security issues with grant types. It’s time for OAuth 2.1 sums it up very well.

If you want to implement a secure OAuth solution today, it requires reading: RFC 6749 (OAuth 2.0 Core), RFC 6750 (Bearer Tokens), RFC 6819 (Threat Model and Security Considerations), RFC 8252 (OAuth for Native Apps), RFC 8628 (Device Grant), OAuth for Browser-Based Apps, OAuth 2.0 Security Best Current Practice, RFC 7009 (Token Revocation), RFC 8414 (Authorization Server Metadata), and if you’re also implementing…

Salil Jain

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store