When you start writing unit tests for your project, you might need to understand unittest.mock
in Python. Imagine that you are making a library interacting with Google Spreadsheet and trying to test it. Do we need to connect Google Spreadsheet for every test? That sounds really time-consuming. What if your project gets bigger and bigger? Your HTTP connections are gonna be huge! But don’t worry, you don’t need to speed up your Wi-Fi or make a phone call to Google to say your requests aren’t DoS attack :)
In this post, I’ll explain unittest.mock
, a built-in library for testing in…
For an NLP task, you might need to tokenize text or build the vocabulary in the pre-processing. And you probably have experienced that the pre-processing code is as messy as your desk. Forgive me if your desk is clean :) I have such experience too. That’s why I create LineFlow to ease your pain! It will make your “desk” as clean as possible. How does the real code look like? Take a look at the figure below. The pre-processing including tokenization, building the vocabulary, and indexing.
You can write below function to use multiprocessing with Lambda function:
When you handle tons of text files or images, you might want to use multiprocessing to speed up the processing. An intuitive way in Python is below:
import multiprocessingwith multiprocessing.Pool() as p:
result = p.map(lambda x: x ** 2, range(100))
But unfortunately, this won’t work because you cannot write a lambda function or a closure with multiprocessing. As for the reason, you can find it in Why? section below.
When you google this problem, you’ll find someone suggests you use joblib or pathos or something like that…
The purpose of the shortest paths problem is to find the shortest path from the starting vertex to the goal vertex. We widely use the algorithms to solve the shortest paths problem from competitive programming to Google Maps directions search. By understanding the key notion, “edge relaxation”, it is really easier to understand the concrete algorithms, say Dijsktra’s algorithm or Bellman-Ford algorithm. In other words, it might be difficult to make these algorithms your own without understanding edge relaxation. In this post, I focus on edge relaxation and explain the general structure to solve the shortest paths problem. Also, we’ll…
Do you happen to know the library, AllenNLP? If you’re working on Natural Language Processing (NLP), you might hear about the name. However, I guess a few people actually use it. Or the other has tried before but hasn’t know where to start because there are lots of functions. For those who aren’t familiar with AllenNLP, I will give a brief overview of the library and let you know the advantages of integrating it to your project.
AllenNLP is the deep learning library for NLP. Allen Institute for Artificial Intelligence, which is one of the leading research organizations of Artificial…
You just copy this alias below and paste it to your .bashrc
, .zshrc
or some configure file:
alias ipy="ipython --no-confirm-exit --no-banner --quick --InteractiveShellApp.extensions=\"['autoreload']\" --InteractiveShellApp.exec_lines=\"['%autoreload 2', 'import os,sys']\""
I often use IPython to develop my library or do some research, because IPython has really great features as follows:
I think you’ve already known IPython provides a good Python interpreter, but also known that you…
There are two fundamental ways of graph search, which are the breadth-first search (BFS) and the depth-first search (DFS). In this post, I’ll explain the depth-first search. Here, I focus on the relation between the depth-first search and a topological sort. A topological sort is deeply related to dynamic programming which you should know when you tackle competitive programming. For its implementation, I used Python. If you’d like to know the breadth-first search, check my other post: Understanding the Breadth-First Search with Python.
In the depth-first search, we visit vertices until we reach the dead-end in which we cannot find…
Today I will explain the heap, which is one of the basic data structures. Also, the famous search algorithms like Dijkstra's algorithm or A* use the heap. A* can appear in the Hidden Malkov Model (HMM) which is often applied to time-series pattern recognition. Please note that this post isn’t about search algorithms. I’ll explain the way how a heap works, and its time complexity and Python implementation. The lecture of MIT OpenCourseWare really helps me to understand a heap. So I followed the way of explanations in that lecture but I summarized a little and added some Python implementations…
There are two basic graph search algorithms: One is the breadth-first search (BFS) and the other is the depth-first search (DFS). Today I focus on breadth-first search and explain about it. Breadth-First Search is one of the essential search algorithms to tackle competitive programming. In this post, I’ll explain the way how to implement the breadth-first search and its time complexity. Please note that I don’t explain how to use it in competitive programming but these are useful for competitive programming. I use Python for the implementation. …
I often use Python, but I really don’t care about the way how Python works internally. So today I focus on the Python list and explain inside the implementation of it. Python’s list.append and list.pop change the list size dynamically, which make them run fast in O(1) time. Please note that list.pop for the last item only takes constant time. In this post, I’ll show the reason why it has become possible.
We call the data structure like Python list the dynamic array and call normal array the static array. This post is structured as follows.
Software engineer, My interest in Natural Language Processing