Python hacks for data-science.

Vinay Kudari
hacking-datascience
3 min readJul 21, 2018

In this story I’ll show how some tasks can be made easier/efficient using few built-in python tools, I’m considering Python 3.6.5 in all my stories henceforth.

Get the ipython notebook of this story from here

Generators

Before getting to know about generators let’s first understand iterators, So what is an iterator?

Iterator in python is simply an object that can be iterated upon.

next(iterator) will return one value at a time, when we reach the end there is no data to return then an exception will be raised. If an object can return an iterator then that object is know an iterable, few examples are lists, dictionaries, tuples, and strings.

example

So now what is a generator?

Generators are iterator objects, they get executed only when we need them.

When a generator function is called it returns an generator object without even executing the function, when we apply next( ) on that object the function starts executing until it reaches yield statement, and the yielded value is returned.

why?

To save memory space : When we are dealing with very large datasets which cannot be loaded into memory all at ones, we can use generators.

zip( )

zip( ) takes iterables as arguments and returns a zip object which contain list of tuples.

syntax

zip(iterable1, iterable2, …)

example

to unzip the zipped object we can use zip(*zipped_object)

example

List Comprehension

They are used to create new lists quickly from other iterables.

syntax

[expression condition loop condition]

note : condition’s are optional

example

example

why?

compact and faster : title says it so 😛

Lambda function

Lambda function doesn't have any name and sometimes called as an anonymous function, they can take arguments similar to a normal function but can only have a single expression, when it is called the expression is evaluated and the result is returned which can be assigned to any variable.

syntax

lambda input-arguments : expression

example

why lambda function?

ease of use : when the function isn't complex we could use lambda function instead of a normal function.

map( )

map( ) takes a function and a sequence as arguments, It returns a map object which contain the function call results.

map object can be converted to list using list(map-object)

syntax

map(function, iterable1, iterable2, …) 

example

filter( )

filter() takes a function and a sequence similar to a map function and filters the given sequence with the help of the input function that tests each element in the sequence to be true or not and returns a filter object.

syntax

filter(function, iterable)

example

why?

super fast execution : given multiple sequence arguments, it sends items taken form sequences in parallel as distinct arguments to the function.

reduce( )

reduce( ) takes a function and a sequence as arguments, when the reduce function is called the first two items of the sequence are passed on to the input function then the computed result along with the next item in sequence is again passed on to the function, this will repeat until the end of the sequence and the final result is returned.

note : reduce( ) has to be imported from functools

syntax

functools.reduce(function, iterable)

example

why?

fast computation : way faster than using loops.

glob( )

Returns an iterable of names matching the given pattern.

syntax

files = glob('sales*.csv')

returns a list which contains all the filenames like sales_1.csv, sales_2.csv….

why?

This is very useful when we have data in many files, we could read the list of files using a for loop.

See you again. 😃

--

--