Python-Fleek e-book: Elevate Your Career Path, Code Your Way to Professional Eminence

Doug Creates
AI Does It Better
Published in
45 min readNov 26, 2023
Python-Fleek, Learning programming with ease. Elevate Your Career Path, Code Your Way to Professional Eminence
Python-Fleek: Elevate Your Career Path, Code Your Way to Professional Eminence

“Coding is hard, rocks are hard, my head is hard. I can’t learn that quickly.” That’s how I felt before college. Growing up with limited means, dreaming big seemed like a distant reality. Yet, I held a firm belief: with self-confidence, anything is achievable. This guide isn’t just about Python; it’s an attempt to lead todays’ dreamers to become tomorrows’ doers. My journey from the uncertainty of poverty to earning a Master of Science, from St Petersburg College to the University of South Florida, and contributing to giants like Intel and Amazon, mirrors this belief. Wherever you find yourself, remove ‘can’t’ from your vocabulary. Python is the skill that bridges dreams and reality.

Witness the technological evolution as Python emerges not just as a language, but as a tool of empowerment. With AI transforming life, mastering Python becomes more than a resume highlight — it’s a career highway. Calculus, data analysis, machine learning — once obscure knowledge become accessible with Python. This guide is your companion in demystifying these aspects. Whether you’re at the start of your career or looking to pivot, Python offers a path to not just participate in the future, but to actively create it. Your AI wand to “swish and flick” is Python!

A strategy for learning

The teaching method here combines top-down and bottom-up learning to give you incremental completeness. Bottom-up is analogous to building a house, where we would start from foundation, bricks, studs for vertical support, rafters and joist for the roof. In language, the analogy is phonemes, words, and sentences make essays and monologues. The problem with this approach is it takes too long to see the final result. Top-down aligns with Socratic teaching, where we start from a question and iterate towards the building blocks. Using both will enable you to see quick progress every chapter.

Too many programmers pick up bad habits that cripple them when it matters most. For that reason, the lessons include pitfalls, system design, and a development process along the way.

In other words, read about the building blocks of each chapter, modify and run the code, fix any runtime errors, and don’t give up to have incremental success, new super powers, and a worthwhile time. Each chapter gets better than the last. Be curious, be creative, and competitive with yourself. People with “try” in their vocabulary have a favorable chance of success.

Python-Fleek: Birds-Eye-View of Chapter Projects

Contents

Stick around for a firehose volume of content. Chapter 1 helps you write a terminal program (no visual interface). Chapters 3 through 6 assume notebooks are used. Chapters 7 and beyond will appear, focusing on product topics including AI web apps, system architecture, conceptual design, integration, promotion, feedback, and resource planning.

1. Introduction to Programming

What is Programming? Why Choose Python? Setting Up the Environment. Installing Python. Introduction to Anaconda. Using pip for Installing Packages. Your First Python Script. Understanding Syntax and Code Structure.

2. Basics of Python Programming

Variables and Data Types. Operators. Control Structures. If Statements. Loops. Functions. Error Handling and Exceptions. Importing Modules and Libraries.

3. Data Structures in Python

Lists, Tuples, Dictionaries, Sets, and Strings.

4. File Input/Output

Reading from and Writing to Text Files. Working with Paths and Directories. Handling Different File Formats (CSV, Excel).

5. Introduction to Libraries for Data Analysis and Visualization

Introduction to NumPy, Pandas, DataFrames, and Series. Basic Data Manipulation with Pandas. Data Visualization with Matplotlib and Seaborn. Creating Plots with Matplotlib. Introduction to Statistical Libraries (SciPy, Statsmodels).

6. Machine Learning with Python

Introduction to scikit-learn. Preprocessing Data. Building and Evaluating Models. Model Selection and Tuning.

Python-Fleek: Elevate Your Career Path, Code Your Way to Professional Eminence, Chapter 1
Python-Fleek: Elevate Your Career Path, Code Your Way to Professional Eminence, Chapter 1 — Definitions, applications, mediums, installation, syntax, and execution.

Chapter 1: Your First Program

1.1: Introduction to Programming

Whether you are a high school student or a PhD scientist looking for a job, Python has become the most popular programming language, making it a requirement to learn. “Coding” as some call it, is a great way to automate the boring repeatable parts of any task. You could analyze game data, discover a new protein, estimate something about the future, build the next Google and TikTok, or create something lame like a “fart analysis” joke app. Whatever the case, your mission needs Python. It’s a critical skill for automating tasks and creating applications.

Starting today, you are a programmer. Doing the exercises is a must. Reading alone will not help. The Chapter Challenges intend to raise questions that are answered as you progress.

1.2: Defining “Programming”

Programming means writing a set of instructions for a computer to execute an intended task. It’s like writing a recipe for a blind robotic chef. For illustration…

If “make a pizza” instructions say “Preheat the oven to 350F and put in the oven”, then your frozen pizza would taste like burnt cardboard. Computer instructions require all the small steps like “take the pizza out of the wrapping, open the oven door, and measure the temperature to verify nothing is wrong.”

blind robot following a recipe. What is a program?
Executing Code: A program is a blind robot following a recipe.

For clarity, programming means writing very clear instructions and well thought comments to minimize unintended effects during operation, then testing for limitations and defects, and finally patching with additional code to handle uncaught errors. As a programmer, you are strictly against monkey patching, ghost commenting, log gobbling, crash bashing, fork bombing, and rick rolling. (Real coding jargon).

1.3: Why Python?

Python shines for its readability, versatility, and the vast ecosystem of libraries. It is the go-to for various fields, including data science, web development, automation, and scientific computing. Compared to older languages, it’s easier to learn and add new functionality. Need proof? You will soon plot a bar chart showing evidence. Once you learn data analysis in Python, I offer an open door to campaign for disagreement. — Your silence is enough to hear a shadow passing, so I’ll continue.

  1. Ease of Learning: Straightforward syntax and readability.
  2. Extensive Libraries: A rich ecosystem of libraries like numpy and streamlit.
  3. Community Support: A large and active community for problem-solving.
  4. Flexibility: Suitable for different programming paradigms.
  5. Integration: Interacts with other languages and tools.
  6. Portability: Runs on various operating systems with few changes.
  7. Efficiency: Enables rapid development and deployment.
  8. Automation: Simplifies the automation of repetitive tasks.
  9. Scalability: Suitable for small scripts as well as large systems.
  10. Machine Learning: Powers leading-edge AI with libraries like tensorflow.

1.4: Real-World Python: Diverse Domain Applications

  • Financial Analysts use Python for KPI analysis. They measure variables like profit, mean DAU, and CAGR. Arithmetic operators calculate ROI, guiding investment decisions based on ROI categorizations using if statements. The process involves analyzing financial health and guiding strategic investments, with ROI as the key metric. Libraries like numpy and pandas are commonly used for data manipulation. Python loops automate the data processing funnel.
  • Bioinformaticians employ Python in protein sequence analysis. Lists store amino acid sequences, with loops and if statements identifying key biological patterns like binding sites. The goal is to understand protein functions, aiding in drug design and genetic research. biopython is a specialized library used in this process.
  • Manufacturing Engineers utilize Python to calculate chip production yields. They use functions to process batch data, applying loops and if statements to assess yield based on defect rates. The focus is on improving manufacturing efficiency, using yield rate as the primary metric. Libraries like scipy and numpy assist in data analysis.
  • AI Developers use Python for chatbot diagnostics, focusing on response accuracy and error management. Functions assess AI responses, while error handling addresses model timeouts or data inconsistencies. This process ensures reliable and effective chatbot interactions, crucial in customer service automation. Libraries like tensorflow and pytorch are often used for AI model development and diagnostics.
  • Data Scientists in the entertainment industry use Python for developing recommendation systems. They analyze user preferences and video metadata using machine learning libraries. Loops and conditional statements generate personalized suggestions, with functions refining recommendations based on user feedback. The objective is to enhance user engagement, measured by metrics like watch time and user retention. Libraries such as pandas for data handling, scikit-learn for machine learning, and tensorflow for advanced recommendation algorithms are commonly employed.
Python-Fleek: Elevate Your Career Path, Code Your Way to Professional Eminence, Hello World Chat Bot
Beginner Chat Bot: The typical first program for year-one software engineers is to print a message. “Clack-clack-clack” echos the keyboard. Enter. “Hello world!”

1.5: Project 1 — Hello World Chat Bot

This first challenge is to make a program and run it. Simple, but not easy. This is always the most challenging step because each computer has unique settings that affect you differently. Project 1 will use a terminal to execute (i.e. run, start, launch) the program, but later chapters switch to notebooks and web apps.

The terminal is the gateway to your inner computer, allowing you to run programs, notebooks, and web apps. Every computer since the 1960s executes programs in a terminal. It’s the simplest way to run programs, but too simple sometimes, as the minimalistic text-only interface was not designed for the immersive technologies of today. Terminals app development is included in the curriculum because notebooks and web apps are also executed in a terminal.

Notebooks (Jupyter, Kaggle Code, and Google Colab) help developers write code and see immediate changes without restarting the program. Notebooks offer an alternative to IDEs (integrated development environments; PyCharm and Jupyter Lab), the preferred code development tool for engineering. IDEs help write, test, and debug complex code with a family of integrated tools, but notebooks are preferred for prototyping, exploration, and research in favor of flexibility and customization. Chapter 3 introduces notebooks for debugging (to fix code) and benchmarking (to measure speed) because of the web features. Chapter 4 discusses notebooks and web apps to explore graphical interfaces.

Web apps, i.e. Chome/Firefox applications work best for graphics and rich formatting. Create personal projects, commercial prototypes, conference demos, and internal company tools with small web servers, like Streamlit, Django, and Flask. Production grade projects often use Nginx, the fancy army knife full of features, but more like a cannon or better yet, a NASA rocket? Chapter 4 introduces Streamlit because it takes 2 minutes to learn, although less friendly for interactive development than notebooks.

Both web apps and notebooks can be used as dashboards, contain form elements, rich text, images, and enable remote access. With networking knowledge, both can be accessed by phone, laptop, Playstation, Xbox, smart TVs, Oculus, Raspberry PI, and so many other options.

While you are encouraged to follow the steps in chapter 1, feel free to use Kaggle or Google Colab instead. Prefer to use your own hardware to have easy access to the terminal, notebooks, and to make web apps later on.

1.6: Setting Up

Find the terminal a.k.a. command line interface. Your computer has one. Choose either the python installer or Anaconda to get python 3 installed. If local installation fails or becomes too difficult, try Google Colab and create a notebook. Computers older than 10 years may need lots of effort to get everything setup. Notebooks can run terminal commands with exclamation (!) before a bash command. Windows users should install WSL2 and Ubuntu (Linux operating system running within Windows; found in the Windows Store app). Mac OS and Linux (Ubuntu, Mint, Clear) will need to dig through “Finder > Applications”, “Launchpad”, or Equal-sign-looking “Applications” menus. Android and iOS should use Colab and connect a mouse/keyboard.

IF (you have terminal access AND python is installed) OR (you have notebook access), THEN go to “Creating a Python Program”.

1.7: Getting Help from ChatGPT

If you cannot progress, try asking ChatGPT for help. For instance, you might try my Requirement-Bomb template. Consider adding rows to describe the problem, context, keywords, outline, output format, and evidence.

Write steps to run a "hello world" python script. 

Include all steps. Be detailed and complete, but do not waste syllables. Start from finding the terminal in Mac, Windows, and Ubuntu. Show me how to verify the python version and status. Show how to run python, jupyter notebook, and streamlit. Think step-by-step.

Requirements: Clear! Coherent! Concise! Comprehensive! Credible! Accurate! Valid! Unbiased! Relevant! Persuasive! Substantiated! Detailed! Methodical! Systematic! Logical! Analytical! Insightful! Original! Innovative! Forward-thinking! Purposeful! Focused! Structured! Well-structured! Organized! Articulate! Fluent! Engaging! Intuitive! User-friendly! Accessible! Understandable! Clear-eyed! Grounded! Validated! Peer-reviewed! Reviewed! Research-driven! Well-researched! Evidenced! Reliable! Professional! Specific! Measurable! Precise! Thorough! Refreshed! Updated! Modern! Timely! Efficient! Streamlined! Articulated! Refined! Exemplary! Illuminating! Enlightening! Informative! Readable! Succinct! Unambiguous! Pragmatic! Cohesive! Appropriate! Balanced! Authentic! Versatile! Aligned! Easy-to-follow! Authoritative!

1.8: Python Installation

Install python version 3.x and verify existence. Command your machine:

# Try these to see what version you have. Install if it's missing.
python --version
python3 --version
# Notebooks: Use ! before commands in a cell.
# Why? If the command starts with !, then run as Bash command, else run as Python command.
# If you can use Jupyter, Collab, or Kaggle notebooks, Python is installed. Check the version is 3.7 or later.
!python --version


# Windows:
choco install python
# macOS:
brew install python
# Linux (Ubuntu):
sudo apt install python3

1.9: Anaconda Overview

Honestly, I would rather use Anaconda (a.k.a. “conda”) any day than vanilla Python. Anaconda packages Python and its libraries in one box, catering especially to data scientists and engineers for scientific computing projects. They do a ton of engineering work to make installation simply work as expected, where I previously spent days on similar tasks in Java to match version compatibility. Anaconda makes this part very easy and safe.

# Install Anaconda on Unix:
wget https://repo.anaconda.com/archive/Anaconda3-2023.11-Linux-x86_64.sh
sh Anaconda3–2023.11-Linux-x86_64.sh
# Run "conda - help" to see usage examples.

1.10: Using pip

Python’s pip manages libraries effortlessly. Expand Python’s abilities as needed. Pip rarely conflicts with conda, but it’s easy to uninstall/reinstall if you see errors. Software engineers and data scientists reading this should use virtual environments (a.k.a. venv), which are out of scope, but worth looking into for avid coders.

# Upgrade pip
python -m pip install - upgrade pip
# Install packages
pip install pandas notebook jupyterlab streamlit voila numpy scipy matplotlib requests
Python-Fleek: Elevate Your Career Path, Code Your Way to Professional Eminence, Hello World robot
Programming: A humanoid robot coding in her dorm room, fixated on humanities last message to her kind, “Hello World…”

1.11: Creating a Python Program

Open a tool for editing text. Consider Notepad++ on Windows, nano on Linux/Mac, PyCharm on any operating system, JupyterLab (recommended), Jupyter Notebook, Colab, or anything else. Copy the code below. Run via terminal: paste and save it as hello.py, then run python hello.py. Run via notebooks: paste to a cell and run (check the menus for help; maybe ctrl/cmd/shift + enter; or Run “>” button). The inline comments briefly introduce syntax and other ideas. Unless you see an error, the instructions: copy, paste, run, interact, and experiment are all to this chapter.

# hello.py - Chapter 1 example project
## Comments start with a # pound sign.
## Start programs with comments to explain who, what, when, why, and how to help your future self remember. You're welcome!
## Each program file should have a single purpose or theme.
## Consider naming scrap code in the format "YYYY-MM-DD_abstract_label_THEN_specific_label" ending with ".py" for Python and ".ipynb" for Python Notebooks.
## print() is a function that accepts one or more strings. It converts non-str type to str (string class name).
print("""Welcome to Python!
,=e Hiss!
`-.
.__,-'
(This is a text block)
""")
## Case matters: Trying PRINT('hello') or Print('hello') will fail.
## Store text as a variable; "tab and space" at the end.
question = "What's your age?\t "
age = input(question)
message = f"Already {age}? Ah. Vintage edition, value rising!"
print(message)
# Write conversation to a file using append mode.
with open('project_1.txt','a') as file_handle:
file_handle.write(f'bot: "{question}"\n')
file_handle.write(f'user: I\'m "{age}" years old.\n')
file_handle.write(f'bot: "{message}"\n')

The corresponding output is shown in the next image. After getting it to run, try making small iterative changes until it breaks or try to personalize it. Snap a picture of the output and share this milestone with someone special.

python hello world hello.py python-fleek
Jupyter Notebook: Expected output for project 1, “Hello World”.

Customize it your way and post a screenshot.

Console: Expected output for project 1, “Hello World”.

Both version of the programs work the same.

1.12: Basic Syntax

Python’s syntax is straightforward and human-readable, facilitating a smooth learning curve. Try experimenting with these in your code before diving deep in the next chapter. Hands-on learning is required. Don’t hesitate. Explanations will follow later. Writing code will build confidence and understanding.

# Comment
### Also a comment
""" | |
| This is a bad way to create comments. |
| | """

# Variables and mathematical operations:
a, b = 30, 20
# This writes a message to the terminal or notebook.
print(f"Difference: {a - b}, Quotient: {a / b}")

# 'a' and 'b' are bad variable names because CTRL+F will have too many matches.
# Easy convention is to double it.
aa,bb = 30,20

# Conditional if-then logic:
if aa > bb:
print("a is greater than b")

# Loop constructs: for i=0 while i<=5, increments i by 1 each time
for ii in range(5):
print(ii, end=' ')

# Functions are reusable:
def greet(name):
return f"Hello, {name}!"
print(greet("Pythonista"))

# String variable assignment. These are functionally equivalent.
multiple_line_text = """1
2
3"""
single_line_text = '1\n2\n3'
also_single_line_text = "1\n2\n3"
f_string_text = "{first}\n{second}\n{third}".format(first=1, second='2', third=3)

# Print message if condition is met:
if f_string_text == multiple_line_text: print('"f_string_text" is equal to "multiple_line_text"')
if f_string_text is multiple_line_text: print('"f_string_text" is the same variable as "multiple_line_text"')

1.13 World Map of Python Language

For the sake of completeness, this is 99% of Python. Print and attach this section to the fridge. These will be slowly introduced through the chapters, but a few hints will help disambiguate Python’s simple syntax.

English has adjectives, adverbs, antonyms, articles, aspects, clauses, complements, compound words, conjunctions, countability, determiners, direct/indirect objects, discourse elements, gerunds, homonyms, homophones, interjections, modifiers, moods, non-standard forms, participles, phrases, prefixes, prepositions, pronouns, quantification, sentence structures, sentence types, suffixes, tenses, infinitives, verbs, voice, word formation, word relationships, nouns and so much more.

Whereas the Python language can be concisely summarized as follows:

  • Variable names are custom symbols to represent data, such asx or mispelld_var_42i.
  • Reserved keywords cannot be used as variable names because they make Python function.
  • Implicit operator functions use math or punctuation symbols, but equate to f(x), f(a,b), f(*arguments, **keyword_args) or obj.f(x).
  • Explicit operator functions look like f(x), f(a,b), f(*args, **kwargs) or obj.f(x). Python may internally translate a function call like str(3.14) to (3.14).__str__(), where 3.14 is a float class (floating point decimal; continued in Chapter 2).

Reserved Keywords:

— Control flow: if, else, elif, while, for, break, continue, return
— Exception handling: try, except, finally, raise, assert
— Function and class definition: def, class, lambda, yield
— Context management: with, as, pass
— Logical operations: and, or, not
— Namespace management: import, from, as, global, nonlocal, del
— Asynchronous programming: async, await
—Static values: True, False, None

Implicit and Explicit Operators and Functions:

— Arithmetic operators: +, -, *, /, %, //, **
— Comparison operators: ==, !=, <, >, <=, >=
— Assignment operators: =, +=, -=, *=, /=, %=, //=, **=, &=, |=, ^=, >>=, <<=
— Logical operators: and, or, not
— Bitwise operators: &, |, ^, ~, <<, >>
— Identity operators: is, is not
— Membership operators: in, not in
— Sequence operators: Indexing [], slicing [:], concatenation +, multiplication *, membership in
— General: dir(), help(), id(), type()
— Conversion: int(), float(), str(), bool(), list(), tuple(), set(), dict()
— Mathematical: abs(), round(), sum(), min(), max()
— Iteration: len(), range(), zip(), enumerate(), iter(), next()
—Miscellaneous: print(), input(), format()

Disambiguation

  • Arithmetic vs. Sequence Operations
    + and *: With numbers, + adds, * multiplies. With strings and lists, — + concatenates, * repeats elements.
  • Assignment Hybrid-Operators
    +=, *=, /=, -=, …: In-place hybrid, e.g., x += 1 means x = x + 1.
  • Bitwise (Element-wise) vs. Logical Operators
    &, |: Bitwise and/or binary operations.
    and, or: Logical for boolean comparisons.
    ~: Logical not for pandas and numpy.
  • Identity vs. Equality
    is vs ==: is checks if variables are the same object, == checks if values are equal.
  • Sequence Access and Manipulation
    []: Indexing, accesses an element.
    [:], [::], [::,::], [from:to:increment]: Slicing, extracts part of a sequence. Libraries like numpy and pandas allow for [::,::] and [::,::,::].
    +, *: Concatenates and repeats sequences, respectively.
  • Type Conversion
    int(), float(), str(): Change types from int('3') (was str) to integer 3, from float('2.5') to 2.5, and from str(3.34E-1) to 0.334.
  • Iteration Helpers
    len(seq): Gets collection size.
    range(start,stop,increment): Creates number sequence.
    zip(seq,seq,...): Pairing elements from collections. Wide-to-long data structuring.
    min(), max(): Iterate to find minimum/maximum extremes of comparables.
  • String Conversion
    str(x): Converts x to str (string of characters; text).
    x.__str__(): Internal method called by str(x). Defines object’s string representation.
  • Object Representation:
    x.__repr__(), repr(x): Internal method for an object’s official string representation, useful for debugging.
  • Variable names:
    — Use prefixes and postfixes consistently to remember the data type, such as d_name_map (dict), ser_active_users (pd.Series), df_table (pd.DataFrame), or PythOn_Fleek_df (also good).
    — Avoid single characters because searching for it is a nightmare.
    — UPPERCASE implies constant/unchanging.
    — CamelCase is controversial.

There are too many nuances to cover in a single e-book, so please find external references for engineering details not covered.

1.14: “print()” Is Your Friend

Any time you are stuck, print a message in your script to confirm your code does what you expect. However engineers need to use logger and the debugger, which are out-of-scope. Write to a file if volatility or scale is a problem. Combine with f-strings, format, str, and repr for easy self-help.

  • Utility Functions:
    print(), print(x), print(a,b): Displays output for zero or more inputs. Internally calls str(x) on each input to ensure the input is printable.
    input(): Gathers user input.
    format(): Customizes string formats.

1.15: Summary of Chapter 1

Congratulations! You are a programmer by technicality. In the first project, you completed the typical “hello world” starter program and learned how to create a basic program that listens and talks. The topics included reading input, writing files, printing messages, and assigning variables.

In the hello.py program, you created the variable age to store an immutable reference to a value. The program did not require valid inputs, which is dangerous in programming. Garbage in, garbage out or in the case of T-Mobile hacker infiltrate, data exfiltrate. Try breaking the program using bad input to understand. What could be done differently to prevent errors and garbage? Let’s revisit in Chapter 3.

Another problem is maintenance and reusability. If we had lots of code, it would be difficult to reuse this. Prefer to write functions and classes that abstract the task similar to the under-appreciated print() function and only reinvent when innovating. I always ask myself “Why this? How is this different or better?”

In the program, some characters (i.e. digits, letters, punctuation, and white space) were encoded with a slash, but showed up correctly when printed. Notice the use of f-strings where a variable can be inserted implicitly in the text.

All of the outputs were saved to a file called “project_1.txt” with append-write mode. Running the program multiple times will create a longer file. Using “w” instead of “a” will write and replace. Too easy? Well, here’s a challenge!

Python-Fleek: Elevate Your Career Path, Code Your Way to Professional Eminence, DNA challenge
Challenge1 — Biology-Chemistry: Translate a DNA Sequence to Protein Sequence. Challenge 2 continues the topic with BioPython.

1.16: Challenge 1 — DNA to Protein Sequence Translator

Implement a DNA to Protein sequence translator using the genetic code. Fetch DNA data, translate it into a protein sequence considering codon usage, and handle any exceptions. Further reading at Wikipedia, RCSB Protein Data Bank (PDB), and NIH.

def translate_dna(dna_seq):
codon_table = {
'ATA':'I', 'ATC':'I', 'ATT':'I', 'ATG':'M',
# … [add all codons]
}
lst = [codon_table.get(dna_seq[pos:pos+3], '?')
for pos in range(0, len(dna_seq), 3)]
protein_seq = "".join(lst)
return protein_seq

# Sample DNA sequence (use a real example from a database)
sample_dna = "ATGTTT…"
print(sample_dna,'->',translate_dna(sample_dna))
Python-Fleek: Elevate Your Career Path, Code Your Way to Professional Eminence, Chapter 2
Python-Fleek: Elevate Your Career Path, Code Your Way to Professional Eminence, Chapter 2 — Data types, operators, control structures, and imports.

Chapter 2: Basics of Python Programming

Python emerges as a vital tool in the technology toolkit, connecting diverse fields with its straightforward yet powerful capabilities. This chapter dives deep into the essence of Python, illustrating its fundamental role through practical scenarios. We explore how Python intersects with commerce, physics, and text analysis, highlighting its adaptability. After visiting three real-world scenarios, read through the menu of new features to consider in the next challenge.

2.1: Variables and Data Types

Data types are the classification or categorization of data items. Python supports various types such as strings, integers, floats, and objects, each with a unique purpose. Variables represent a symbol (e.g. “x”), a state (initially x=0, then the value updates to x=5), and memory (5 is an integer costing 8 bytes or 64 bits). Objects are the data structures discussed in chapter 3, which enable creative opportunities.

x = 0
x = 5
print('x:', x)
print('data type of x:',type(x))

# String: Sequence of characters
protein = "Hemoglobin"

# Integer: Whole number without a fractional part
electron_count = 6

# Float: Number with a decimal point
pi = 3.14159

# Boolean: True or False values
is_active = True

# List: Ordered, mutable collection of items
colors = ["Red", "Green", "Blue"]

# Tuple: Ordered, immutable collection of items
coordinates = (10.0, 20.0)

# Set: Unordered collection of unique items
unique_amino_acids = {"A", "C", "E", "D"}

# Dictionary: Unordered, mutable collection of key-value pairs
enzyme_activity = {"Trypsin": 6.4, "Pepsin": 2.5}

# NoneType: Represents the absence of a value
data_not_available = None

# Complex: For complex numbers with a real and imaginary part
complex_number = 4 + 5j

2.2: Operators

Operators are the constructs which can manipulate the value of operands.

# Arithmetic: Adds two operands
proton_charge = 1.6e-19 # Coulombs
electron_charge = -1.6e-19 # Coulombs
total_charge = proton_charge + electron_charge

# Comparison: Compares two values
mass_proton = 1.67262192369e-27 # kg
mass_neutron = 1.6735575e-27 # kg
is_equal = (mass_proton == mass_neutron)

# Logical: Combines conditional statements
hour = 23
is_daytime = (hour >= 6) and (hour <= 18)

# Identity: Checks if two variables share an identity
molecule1 = 'H+'
molecule2 = 'H+' # same value, but different instance
is_same_molecule = (molecule1 is molecule2)

# Membership: Checks for membership in a sequence
amino_acid_sequence = 'CH3-S-(CH2)2-CH(NH2)-COOH'
alternate_amino_acid_sequence = ['C', 'H', '3', '-', 'S', '-', '(', 'C', 'H', '2', ')', '2', '-', 'C', 'H', '(', 'N', 'H', '2', ')', '-', 'C', 'O', 'O', 'H']
has_nitrogen = ('N' in amino_acid_sequence)

read_flag = 1 << 1 # shift 1 by one bit, effectively multiplying by 2
write_flag = 1 << 2 # shift 1 by two bits, effectively multiplying by 4

# Bitwise: Operates bit by bit
# Assign 2 + 4 == 6 to "flags" variable. Bitwise: (010 or 100) == 110
flags = (read_flag | write_flag)

# Assignment: Assigns a value to a variable
inventory_count = 0

# "I don't want to write x = x + 1" Assignment:
# Updates a variable using operators
new_stock = 99 # units
inventory_count += new_stock

# Chaining: Compares multiple operands
molecular_weight = 16.04 # g/mol CH4
is_in_range = 1 <= molecular_weight <= 300
IRL Traffic Control Structures: Robotic NWAPD cop imagines being replaced by humanity while directing traffic on another dull evening, then ponders the irony of this situation. Sections 2.3–2.8 demonstrates five types of code sequence control mechanisms: if-then, loops, functions, errors, and imports.

2.3: Control Structures

Control structures direct the flow of logic in a program. The format for IF-THEN is always “if condition 1, then action 1, else if condition 2, then action 2, otherwise do the default action.” The format of a FOR LOOP is “for item in sequence, do something.” The format of a WHILE LOOP is “while condition is true, do something.” The format of a UNTIL LOOP is “do something, until condition is true.”

Let’s apply this to road intersection traffic control lights:

def can_cars_drive_safely(light_color):
## This requires the color to be a string. Extra spaces are removed. Case is lowered.
light_color = str(light_color).strip().lower()
## If the color is not valid, choose a safe value.
if light_color not in ['red','green','yellow']:
light_color = 'red'
if light_color == 'red':
return "Please stop before the white line."
elif light_color == 'yellow':
return "Be safe!"
else:
return "Get moving."

To simulate more of the traffic scene, let’s consider a loop.

from time import sleep

is_north = True
delta_time = 1 # seconds
light_cycle_time = 20 # sec
direction = 'North-South'

while True:
for t in range(0, light_cycle_time+1, delta_time):

# Schedule when color is shown
light_color = 'red' if t < light_cycle_time*0.40 else ('green' if t > light_cycle_time*0.55 and t < light_cycle_time*0.90 else 'yellow')
print(f'\nThe light facing {direction} is {light_color}.')

# Human-readable labels
direction = 'North-South' if is_north else 'East-West'

# Call function to get the message.
message = can_cars_drive_safely(light_color)

# Show the message
print(f'Robotic cop signals to {direction} traffic: "{message}"')
sleep(delta_time)

is_north = not is_north # Change direction

# CTRL+C or click STOP to end the infinite loop

2.4: If Statements

If statements allow conditional execution of a code block.

# Single condition
if is_base_pair('A', 'T'):
bond_type = 'Watson-Crick'

# Multiple conditions
if temperature > 100:
state = 'Gas'
elif temperature > 0:
state = 'Liquid'
else:
state = 'Solid'

# Conditional expressions (ternary operator)
status = 'complete' if task_completed else 'incomplete'

2.5: Loops

Loops are used for iterating over a sequence.

# For loop: Iterate over a sequence
for i in range(5):
print(i)

# While loop: Repeats as long as a condition is true
while temperature < 100:
temperature += 1

# Nested loops: One loop inside another
for row in matrix:
for element in row:
print(element)

# Loop with else: Runs if the loop was not terminated by a break
for n in numbers:
if n == 0:
break
else:
print("No zeros found")

# List comprehensions: A concise way to create lists
squares = [x*x for x in range(10)]

2.6: Functions

Functions are blocks of code that perform a specific task. Ideally, they are also reusable, contain type checking, error handling, and comments.

# Define a function
def calculate_distance(x1, y1, x2, y2):
return ((x2 - x1)**2 + (y2 - y1)**2)**0.5

# Call a function
distance = calculate_distance(1, 2, 3, 4)

# Default parameter values
def cylinder_volume(radius, height=10):
return pi * radius * radius * height

# Variable number of arguments
def sum_all(*args):
return sum(args)

# Lambda functions
square = lambda x: x * x

# Decorators: Modify the behavior of functions
def verbose(function):
def wrapper(*args, **kwargs):
print("Arguments were: ", args, kwargs)
return function(*args, **kwargs)
return wrapper

2.7: Error Handling and Exceptions

Error handling is crucial for dealing with the unexpected in a program. This is the place to put self-documenting error messages for your future-self to read.

# Try-except block
try:
result = x / y
except ZeroDivisionError:
result = 'undefined'

# Multiple exceptions
try:
process_data(data)
except (TypeError, ValueError) as e:
log_error(e)

# Else clause in try-except blocks
try:
user = load_user(user_id)
except UserNotFound:
create_default_user()
else:
initialize_session(user)

# Finally: Always executed
try:
save_changes()
finally:
close_connection()

# Raise: Trigger an exception manually
if not validate_email(email):
raise ValueError('Invalid email')

2.8: Importing Modules and Libraries

Modules and libraries extend Python’s functionality.

  • import os Load operating system functions.
  • from math import sqrt Load only the square root function.
  • import pandas as pd Load the “panel data” library and locally call it pd.
  • if debug_mode: import debug_tools Assuming debug_mode is defined and True, import debug_tools.
  • from scipy.optimize import curve_fit From the “scientific python” library, within the optimize package, load the curve_fit function.
  • import mymath Import my own custom library found at ./mymath.py
  • reload(mymath) Use importlib to reload code. Loading usually only happens once.
  • numpy = __import__(module_name) Import by name.
# Import a standard library module
import os

# Import a function from a module
from math import sqrt

# Importing with aliases
import pandas as pd

# Import all names from a module
from sys import *

# Conditional imports
debug_mode = False
if debug_mode:
import debug_tools # hypothetical import

# Importing a module from a package
from scipy.optimize import curve_fit

# Custom module import
import mymath # hypothetical import

# Reloading a module
from importlib import reload
reload(mymath)

# Using __import__() for dynamic importing
module_name = 'numpy'
numpy = __import__(module_name)

Revisit these descriptions above! Here are 3 projects to demonstrate the new skills.

Python-Fleek: Elevate Your Career Path, Code Your Way to Professional Eminence, Retail KPIs
KPI Analytics: Imagine interning at Temu-zon-mart online retail company. Project 2 demonstrates how to measure a financial funnel’s key performance indicators, the health signals of the company. Project 6 continues the topic with data queries.

2.9: Project 2 — Online Retail Analytics for Business Technology

E-commerce relies heavily on data. Metrics like Click-Through Rate (CTR), Conversion Rate, and Margin are critical. This tutorial teaches Python calculations for these metrics. An advanced version uses pandas for DataFrame manipulation. Further reading at Wikipedia.

Key Performance Indicators (KPIs) are crucial in the retail industry, serving as measurable values that gauge how effectively a company is achieving its key business objectives. These indicators are pivotal for tracking progress toward various goals, including financial achievements, marketing objectives, and operational improvements. Retail companies leverage KPIs to evaluate their performance, discern trends, inform decision-making processes, and strategize for future growth, thereby offering a quantifiable basis for assessing success and pinpointing areas needing enhancement.

Click-Through Rate (CTR), a pivotal metric in digital marketing, is defined as the ratio of the number of clicks on an advertisement to the number of times the advertisement is shown, expressed as a percentage. The formula for CTR is given by: CTR=(Total Clicks on Ad / Total Impressions)×100%

In online retail, measuring CTR is instrumental in evaluating the effectiveness of advertising campaigns in attracting potential customers. A high CTR is indicative of effective ad targeting and messaging, crucial for increasing website traffic.

Conversion Rate (CVR) is a key metric representing the percentage of visitors to a website who complete a desired action, like making a purchase. The CVR is calculated using the formula: CVR=(Number of Conversions / Total Visitors)×100%

For online retailers, CVR is vital for assessing the effectiveness of their website and marketing initiatives in converting visitors into customers. Optimizing the sales funnel and enhancing customer experience are strategies employed to boost CVR and, consequently, sales.

Impressions measure the number of times an advertisement or any form of digital content is displayed on a user’s screen. They are used to gauge the reach of an ad campaign and are integral in forming strategies for brand awareness.

Margin is a critical financial metric indicating the percentage of revenue remaining after deducting all operating expenses. The formula for Margin is: Margin=([Revenue−Cost] / Revenue)×100%

Online retailers utilize Margin to assess their profitability, informing decisions related to pricing, marketing, and operations to enhance profit margins.

Tutorial Highlights:

  • Learn to calculate click-through-rate, conversion-rate, and margin in Python.
  • Understand Python’s role in simplifying e-commerce data analysis.
  • Explore pandas for enhanced data manipulation. This program calculates important business metrics such as CTR, CVR, and Profit Margin for an online retailer. It demonstrates variables, basic arithmetic operations, and functions.

This code is relevant to data science, data analytics, and business intelligence where impressions, clicks, conversions, cost, and revenue are provided by the business, calculated by a data funnel, and displayed in a dashboard.

IMPORTS:

# No imports needed for the basic version.
import pandas as pd # For the pandas version.

FUNCTIONS:

# Basic Version
def calculate_metrics(impressions, clicks, conversions, cost, revenue):
ctr = clicks / impressions
conversion_rate = conversions / clicks
margin = (revenue - cost) / revenue
return ctr, conversion_rate, margin

# Using Pandas
def calculate_metrics_pd(data):
data['CTR'] = data['Clicks'] / data['Impressions']
data['Conversion_Rate'] = data['Conversions'] / data['Clicks']
data['Margin'] = (data['Revenue'] - data['Cost']) / data['Revenue']
return data

USAGE:

# For the basic version
impressions, clicks, conversions, cost, revenue = 1000, 50, 10, 200, 500
ctr, conversion_rate, margin = calculate_metrics(impressions, clicks, conversions, cost, revenue)

# For the pandas version
data = pd.DataFrame({'Impressions': [1000], 'Clicks': [50], 'Conversions': [10], 'Cost': [200], 'Revenue': [500]})
data = calculate_metrics_pd(data)

PRINT SUMMARY:

print(f'''Online Retail Analytics Summary:
- The Click-Through Rate (CTR) is the ratio of clicks to impressions. It helps in assessing the effectiveness of advertising.
- The Conversion Rate is the ratio of conversions to clicks, indicating the success of turning interest into sales.
- The Margin is the profit margin, a measure of profitability.
Inputs:
Impressions: {impressions}, Clicks: {clicks}, Conversions: {conversions}, Cost: {cost}, Revenue: {revenue}
Outputs for Basic Version:
CTR: {ctr}, Conversion Rate: {conversion_rate}, Margin: {margin}
Outputs for Pandas Version:
{data.to_string(index=False)}
''')

KEY TAKEAWAYS:

  • CTR, Conversion Rate, and Margin are vital metrics in e-commerce.
  • Python simplifies complex data calculations.
  • pandas enhances data manipulation, making Python a powerful tool for business analytics.
Python-Fleek: Elevate Your Career Path, Code Your Way to Professional Eminence, Physics Girl doing work
Physics Analytics: Imagine interning at NASA and you are asked to measure vibration as work (joules) in regard to variable force (newtons) over time (seconds). Project 3 demonstrates how to code calculus and physics. Project 7 continues the topic with a basic physics simulation.

2.10: Project 3— Physics Calculation: Work Done by a Variable Force

In physics, calculating work done by a variable force involves integral calculus. This Python tutorial covers the approximation of such calculations. It introduces control structures and mathematical functions. A scipy version offers a more efficient, library-based method. Suitable for beginners and advanced learners, this tutorial demonstrates Python’s adaptability.

Force: Force is a fundamental concept in physics, representing an interaction that changes the motion of an object. Mechanically, it’s a vector quantity, meaning it has both magnitude and direction. In practical terms, force can cause an object to accelerate, slow down, change direction, or alter its state from rest to motion (or vice versa).

What makes force unique is its ability to quantify the interaction between objects or between an object and its environment. Newton’s Second Law (F=m×a) is a pivotal formula in physics, linking force (F) to mass (m) and acceleration (a). This law underscores the core role of force in understanding motion and physical interactions.

Work: In physics, work is the measure of energy transfer that occurs when an object is moved over a distance by an external force. It’s defined as the product of the force applied to an object and the distance the object moves in the direction of that force. Mechanically, it’s expressed as W=F×d×cos⁡(θ), where W is work, F is the magnitude of the force, dd is the distance moved, and θ is the angle between the force and the direction of motion.

Work is special because it bridges the gap between force and energy. It offers a quantifiable way to understand how forces cause changes in the energy of a system, which is central to the concepts of mechanics and energy conservation.

Work Done by a Variable Force: When dealing with a variable force, the work done is not constant over a distance. In such cases, the work done is calculated as the integral of force with respect to distance. Mathematically, it’s expressed as W=∫F(x), a < x < b, where W is the work done, F(x)is the force as a function of distance x, and a and b are the limits of the displacement.

The concept of work done by a variable force is crucial in scenarios where force changes with respect to position, such as in springs or varying gravitational fields. This approach using integral calculus allows for a more accurate and comprehensive understanding of work in dynamically changing systems. It exemplifies the application of calculus in physics, providing a deeper insight into how forces act over distances where they are not constant.

Tutorial Highlights:

  • Understand physics work calculations using integral calculus in Python.
  • Learn Python’s syntax and control structures for physics calculations.
  • Apply scipy for efficient numerical integration.This program calculates the work done by a variable force using the concept of integrals in physics. It introduces control structures and more complex mathematical operations.

IMPORTS:

# Basic math
from math import sqrt

# SciPy for numerical integration
from scipy.integrate import quad

FUNCTIONS:

# Basic Version
def work_done(force_func, start, end):
dx = 0.001 # small change in x
work = 0
x = start
while x < end:
work += force_func(x) * dx
x += dx
return work

# Using SciPy
def work_done_scipy(force_func, start, end):
result, _ = quad(force_func, start, end)
return result

USAGE:

# For the basic version
force_func = lambda x: 5 * sqrt(x) # Example variable force function
start, end = 0, 10
work = work_done(force_func, start, end)

# For the SciPy version
work_scipy = work_done_scipy(force_func, start, end)

PRINT SUMMARY:

print(f'''Physics Calculation: Work Done Summary:
- Work done by a force is the integral of force over distance.
- This example uses a numerical method to approximate the integral.

Inputs:
Force Function: {force_func}, Start: {start}, End: {end}

Outputs for Basic Version:
Work Done: {work:0.3f}

Outputs for SciPy Version:
Work Done with SciPy: {work_scipy:0.3f}

Mathematical Algorithm:
Work = ∫ F(x) dx from {start} to {end}
''')

KEY TAKEAWAYS:

  • Work calculations in physics can be approximated using integral calculus.
  • Python’s simple syntax and control structures facilitate such calculations.
  • scipy, a scientific library, offers efficient numerical integration tools.
Python-Fleek: Elevate Your Career Path, Code Your Way to Professional Eminence, Search engine, information retrieval, inverted index, Jaccard similarity.
Information Retrieval: Everyone depends on search. Demand for search experts ascends every year. So here’s the TLDR! An inverted index recalls data from a corpus in near constant time. Jaccard with n-gram tokenization measures proportional similarity. Project 4 demonstrates how to sort by similarity. With a little more code, we might call it a search engine. Project 8 continues the topic.

2.11: Project 4— Inverted Index with Jaccard Similarity

This program builds an inverted index for a list of strings and calculates the Jaccard similarity between them. It demonstrates string processing, data structures, and algorithms.

An inverted index is a bit like a detailed map for finding words in a large collection of documents. Imagine you have a huge library of books and you want to find every place a specific word appears. An inverted index does exactly this for digital documents. It lists every word and tells you where (in which documents and at what positions) these words can be found.

This system is incredibly useful for search engines and databases where you need to quickly find information in tons of documents. It’s like having a super-efficient assistant who can instantly point out where in a mountain of papers a particular word is mentioned. The reason it’s so popular is because of its speed and accuracy. When you type a search query, the inverted index quickly scans its map of words and brings you the results in a flash. This efficiency is what makes it a backbone technology for many search engines and large-scale text retrieval systems.

Jaccard Similarity is a way of measuring how similar two sets of items are. Imagine you have two bags of groceries. Jaccard Similarity helps you figure out how similar these two bags are based on the items they contain. It does this by comparing the number of items they have in common against the total number of unique items across both bags.

This method is widely used in fields like data analysis, ecology, and even in recommending products or content in online platforms. It’s like a mathematical tool for comparing shopping lists, ingredients in recipes, or users’ preferences. What makes Jaccard Similarity special is its simplicity and effectiveness in a variety of scenarios. It provides a clear and easy way to understand how similar or different two sets of items are, making it a valuable tool in many areas of research and technology.

Tutorial Highlights:

  • Learn to build inverted indexes and calculate Jaccard similarity in Python.
  • Understand Python’s data structures for efficient text processing.
  • Explore numpy for large-scale text analysis.

The Python implementation is straightforward yet powerful, using dictionaries and set operations. A numpy-enhanced version demonstrates Python’s ability to handle large datasets efficiently.

IMPORTS

# No imports for the basic version
from collections import defaultdict
import numpy as np # For the numpy version

FUNCTIONS

# Basic Version
def jaccard_similarity(set1, set2):
intersection = len(set1.intersection(set2))
union = len(set1.union(set2))
return intersection / union

def inverted_index(corpus):
index = defaultdict(set)
for i, line in enumerate(corpus):
for word in line.lower().split():
index[word].add(i)
return index

# Using NumPy
def jaccard_similarity_np(array1, array2):
intersection = np.intersect1d(array1, array2).size
union = np.union1d(array1, array2).size
return intersection / union

def inverted_index_np(corpus):
index = defaultdict(list)
for i, line in enumerate(corpus):
for word in line.lower().split():
if word not in index:
index[word] = np.array([])
index[word] = np.append(index[word], i)
return index

USAGE

query = 'quick fox'
corpus = ["the quick brown fox", "the lazy dog",'nothing quick','victory of the dog']

# For the basic version
index = inverted_index(corpus)
candidates = index['quick'] | index['fox']
candidate_data = [corpus[ii] for ii in candidates]
results = sorted(candidate_data, key=lambda x: -jaccard_similarity(set(x.split()), set(query.split())))

# For the NumPy version
index_np = inverted_index_np(corpus)
candidates = np.union1d(index_np['quick'], index_np['fox']).astype(np.int32)
candidate_data = [corpus[ii] for ii in candidates]
results_np = sorted(candidate_data, key=lambda x: -jaccard_similarity_np(set(x.split()), set(query.split())))

PRINT SUMMARY

print(f'''Inverted Index with Jaccard Similarity Summary:
- The Jaccard Similarity measures the similarity between two sets.
- An inverted index maps each word to its document locations.

Inputs:
List[Strings]: {corpus}

Outputs for Basic Version:
Inverted Index: {index}
Sorted by Jaccard Similarity with sets: {results}

Outputs for NumPy Version:
Inverted Index with NumPy: {index_np}
Sorted by Jaccard Similarity with NumPy: {results_np}

Mathematical Algorithm:
Jaccard Similarity = |Intersection(A, B)| / |Union(A, B)|
''')

KEY TAKEAWAYS:

  • Inverted indexes and Jaccard similarity are crucial in text processing.
  • Python’s inherent data structures support efficient implementation.
  • numpy amplifies data handling, beneficial for large-scale text analysis.

2.12: Pitfalls in Python Programming Basics

  • Misunderstanding mutable and immutable data types can lead to unexpected bugs, especially when modifying collections like lists or dictionaries.
  • Misapplying operators, especially logical and bitwise ones, can cause logic errors. Overusing operators in complex expressions can make code hard to read and debug, leading to subtle bugs that are difficult to trace.
  • Misusing control structures, like nested if statements or improper loop conditions, can lead to inefficient code or infinite loops. Overcomplicating control flow with unnecessary conditions can make the code less readable and more error-prone.
  • Incorrect use of function parameters, especially mutable default parameters, can lead to unexpected behaviors. Not understanding local vs global scope within functions can cause variable conflicts and bugs. Functions should be designed with clear intent and limited side effects to maintain code integrity.
  • Ignoring error handling or using broad exception clauses can mask underlying problems, making debugging difficult. Overusing or improperly managing module imports, especially in large projects, can lead to conflicts, increased memory usage, and slower performance. It’s important to handle exceptions specifically and import only necessary modules to keep the code efficient and maintainable.

2.13: Summary of Chapter 2

Chapter 2, “Basics of Python Programming,” served as a guide for understanding Python’s fundamentals. Variables and data types were first, providing the tools for data representation and storage, foundational for any programming task.

Operators followed, divided into arithmetic for calculations and logical for conditional operations, critical for manipulating and evaluating data. Control structures, including if statements and loops (for and while), were discussed, underlining their importance in directing program flow and enabling repetitive tasks.

Functions were emphasized for their role in creating reusable, organized code, a practice essential for efficient programming. Error handling and exceptions addressed methods to manage and respond to errors, crucial for building reliable and robust applications.

The chapter concluded with importing modules and libraries, demonstrating Python’s ability to extend its functionalities and adapt to varied programming needs.

2.14: References

In no particular order, these are valuable resources. The core message is “Learn Python!”, not necessarily from me. It’s a great skill and these are great alternatives (from what I hear). No affiliation.

BioPython Challenge: Fill in the missing code. BioPython provides the correct answers. Write equivalent code to perform these important operations. Challenge 3 continues the topic.

2.15: Challenge 2 — Working with RNA, DNA, Protein, and Nucleotides

Objective: Develop a Python script to process DNA sequences using BioPython and custom methods, comparing their outputs. The script will handle DNA sequence standardization, transcription to RNA, RNA translation to protein, and nucleotide frequency calculation.

Install Python and BioPython (pip install biopython). Prepare input_file.txt with DNA sequences (one per line).

About BioPython: BioPython is a powerful library for computational biology and bioinformatics, providing tools for biological computation including sequence analysis, structural bioinformatics, and more.

Python Data Structures: The tutorial leverages Python’s core data structures: lists for storing sequences, sets for ensuring uniqueness, and dictionaries for mapping relationships. More in Chapter 3.

Control Flow in Python: We’ll use functions for modular code organization, for loops for iteration, and conditional statements (if, elif, else) for logical decision making. From Chapter 2.

Functions:

  1. file2lines(filename): Reads a file, returning lines as a list. Continuing from Chapter 1.
  2. standardize(line): Clean raw text to ensure consistent format of DNA sequences.
  3. process_sequence_*(dna): Processes sequences using BioPython or custom (your code).
  4. Write these functions:

transcribe_dna_to_rna(dna): Transcribes DNA to RNA.

translate_rna(rna): Translates RNA to protein.

nucleotide_frequency(dna): Calculates DNA nucleotide frequencies.

process_sequence_custom(dna): Integrates custom methods.

debug_and_compare(line): Compares BioPython and custom method outputs.

Code:

## Hint: https://biopython.org/DIST/docs/tutorial/Tutorial.html#sec7

from Bio.Seq import Seq
from collections import Counter

## Customize this.
def transcribe_dna_to_rna(dna):
return 'Hint: replace this message with the answer. "dna" is a str. Replace T with U.'

def translate_rna(rna):
codon_to_amino_acid = {"UUU": "F", "UUC": "F", "UUA": "L", "UUG": "L",
# Add the rest of the RNA codon table here
}
protein = None
## Hint: replace this comment with the answer.
return protein

def nucleotide_frequency(dna):
return "Hint: result might look like {'A': 12, 'T': 11, 'C': 8, 'G': 3}"

## Helper code
def file2lines(filename):
with open(filename, 'r') as file:
return [line.strip() for line in file]

def standardize(line):
return line.upper().strip()

def process_sequence_biopython(dna):
dna_seq = Seq(dna)
return str(dna_seq.transcribe()), str(dna_seq.translate()), Counter(dna)

def process_sequence_custom(dna):
rna = transcribe_dna_to_rna(dna)
protein = translate_rna(rna)
freq = nucleotide_frequency(dna)
return rna, protein, freq

def debug_and_compare(line):
dna = standardize(line)
rna_bp, protein_bp, freq_bp = process_sequence_biopython(dna)
rna_custom, protein_custom, freq_custom = process_sequence_custom(dna)

print(f"DNA: {dna}")
print(f"BioPython RNA: {rna_bp} | Custom RNA: {rna_custom}")
print(f"BioPython Protein: {protein_bp} | Custom Protein: {protein_custom}")
print(f"BioPython Frequency: {freq_bp} | Custom Frequency: {freq_custom}\n")


# lines = file2lines('input_file.txt')
lines = [
'TACGAGAATAATTTCTCATCATCCAGCTTTAACA',
]

for line in lines:
debug_and_compare(line)

Hints and Debugging Tips:

  • Ensure that the RNA codon table in the custom translate_rna function is complete and accurate. Incomplete or incorrect mappings can lead to faulty protein translations.
  • When debugging, pay attention to discrepancies between the BioPython and custom method outputs. Such differences could indicate issues in the custom implementation or data formatting errors.
  • Use print statements within the custom functions to track the intermediate results and understand where discrepancies might be occurring.
  • Remember that BioPython’s translate() function automatically handles stop codons, while in the custom method, you will need to manually manage them in your RNA codon table.
  • If encountering unexpected results, verify the input DNA sequences for any non-standard characters or lower-case letters. The standardize function should handle most formatting issues, but manual verification is beneficial for unusual errors.
  • The script is primarily educational, designed to illustrate the practical use of Python in bioinformatics. It is particularly useful for learners who wish to understand how BioPython can simplify complex tasks in biological sequence analysis and how these tasks can be achieved with custom Python code.

BioPython vs. Custom Implementation:
By providing two implementations for the same tasks, the script offers a unique opportunity to compare the efficiency, readability, and output of two different approaches. This comparison can be highly insightful for understanding the advantages of using specialized libraries like BioPython. Now onto data structures.

Python-Fleek: Elevate Your Career Path, Code Your Way to Professional Eminence, Chapter 3- Data structures, text operations, control structures, performance, and code quality control.
Python-Fleek: Elevate Your Career Path, Code Your Way to Professional Eminence, Chapter 3— Data structures, text operations, and code quality control.

Chapter 3: Data Structures

Data structures, the backbone of programming, organize and store data. They enable algorithms to search, rearrange, and access efficiently. Structures like list (array list) and dict (dictionary; hash map) dictate how data is arranged in memory, impacting the speed and functionality of software. For instance, dict enables immediate access to data elements, list offer flexibility, set prevents duplication, tuple prevents modification, and str (string of characters; list-like immutable) bundles text functionality. This specific collection of structures are the simplest building blocks needed for trees, graphs, bloom filters, data management, and machine learning.

Since this e-book intends to make you a Python developer over night, many advanced engineering details were skipped. After 20 years of coding, I’ll boldly and controversially claim that you don’t need to care what’s under the hood, but it helps of course. Likewise, driving neither requires a license (legally yes, but physically no) nor mechanical engineering to operate a vehicle. If a recruiter doubts your Python experience, interrupt them and highlight your coding achievements. As a lead developer, results and fluency matter more than degrees or years. Keep your creative juices flowing. Musing moves mountains.

Chapter 3 covers Python’s main data structures: list, tuple, dict, set, and str. Additional topics add a layer of depth for self-help. Immutable types prevent change, while the rest enable creativity. All science and engineering fields rely on strings (str) and regular expressions (re; RegEx) for text operations (i.e. NLP). It’s broadly unavoidable, so learning about built-in tools like find(), index(), and replace() will be a start. Tools for debugging (dbg), profiling (cProfile, timeit), and testing (unittest) are briefly introduced as methods of expert-grade self-help, but okay to skip.

Projects in Chapter 3 include using lists for matrix-vector multiplication, making dataframes with nested dictionaries, doing stats analysis (max, mean, median), estimating Pi with Monte Carlo simulation, and text search with a tokenizer. These projects apply theory to real tasks, showing Python’s data structure utility with frequently used algorithms.

3.1. Lists

Dynamic array lists are versatile in handling various data types.

# Creation
primes = [2, 3, 5, 7, 11]

# Access
third_prime = primes[2] # 5

# Comprehensions
squares = [x**2 for x in range(10)]

# Slicing
first_three = primes[:3] # [2, 3, 5]

# Append and Extend
primes.append(13);
primes.extend([17, 19])

# Sorting
primes.sort(reverse=True)

# Nested Lists
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]

# Stack Operations
stack = [];
stack.extend(list('ABC')); # copy onto this stack
stack.pop() # remove 'C'
stack.append('Z');

# Filtering
evens = [x for x in range(20) if x % 2 == 0]

# Mapping Functions
import math
roots = list(map(math.sqrt, [1, 4, 9, 16]))

# Zipping Lists
pairs = list(zip([1, 2, 3], ['a', 'b', 'c']))

###### Try me! ######
def matrix_to_str(M, tab=' '*8, delim=', ',left_side=' |',right_side='|\n'):
out = []
for row in M:
row_copy = map(str,row)
out.extend([left_side,delim.join(row_copy),right_side])
return ''.join(out)

print(f'''
Primes: {primes}
Squares: {squares}
Matrix:\n{matrix_to_str(matrix)}
Roots: {roots}
List of Tuple: {pairs}
Stack: {stack}
''')

Lists: ordered, mutable collections.
— Lists hold ordered items like numbers, primes = [2, 3, 5, 7, 11].
— Access items using index, third_prime = primes[2] gets 5.
— Create lists with expressions, squares = [x**2 for x in range(10)].
— Extract subsets, first_three = primes[:3] gets [2, 3, 5].
— Add items, primes.append(13), and primes.extend([17, 19]).
— Arrange items, primes.sort(reverse=True) for descending order.
— Use lists within lists, matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]].
— Perform stack operations, stack.pop() and stack.append('Z').
— Create lists by filtering, evens = [x for x in range(20) if x % 2 == 0].
— Apply functions to items, roots = list(map(math.sqrt, [1, 4, 9, 16])).
— Combine lists into tuples, pairs = list(zip([1, 2, 3], ['a', 'b', 'c'])).

3.2. Tuples

Immutable and ordered, tuples are ideal for fixed data sequences.

# Creation and Access
my_tuple = (1, 'hello', 3.4)
second_element = my_tuple[1] # 'hello'

# Immutability
# my_tuple[1] = 'world' # Error

# Unpacking
a, b, c = my_tuple

# Single Element Tuple
single = (1,)

# Concatenation
new_tuple = my_tuple + (5, 6)

# Repetition
repeated = ('repeat',) * 3

# Slicing
slice_tuple = my_tuple[1:]

# Membership Test
exists = 1 in my_tuple

# Length
length = len(my_tuple)

###### Try me! ######
def get_stats(nums):
if not nums: raise ValueError(f'ERROR: bad input = get_stats({nums})')
average = sum(nums)/len(nums)
# Return a tuple of results
return min(nums), max(nums), average

def stats_to_str(values):
field_names = ['min','max','avg']
return str(dict(zip(field_names,values)))

list_of_numbers = [0,3,7,10]
tuple_of_numbers = (1,1,1,7)

print(f'''
My Tuple: {my_tuple}
a={a}, b={b}, c={c}
New Tuple={new_tuple}
Repeated: {repeated}
Slice Tuple: {slice_tuple}
Stats(list): {stats_to_str(get_stats(list_of_numbers))}
Stats(tuple): {stats_to_str((1,7,2.5))}
"1" Exists in {my_tuple}? {exists}
''')

Tuples: ordered, immutable collections.
— Create by placing elements in parentheses (), separated by commas.
— Access elements via index, like my_tuple[1] yielding 'hello'.
— Tuples are immutable; attempting to change an element causes an error.
— Unpack elements into variables, e.g., a, b, c = my_tuple.
— Single-element tuple format: single = (1,).
— Concatenate tuples using +, as in new_tuple = my_tuple + (5, 6).
— Replicate elements with *, like repeated = ('repeat',) * 3.
— Slice tuples using index range, for instance, slice_tuple = my_tuple[1:].
— Check for an element’s presence with in, such as exists = 1 in my_tuple.
— Determine the number of elements with len(my_tuple).
get_stats(nums): Computes min, max, and average of a sequence; raises ValueError if empty; returns a tuple of stats.
stats_to_str(values): Turns a tuple of stats into a string; pairs field names with values using zip(), converts to a dictionary, then to a string.

3.3. Dictionaries

Key-value pairs in dictionaries streamline data retrieval.

## Use a prefix like d_ for dictionary to remember the type
# Creation and Access
# Important! Use with pandas Series
d_data = {'name': 'Doug', 'age': 99}
name = d_data['name']

# Add or Update Entries
d_data['email'] = 'doug-creates@py-on-fleek.com'

# Dictionary Comprehensions
squares_dict = {x: x*x for x in range(6)}

# Iterating
for key, value in d_data.items():
print(key, value)

# Safe Value Retrieval
age = d_data.get('age', 0)

# Removal of Entries
removed_value = d_data.pop('age', None)

# Nested Dictionaries
# Important! Use with pandas Dataframes
users = {'Doug': {'email': 'doug@python.fleek.com', 'age': 18}}

# Accessing Keys and Values
keys = d_data.keys();
values = d_data.values()

Dictionary (hash map): mutable collection of key-value pairs.
— Use d_ prefix for naming dictionaries, like d_data.
— Create with {} or dict(). {} is more common; dict() is more explicit. Access values using keys, e.g. d_data['name'].
— Add or modify by assigning a value to a key, e.g. d_data['email'] = 'doug@example.com'.
— Use comprehensions for efficient creation, e.g. {x: x*x for x in range(6)}.
— Iterate with .items(), e.g. for key, value in d_data.items():.
— Retrieve values safely with .get(key, default), e.g. age = d_data.get('age', 0).
— Remove entries and return their values with .pop(key, default), e.g. removed_value = d_data.pop('age', None).
— Store and access data in nested structures, e.g. users = {'Doug': {'email': 'doug@example.com', 'age': 25}}.
— Retrieve all keys with .keys() and all values with .values(), e.g. keys = d_data.keys(); values = d_data.values().

3.4. Sets

Unordered and unique, sets are useful for mathematical set operations.

# Creation
my_set = {1, 2, 3}

# Add Elements
my_set.add(4)

# Set Operations
another_set = {3, 4, 5}
union_set = my_set.union(another_set)

# Intersection
intersect_set = my_set.intersection(another_set)

# Difference
difference_set = my_set.difference(another_set)

# Symmetric Difference
sym_diff_set = my_set.symmetric_difference(another_set)

# Subset and Superset
is_subset = {1, 2}.issubset(my_set)
is_superset = my_set.issuperset({1, 2})

# Remove Elements
my_set.remove(4);
my_set.discard(5)

# Pop Elements
popped = my_set.pop()

Set: collection with unique, unordered elements.
—Creation: my_set = {1, 2, 3} initializes with elements 1, 2, 3.
— Add Elements: my_set.add(4) adds 4.
— Union: union_set = my_set | another_set or union_set = my_set.union(another_set) combines both sets, no duplicates.
—Intersection: intersect_set = my_set & another_set or intersect_set = my_set.intersection(another_set) finds common elements.
— Difference: difference_set = my_set - another_set or difference_set = my_set.difference(another_set) gets elements not in another_set.
— Symmetric Difference: sym_diff_set = my_set ^ another_set or sym_diff_set = my_set.symmetric_difference(another_set) finds elements in either set but not both.
— Subset and Superset: {1, 2}.issubset(my_set) checks if {1, 2} is a subset. my_set.issuperset({1, 2}) checks if my_set is a superset.
— Remove Elements: my_set.remove(4) removes 4, errors if absent. my_set.discard(5) removes 5 if present, else no action.
— Pop Elements: popped = my_set.pop() removes and returns an arbitrary element, errors if set is empty.
— Sorted: sorted(my_set) returns a sorted list of the set's elements.

3.5. Strings

Strings are essential for text manipulation in Python.

## Creation
test = ('\t'*2) + "This is Python-Fleek!\n\tLine 2 \n "

## Access Characters
char = test[5] # 's'

## Slicing
substring = test[1:5] # '\tThi'

## Concatenation
concatenated = test + " Enjoy!"

## Repetition
repeated = test * 2

## Length
length = len(test)

## Replace Substrings
replaced = test.replace("Python", "On")

## Splitting
tokens = test.split(' ')

## Find Substring
idx = test.find("Line")
slogan = test[:idx]

## Strip Whitespace
slogan = slogan.strip()

## Change Case
print('Case:',slogan.upper(), slogan.lower(), slogan.title())

Strings: immutable sequences of characters, i.e. text.
Note: quote characters might not copy correctly.
—Creation: test string with tabs, text, new lines using \t, \n.
— Access Characters: char extracts single character 'T' from test at index 5.
— Slicing: substring gets part 'Thi' from test using [1:5].
— Concatenation: concatenated merges test with " Enjoy!" using +.
— Repetition: repeated duplicates test twice using * 2.
— Length: length calculates number of characters in test using len().
— Replace Substrings: replaced changes 'Python' to 'On' in test with .replace().
— Splitting: tokens breaks test into words with .split(' ').
— Find Substring: idx locates start of 'Line' in test with .find().
— Substring Extraction: slogan is part of test up to idx, excludes 'Line'.
— Strip Whitespace: slogan removes spaces at start and end with .strip().
— Change Case: Applies upper, lower, title case to slogan.
— String Literals: ‘x’, “x”, ”””x”””, ’’’x’’’ represent string literals with single, double, triple single, triple double quotes.

###### Try me! ######
import re
pattern = re.compile(r'\b[A-Za-z]+\b')

### String processing functions
def tokenize(xx): return pattern.findall(str(xx).strip())
def normalize(xx): return ' '.join(pattern.findall(str(xx).lower().strip()))
do_nothing = lambda x: x

def print_dict(d_outputs, func=do_nothing, title=None):
if title: print(title)
lines = [f'>>> {kk:>12}: {str(func(vv)):<40}\n' for kk,vv in d_outputs.items()]
print(''.join(lines))

d_outputs = dict(test=test, char=char, substring=substring,
concatenated=concatenated, repeated=repeated, replaced=replaced)

# print_dict(d_outputs, func=do_nothing, title='NO-OP')
# print_dict(d_outputs, func=normalize, title='NORMALIZE')
print_dict(d_outputs, func=tokenize, title='TOKENIZE')

## Make a list of operators to apply to a string
func_lst = [str.upper, str.lower, str.title, str.split, tokenize, normalize, do_nothing]

## Create a dictionary of {func_name --> first line from f(x)}
d_values = {func.__name__.upper():str(func(test)).split('\n')[0].strip() for func in func_lst}

print_dict(d_values, title='FUNCTIONS')

— f-Strings: f’My name is {name}’, f"""This is a {label} f-string""" insert variables into strings.
— String Processing Functions: tokenize extracts words via regex, normalize to lowercase and extracts words, do_nothing returns input unchanged.
— Print Dictionary Function: print_dict formats, prints dictionary items.
— Operators on Strings: func_lst includes functions like upper, lower, split, tokenize, normalize.
— Function Application: d_values maps function names to result of applying each to first line of test.
— Demonstration: Shows manipulation of test with different functions, displayed by print_dict.

3.6 Improving Code Quality

When coding becomes challenging or time consuming, consider using existing tools.
autopep8 (pip install autopep8) — PEP8 compliance code formatter.
bandit (pip install bandit) — Security analysis security linter.
Black (pip install black) — Code formatter.
coveragepy (pip install coverage) — Test coverage analysis.
cProfile (Internal) — Performance analysis and profiling.
flake8 (pip install flake8) — Code style linting.
inspect (Internal) — Inspect code and print stack traces.
isort (pip install isort) — Import formatting sorts imports.
mypy (pip install mypy) — Type checking
pdb (Internal) — Python debugger
pre-commit (pip install pre-commit) — Commit hooks framework
PyDocStyle (pip install pydocstyle) — Docstring style linter
pylama (pip install pylama) — Code quality analysis
PyLint (pip install pylint) — Code quality analysis
pytest (pip install pytest) — Testing framework
rope (pip install rope) — Refactoring library
Sphinx (pip install Sphinx) — Documentation generator
tox (pip install tox) — Testing automation
unittest (Internal) — Testing framework
yapf (pip install yapf) — Code formatter

Linting means automated checking of your source code for programmatic and stylistic errors.

Debugging with pdb
Essential for Python developers to identify and fix bugs. Insert import pdb; pdb.set_trace() at desired points. More info.

import pdb; pdb.set_trace()  # Debugging point

Performance Profiling
Measures execution time using cProfile. Ideal for optimizing Python code efficiency. Implement with import cProfile and cProfile.run('code'). More info.

import cProfile
cProfile.run('sum([x for x in range(1000000)])')

Testing with unittest
Ensures code correctness and prevents bugs. Involves creating test cases in a class derived from unittest.TestCase. Test functions with self.assertEqual(). Execute tests with TestClassName().test_method(). More info.

import unittest
class TestMathFunctions(unittest.TestCase):
def test_square(self):
self.assertEqual(4, 2**2)
def test_four(self):
self.assertEqual(4, '4')

TestMathFunctions().test_square() # passes
TestMathFunctions().test_four() # fails

Formatting
Black: Maintains readable, uniform style. Use black_format(file_path) with from black import format_file_in_place, FileMode for file formatting.
— iSort: Organizes imports neatly. Utilize sort_imports(file_path) with from isort import file to sort imports in a file.

# Black - Code formatter
def black_format(file_path: str):
from black import format_file_in_place, FileMode
format_file_in_place(file_path, fast=True, mode=FileMode.AUTO_DETECT, write_back=True)

# isort - Import formatting
def sort_imports(file_path: str):
from isort import file
file(file_path)

3.7 Challenge 3— Hellinger Distance

Since this chapter covers basic data structures and next chapter covers large semi-automated data handling, the challenge is to rewrite the code without numpy or pandas.

Hellinger Distance is a way to quantify the similarity or divergence between two probability distributions. It’s a measure rooted in the theory of statistics and information theory, providing a robust tool for various applications in data science and machine learning.

The Hellinger Distance between two probability distributions P and Q is calculated as:

Here, pᵢ​ and qᵢ represent the probabilities of the i-th element in distributions P and Q, respectively.

Where is it Used?
— In Statistics and Data Science
: For clustering, classification, and understanding the difference between datasets.
In Machine Learning: In algorithms like decision trees and in kernel methods.

Comparison with Other Measures
— Versus Kullback-Leibler Divergence
: Hellinger Distance is symmetric and less sensitive to outliers.
Versus Euclidean Distance: It’s more suitable for probability distributions due to its sensitivity to the nature of the data.

Hands-On Implementation
— Data Preparation
: Ensure your data represents probability distributions.
Implementing the Formula:

import numpy as np

def hellinger_distance(P, Q):
return np.sqrt(np.sum((np.sqrt(P) - np.sqrt(Q)) ** 2)) / np.sqrt(2)

# Example probability distributions
# Convert a list to Numpy array for efficiency and simplicity.
# Example: This could be a list of Jaccard similarities, sample means, or classification probabilities.
P = np.array([0.4, 0.6])
Q = np.array([0.5, 0.5])

# Calculate and print the Hellinger Distance
print(f"Hellinger Distance: {hellinger_distance(P, Q)}")

This snippet calculates the Hellinger Distance between two simple distributions P and Q.

Working with Real Data: Apply this measure to compare different customer segments or to analyze survey data in market research.

Optimization Tip: For large datasets, optimize by handling zero values and using efficient array operations.

Why Hellinger Distance Matters: It’s a powerful for data comparison that is less sensitive to outliers and more suited for probability distributions, making it invaluable in many practical scenarios.

Applications of Hellinger Distance
Hellinger Distance: statistical measure used in various data analysis fields.
— Key for measuring similarity or difference between two probability distributions.
— Applied in text mining, image recognition, market analysis, ecology, genetics, machine learning, speech recognition, network security, finance.
— Calculated as the square root of half the sum of squared differences between square roots of corresponding values in two distributions.
— Assessing similarity or divergence of statistical or probabilistic models.
— Evaluating text document similarity by comparing word frequency distributions.
— Measuring difference in pixel intensity distributions for image classification.
— Analyzing customer behavior in market segments by comparing purchase probabilities.
— Comparing species distribution models for biodiversity variations across regions.
— Assessing genetic diversity or similarity in gene expression profiles.
— Comparing output distributions of machine learning models for optimal selection.
— Differentiating spoken words or sounds by comparing acoustic feature distributions.
— Identifying unusual network traffic patterns for security.
— Comparing probability distributions of returns for investment portfolios to assess risk.

3.8: Project 6— Compound Annual Growth Rate (CAGR)

CAGR represents the mean annual growth rate of an investment over a specified time period longer than one year.

Compound Annual Growth Rate (CAGR)
Formula: Compound Annual Growth Rate (CAGR)
  • Ending Value: The final value of the investment
  • Beginning Value: The initial value of the investment
  • Periods: Number of years (or other time units) over which the investment is considered

CAGR offers a smoothed perspective of an investment’s growth, eliminating the fluctuations that occur during the investment period. It’s particularly useful for comparing the growth rates of different investments.

CAGR simplifies growth by ignoring volatility. It assumes steady growth over the period, which is rarely the case in real-world scenarios.

Calculating CAGR: Let’s dive into a Python example. We’ll use a list to store investment values over different years and calculate the CAGR.

def calculate_cagr(beginning_value, ending_value, periods):
"""Calculate the Compound Annual Growth Rate."""
return (ending_value / beginning_value) ** (1 / periods) - 1

# Example data
investment_values = [1000, 1500, 2000, 2500, 3000] # Investment values over 5 years
beginning_value = investment_values[0]
ending_value = investment_values[-1]
periods = len(investment_values) - 1
cagr = calculate_cagr(beginning_value, ending_value, periods)
print(f"The Compound Annual Growth Rate is {cagr:.2%}")

The output of this script will give you the CAGR of your investment over the specified period. A positive CAGR indicates growth, while a negative CAGR indicates a decline.

Implications: CAGR finds its application in analyzing business growth, comparing investment options, and making future financial projections. However, it’s crucial to consider it alongside other metrics due to its limitations in accounting for volatility.

Chapter 3 is not complete! Intermission…Updated Jan 1, 2024.

--

--