Stories by Khushboo Alvi on Medium

What is Scope in Python?

Khushboo Alvi — Sat, 04 May 2024 13:32:20 GMT

The location where we can find a variable and also access it if required is called the scope of a variable.

Global Scope:

Variables defined outside of any function or class have global scope. These can be accessed from anywhere within the program. To modify a global variable inside a function, we use global keyword.

Enclosing or Non-local Scope :

Enclosing scope refers to the scope of the enclosing function when nested functions are defined. Inner functions can access variables from outer functions, but cannot modify it by default. For Modifying Variables, we use non local keyword.

Local Scope :

Variables defined within a function have local scope. These can only be accessed from within that function.

Understanding scope helps you write code that is easier to maintain, debug and reason about. It also enables you to avoid unintended side effects by controlling where variables can be accessed and modified.

Keep Learning 😊 !!

Different Operations Using SQL

Khushboo Alvi — Sat, 27 Apr 2024 09:29:13 GMT

Here , I am sharing a brief description of different Operations using SQL.

SELECT: Retrieves specific columns from a table.
GROUP BY: Aggregates rows based on a specified column or expression.
WHERE : Filters rows based on a specified condition.
ORDER BY : Sorts the result set based on one or more columns.
JOIN : Combines rows from two or more tables based on a related column between them.
INNER JOIN : Returns only the rows with matching values in both tables.
LEFT JOIN : Returns all rows from the left table and matching rows from the right table.
RIGHT JOIN : Returns all rows from the right table and matching rows from the left table.
FULL JOIN : Returns all rows when there is a match in left or right table.
UNION : Combines the result sets of two or more SELECT statements into a single result set.
UNION ALL : Combines the result sets of two or more SELECT statements into a single result set including duplicates.
INSERT INTO : Inserts new rows into a table.
UPDATE : Modifies existing rows in a table based on a specified condition.
DELETE FROM : Deletes existing rows from a table based on a specified condition.
CREATE TABLE : Creates a new table in the database.
ALTER TABLE : Modifies an existing table structure.
DROP TABLE : Deletes a table from the database.
CREATE DATABASE : Creates a new database.
DROP DATABASE : Drops an existing database.
CREATE INDEX : Creates an index on a table to speed up data retrieval.
DROP INDEX : Deletes an index from a table.
TRUNCATE TABLE : Deletes all rows from a table but retains the table structure.
ALTER INDEX : Modifies an existing index.
ALTER VIEW : Modifies an existing view.
CREATE VIEW : Creates a virtual table based on the result set of a SELECT statement.
DROP VIEW : Deletes an existing view.
COMMIT : Saves all changes made in the current transaction.
ROLLBACK : Undo changes made in the current transaction.

Keep Learning 😊!!

Pandas Interview Questions

Khushboo Alvi — Tue, 23 Apr 2024 19:44:02 GMT

Here, I am sharing important Pandas interview questions crucial for interviews:

1.What is pandas in Python?
2. What is the difference between Series and DataFrame?
3. What is an index in pandas?
4. What is Multi indexing in pandas?
5. Explain pandas Reindexing.
6. What is the difference between loc and iloc?
7. Tell different ways to create a pandas DataFrame.
8. What is the difference between Join and Merge methods in pandas?
9. What is Timedelta?
10. How do you read Excel files to CSV using pandas?
11. How do you sort a DataFrame based on columns?
12. How can you create a new column derived from existing columns?
13. How do you handle null or missing values in pandas?
14. What is Resampling?
15. What is the pandas method to get the statistical summary of all the columns in a DataFrame?
16. How to access the first few rows of a dataframe?
17. Difference between pivot_table() and groupby().
18. How to convert a String to Datetime in Pandas?

Keep Learning !!

Common Plots using Matplotlib for data visualization.

Khushboo Alvi — Tue, 23 Apr 2024 19:12:27 GMT

I am sharing a brief description of different types of plots using Matplotlib.

plt.plot(x, y): Line Plot represents data points connected by straight lines ideal for showing trends over time.

plt.scatter(x, y) : Scatter Plot displays individual data points as markers useful for exploring relationships between two variables.

plt.bar(categories, values): Bar Plot uses rectangular bars to represent categorical data facilitating comparisons between different groups.

plt.hist(data, bins): Histogram visualizes the distribution of a continuous variable by dividing it into bins and displaying the frequency of each bin.

plt.pie(values, labels=labels): Pie Chart illustrates the proportion of different categories in a dataset using slices of a circle.

plt.boxplot(data): Box Plot summarizes the distribution of a continuous variable through quartiles, outliers and median.

plt.violinplot(data): Violin Plot combines the features of a box plot and a kernel density plot to provide insights into data distribution and density.

plt.imshow(data, cmap=’viridis’): Heatmap represents data values in a matrix format using colors, useful for identifying patterns and correlations.

plt.stackplot(x, y1, y2, labels=[‘Variable 1’, ‘Variable 2’]) : Area Plot displays the cumulative contribution of different variables over time or another dimension.

plt.errorbar(x, y, yerr=error, fmt=‘o’) : Error Bar Plot represents variability or uncertainty in data by displaying error bars around data points or bars.

plt.contour(X, Y, Z) : Contour Plot shows the 3D surface of a function in a 2D plane using contour lines to represent different levels of the function.

plt.polar(theta, r) : Radar Chart displays multivariate data in the form of a twodimensional chart with three or more quantitative variables represented on axes starting from the same point.

plt.hexbin(x, y, gridsize=20): Hexbin Plot represents the counts of observations falling within hexagonal bins useful for visualizing the distribution of large datasets.

plt.quiver(x, y, u, v) : Quiver Plot visualizes vector fields with arrows where each arrow represents the direction and magnitude of a vector at a specific point.

plt.streamplot(x, y, u, v) : Streamplot represents flow fields using streamlines, which are curves that are tangent to the velocity vector of the flow.

fig = plt.figure()

ax = fig.add_subplot(111, projection=’3d’)

ax.scatter(x, y, z)

3D Plotting creates three-dimensional visualizations of data points, surfaces or volumes.

plt.contourf(X, Y, Z) : Contourf Plot fills the area between contours with colors to represent different levels.

ax.plot_surface(X, Y, Z) : Surface Plot represents a three-dimensional surface by plotting a grid of points in three-dimensional space and connecting them with a continuous surface.

wordcloud = WordCloud().generate(text)

plt.imshow(wordcloud, interpolation=’bilinear’)

Word Cloud represents text data by displaying words in different sizes based on their frequency or importance.

These plots offer sophisticated ways to visualize complex data and relationships. Depending on your data and analysis goals, you can choose the appropriate plot type to effectively communicate insights.

List comprehension in Python.

Khushboo Alvi — Thu, 04 Apr 2024 17:54:46 GMT

List comprehension is a concise way to create a new list based on the values of an existing list.

In another terms, It is generally a single line of code enclosed in square brackets to filter, format, modify or do other small tasks on existing iterables such as strings, tuples, lists and so on.

Syntax : [expression for item in iterable if condition]

Where:

expression: represents the operation you want to execute on every item within the iterable.

item : It refers to each value taken from the iterable.

iterable: specify the sequence of elements you want to iterate through.

condition: A filter helps to decide whether or not an element should be added to the new list.

Here, I have shared some examples of different type of list comprehension.

Keep Learning !!!

Bag-of-words model in NLP

Khushboo Alvi — Tue, 02 Apr 2024 19:01:06 GMT

Whenever we apply any algorithm in NLP, it works on numbers. We cannot directly feed our text into that algorithm. Bag Of Words is a feature extraction method of converting the text data into numerical vectors as features. Those numbers are the count of each word (token) in a document.

This model can be visualized using a table which contains the count of words corresponding to the word itself.

Applying the Bag of Words model

Step1 : Data Preprocessing

Convert text to lower case, Remove all non-word characters and punctuations.

Step2 : Obtain most frequent words.

Declare a dictionary to hold bag of words and tokenize each sentence to words. Now for each word in sentence, we check if the word exists in dictionary. If it does, then increment its count by 1. If it doesn’t, add it to dictionary and set its count as 1.

Step3 : Building the Bag of Words model.

In this step, create binary vectors representing each sentence where each element indicates the presence (1) or absence (0) of a frequent word from bag of words. These binary vectors serve as the numerical representation of the textual data.

This process results in a table-like structure where each row corresponds to a document and each column corresponds to a word. The values in the table represent the frequency of each word in the respective document.

The Bag of Words model is used for tasks like sentiment analysis, document classification etc.

Limitations of Bag of Words Model

1. It ignores the order of words in a document leading to a loss of valuable sequential information.

2. BoW matrices can be very sparse especially when dealing with large vocabularies or documents. It can lead to high-dimensional data which may require significant memory and computational resources.

3. It treats each word as independent disregarding the context in which words appear. This can result in a loss of meaning and context in the analysis.

4. It assigns equal importance to all words regardless of their relevance or importance in a specific task.

5. It struggles with out-of-vocabulary words as it cannot represent words that were not seen during the training phase which may limit its effectiveness in handling new terms.

Keep Learning 😊 !!

Named Entity Recognition (NER) and its applications.

Khushboo Alvi — Mon, 01 Apr 2024 01:46:52 GMT

Describe the process of Named Entity Recognition (NER) and its applications.

Named Entity Recognition (NER) is a technique in natural language processing (NLP) that locate and classify named entities mentioned in unstructured text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages etc.

Process of NER:

1. Preprocessing:

It involves cleaning the text, removing unnecessary elements (like HTML tags, if any) and sometimes converting the entire text into a uniform case (usually lowercase) to standardize the input data.

2. Tokenization:

The text is split into sentences and then into words or tokens.

3. Part-of-Speech Tagging:

Each word or token is labeled with a part of speech (noun, verb, adjective, etc.), based on its definition and its context in the sentence.

4. Entity Detection:

It can be approached in several ways including rule-based methods, machine learning models or a combination of both. Rule-based methods use predefined patterns and dictionaries while machine learning models especially those based on deep learning learn from annotated training data.

5. Entity Classification:

Once potential entities are detected, these are classified into predefined categories.

6. Post-processing:

It might involve normalization as standardizing entity representations e.g., dates into a common format and disambiguation as ensuring entities are correctly identified in terms of their real-world counterparts.

Applications of NER:

1. Information Retrieval:

NER can improve the efficiency of information retrieval systems by enabling searches that are focused on specific entities.

2. Content Recommendation:

By identifying entities in content systems can recommend articles products or services based on the user’s past interactions with similar entities.

3. Sentiment Analysis:

NER can be used to identify entities in product reviews or social media posts which can then be the focus of sentiment analysis.

4. Knowledge Graphs:

NER is used to extract entities and their relationships from text which are then used to construct knowledge graphs that represent real-world facts and how they’re interconnected.

5. Automated Customer Support:

NER can help in identifying specific products, services or issues mentioned in customer queries, enabling more efficient and automated customer service responses.

Keep Learning 😊!!

lambda functions in Python

Khushboo Alvi — Mon, 01 Apr 2024 01:44:15 GMT

What are lambda functions in Python, and how are they used?

It is a small anonymous function defined using the lambda keyword. Lambda functions can take any number of arguments but can only have one expression. These are commonly used when you need a short function for a short period of time such as for sorting or filtering operations.

Keep Learning!!!

What are n-grams and how are they useful in NLP tasks?

Khushboo Alvi — Sat, 30 Mar 2024 16:25:35 GMT

n-grams are continuous sequences of words or symbols or tokens in a document. n-grams typically are collected from a text or speech corpus as a long text dataset.

Here, n is just a variable that can have positive integer values. n-grams are classified into the following types:

n Term
1 Unigram
2 Bigram
3 Trigram
n n-gram

n-grams are useful in several NLP tasks:

a). Language Modeling:

n-grams are used to build language models (statistical) that predict the probability of a word or sequence of words occurring in a given context. n-gram language models are used for text generation and speech recognition.

b). Text Prediction and Autocomplete:

By calculating the probability of a sequence of words appearing together, the n-gram model can predict what will most likely come next in the sentence, making them useful for autocomplete systems in search engines, messaging apps and word processors.

c). Spelling Correction:

By analyzing the frequency of occurrence of n-grams, spelling correction systems can suggest corrections for misspelled words based on the likelihood of certain n-grams occurring in the language.

d). Machine Translation:

n-grams are used in machine translation systems to break down sentences into smaller units for translation, helping to capture the local context and improve translation accuracy.

e). Sentiment Analysis:

n-grams can be used as features in sentiment analysis tasks where the presence or absence of specific word sequences can provide valuable clues about the sentiment expressed in a text.

f). Text Classification:

n-grams can be used as features for building classifiers, capturing patterns and characteristics of different classes of documents in tasks such as spam detection or document categorization.

Keep Learning!!

Difference between tokenization, stemming, and lemmatization

Khushboo Alvi — Sat, 30 Mar 2024 16:22:41 GMT

Let’s start with a question…

Explain the difference between tokenization, stemming, and lemmatization.

Tokenization, stemming, and lemmatization are the techniques used in natural language processing (NLP) and text mining to preprocess textual data.

1. Tokenization:

Tokenization is the process of breaking down a text into smaller units typically splitting paragraphs into sentences and sentences into words. Tokens are the basic building blocks used for further analysis in NLP tasks.

2. Stemming:

Stemming is a technique used to reduce words to their root or base form, called the stem. It involves removing suffixes or prefixes from words to achieve this normalization. For example, the words ‘play’, ‘plays’, ‘played’, and ‘playing’ are reduced to ‘play’ which is a meaningful word. However, this is not always the case.

Stemming might not always result in semantically meaningful base words. The stemmer reduces the word ‘communication’ to a base word ‘commun’ which is meaningless in itself.

Stemming is faster and computationally less expensive than Lemmatization.

Stemming is often used in information retrieval systems and search engines for faster indexing and retrieval.

3. Lemmatization:

Lemmatization is similar to stemming but aims to reduce words to their canonical form, called the lemma. Unlike stemming, lemmatization considers the context and meaning of the word. For example, ‘play’, ‘plays’, ‘played’, and ‘playing’ have ‘play’ are reduced to play and the word ‘communication’ are reduced to the base word ‘communication’.

Lemmatization is slower and computationally more expensive than Stemming.

Lemmatization is preferred in tasks where semantic meaning is important such as sentiment analysis or language translation.

Keep Learning!!