Teaching Data Science

An open-source repository for teaching material, open-free to all

Yogesh Haribhau Kulkarni (PhD)
Technology Hits
4 min readMar 4, 2024

--

Screenshot of the GitHub repository

Are you passionate about data science? Do you dream of unraveling hidden patterns in vast datasets, creating predictive models, and contributing to cutting-edge research? If so, you’re in the right place!

Welcome, data enthusiasts, to a treasure trove of knowledge — the “Teaching Data Science” repository by Yogesh H Kulkarni. This GitHub repository is a rich collection of LaTeX course notes covering a spectrum of topics including Python, Machine Learning, Deep Learning, Natural Language Processing, and more. In this blog, we’ll explore the purpose, how to use, steps to contribute, and other essential aspects of this invaluable resource.

Purpose: Spreading the Light of Data Science

The Teaching Data Science repository serves a noble purpose: to spread the gospel of data science far and wide. Our mission is simple yet powerful — to make data science accessible to everyone. Whether you’re a student, an industry professional, or an enthusiast, we believe that knowledge should flow freely. By sharing our insights, code, and expertise, we hope to empower individuals to harness the power of data. The values driving this endeavor are rooted in giving back to the community and paying knowledge forward. The ultimate goal is to propel the industry from automation to autonomy.

How to Use: Navigating the Maze of Knowledge

The core content is presented as Beamer slides — a dynamic format that combines visuals, text, and equations. These slides cover a wide range of topics, including:

  • Python fundamentals
  • Machine learning algorithms
  • Deep learning architectures
  • Natural language processing techniques

But that’s not all! We’ve also transformed these slides into two-column course notes PDFs. Whether you’re preparing for a seminar, workshop, or a semester-long course, you’ll find valuable material here. The structure of this repository is well thought out, divided into three main directories: LaTeX, Code, and References.

LaTeX Directory

In the LaTeX directory, you’ll discover TeX sources alongside essential images. Here’s how it’s organized:

  • Naming Convention: Each TeX file follows a consistent naming convention, such as maths_linearalgebra_matrices.tex. Clear, concise names make navigation a breeze.
  • Driver Files: For different event durations, we have driver files like Main_Seminar_Presentation.tex, Main_Workshop_CheatSheet.tex, and Main_Course_Notes.tex. These files compile the relevant sources seamlessly.

Code Directory

Data science isn’t complete without hands-on practice. Our code directory houses Python and IPython notebook files. Here’s what you’ll find:

  • Naming Consistency: Each code file corresponds to a specific LaTeX topic. We believe in connecting theory with practice.
  • Library-Based TeX Files: For instance, sklearn_intro.ipynb accompanies the sklearn_intro.tex lecture. Dive into real-world examples and experiment with libraries.

References Directory

A treasure chest of papers, code, and presentations used as base material for content preparation. This directory acknowledges the importance of building on existing knowledge and resources.

Requirements: Setting Up

To utilize this repository, ensure you have LaTeX installed (tested with MikTex 2.9 on Windows 7, 64bit). Additional LaTeX packages may need installation based on warnings/suggestions. The recommended IDE is TexWorks.

How to Run LaTeX

To embark on your data science journey with Kulkarni’s notes, you’ll need a few essentials:

  • LaTeX: We recommend MikTex 2.9 on Windows 7 (64-bit).
  • LaTeX packages: Install them as prompted to ensure smooth compilation.
  • TexWorks IDE: This user-friendly editor streamlines your LaTeX workflow.

Running the LaTeX files is straightforward. Driver files are named intuitively, and you can even compile individual files using your preferred LaTeX system. Alternatively, feel free to create your own main files and include the content files for a customized learning experience.

Steps to Contribute

The beauty of open source lies in collaboration, and this repository welcomes contributions with open arms.

1. Navigate to the ‘LaTeX’ folder.
2. Copy your images into the ‘images’ folder and source code to the ‘src’ folder.
3. Sample files are provided for copying and modification: `Main_Sample_Presentation.tex`, `Main_Sample_CheatSheet.tex`, both calling `sample_content.tex`.
4. Fill your material directly in the content file or organize it into multiple files and then `input` them in the content file.

Disclaimer

As with any valuable resource, the Teaching Data Science repository comes with a disclaimer:

- No guarantee of the correctness of the content.
- Notes are built using publicly available material.
- Citing original sources is a priority, but some may be missing.
- Continuous improvements are underway, and feedback is encouraged.
- Suggestions, comments, corrections, and pull requests are not just welcome; they are actively sought.

In conclusion, the Teaching Data Science repository is a beacon of knowledge in the vast ocean of data science. It empowers you to learn, contribute, and grow in this dynamic field. As we collectively strive for “From Automation to Autonomy,” let’s leverage this resource to illuminate our path in the world of data science. Dive in, explore, and let the journey to autonomy begin!

Click pic below or visit LinkedIn profile to know more about the author

--

--

Yogesh Haribhau Kulkarni (PhD)
Technology Hits

PhD in Geometric Modeling | Google Developer Expert (Machine Learning) | Top Writer 3x (Medium) | More at https://www.linkedin.com/in/yogeshkulkarni/