credit:https://logomakr.com/

Writing My First Julia Package ProjectFlow.jl: Demo and Experience

MrDataPsycho
The Startup
Published in
6 min readAug 20, 2020

--

Imagine yourself working as data analyst in a company. You get different king of ad-hoc analysis request every day. To resolve those request you are using several programming tools (Dataframe, Distribution etc.), querying to databases. At the end of the projects you also git your code so that you can use it further when it needs or you can share you code to other colleagues when they need or may be your project should be in QA (git) before it sent to the clients.

  • But the project has a start and where to start?: You might want to have a folder JuliaProjects where you have a folder adhocs and folders for each projects such as 2020–07–18_My_Fancy_Project_xyz and there is a file code.jl where you have written all your code
  • What about the data you are going to download?: Ok. You can have another folder called datalake inside of adhocs where you store the data for each projects separately with the same signature name like project (2020–07–18_My_Fancy_Project_xyz) as you do not want to git the data
  • What about the visuals and final report? Well, you can have a folder called insights inside of adhocs and you will store the visuals and data like the same fashion like datalake.

Lets see a tree structure of my won adhoc project folder inside of JuliaProjects folder:

├── datalake
│ └── 2020–07–18_My_Fancy_Project_xyz
├── insights
│ └── 2020–07–18_My_Fancy_Project_xyz
│ ├── datafiles
│ └── vizfiles
└── projects
└── 2020–07–18_My_Fancy_Project_xyz
└── code.jl

Now will you create those files manually every time? How would you keep track the directory every time you close and open the same project? That’s why I came up with that idea of writing a project manager for ad hoc analysis. I have the same package for python which I never published but I used almost everyday. I though that would be a great learning experience If I re-create the same package in Julia and here is my package. If you like it, giving a start will be appreciated.

First let’s see the package efficiency from user point of view. To get started with the package we

  • First need to add the package with ] add ProjectFlow or Pkg.add(ProjectFlow)
  • We have to create a profile which ProjectFlow will use
    touch /home/username/.projectflow/profile and add the following configs
[default]
project_root=/home/datapsycho/JuliaProjects/adhocs
projects_dir=projects
data_dir=datalake
insights_dir=insights
insights_viz_dir=vizfiles
insights_data_dir=datafiles

It tells where is my project root, where the code file will be saved, where the data files, reports and visuals will be saved. But you can add more profiles if you want. Now lets use the Package (Current version of the package do not have support for windows, as I do not have any windows machine I could not test it, Someone interested can fork it and test it for windows machine).

To use the Profile we can have common Julia environment inside of adhoc folder and code file lets say initializer.jl with the following code.

using ProjectFlow

p = Project(
id="xyz",
name="My Fancy? *Project1 2 ",
template="jl",
profile="default"
)

datalake, iviz, idata = initiate(p)

After we activate the environment, Filling up the entities of type a value of type Project (id: unique id of the project, name: full name of the project, template: the file extension to create a code.jl template and profile: the profile to use from config) and execution of initiate(p) will create all the necessary folder structure showed in the first tree view and also a code.jl file inside of the new project folder with the following code. In future I have a plant to add notebook template if the user want to create notebook instead with all the template code:

# Project Name:
# Regular Imports
using ProjectFlow

p = Project(
id="xyz",
name="My Fancy? *Project1 2 ",
template="jl",
profile="default"
)

datalake, idata, iviz = initiate(p)

Every time we close the Julia and open the project again and run the following code it will load all the necessary file path to save the data, visuals and raw data. So All three questions I asked at the beginning is automated now. If you want to save some data to disk you can use joinpath(datalake, "myfile.csv") to store the data as datalake already loaded the path /home/datapsycho/JuliaProjects/adhocs/dataleke/2020–07–18_My_Fancy_Project_xyz/ in to your current environment. You can read more detail in the package doc.

Now Lets talk about about the experience of writing the package. I have started learning Julia beginning of this year, so I was very new to Julia. I must say the Julia Community is fantastic, all my queries has been answered in a very short time and I was getting Guidance from the community time to time.

To start writing package I have looked through several packages already created and also read the package writing part from the book Julia Programming Projects and Hands-On Design Patterns and Best Practices with Julia. But later I found the PackageTemplates.jl that makes my work much easier.

Julia packages is little bit different than python. In python you would create a directory with __init__.py file and will be considered as module later you can use setup tool to create whlfile from the module and you will be able to install the wheel file using pip. Your dependency can be added to setup.py file or pip.lock file.

But julia package system is different. As we can see in that tree map, A inside view of my ProjectFlow folder:

.
├── LICENSE
├── Manifest.toml
├── Project.toml
├── README.md
├── src
│ ├── loader.jl
│ ├── logger.jl
│ ├── manager.jl
│ ├── ProjectFlow.jl
│ ├── project.jl
│ └── template.jl
├── temp
│ ├── codecove.txt
│ └── projects.log
└── test
├── exporttests.jl
└── runtests.jl

All of your main code lies inside of src folder (I am new, so still do not know if that a standard) The toml files are for dependency management, I am not going to discuss in details about which toml is for what purpose. Now it comes to src . As you can see ProjectFlow.jl file is the main file for user to import the library and all other files are just separated based on functionality. Julia have spacial keyword for modules which you will find in the ProjectFlow.jl file. Community suggest you should only use module when you really need. You can see more details about the answers of my questions in discourse. There is no creating wheel files, after adding your Package in GitHub and Julia Hub Repository, You will be able to install it using Pkg directly from GitHub or Julia Hub Repository.

Just to give an example how I separate the code for example: Every time we create a project that project info is logged in /home/datapsycho/.projectflow/projects.log file, so that functionality is in the logger.jl file lated the logger.jl is included into ProjectFlow.jl with spacial keyword include("logger.jl") and used.

The first problem I have encountered with when creating package is CICD and documentation creation. But after having a look on other packages which resolves too, here is a little bit backdated video by Chris from MIT on creating packages (he is awesome!). The next problem was getting a tag and creating an issue to add the package in Julia Hub. Mosè Giordano from community jumped and saved my day, I am really thankful to him.

Over all It was really a good experience. Every time I felt in to some issue the Julia Community jumped in and help me. Now I feel like its my duty too to help the others. Thanks To Julia Community.

Package Development in Julia is different than development in Python I must say. But It has all the tools pre-built (PackageTemplates, test etc.) to get you started and there is always someone in Julia Community is going to help on the way. I never felt alone on that journey.

--

--

MrDataPsycho
The Startup

Data Science | Dev | Author @ Learnpub, Educative: Pandas to Pyspark