How to solve Big Data problems using Analytics

GowthamLabs
6 min readSep 21, 2019

--

DATABASES THAT EXISTS IN “IT” WORLD.

Many of them were interested to know, how I came into Analytics. After explaining few, I thought its going to be easier by sharing my career highlights, which in-turn sets the context to this article.

Starting from:

  1. J2EE programmer to Weblogic/WebSphere admin (Dev to AppOps)
  2. AppOps admin to Solaris system admin, (System Admin)
  3. System Admin to Oracle DBA, (Database Admin)
  4. Unix/VM to Linux/Openstack/Container/Serverless world’s, (Ops)
  5. Decade old, IT Architect, Astrologer.

Skill diversification helped to see the bigger picture of Technology. Few years back, I understood the “why’s” of positioning Big Data & analytics. Then I took a step forward in setting up the Analytic stack that concentrates on Vedic Sciences, Astrology & Astronomy.

With years of astrology practice, tons of data already flows to me in crude format. Seeing the results of Analytics & Big Data, I felt the existing method of referring to those crude data is cumbersome for arriving at any fact based predictions.

In a world where every piece of data is crucial, the need only exists for meaningful content that brings relevance.

Few guidance for building an Analytic stack:

  1. List down the existing data problems
  2. List the Requirements you want to achieve
  3. Use case for your needs (UML)
  4. Future data models and its growth/shrink patterns
  5. End users pattern (Specific to a domain, diversified, miscellaneous etc)
  6. Who will access the data, how (through apps-roles, directly from Database interfaces, via visual reports on the web, final documents etc).

Above steps are ‘critical’ before exploring Technology options.

Once the details are reviewed, validated & documented:

  1. Select right Technology/Tools that bring confidence to solve your big data problems.
  2. Management must sponsor and empower teams with business domain knowledge.
  3. You(your) existing People would be required to scale-up the knowledge to technology changes. On-boarding new people is fine, but if you can’t change the existing system, the problem will inherit.
  4. Know your End Users: There are 3 types of users:
  • Novice Users: They knew statistics, Big Data as a buzzword and want to learn it in their career/life. Curiosity plays the main role.
  • Business Users: Users who look for better ways to analyze their existing Operations, Marketing, Finance etc. system data in a more efficient way. They are more functional thinkers in an industry/domain and may be using Office tools and graphs to visualize above mentioned data today. They want to leverage Big Data, Analytics for a better use of their data.
  • Technology Users: New Age startup’s, who operate entire businesses on web & through IT pros. e.g. Online shopping, Online cab booking etc. They build Big Data systems, manage, provision data streams and allow rest of the organization to better use that data.
  • Let me know, if you found more user types :-)

Rest of this article is based on Technical User mindset. The main user of this Full stack analytic system needs to be an IT professional who knew IT and to retrieve the industry domain data. I am well aware of this situation and that’s why the choice is based on Linux with programmatic visualization patterns, otherwise i would have selected Business user type of tools and technologies.

Mentioning this again: I did these for a project, that concentrates on Indian Vedic Sciences, Astrology & Astronomy.

My Analytic stack runs:

Importantly, we had to break the silo-ed “role based” mindsets like Developer, System, Database, Application Admin etc. to succeed for a “Full stack environment”.

Otherwise this can happen…

Hope this is self explanatory….

INSTEAD:

APPS/DB/INFRA TOGETHER
TEAMWORK (Apps, DB, Infra)

Once the Analytics environment is setup, playing with the data in the world of Linux tools and displaying insights as visuals & graphs would be the next step.

It’s all about communicating the meaning of data in an effective format helping users to understand better and consume.

5 Levels of Layered visual chart created for Astrology Content (Generated from the SQL/Apache Superset)

Above visual chart show’s 5 layer’s of Astrological data:

  1. 12 Zodiac signs
  2. Elements of Zodiac sign(Air, Fire, Water, Earth)
  3. Lord of each Zodiac sign(Planets)
  4. 27 stars mapped to 12 zodiac signs
  5. Lord of each stars
  6. Split 27 stars into 108 parts(pada’s)
  7. Degree of 108 star parts(padas)

Basically the chart visual goes “against” the model of introducing data slowly to end users. The chart has merged several column based table data-sets that exists in vedic books. The graph helps to view the connecting relationships between each other and promotes better discussions and analysis.

Lessons Learnt:

  1. Plan your data structure well before starting any technical activities.
  1. At the time of writing this article, i suggest “Not to start with Ubuntu 17/18”, which has too much of bugs (but time will change).
  2. Don’t put your cleaned data into any free repository and burn your fingers.
  3. Its possible, this stack can be built without Kubernetes & Containers. True, but i had been using this system for too many works. I’ll browse internet most of the time. By containerizing workloads, we can scale up/down based on needs.
  4. I spent long time on “Which” K8 distros and found one best distro that supports me. Canonical K8 was good, but went to something else. When you select a K8 distro, plan for your upgrade strategy. K8 has 3 months release cycle, with major enhancements in every version. K8 is an important layer on which you build your applications and manage it. If you’re unaware of upgrades and impact your production data will be lost on upgrades.
CNCF Kubernetes Distribution Release & EOS
  • Data cleaning is an important effort. Spending time on evaluating tools is a good investment than anything. It helped to reduce dozens of hours and improved processing time.

N Layers of Information Technology (Explorable graph in IT Domain)

In this article, you have seen the Infra, Apps, Databases and tools been used to bring an industry vertical data. As mentioned before this stack was mainly built with Vedic sciences/Astrology in mind.

Regards, N Gowthaman

Technology (Twitter)

Astrology(Twitter)

Here is an Astrology Analysis on Indian Economic Growth

--

--