AIAI Berkeley Extension 2020

AIAI Berkeley Acceptance Email

What is a true data scientists/analyst?
Someone who is a leader, can analyze data, and tell a story. “A data scientist is a data analyst with big pay.”

Why is learning how to analyze data important?
You gain confidence, convince others, and solve problems through analysis. For analysis of the coronavirus, for younger folks, it’s about 0.5% mortality while 2% for older folks.

How do you deal with fake data?
If the information does not add up or compare well with other data, then it is important to note that. “The world of data may feed us with information they want” but few people actually understand. “To make more data more ethical, relevant, and unbiased, we need to be everyone on the planet to be data scientists or data aware.”

What is digital darwinism?
Digital darwinism, developed by Byran Solis, is the idea that companies need to adopt digital aspects in order to survive. Think of blockbusters and how they are no longer around.

“To survive in this world, you need to innovate…be agile” but the problem is these things “cannot be dictated.” Instead, “they can be fostered.”

“Mastering data, regardless of your level, will give you two things: insight and time.”
1) Insight: you have confidence to act upon something solid. Data will give you a sense of assurance or pragmatic
2) Time: interacting is important. You need to have control withthe data and be efficient.

“Data is the new oil.” — Gauthier Vasseur, the executive director of the Fisher Center for Business Analytics

Important Rules
#1 Keep your executives out of jail (if you don’t have your data in track, then you could be in trouble).

What is Black Swan Theory?
“The black swan theory or theory of black swan events is a metaphor that describes an event that comes as a surprise, has a major effect, and is often inappropriately rationalized after the fact with the benefit of hindsight.”

It is an event which has a low probability to occur but the maginitude of impact is very high. An example is 9/11.

How do you manage Black Swan?
If you have your data, you could adjust and have the agility to figure out what to do next. Side comment: This is true. We may not be able to control something but with more data, we can figure out what to do next with a higher probability. This definitely could be applied to healthcare!

Where do you start?
1. Start with the right question:
what’s your business challenge/question? Narrow down what you are using your data for. “You want to predict _____…but for what?” There has to be a *why*. What are you trying to achieve?
Do not start with concepts such as “Lets do a Big Data Project.” You are here to solve problems. Start with the problem.

2. Know your technology

Why do we need systems?
Automation, survive, improve efficiency.

Two families of systems:
Decisions
Transactions

Transaction Systems run operations
Focus on execution
Continuous processing
Flawless
Efficient
Compliant with IRS, etc.
Highest level of granularity

Middleware
Software between transaction and rest of the stack. This will perform a serious of tasks. 80% of your work is done here (data preparation, capture, and alignment).

Three acronyms for middleware:
1) Extract Transform Load (ETL, verb & noun)

Master Data Management
Master of the data that will structure your data (eg. names, zip code, etc.). Setting up the structures of your data.

Data Quality Management
Missing fields
Error
Wrong format

Database
Where you store your data. “Where IT meets business.”

Request Database for Business
Speed, Processing Power, Storaeg capacity, Ease of Maintenance and agile evolutions, sandboxes

The Analytics Solutions
80% of the work happens before analysis. Where you will analyze and view your data.

The Reporting Solutions
Where you convey your results to the right stakehoder.
3 Goals:
1) Trigger an action
2) Drive a decision
3) Call for an additional support

Important Stack
Reporting (e.g Canary )
Analytics Solutions (blood stream)
Databases
Middleware (breathing, tranforming matabolism, lungs)
Transaction (place that has the most data)

The simplest form of database is a csv file.

— — — — — — — — —PART 2 — — — — — — — — — — — — — — — — — — -
“People will adopt one technology over another because they will look good. Their LinkedIn profile will get beefed up.

“The impact that you have could go beyond the walls of your workspace. Even in technology, be humane, be responsible, and have a good cause.”

Secondary Cardinal Rule
Prepare first — then Analyze: 80% of the work is data prep. Simplest form of database is .csv file.

Txt Delimited Files
Good: handles volume, versatile, easy to create/extract
Not-So-Good: join management, formal constraints, maintenance, protect. Not ACID.

ACID = atomicity, consistency, isolation, durability . The guarantee that your file will keep your integrity or not.

DataBase Darwinism
Txt
SQL — Relational DB
Olap — Multidimentsional DB
Columnar DB
NoSQL

Olap Database
Good: pivote rows and columns, calculate fast
Bad: doesnt scame much in volume, MDX,, core

NoSQL:
hodgepodege of unaltered data

“Youre going to make 1984 look like a fairytale if you don’t stop it.”

The 3 Aspects of Data:
1) Structure
: mostly a table (CSV)
2) Role:
measures or metrics, attributes or dimensions, keys or unique ID, calculations or transformations
3) Format:
number, decimal, etc.

Attribute: information you get for your measures

Attributes:
1) Nature — bring information about the measures, add ways to the group, sort, or organize data
2) Format — numeric, alphanumeric, boolean
3) Quality can suffer

When you first get a data set, make sure you have unique keys. To make the key unique, add a letter to the left and right side.

Types of Data Relationships:
1) One to one:
2) One to many:
3) Many to one:
4) Many to many:
You can sum on many side but not on one side.

— — — — — — — — —DAY 2 PART1— — — — — — — — — — — — — — — — -

Sugar vs Rating with Fat content in mind

To find the correlation between two things, it is good to use a scatter chart.

Left join from manufactruer to cereal2

When we want to joining, joining will have a tremendous impact on how data will work. If we have a left table in which we don’t want to lose data, we want to protect the cereals and do a left join. However, no keys are missing on either side right now so any join could have worked for the cereral dataset.

You click bidirectional so that they (the two data tables) could interact with each other.

Housing Prices (y-values) vs Rooms (x-values) vs Bedrooms (colors)

— — — — — — — — — DAY 2 PART2— — — — — — — — — — — — — — — — -

Stretch the Roles of Data

Profile your Data
Leverage your analytics to identify potential errors:
- Mathematics: average, max, min
- Analytic Reviews: trends
- Outliers

Determining outliers in sugar ratings (Rating xval, Sugar y-val)

Manage your Keys
The key will work if it is unique but it will be better if it is universal.

Don’t match and merge eveerinces?
-Biology/Healthcare background
- Compliance

  • Ethical and elgal irsh.

— — — — — — — — — DAY 2 PART3— — — — — — — — — — — — — — — — -

Read: articles
1. http://gauthiervasseur.com/2016/10/06/becoming-data-driven-begins-human-qualities/

2. http://gauthiervasseur.com/2016/08/31/waking-everyday-data-ready/

3. http://gauthiervasseur.com/2017/02/15/rise-digital-leaders-new-deal-gender-equality/

4. https://www.theregister.co.uk/2013/08/30/google_f1_deepdive/

Homework: in groups, prepare a 5 minute presentation of your story with the data. Eg. When you first saw the data, how you cleaned the data, how you are modeling it.

You should be able to show your question and explain that you answered your own question.

“And we even failed while doing…” Failure is okay if you are able to articulate why. Each person should talk in this journey toward explaining their thought process.

4. Get the Real Big Data Value
Big data used to be defined by three Vs: volume, velocity, and variety. Big Data is like Google Data, amazon data, facebook data. It is far more than 200k pieces of information. Atronomers generates 35,000 DVD + /second for astrophysics. https://sciencebusiness.net/news/79927/Square-Kilometre-Array-prepares-for-the-ultimate-big-data-challenge

Real Time Operation Analytics is good for transactions (eg. operating purposes online). Velocity is nor the problem.

YBG-IGB

You begin, I begin.

The holy grail is variety.

70–80 % of the projects in data science will faill. You have 1 month until the end of the course to deal with data.

We Unleash Your 4 Secret SuperPowers
1. “The power of ‘I can do’”
2. The power ‘I dont know’
3. The power of why
4. The power of Fall Fast

Streamline your Creeper Processes
Creeper Processes are processes that plague every department, They generate hundreds of wasted hours, degrade transparency and compliance, and build upon inefficiences across entire organization.

Analytics Data Supply Chain (7 steps always)
Question
2. Identify data you to need to work
3. Capture the data
4. Prepare it.
5. Store it.
6. Analyze it.
7 Report.
8. Act
Decision

Swarm Intelligence
The collective bheavior of decentralized, self-organized systems, natural or artificla. Anyone at any level is empowered to be able to use and understand data.

Interactive Tools Used:

https://www.menti.com/
(Poll Taking — amazing way to keep things interactive)

Powerpoint with subtitles as speaking

https://csved.sjfrancke.nl/index.html
CSVED CSV editor

Analytics
https://partners.pyramidanalytics.com/

Additional AI Learning Websites
https://sites.google.com/view/foothill-college-cogsci-aiclub/student-resources/software-training

AI4All Open Learning Program

--

--