The Search for Assumptions
Earlier this week, I posted a problem on my twitter feed for potential data scientists to answer. I have to admit that I have a different idea for what data science is about. For me, it is not about doing some computational analysis of data using R or Python for machine learning, and definitely not making some regressions between two variables or doing some statistical tests.
Data Science, for me, is about how people figure out what is going on in our life, what are the mechanisms of action in that process, predict the implications of it, and comparing the expected results with hard empirical data, obtained through various methods such as, but not limited to, surveys, data mining and machine learning.
There’s the word Science in data science. Statistical analysis of observed data is one part of Science, but there’s also the hard part: figuring out the assumptions that can be used to help understand the problem and put them into action with all the modelling.
From my limited experience, the first hard problem in doing science is how you formulate the problem, and the second hard problem is figuring out which assumptions are more useful to use in solving the formulated problem, in relation to the observable data. In absence of hard empirical data, there’s the problem of logical consistency that needs to be figured out so the model can hold against rigorous scrutiny. The results can be wildly wrong, but that’s not a problem, as long as the assumptions are logically consistent.
Based on this experience and expectations, I posted this problem:

It’s an open problem, and there’s a set of very different answers you can arrive to, dependent on the assumptions you’re using. I would say that every answer would be correct, assuming that the logic holds against scrutiny. Yet, during the week, I found that most of the responses I get is not about how long it would take for the humans in the ship to be decimated. Most of the responses, instead, asked about the assumptions I’m using when I formulate the problem. It went from the lines of whether the humans in the spaceship are in hibernation, to the probability of the two monsters in killing the people.
It seems to me that in the market base I’m addressing, there’s a sufficient lack of the courage in making assumptions and predicting the result of such assumptions.
In life, most of the things happening around us happens in an incomplete information context, dependent on the timeframe of our observation and the perspectives we’re using. Complete information is hard to find, and potentially very expensive to obtain. It is in such context that the scientific mind needs to think very hard to find the best potential solution for a problem observed.
Perhaps a little faith in the set of assumptions used would be really useful. With such a definite lack of information in an open problem, how would you know if one model would be less wrong than the other, without any data to guide you? You don’t know which model, which sets of assumptions, would be useful, but you need to work on them, believing that they can help you with the problem.
It is this kind of mindset that I would love to work with, to solve the real problems we’re facing in the real world. All while we’re being sequestered in our little dungeon, away from civilisation, stuck with masturbating our next less wrong answers to ancient, unsolved, and still recurring problems.
There’s probably some really good articles out there about how people can use the full extent of their mind to solve problems, written by logicians, philosophers, poets, writers, scientists, self-help gurus, and even marketing people. I’ll leave you to decide how you can find the best possible answer to a problem under incomplete information.
For all I care, intuition and reasoning is where humans still excel at, now that the machines can do lots of things better than us, regardless of what the guys doing Artificial Intelligence would say about the matter.