Crafting an Artificial Intelligence I

Drawing Parallels between Craft process and development of machine learning through engagement with the data as material.

Yatharth
Future Craft
7 min readApr 8, 2019

--

Extracting Geometry data from Images

For the last week and past few months I have been Increasingly working with Artificial Intelligence or more specifically Machine Learning. I have trained a system which can classify gestures, an object recognition system, a few generative algorithms and algorithms to visualize datasets. With more and more experience I have started to see machine learning less as an objective way of computing rather a very personalised and subjective way of dealing with algorithms. Almost always you have to feel your way around with the datasets and parameters and even the best approach doesn’t give a result which is objectively correct. It’s more organic than computational, same trained model can give different results every time its run. I will breaking down the process for three experiments I have worked on and try to draw parallels to the exploratory nature of crafts.

Section A : Gesture Recognition

We had to train a gesture recognition system to detect gestures of Traditional Indian instruments like sitar,tabla etc. Standard gesture recognition are highly stable in simple gestures like waving hands and clap etc. We were dealing with a system which was able to differentiate between very similar gestures of playing table,harmonium and sitting still.

The Gesture Recognition Installation

Sourcing The Data

To collect data, we would go sit in turns and try record all the gestures for different people in varying poses and train it on a classifier. The challenge was after initial success, we started realising, if people were of different height than we have trained for, the algorithm would not work. So we started modifying the dataset to make it height agonistic, that is we took the spinal cord’s length of a person and divides by the length of all the limbs. The data stopped taking absolute lengths of limbs but rather just the proportions and angles between them, which would remain more or less equal among most age groups and cultures. Our experimentations lead us to realize also that we can not build a robust system through just training a lot of data to algorithm, since every person would interpret the action and gesture differently, more data would actually confuse the system. For example some people play tabla too close to their sitting positions which can start confusing the system between both actions.

Data point generated by the sensor

We needed better data. We determined the easiest way to do so would be direct people to make certain gestures rather than letting people interpret the actions. We used icons to tell people what can be recognized and UI feedback to let people know what action the machine thought they were doing. This gave us more control over the kind of data were getting from people.

We sourced the material for our need and filtered out what would be useful for us.

Processing the Data

Processing model which emerged for final system

Just having the data run through is not enough. You need a way to make sense of it too. A classification system can only tell from the categories it knows. It’s easy to detect true positives, i.e. if a particular gesture out of 5 was enacted, but it’s far difficult to detect a case of that none of the gestures were detected. For the classifier, the action of not doing anything is also a gesture, and you need to give example of all the gestures. It’s easy to show this is how one plays tabla but how do you train a system on all the other actions you could be doing when not playing tabla, one approach could be to try all permutation and combinations but that can quickly lead to the program to start all gestures as not playing tabla, including the gesture of playing tabla. Too much data can result in noise.

The problem needed a way of cross checking the validity of our detection. After experimenting with different ways of processing the data, we settled on using the motion data, i.e the speed of each limb to identify the gestures and the position data which identified the arrangement of limbs and a mixed data, which through machine learning magic would account for the relation between two. We ran all these data through three different classifiers and then cross checked there validity, only letting a detection filter through if 2 or more classifier matched.

The approach was not most accurate but worked for our use cases. The solution came out of playing with the system and figuring as we went instead of an elaborate scientific planning.

Section B : Visualising Datasets

Recently I have been trying to visualize a dataset collected from 36DaysOfType through instagram, the goal was to group ‘visually similar’ type together to understand what kind of work people are producing using t SNE Dimensionality reduction algorithm

Mapping of 36DaysOfType images by geometry

Sourcing the Data

Collection of data was fairly simple through a script which downloaded all the posts on an instagram hashtag. The difficult part was to understand what does ‘Visually Similar’ actually mean, a simple run of the images lead to images getting grouped through there colors and ‘style’ but since here we are dealing with typography, just the colors of an image doesn’t tell much about the typefaces used. The challenged was to extract font information from the images and then map them, with my knowledge and timeframe I could not train another system altogether to identify the fonts for each image, so I tried to strip the images of all colors, and only preserve the geometrical information,ie the outlines of the characters, Running a visual similarity algorithm on a pure black and white image dataset lead to mapping of similar typefaces and treatments closer to each other.

Intuition

The important point about working with tSNE is that the mapping of data points it generates is not necessarily accurate in statistical sense. And the mapping can vary greatly between different runs and the results depend heavily on a few parameters called hyperparameters like perplexity and learning rate. There is no general rule about if more of this number would give better result or less,and there is no way to evaluate if the results are good or not, if you don’t know what the result should actually look like, the results are subjective both on algorithm level and at your level.

Interpreting the Mapping done through tSNE

The only way to go around tSNE parameters is to ‘feel your way through it’ and try all sorts of values and using your ‘intuition’ to judge what result you consider good enough for your purpose for there is lack of an absolutely accurate result.

Algorithmic Craft

The above two incidences and other experiments I have been doing has made me rely more and more on my ‘intuition’ and ‘gut feeling’ about how to process and evaluate a certain dataset. It always has been a process of playing around with the algorithm and data till I am satisfied with results, which echoes more with the craft approach of ‘engaging playfully with the materiality rather than pure mathematical reasoning and logic the word algorithm invokes.

Playing around with parameters gives different results

The machine learning algorithms and data are more like a lump of clay and a potters tool, which can take form through engaging with them as you go and developing a relation with the material.

Overtime one starts developing a feel for what may or may not work and also very essentially start accepting the roughness and artifacts left through the process, the mistakes and roughness becomes a character of the work you are producing rather than being a defect and can perhaps provide value in future.

This brings me to the danger of calling Machine Learning the 4th Industrial Revolution, Machine learning lacks the ‘finish’ and ‘objectivity’ of Industrial Nature. We risk ignoring the subjectivity and error prone results machine learning can lead to, perhaps as illustrated by various ethical voices raising in the AI community, A scientific-industrial trust in machine learning can result in pseudo-scientific results full of bias and subjectivity under the blanket term of ‘algorithmic objectivity.

Sketch to Render AI I am working on currently

In coming days I would be exploring more ways to develop a relation with the algorithms and seeing what a partnership with algorithms acknowledging its limitations and faults can lead to.

--

--