The role of data plotting and graphics in accurate model development
The assigned reading mentioned the importance of data plotting and graphing in discovering problems with the model that cannot be otherwise detected by looking at the numerical statistical representation. I find it essential to discuss the role of data plotting and graphical representation in developing an accurate model of the data collected as this topic was not discussed more in-depth in this reading.
We could see the usefulness of data plotting by looking at figure 3.5 in the reading, this figure represents a three dimensional graph for radio and TV advertising versus sales. Some data points in this graph are above and some under the regression plane, suggesting these data points have been either overestimated or underestimated. These overestimated or underestimated data points might not be clear by just looking at the numerical statistics, and the researcher might use the wrong model for representation of sales and not consider the interaction between variables.
In our reading about EDA, graphics were mentioned as the main tool for the exploratory data analysts. This significance was put forth by Tukey’s 1977 statement about the role of graphical representation in data analysis “The greatest value of a picture is when it forces us to notice what we never expected to see”. Garner (1974) explained that the use of visual aids or images will activate the visual spatial memory systems allowing for a better pattern recognition.
The use of graphical representation and model improvement is an important cornerstone in my area of interest, which is the use of computer vision tools and machine learning in reducing driver distraction and improving safety on the road. Advancement in computer vision allowed the introduction of many in-vehicle safety systems (e.g. stop sign detection, lane departure warning, etc.). The main idea in vision toolboxes is the use of sample data to train classifiers for feature detection, feature extraction, stereo vision, and camera calibration. The cascade object detection classifier can be trained by using a sample data, such as stop sign images to detect such features in a video. This important feature can be applied to initiate the brakes if the driver fails to do so.
Example: A sample size of 200 images of stop signs was used to develop a cascade object detector to recognize stop signs on the road. The detector was trained by using positive images (stop signs) and negative images (anything but a stop signs). A bounding box will be drawn around the Region of Interest (stop sign). The detector was a five stage classifier, in each stage the cascade object detector will use the positive images from the previous stage to train the next one, and in each stage a better model is developed. The main idea is to minimize false positives or false negatives. But one should be careful to not over train (over fit the model) as this will increase the rejection of true positives. In the provided video, we see some false positives, this is due to the sample size used to train the detector, if a larger sample size of stop sign images used a more accurate model will be developed.
Researchers have recognized the importance of interactive graphing software in data analysis. Cleveland (1985) has a great impact in the field of statistics and the use of graphics in data analysis and model development. Kosslyn (1994) discussed many rules for data graphing and representation from a psychological perspective.
Garner, W. R. (1974). The processing of information and structure.Potomac, MD: Erlbaum.
Cleveland, W. S. (1985). The elements of graphing data. Monterey,CA: Wadsworth.
Kosslyn, S. (1994). Elements of graph design. New York: W. H.Freeman.