File Formats

Project 2

Earlier thoughts and ideas…

(Not part of the proposal)

Standards need to exist in this world. In society, in our laws and in technology. We are going to focus on standards in technology, specifically file formats for images.

A file format is a standard way of organizing data that other pieces of software and hardware can share and recognize. Image formats are separated into three groups, uncompressed, compressed and vector.

The first two, uncompressed and compressed are rasterized. This means that the pixels that make up the image are arranged in a generally rectangular grid.

Vector based images do not store data for each individual pixel, instead they store the calculations of polygons.

Raster based image formats

  • Jpeg
  • Jpeg 2000
  • Exif
  • Tiff
  • Gif
  • Bmp
  • Png

Vector based image formats

  • Svg
  • Ai
  • 3ds

Types of Compression

There are two times of image compression — lossless and lossy. These will be referred to often with each type of image compression.

Lossless compression algorithms are able to reduce the file size without diminishing the quality of the image. The original image is not degraded when it is created or updated. The downside to this is that file sizes tend to be larger then when comparing to lossy compression formats.

Jpeg, one of the most popular compression algorithms, is in the category of lossy compression. This essentially means that the original image is not fully maintained. It’s degraded slightly, and repeated re-compression will only further degrade that. But then why is the jpeg format just so popular? Well, it’s the trade off between vastly reduced (or compressed) file sizes for a minimal amount of loss of clarity. In fact, most often it’s not even noticeable.

Conclusion Ideas

Think about which file types you use on a regular basis. Does this analysis encourage you to make an alternate choice? While one file type across the board may not be the best answer, there are generally ‘more appropriate’ file types to use for certain situations and specifications.

The Wikipedia article on file types has a lot of very useful information on the topic: https://en.wikipedia.org/wiki/Image_file_formats#CGM


Second Revision

File Formats

A multitude of software programs exist in our world. They run on a diverse array of computers and operating systems. How how do they all store information in a way that is recognizable to themselves and other programs? Standards were needed to be created in order to facilitate this. These standards are what are known as file formats.

Focusing in on the file formats of images, there are two main categories, raster based and vector based. Raster based formats store a grid of pixels in a rectangular form. Conversely, vector based formats store only the mathematical lines and shapes of polygons and no individual pixels.

As raster based images tend to have larger file sizes compared to vector based images, varying compression algorithms have been developed. But only two fundamental aspects of compression exist: lossless and lossy. Simply put, lossless compression compresses the image without degrading the quality of the image. All of the data is retained. Lossy compression on the other hand looses quality, but these losses may not be noticeable and the file sizes may be dramatically smaller, which can be helpful in its own right.

Third Revision

A multitude of software programs exist in our world. They run on a diverse array of computers and operating systems. How is data stored in a way that is universally recognized by these programs? Standards known as file formats exist in order to facilitate this.

When looking at image formats, there are two main categories, raster based and vector based. Raster based formats store a grid of pixels, in a rectangular form. Pixels are each individual dot which makes up your screen. When thousands of them are put together, an image appears. Conversely, vector based formats store only the mathematical calculations lines and shapes of polygons.

Raster based images tend to have larger file sizes compared to vector based images. Because of this, various compression algorithms have been developed to decrease the file size of raster images. There are two categories of compression: lossless and lossy. Simply put, lossless compression compresses the image without degrading the quality of the image. All of the data is retained. Lossy compression on the other hand looses quality, but these losses may not be noticeable and the file sizes may be dramatically smaller. This can be very useful when smaller file sizes are needed.

Sketching Actors & Relationships

Switching over to the 2nd project, we were assigned to sketch out actors and relationships according to Don Moyer’s ideas.

Here’s my take on it. My project is on file types. I have designs for various file types. Large for uncompressed, small for compressed. Neat grid lines for lossless (no data loss) and missing grid lines for lossy (some data loss). Vector files don’t have a grid, but two lines and a triangle.

I’m not entirely sure of other actors yet, but computers & file systems are one — they are why these standards exist in the first place. The disk represents these standards.

I’ve also started to think about the process of compression. What is that like? I have imagined it as a factory or a conveyor belt almost. It takes the large uncompressed file and turns it into smaller compressed files and then delivers it to disk so computers can understand it.

Early Sketches

10–11–16

I feel there are some main aspects to this project that need to come together in order to enable to creation of a successful video.

  1. The Script. I’ve been thinking about how I would say some of the things I want to get across to my audience. I need to refine the script.
  2. The Storyboards. I have a few independent ideas sketched out, but I’m not yet sure how I want to connect them.
  3. The Content. I am still in the process of figuring out just where to start and stop this project. How much set up and introduction should there be? On the other end of the spectrum, how detailed should I get with regard to the various file formats? Or should I keep ‘zooming in’ and focus on teaching about the compression algorithm used for just one or two specific file types? I’ve been doing some additional research on this aspect of my project. I’m not a great math or computer science person and some of the terms are pretty high level stuff (for me at least) and want to be able to understand as much of it as I can in order to break as much of it down as I can for my video.

Ideally, I’d like to build on all of these aspects as a group and come up with a concrete plan as I go, but I’m not entirely sure if that’s feasible. I’d like to be able to connect some of the ideas and sketches that I’ve done so far.

The Script (Thus far)

How do you share a digital photo with another person on another computer? When you send someone a jpeg, or download a gif animation, you are utilizing specific types of file formats that computers can universally recognize and decode.

There are a plethora of ways in which computers encode and store visual data. Each type of file and each type of compression algorithms work in subtly different, yet important ways.

There are two main categories of image formats, raster based and vector based. What’s the difference? Let’s look first at vector-based images.

Vector-based images as the name implies, store the mathematical calculations of shapes, lines and polygons. The common file types this category includes are ai, svg. If we think of this in terms of a file containing Carnegie Mellon University’s word mark, the only information stored in that file is the outline of the text and any shading or coloring inside the words. [show the CMU word mark and highlight the outline shapes]

Why is this useful? Well, for one, you can zoom in or make the words very large and they will be completely clear. [show the CMU word mark being zoomed in to just a part of a letter but shown to be very clear]

Vector vs Raster

Raster-based images on the other hand, store a rectangular grid of pixels within the file. [show a grid forming of colored pixels] Common types of these would be jpg, psd, tif, gif, png. When we look at our example of the CMU word mark, individual colored dots or ‘pixels’ make up the words. [show the CMU word mark being zoomed in to just a part of a letter and show the pixelation]

Because the sheer number of pixels involved, the file sizes of raster-based images can become very large compared to that of vector-based images. In order to shrink the file sizes down to more manageable levels, other file types have been created that are able to compress or shrink down the size of the files.

There are two basic categories of compression: lossless and lossy. [show the words, big] Lossless compression reduces file size without a reduction of the quality of the image. Common types of these files are gif and png. Lossy compression reduces file size with a reduction of the quality of the image. This includes the most widely used type of file, the jpeg.

Visual reference for Huffman encoding jpeg compression. Source: http://www.print-driver.com/stories/huffman-coding-jpeg

[go into jpeg compression above in very simple steps]

But just why do we need our files to be smaller? Well, there are several reasons.

  1. It reduces sending and loading times. The smaller the file, the faster it will send. A large file will take more time and more bandwidth to send.
  2. Viewing and manipulating too many large files will slow down your computer system.
  3. Many mobile data plans have cap limits today. If you have a cap of 2GB of data per month, how many 8MB files can you download? About 250. [show the math calculations in big letters] When the file is 500KB instead, that allows for over 4000 downloads.

But this all depends on the use of the image. Will this file be printed out as a large poster? If so, a larger, uncompressed file will be useful. Is this going to be shared on social media? A smaller compressed file would be best. This file doesn’t need to be perfect quality, it needs to be small so it can load quickly and not slow down the busy social media platform.

While file type is most useful for a given situation? It’s important to recognize when a file is too big for what its use is. This will save time, storage and processing power.

The Script (Revision 2)

Introduction
You’re hiking and you take a great photo of the mountains. And if you’re like me, you want to share it with your friends. But did you ever stop to think about how your every computer and phone can recognize a file that you took with your camera?
This is all made possible because of standards we know as file formats. A file format is a way that information can be stored that is universally recognizable on all devices. When we look at the types of files that are used for images, we find a wide variety of formats, each with their own advantages and disadvantages.
Lets take a closer look at how images are stored. There are two categories of file types — vector based and raster based.
Vector-Based
Vector based files store the mathematical calculations of polygons, or a series of lines and shapes. Here’s the Carnegie Mellon University word-mark. Look closely at the edges of the words. [zoom in] Notice how clean the curves of the letters look. That’s because the curve itself is what’s stored in the file. As you might imagine, vector-based images are very useful if you are working with a logo or shape that would need to be viewed or printed very large. Some common types of vector files are: .AI and .EPS.
Raster-Based
Raster-based files on the other hand, store a rectangular grid of pixels that make up the image. Look at the CMU word-mark again. When we zoom in, it becomes apparent that the letters are made up of many individual pixels. [zoom in]. All photos that we take or view fall under this category. That’s because all photographs contain pixels. Some common file types you might be familiar with are: .GIF .BMP .PNG .JPG.
File Sizes
One important thing to keep in mind when working with images are their file sizes. Most often, larger files will take longer to download, modify and store. The advancements of broadband networks, processing power and computer storage tend to offset this, but best practice encourages us to use and create images as large as we need them to be.
Some important things to know:
For vector-based images, the dimensions of the image has no relevance to size of the file — that’s based solely on the number of polygons within the file.
Some of the raster-based images with the highest file sizes are often the uncompressed RAW data that a camera uses to create an image. There are only a few reasons to keep this data in its entirety. If your photo is not going to be subject to some advanced manipulation techniques or printed at a large size, then it may be best to compress the image.
Compression
So just what is compression? In simple terms, compression removes the irrelevant and redundant data so that the image can be stored and transmitted in its most efficient form.
So how does this work, exactly? Let’s think about a grid of 20 by 20 pixels. Uncompressed, the information of each of those colors is stored individually. But as an image is compressed, the algorithm counts up how many times each color appears and they are placed into a model that that puts the most frequent colors first. Why is this important? Well it takes far less data for the computer to point to the the first color than it would to store that color in memory. Which is easier, saying what green is or pointing to a green square on the wall?
That is how lossless compression works. Lossless compression is reversible and can compress the image without removing any of the quality. Alternatively, lossy compression removes some of the image quality. But does this actually matter? No. In fact, in most cases, the information that gets removed is indistinguishable to the human eye. And the reward for loosing the data that we wouldn’t get to see anyway: A vastly smaller file size, which is helpful for real-time video transmission and images on websites among other things.
File Types
Let’s look at a few of the most popular and supported types of images.
JPEG. This is by far the most common format for images. Most digital cameras use this file and it as universal support. It also offers small file sizes, due to its lossy compression.
TIFF. These are larger lossless files that are widely accepted as the standard for photographers and printing presses. However, there is minimal support by web browsers.
GIF. It’s limited to only 256 colors. Because of this and lossless compression, it’s best suited for graphics with few colors, such as diagrams, shapes or logos. Because of its simplicity and acceptability, it’s widely used to produce short animations. Transparency is also supported.
PNG. Originally created as an open source alternative to GIF, it is very versatile and advanced. Supports Lossless compression, an alpha channel (for transparency). File sizes tend to be larger than that of a JPEG, but smaller than a TIFF. Widely supported by web browesrs.
Summation
Whew, that was a lot! We’ve covered raster and vector-based images, compression algorithms and some of the major file types.
Now the rest is up to you! The next time you save, share or download files, keep this information in mind and ask yourself what the most effective file format for the task at hand is. Cheers to saving time, storage and processing power through good decisions made when choosing a file format!

10–18–16

Today was a good working day that started off with getting some answers to my questions relating to the script above and working on some new ideas on how to better form and visualize ideas for this project.

Visualizations in progress

I was pleased with how these ideas had been developing, but as I’m learning with Design, the expression “One step forward and two steps back” (or is it the other way around?) tends to become more and more familiar.

It has been suggested to me that I’m ‘changing scenes’ too drastically. Instead of having a narrative with a nice flow, I’m creating 3 narratives and comparisons within the concept of this project. (Ex: taking a photo, comparing raster vs vector, and explaining file types.) I’m breaking this information down as I would for a PowerPoint presentation. Group things together, move on to the next idea for the next slide. But I think that I’m now realizing just how much more I need to be thinking about the transitions. This shouldn’t be PowerPoint in video form. I need to alter my thinking.

As the end of class approached, I’ve come to acknowledge that there are some ideas I need to refine. For instance, instead of talking about a photo of mountains and then switching to a word-mark, maybe I could be talking about a photo that has some lettering in it? Then later I could pull out those letters to demonstrate what is vector-based form.

The idea of just ditching the vector comparison altogether was brought up. On the one hand, it would make my piece simpler, but on the other hand, vector graphics are a big part of image file formats and I like that I’m following this hierarchical structure down through these formats.

Development of the Script Continues…

I’m finally in a place that’s close to finalized with the script. The first take of the audio is recorded. Recording it actually allowed me to consider the way I had part of it written from a new perspective. I’ve read it aloud before, but when I really am recording it “for real” I tend to pay more attention to how some of the words read.

You’re out at your favorite coffee shop and snap a picture of it to show your friends. It’s so easy to share. But did you stop to think about how this is possible?
Images are stored in standardized file formats that are universally recognizable to all devices. These files are raster based, meaning that the file stores a rectangular grid of pixels. As we zoom into the image, the individual pixels become visible.
The other category of image files is vector based. These files do not contain any pixels; instead they store only the calculations of lines and shapes. This makes them well suited for text and simple graphics such as logos. Here’s a close-up of the coffee shop’s logo as a vector based image. Notice how the curves of the letters remain completely clear.
When working with images it’s important to assess their file sizes. Images with large file sizes will take longer to download and eat up more storage space relative to smaller files.
Fortunately we are able to compress large images to take up less digital space. Compression removes irrelevant and redundant data from an image so that it can be stored and transmitted in its most efficient form.
Image compression typically counts up the pixels of similar colors and stores the results in a key. Instead of storing the data used to determine each color for each pixel, the computer stores the data once for each color and references it from the key. This is known as lossless compression because the compression is fully reversible.
The other form of compression looks for pixels of similar colors and averages those colors together. This will reduce the number of colors stored, which reduces the file size of the image. It’s important to note that this lossy form of compression removes details from the image. However it’s usually in ways that are not so noticeable to the human eye.
Let’s look at where some of the common formats fall under these categories. [file size vs visual quality graph, jpg, gif, png, tif, eps]
Now let’s look at the types of files (raster/vector) and types of compression (lossless/lossy)
Reference this information the next time you save, share or download files, and ask yourself what is the most effective file format for the task at hand. Cheers to saving time, storage and processing power through good decisions made when choosing a file format!

10–25–16

With the script (mostly) finalized — subject to some minor changes to go along with edits to the movie — I’ve been able to focus on to some visuals.

Visuals v.2

Big changes begin with the image itself. Instead of a vector based image representing a mountain scene, I’ve chosen an image of a coffee shop. Chosen because it’s a raster based image that has text in it. It’s also something that would conceivably be featured in someone’s photograph. And instead of using the CMU word-mark to show what a vector based image is, the logo of the coffee shop can nicely demonstrate the difference between how text looks in either type of format.

Areas where I’m still working are

  • to show what the differences of file size mean on downloading
  • to introduce the concept of compression (as mechanization?)
  • to show what some file comparisons are

Here’s an updated version of the visuals. All examples of pixelizations are done with the same coffee shop image. However, because there’s a lot of “ugly brown pixels” that may not show enough visual distinction on the screen, I need to look around for a better image of a coffee shop to use. I’d also like to find one that has the sign overlaying the bricks to further my design.

Visuals v.3

The last row (the file type comparisons) is something I need to do a lot of work on. Stacie suggested using my coffee shop photo but represent it in varying degrees of compression and quality, while possibly displaying its file size. Stay tuned for future updates on the subject!

A single golf clap? Or a long standing ovation?

By clapping more or less, you can signal to us which stories really stand out.