Library of Congress Subject headings are often assigned alternate labels when created. The tracing terms are created to provide alternate access points to the official terminology. For example “Hydrothermal vents” has a number of alt labels:
“Black smokers (Oceanography)”
“Hydrothermal deep-sea vents”,
“Oceanic hot springs”
“Deep-sea vents, Hydrothermal”
The idea is in your library discovery system when someone searches for “Black smokers” they would end up with results for things assigned “Hydrothermal vents”
Wikipedia has a similar system called redirects. When someone goes to: https://en.wikipedia.org/wiki/Black_smoker they get redirected to a part of https://en.wikipedia.org/wiki/Hydrothermal_vent. These redirects were created ad hoc as needed. The result is an invisible folksonomy created over the history of the article being edited, merged with other articles, and Wikipedians adding additional access points to the article based on how people expect to locate information on Wikipedia. …
The new sample dataset from the web archives of the Library of Congress consists of 1000 audio files archived from various government websites. Hearings, announcements, podcasts from members of congress, all sorts of audio. I was thinking about how they could relate together and was curious about repetition of phrases. In political rhetoric the device called anaphora is the act of reusing the same phrase over and over for impact. I also wanted to try out AWS Transcribe, a service that takes audio files and produces an automated transcript. …
I was excited to see the digitization and posting of some of the 130,000 Warhol exposures given to the Cantor Arts Center. Many of the photos are repeats, Warhol trying to get the right pose for a portrait or the right angle of a still life. But repetition doesn’t mean duplication. Each one has a slight variation, a head tilt or an eye movement that puts it someplace in between repetition and elaboration.
The Tool: https://tb.semlab.io/
The fundamentals of Linked Data are pretty straightforward at a conceptual level. The difficultly comes in like most things with the details. What is the URI of this predicate? Can I say this literal is in another language? How does this triple look serialized? This is where the learning curve usually is, how do I actually make a RDF statement and what does it look like?
When I started teaching metadata and linked data I made little tools for my students to use that reduced the complexity of building RDF. …
I came across quite a few screenshots included in patent application for early(ish) websites/technologies. I love being able to see the old web environment, reskinned IE, stacks of toolbars, random applications open in the Windows taskbar. I also like very much the xerox aesthetics and general bizarreness. Here are few, in no particular order:
This Twitter bot was the most media heavy one I’ve created. I wanted to document the process of creating something like it from start to end and share the source code. The goal of the bot is to collage together 16,000 audio effects into 60 second videos. Repo here.
We need a listing of all the sound effect files in order to download them. …
Growing up visiting the public library my siblings and I would always head over to the CD drawers looking for anything good. They were a weird slanted wood laminate shelving system full of jewel cases.
There would never be any new releases, having already been snapped up by someone else. But there were always rows and rows of sound effect CDs. Once and a while we would check one out, for the novelty of it. We would play it in the living room five disc player and marvel at a CD with 99 tracks. It was extremely boring. …
Building on my previous look at book (ISBN) citations used in English Wikipedia using the data recently released I turn to the other prominent citation type in the dataset: DOIs. These DOIs mostly represent journal articles referenced on Wikipedia.
Same as the book citation analysis I’m only looking at English (en) citations released.
3.79 Million citations
1,211,807 DOI citations
835,517 Unique/resolvable DOI citation analyzed
To gather the data I politely used the CrossRef API to gather data they had for each DOI. This API returned a lot of data that can be aggregated. I was curious about a lot of the same data features found for ISBNs but since the majority of these references are for journal articles it also introduces the notion of publisher and access (not saying that is untrue for monographs, but is more of an issue for journal…
Citation data used on Wikipedia was recently released connecting the identifiers of source materials to the Wikipedia article using them as references:
I was curious about the type of books that were being used in the Wikipedia ecosystem. When were they published, what authors are prevalent/influential, what subjects were most common, etc. Using the citation data released and OCLC metadata APIs I gathered some stats about books cited in English Wikipedia articles.
Citations were released for each Wikipedia language site. To make things scoped I just looked at the English (en) Wikipedia articles. …
If you explore this large list of books from the US Library of Congress I made last year you will notice interesting repetitions in titles. Since it is sorted alphabetically you will get runs of titles that start with the same phrase:
"1001 things to do with your kids",
"1001 things to do with your Commodore 128",
"1001 things to do with your personal computer",
"1001 things to do with your ATARI ST",
"1001 things to do with your IBM PS/2",
"1001 things to do with your TRS-80",
"1001 things to do with your Apple",
"1001 things to do with your Amiga",
"1001 things to do with your IBM PC",
"1001 things to do with your Macintosh",
"1001 things to do with your Commodore 64",
"1001 things to do with your Apple…