Similarities Between Ocean Liner Menus and More

Published in

INST414: Data Science Techniques

8 min readMar 9, 2024

Author’s Note

My previous post, “Preserved Menus Network Exploration” will be referenced often in this post and some methods will be used in the current analysis.

The Question

In an era before the plane, ocean liners were once the only way to cross oceans. These massive ships carried immigrants, tourists, and the rich on multi-day voyages to destinations on every single continent. Much like modern day cruise ships, ocean liners had to feed several meals a day to passengers and crew on the ship. Each ship typically had multiple restaurants and/or canteens, each of which serving one of several classes (ex: 1st, 2nd, 3rd). Many have studied historical menus from ocean liners. However, little research has been done on how similar these are to food served elsewhere.

How similar was the food served on ocean liners to food served elsewhere?

This question will be answered from a data science point of view. The answer may be helpful for historians looking for information on menus, historical associations or museums trying to confirm theories about similarities between companies, and it may also be helpful for the individuals who own the data the analysis will be performed on.

An example of an ocean liner, Cunard’s Queen Elizabeth. From the personal collection of Simon Miller.

The Data, Collection, & Cleaning

Ideally, the data needed to answer this question would be a subset of historical menus, including those from maritime companies that operated ocean liners and regular restaurants. This data does exist! Much like in my previous post, this analysis will use data from the New York Public Library’s “Whats on the menu?” initiative. This program digitizes and stores data from the library’s collection of menus from the 1800s to the 2000s. Retrieved via an e-mail found on the “Whats on the menu?” website, this data is normalized and stored in several CSV files, Dishes.csv, Menu.csv, MenuItem.csv, and MenuPage.csv. For this analysis, only the sponsor (this can be a company, a ship, a hotel, or a restaurant) in the Menu.csv file, and the dish_id in the MenuItem.csv file will be used. Because the dataset is normalized, ids (sometimes called keys) must be used to navigate through the files to retrieve which sponsors served which dishes. To achieve this, the following block of Python is executed.

menuDict = {} #create empty dict

for index, row in dfMenu.iterrows(): #iterate through the rows
    menuId = row[0] #save the Menu ID
    sponsor = row[2] #save the sponsor name
    
    #break if the length of the dictionary is 250
    if len(menuDict) == 250:
        break
    
    if re.search(r'\bDINNER\b', str(row[3])): #use regex to find when dinner is stated in the event row
        dishList = [] #create empty list to store dishes
        #find the ids(which is menu_page_id in MenuItem.csv) at the menuID for this individual menu
        seriesPageID = dfMenuPage.query(f'menu_id == {menuId}')['id']

        #for each menupage ID...
        for menuPageID in seriesPageID:
            seriesDishID = dfMenuItem.query(f'menu_page_id == {menuPageID}')['dish_id'].dropna() #...find the item IDs for each menuPageID and drop NaN values
            
            #find each dishID
            for dishID in seriesDishID:
                
                dishList.append(int(dishID)) #append the dish to the dish list

        #add the dish to the dictionary, making sure to append if it already exists
        if sponsor in menuDict:
            menuDict[sponsor] += dishList
        else:   
            menuDict[sponsor] = dishList

This block returns a dictionary that contains sponsor names and a list of dish_ids that the sponsor has served. In my previous post, this block of code only returned a dictionary with a length of 100. Because the code will be lighter in this analysis, this number is increased to 250.

In the event row of Menu.csv, a regular expression is used to find mentions of the word “DINNER.” This means only menus that were used for dinner service are considered. In previous post, this was used to avoid communities forming based on time of day, which is not applicable to this analysis. Originally, this line of code was removed. However, this increased the number of dish_ids for each sponsor, slowed the program down, and created errors. Considering time limits, the line of code was added back in, resulting is a more focused subset of data.

One of the menus from the New York Public Library. This menu is from one the Matson Line ocean liners. Credit: https://menus.nypl.org/menu_pages/63920

Similarity

To begin the process of finding the similarity, the dictionary was placed into a Pandas DataFrame. The rows contain each sponsor, and the columns contain unique dish_ids. Each cell contains the number of times the sponsor serves each dish. The two blocks of code used to create this DataFrame is shown below.

uniqueList = [] #create an empty list

#iterate through each list in the Menu Diction
for ilist in menuDict.values():
    
    #iterate through each dish in the list
    for dish in ilist:
        
        #if the dish is not in the unique list, add it
        if dish not in uniqueList:
            uniqueList.append(dish)
            
print(len(uniqueList))
print(uniqueList)

#create a dataframe with zeros, columns as the unique list and the index as the keys from the dictionary
df = pd.DataFrame(0, columns=uniqueList, index=(menuDict.keys()))P

#iterate through each key and value pair in the dictionary
for sponsor, dishes in menuDict.items():
    
    #iterate through each dish in the list
    for dish in dishes:
        df.loc[sponsor, dish] += 1 #add 1 to that value in the dictionary

df.head(10)

The previous post largely failed because it was sensitive to magnitude. The greater the number of dishes each sponsor served, the more important it was in the network. It was decided to use cosine similarity to analyze this DataFrame because cosine similarity is based on the angle between two vectors, meaning it does not take magnitude into account. Therefore, it is ideal to use for this analysis where some sponsors may not have as many menus preserved, and therefore not as many unique dishes as others. The function provided below performs the cosine similarity calculations on the DataFrame. Calling the function with the name of a sponsor will return the top 10 most similar sponsors.

def cos_sim(row):
    """
    This function takes in the name of the row and prints the top 10 most similar sponsors, utilizing cosine distance.
    Heavily inspired from Professor Cody Butain's code as seen below:
    https://github.com/cbuntain/umd.inst414/blob/main/Module03/02-Similarity.ActorsGenre.Normed.ipynb
    
    Inputs:
    row(string): A string representing the row 
    """

    #Gathering the genres for that sponsor
    target_sponsor = df.loc[row]

    #Generating distances from that sponsor to all the others
    distances = scipy.spatial.distance.cdist(df, [target_sponsor], metric="cosine")[:,0]

    query_distances = list(zip(df.index, distances))

    #Printing the top ten most similar sponsors to our target
    i=1
    for similar_sponosor, similar_dish_score in sorted(query_distances, key=lambda x: x[1], reverse=False)[:10]:
        print(f"{i}.", similar_sponosor, similar_dish_score, df.loc[similar_sponosor].sum())
        i+=1

Top 10s

It was decided to pick three different maritime companies that operated ocean liners from three different countries to find which sponsors were the most similar to them. Cunard Line from Southampton in the United Kingdom (service largely to New York), Norddeutscher Lloyd Bremen from Bremen in Germany (service largely to New York), and the Occidental & Oriental Steamship Company from San Francisco (service largely to Hong Kong). The results follow.

It is important to note that the first similar sponsor will always be the sponsor the similar is based on. Each line contains the sponsor, the cosine similarity score, and the number of dishes. All maritime companies are highlighted. Some companies have a hyperlink that leads to more information.

CUNARD LINE 0.0 769
USMS 0.7818385713319268 303
USMS ST LOUIS 0.7955010172158054 86
U.S.M.S. 0.8041222783256909 196
HEADQUARTERS 47TH INFANTRY U.S. VOLUNTEERS 0.8090861996713197 18
U.S.M.S 0.8179078150346861 99
RED STAR LINE 0.8440014021961351 267
OCEAN STEAMSHIP CO. 0.8514036597666055 60
D&H DINING CAR SERVICE 0.8528092357366177 69
BATTERY PARK HOTEL 0.8621147685588016 260

NORDDEUTSCHER LLOYD BREMEN 0.0 146
NORDDEUTSCHER LLOYDS BREMEN 0.6656330724547883 35
BREMEN NORDDEUTSCHER LLOYD 0.7911961834246675 34
HAMBURG-AMERIKA LINIE 0.850466564116232 28
ALPHA KAPPA PHI 0.8560690702733798 17
PENNSYLVANIA RAILROAD 0.8594196107211167 22
XIII CLUB 0.8637989550786003 15
WHITE STAR LINE 0.8637989550786003 15
MASONIC TEMPLE RESTAURANT 0.8638549703163885 19
NORDDEUTSCHER LLOYD -BREMEN 0.8643006180904298 34

OCCIDENTAL & ORIENTAL STEAMSHIP COMPANY 0.0 78
OCCIDENTAL & ORIENTAL STEAMSHIP CO. 0.6070212135217423 113
DEL CORONADO HOTEL 0.7257586221349277 50
MAXWELL HOUSE 0.7278344730240913 61
U.S. ARMY TRANSPORT 0.7334991045554868 31
PACIFIC MAIL STEAMSHIP COMPANY 0.7390439259906789 103
HOTEL ORMOND 0.7524631142558313 123
TAMPA BAY HOTEL 0.7549509852950983 98
TOYO KISEN KAISHA — HONG KONG MARU 0.758477054230176 35
HOTEL DEL CORONADO 0.7667152625920782 49

Discussion & Bias

Before diving into answering the question, one thing needs to be made clear; There are a plethora of repeat sponsors with slightly different names. For this discussion, these will be ignored.

How similar was the food served on ocean liners to food served elsewhere?

One of the biggest conclusions that can be made here is that maritime company menus are very similar to other maritime company menus. Out of all three companies, 7 other unique companies were returned that operated ocean liners. They were also similar to hotels, and railroad companies. These are notable because maritime companies, hotels, and railroads all provide food and sleeping accommodations. When food and sleeping accommodations are provided together, the menus are similar.

There are a few specific similarities that are of interest. Norddeutscher Lloyd and Hamburg-Amerika Line having a significant similarity makes sense since they are both German maritime companies. It is likely that both companies are serving German dishes. Similarly, the Occidental & Oriental Steamship Company and Toya Kisen Kaisha, a closely related maritime company, are found to be similar. It is likely that both Occidental & Oriental and Toya Kisen Kaisha both served Chinese dishes. As discussed in the previous post, the presence of Maxwell House in this list also suggests a similar dish of coffee!

There is potential that more high class and western menus are preserved. For example, a 1st class menu from White Star Line’s Olympic is more likely to be preserved compared to 3rd class aboard Toyo Kisen Kaisha’s Taiyo Maru. Using cosine similarity does help to fight this bias, with magnitude of dishes/menus not taken into account as much as Euclidean distance, for example.

Conclusion & Furthermore

To summarize, the maritime companies that operated ocean liners all served similar dishes during dinner. These similarities also extend to other companies providing overnight accommodations such as hotels and railroads. When food and sleeping accommodations are provided together, the menus are similar.

This analysis could be improved in many ways. Data cleaning could be improved, such as combining sponsors with slightly different sponsor names (ex: USMS vs U.S.M.S.). This could be done in Python or with a data wrangle program such as OpenRefine. Finding what dishes are similar between companies would also prove to be extremely beneficial. This could confirm which dishes are similar, such as was hypothesized in the discussion. Finally, performing this analysis on all menus, not just those tagged as a dinner menu, could help to broaden the conclusions that were made from this analysis.

I realize this analysis is extremely generalized and gives an overarching view of the food culture across companies. Real people were passengers on these ships and each individual had a story to tell. To remind the reader of this, I will conclude with a quote from my personal collection of ocean liner postcards. This postcard is from 1957 aboard Holland America’s S.S. Nieuw Amsterdam.

…this ship is no place to be on a diet — food & drinks are fabulous & never ending!

Holland America’s S.S. Nieuw Amsterdam. From the personal collection of Simon Miller.

The reverse side of the postcard containing the quote. From the personal collection of Simon Miller.

Resources

New York Public Library “Whats on the menu?”: https://menus.nypl.org/

GitHub Repository: https://github.com/smiller1551/MenuAnalysis2

This Medium post was created by Simon Miller at the University of Maryland — College Park for INST414: Data Science Techniques under Professor Cody Buntain.

Similarities Between Ocean Liner Menus and More

Written by Simon Miller