Check Your Ego at the Door
After two weeks of work, we are at a point where I can confidently say that it is easier to make sense of the GNIP JSON files (which I have decided to name Jason), though it seems like every time we meet, we discover something new about the raw data that may or may not be helpful for our project. The original data we received had files containing hashtags that included 1reasonwhy within the strings, so “21reasonwhy” or “1reasonIhateFacebook” among some other…
After getting over the initial shock of seeing the clustered schema in the JSON files, our team finally made sense of the data by comparing them to Twitter API and actual tweets.

GNIP packages the tweets above as objects with coded names. Using my tweet above. “MikeD 3.0" would be under a “screenname” field, “@michaelslore” appeared under “preferred_username,” “ProjectJason” was under a hashtag field and “This is totally a tweet! #ProjectJason” would be located under “summary” and “body.”
As a group, we decided that we needed to rename these fields when we cleaned the data to pull the information relevant to our project. Everybody participated in selecting the language that was more intuitive to use based on Twitter culture. For example:
preferred_username ———→ handle
body ———→ text
For our first coding session, Xi came prepared with a code that sifted through all the tweets in our folder and pulled aside the tweets that only had #1reasonwhy.
We all began working on a code that would clean our data. Bing and Jodi readied the variables in our python code to restructure the JSON structure and went through and added syntax for creating our own schema. Xi and I collaborated on the fetchData() function as well as a function that would sort through the hashtags in the original schema, create a list from them, then put them into a similar structure in our new schema. This proved harder than I had initially anticipated, but once we got this down, it made the next process of creating a similar function for user_mentions. Jodi wrote the basic code for this based off of our hashtag code. After much deliberation, we decided we neeed a dictionary for user_mentions instead of a list because we needed two fields within our new schema that incorporated user_mentions and userIds.
For the most part, our group seems to be problem solving together and moving forward, all contributing to large decisions that influence the direction of the project. The difference of levels in each of our experience with coding is becoming very apparent. Xi is most knowledgeable when it comes to coding, especially in Python. I have some experience with object oriented coding and can provide suggestions and can review and make sense of some written code, but when it comes to Python, I know what I learned from the codecademy tutorials—as is apparent with the rest of the team. Because of this, I try to push pseudo code to get an idea of what we need to do to accomplish our tasks before implementing actual Python code. Jodi has communication and facilitation skills that prove useful during team deliberations. Bing has useful insight into technical issues that we have not considered or overlooked and draws our attention to weaknesses that we need to address in the framing of our project.
During this week, team members have met in pairs. Jodi and Bing met together to review the project. Jodi and I also met aside because she wanted to go through the existing code to test her understanding of python in general and the application of the code to our project specifically. I agreed to meet with her because actively participating and explaining code helps me make sense of the processes as well. I am pleased to see our group members taking the time to meet outside of our team meetings to help each other.