TAMU Datathon 2019 Letter to Stakeholders

TAMU Datathon Organizers
TAMU Datathon
Published in
8 min readNov 16, 2019

Written by: Josiah Coad

What is TAMU Datathon?

TAMU Datathon is a data science hackathon, an annual weekend event where students come from around the world (in 2019 we had attendees from 5 countries) to Texas A&M University to learn about data science (DS) and machine learning (ML) and compete in solving challenges. With our inaugural event in 2019, we became the first Major League Hacking (MLH) datathon in the world!

TAMU Datathon exists to connect the top talent in DS/ML with top companies, lower the bar to entry into DS/ML and encourage collaboration across disciplines.

Our event is student run and free to all students. In 2019, this was made possible by our 15 company sponsors, such as Facebook, ConocoPhillips, Walmart and Goldman Sachs, and our 5 college/department sponsors at Texas A&M.

How TAMU Datathon started

TAMU Datathon started in May 2019 as an experiment to do something that has never been done before: bring the hackathon (MLH) structure to data science. The inspiration came after my 2 week NSF trip to Chile where 30 students worked together to solve challenges facing astronomy through data science. I was put on a team with a physicist from Italy, an astronomer from Columbia, and a statistician from Mississippi. This diverse expertise led to our success in classifying exoplanets, beating the state of the art in just one week.

TAMU Datathon was officially formed in May 2019 under the leadership of 8 students: Allyson King, Chinmay Phulse, Malia (Yun) Phelps, Amir (Sorhan) Karimloo, Abdullah Kader, Andrew Casillas, Marina Romanyuk, myself (Josiah Coad) and later added, Megan Meyer.

What an event like TAMU Datathon brings to Texas A&M

Texas A&M, as the host university of the first and only MLH data science hackathon, is in a unique position to gain an international spotlight as the university leading the way in data science. In 2019, we had over 90+ majors apply, ranging from Wildlife Sciences to Psychology to Business to Toxicology. Data science thrives under the union of such diverse disciplines; and every discipline, in some facet, uses data science. Thus, having an event like this connects students across a large school on common ground. The more we can tie in various departments and programs at Texas A&M, the more we can show its strength; not only in one department or college, but across the university.

Why Texas A&M is the right place for TAMU Datathon

Texas A&M has been supportive of TAMU Datathon from the start. The College of Science, Texas A&M Institute of Data Science, Department of Math, Department of Statistics and Department of Computer Science have all supplied direct financial support, publicity, and advice.

We know we are only beginning to tap into the resources here at Texas A&M. This upcoming year we plan to involve the College of Engineering as well as other departments such as Industrial Systems Engineering, Electrical Engineering, Mechanical Engineering, and Management Information Systems. Furthermore, we expect to involve nearly 30 clubs on campus that relate to data science and technology.

We find Texas A&M to be a strategic place to host this event. The need for data science is being realized not only on the ground level but all the way up to the top administration. President Young said in his 2019 State of the University Address: “In order to remain competitive and continue our pursuit of excellence, we must invest in emerging areas of critical research…data science, artificial intelligence, cybersecurity, among others. We want to bring to bear the full spectrum of research in these areas, much of which is already taking place on our campus.” Our success is in no small part due to the vision that Young has cast and the realization of it throughout the entire system.

Why now is the time

“There were 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every 2 days.” — Eric Schmidt, Google, 2010

“Information is the oil of the 21st century, and analytics is the combustion engine.” — Peter Sondergaard, Gartner Research, 2011

Tools for data science are at their worst, clarity around the subject is foggy at best, and demand for data science is at the dawn of its climb. In short, data science is in its infancy. Students don’t know yet quite how to best prepare for a data analytics role, and companies don’t quite seem to know how best to recruit for it. In fact, 82% of organizations have or plan to have positions that require data-analysis skills; and yet 78% report problems recruiting for data-analysis positions (from the Society for Resource Management survey in 2016). That’s why we believe we are in the right place at the right time, to help students prepare and help companies find the best talent.

The ubiquity of data science

A data scientist/analyst, as mentioned here, isn’t exclusively referring to a person clearly and solely labeled as a “data scientist”. Indeed, many data scientists are labeled entirely differently and yet would describe their skills and responsibilities as those of a data scientists. In fact, research shows that only 46% of data analysts work in the technology industry; and only 35% actually have a degree in computer science, math or statistics (according to Villanova University).

Data science is, at its core, about asking meaningful questions, navigating the path to a data-driven answer to those questions and then providing meaning to that result. Data science is inherently one of the most human things we can do: to provide meaning to the information around us. The skill to be a data scientist lies primarily in the ability to be analytical and purpose driven. That’s why many are able to be a data scientist, and nearly all fields need data scientists.

Our culture as the leadership team of TAMU Datathon

Our team was intentionally kept small. Many other hackathons of our size have leadership teams ranging from 20 to 40 people. We had eight. We are a small team of learners, self-starters and innovators. We each believe in our ability to shape the future and leave a legacy, not only at Texas A&M, but worldwide. We embrace working under pressure and set relentlessly high standards for ourselves and our work.

Furthermore, we think big; and we think differently. We are constantly driven by what would make the experience better for the participants and for the sponsors. We are driven by the vision to make TAMU Datation a leading world class event, and we will put forth the hard work toward that vision. We are designers, coders and logistical planners. We value biasing for action, preferring to take risks instead of staying stagnant because speed matters. Those that we bring on cannot be “hired” in the normal sense of the word; indeed they are not being paid at all. What we provide is a concert hall in which artists of their trade wish to perform. Innovation comes from each person on the team in distributed decision-making which allows us to consistently generate new ideas.

What it’s like to sponsor TAMU Datathon

To our sponsors interested in reaching the analytical population, we offer a unique opportunity as the only MLH data science hackathon and one of the largest hackathons in the world. Our participants are qualified, motivated and diverse. Our sponsors enjoy international branding, recruiting and engagement.

As to branding, we are the most followed MLH hackathon on Instagram. We had 16,000+ visits to our website in the past 3 months. Furthermore, in 2019 we had 2,000+ applicants from 100+ universities from 5+ countries, 90+ majors and near equal representation of freshman to PhD with 30% female and 30% first generation students. We live-streamed the event and had 500+ people tune into our live stream.

As to engagement, our participants submitted 100+ projects to company challenges. Our workshops saw a total attendance of 576 participants. Our event also included a learner track that introduced 350+ students to DS/ML through lectures and hands-on training. This is evidence of our mission to focus on long-term growth. By equipping these students with a crash course and yearlong support through our open-sourced data science learning software program, Cosmos, they are eligible to contend next year as competitors.

As to recruitment, one of our sponsors, Goldman Sachs, said, “We met many candidates who were motivated and interested in the career options available, and all were obviously qualified for the roles we have available.” How else can you target such a diverse and yet specific crowd of students that are intent on working at a company like yours? That’s why we offer data which can help in a sponsor’s recruitment. Our applicant data includes crucial factors that a company would not get on an ordinary application, e.g. if a participant attended your workshop, if they submitted a project, the link to their project and whether they won.

Focus areas moving forward

In our first year, TAMU Datathon broke several records. For example, we are now the most followed MLH hackathon on Instagram. We are also growing very quickly. In 2019, we were larger than 90% of the 214 MLH hackathons around the world, limited in size only by our space capacity. Our goals for TAMU Datathon 2020 are to double everything. We aim to double our sponsors and double our number of attendees. At Texas A&M, as the 2nd largest university in the U.S., we have the network and resources to do it; and we will move boldly forward, fearless at every front.

We will continue to focus on how to create the most value for our sponsors and participants, continually innovating new ways to aid engagement, improve recruitment and lower the barrier to entry and education in DS/ML. We are internally driven to improve our services, adding benefits and features, before we have to. Following are some examples, our unique programs named Cosmos and Nebula, of how we’re inventing to stay ahead of our sponsors’ and participants’ needs.

Cosmos (cosmos.tamudatathon.com) is a software we developed as a free DS/ML learning environment. It is a web-based interactive gamified platform where users can browse a variety of lessons, read material, watch instructors and run code all in a coherent environment directly in their browser. No download/installation is required. We already have 550+ users. We aim to expand and offer new content to support continued usage, as well as eventually make it possible for anyone to upload content and spread their knowledge of data science.

Nebula is our attempt at creating a unified data science profile to track not only student performance at our event (who came, stayed, submitted and won) but also various other data science accomplishments and engagements during the year. Our sponsors interested in Nebula will have a view of candidates and their activity/impact in data analytics, a valuable resource.

We endeavour to know accurately what our sponsors and participants need. So we investigate, invent and navigate continuously towards the goal of being an indispensable resource for those interested in learning about and recruiting in DS/ML.

Metrics we will measure along this goal for TAMU Datathon 2020:

  • Number of Cosmos sign-ups and monthly usage
  • Number of lessons added to Cosmos
  • Nebula integrations with clubs and organizations on campus
  • Number of (and quality of) our sponsors
  • Number of applications
  • Scholarships for qualified applicants to attend TD 2020 from around the world
  • Representation of historically underrepresented demographics
  • Number of attendees
  • Number of project submissions

In Closing

2019 was indeed an incredible year for TAMU Datathon. We’d like to thank all our sponsors for financially supporting us and believing in our experiment. We thank our participants for your applying, attending and working hard throughout the event to submit 100+ projects. We further thank Texas A&M for the top-tier faculty and staff that also have believed and supported us from the beginning. And we thank each other with respect for each member of our founding TAMU Datathon team whose individual skills and dedicated teamwork our success has been built on.

Josiah D. Coad

President

TAMU Datathon

November 3, 2019

--

--