The Co-op Close-up: Statistics Canada

Kathrin Knorr
SFU Professional Computer Science
4 min readNov 20, 2019

SFU’s professional master’s program in computer science trains computational specialists in big data and visual computing. All students complete a paid co-op work placement as part of their degree. In this feature, we examine the co-op experiences of some of our big data students.

Mihir Gajjar completed a Bachelor of Technology in information and communication technology at Ahmedabad University, India. Before joining SFU’s professional master’s program, he worked as a machine learning intern at Episodic Labs, a sports-based start-up.

Can you tell us about Statistics Canada? What is it like working there?

Statistics Canada is Canada’s central statistical agency providing statistics that help Canadians better understand their country — its population, resources, economy, society and culture. In addition to conducting a census every five years, there are about 350 active surveys on virtually all aspects of Canadian life.

I have been working in the amazing Data Science Accelerator (DSA) hub. In the DSA hub, data scientists work with subject-matter analysts, methodologists, and IT specialists to develop big data processing, machine learning, and AI strategies.

For me, there are several highlights about the work culture at Statistics Canada, such as the daily scrum meetings with the supervisor and team members where we prioritize the day’s work and discuss other important issues. I also like the DSA’s agile development approach for most of the projects so that the lifespan of each project is four months, and then the project is ready for deployment. We also have weekly machine learning technical seminars where we learn about advancements in the field and discuss relevant research papers.

Can you tell us a bit about the project(s) you are working on in your co-op position?

At Statistics Canada, analysts spend a lot of time searching for information about enterprises. With the amount of news growing exponentially, it becomes difficult to manually track all the published information. The project I am working on seeks to automate the tasks of detecting events of interest from news articles and extracting their attributes. For example, events of interest that are related to enterprises might include mergers and acquisitions, equity markets, and branch openings, whereas event attributes are things like dates and locations of said events. Ultimately, my work allows economic analysts to spend less time on information search and devote more time to analysis. This multi-disciplinary work is a collaboration between teams, including portfolio and accounts managers, methodologists, and other data scientists.

The main technical tasks include finding similarities between articles for ranking, removing duplicates, and text summarization. The goal is to provide subject matter experts with a dashboard to support the detection and tracking of desired events over a specified time span.

The data for our models consists of 1.5 million news articles from the Dow Jones DNA API and NewsDesk, a shared governmental system. Exploratory data analysis and basic text pre-processing were used to construct TF-IDF features along with word embeddings to train various machine learning models such as SVMs, CNNs, BERT models, and hierarchical attention networks.

The software stack is deployed on the Azure cloud, in part via Docker containers, and includes a data lake as well as Elasticsearch. A data-as-a-service platform is used to connect the data sources to JupyterLab where data science and machine learning models are being built and monitored via Mlflow. Dash by Plotly is used to develop the dashboard provided to the subject matter expert.

How did the big data program prepare you for your co-op position?

SFU’s big data program provided theoretical as well as practical hands-on experience through lectures and a project-based learning approach. Subjects like machine learning by Dr. Greg Mori helped me develop a solid theoretical base while the practical assignments and group projects throughout the program allowed me to implement the concepts and try out new tools and technologies. The computing resources for big data students, such as the CPU and GPU clusters, allowed us to perform fast, limitless processing during projects and experiment with resource-intensive machine learning models.

Along with sound technical knowledge, the program equipped me with essential skills, such as working in a team, communicating and sharing ideas with other people, giving presentations, critical thinking, technical writing, and time management. In addition, the co-op team assists students with designing applications for co-op positions and preparing for interviews.

What are your most valuable takeaways from this co-op experience?

Through the project I have been working on, I learned a lot about the practical aspects of working as a data scientist. Part of the project was to extract data using an external company’s API which involved weekly meetings with its development team. This helped me learn how to think analytically and design questions which aid in understanding the quality and the depth of the data the company is providing. I also learned about the importance of fully understanding the user’s needs in order to develop a product that meets those requirements.

Working at Statistics Canada gave me exposure to real-world data science projects and taught me how to create and execute a technical plan to achieve the desired goals. This is my first time working as a data scientist and this experience has improved my skills and made me feel confident about working in this role moving forward in my career.

What do employers say about our students?

“Mihir is a great fit for this work environment, as he is always open to learning new approaches in technology, and works well independently or in a team setting. Partnered with a senior data scientist, Mihir continues to grow in his time here at Statistics Canada, moving from Natural Language Processing projects to image processing projects with enthusiasm and focus. He is a valued member of the team.”

— Meredith Thomas, Chief, Data Science Division, Statistics Canada

--

--