The role of Data Scientists — between Reality and Aspirations

4 min readNov 3, 2024

The latest epiosde of MetaDAMA intents to dove deeper into the reality of Data Science in Norway. From working with Norwegian language data to adapting AI to Norway’s unique cultural landscape, Victor Undli’s journey sheds light on the vital and often unseen work behind data science in both public and private sector.

Starting Out as a Data Scientist: High Expectations, Real-World Realities

Victor’s journey began with a fascination for cutting-edge technology and the endless pursuit of model precision — a drive we think many data scientists share. However, he quickly realized that the day-to-day realities were more intricate. Instead of solely refining models, Victor found much of his time was spent on data engineering tasks, which included curating datasets, data cleaning, and data enrichment. “A lot of tasks in data science are actually data engineering tasks,” he explains, which underscores the need to manage data quality carefully to maximize its potential.

Time management, too, is a challenge. “With the time given, I often create benchmark models first, with plans to iterate on improvements later,” Victor says. While academic studies may focus on perfecting model accuracy, in the real world, balancing time and quality is essential. Collaborating with a team is also crucial, as data science is not a solo endeavor; it thrives in a collaborative environment where ideas are exchanged and iterated upon.

Data Challenges in the Public Sector: Norwegian Text and Data Curation

One of the more unique aspects of Victor’s work involves dealing with Norwegian language data. This presents distinct challenges in natural language processing, as fewer high-quality Norwegian datasets are available compared to those in widely spoken languages like English. Moreover, using English-based language models means there’s a significant amount of translation and adaptation involved. Victor notes, “There’s always the challenge of working with Norwegian text, and it’s essential to curate our data carefully because, if not managed correctly, it loses potential over time.”

Data quality is central, especially when it will be used to train AI models. Ensuring high data quality requires curation, often involving enrichment to fill in any gaps. Given that data can “become washed out over the years,” data curation is an ongoing process of revisiting and enhancing datasets to retain their value.

Public vs. Private Sector

Victor also points out that there are significant differences between the public and private sectors. While private companies may offer more flexibility and autonomy when working with data, the public sector is both more cautious in what steps to take going forward, also when it comes to utilize data. At the same time, there is structure and substantial investments in digitalization in the public sector in Norway.

Stakeholder management also requires skillful balancing. In the public sector, expectations from stakeholders need to be carefully managed, as priorities may not always align with the data team’s capabilities. Consistent communication is essential, and often, it involves negotiating between stakeholder expectations and the practical constraints the team faces.

AI and the Call for a Norwegian Large Language Model (LLM)

AI is a key focus in Victor’s work, particularly the need for a Norwegian language large language model (LLM). Language encapsulates culture, and you need to embrace language to understand culture. Developing a Norwegian LLM, especially through open-source initiatives, would provide a competitive and culturally relevant option for the public sector. This would not only enhance AI applications but also contribute to preserving Norway’s unique cultural context within the AI landscape.

Victor sees AI as a field beginning to realize its potential, particularly in enhancing productivity through automation. He reflects, “With a very simple idea, you can effectively free up time for others by automating tedious tasks.” There’s an exciting correlation between the quality of data and the impact an AI model can have. “Data is so accessible, and you can achieve so much with seemingly small-performing models,” he adds, a reminder that groundbreaking AI does not always require massive datasets or resources — it often comes down to the quality of the data.

Motivation in a Public-Facing Role

One of the most fulfilling aspects of Victor’s role is the societal value his work brings. At Ung.no, the mission is deeply community-driven, creating resources that empower young people in Norway. Victor’s work as a data scientist plays a part in this mission by enhancing the platform’s effectiveness and reliability, which directly impacts users. “AI can make our work more effective, more reliable,” he notes, a testament to the motivational power of purpose-driven work in data science.

The Road Ahead for Data Science in Norway

For Victor, data science in the public sector is about balancing cutting-edge technology with ethical considerations, and societal impact. He firmly believes that Norway and the Nordic region can set an example in AI, bringing a strong ethical focus to the field.

Working in data science at Ung.no has given Victor insights into the unique challenges and rewards that come with this field in the public sector. Through careful data curation, adapting language models to Norwegian, and managing complex stakeholder dynamics, Victor’s journey highlights the critical role data science plays — building not just advanced models, but a more informed and engaged society.

--

--

Winfried Adalbert Etzel
Winfried Adalbert Etzel

Written by Winfried Adalbert Etzel

Everything is data, but data is not everything! Here I share my own thoughts on the world of data. https://www.linkedin.com/in/winfried-adalbert-etzel/

No responses yet