My first time working in a cross-functional team as a Data Scientist

Minh Nguyen
CodeX
Published in
9 min readAug 26, 2021
Photo by Headway on Unsplash

The Start

In my final month of Lambda school, a coding boot camp, I had a chance to join the data scientist team that work with a full-stack team to build a project for a great cause. Our stakeholder is Human Rights First, a nonpartisan, non-profit international human rights organization. One of Human Rights First’s important services is helping people who seek asylum. People all over the world come to the US to seek protection because they are at risk of harm in their home countries because of their religion, political opinion, sexual orientation, or ethnicity. The process for seeking asylum in the U.S. is complicated and an asylum-seeker is unlikely to afford high-quality legal representation. Human Rights First helps match good lawyers with asylum-seekers.

Our stakeholder wants to create a product that assists these asylum attorneys by finding insights on how immigration judges rule favorable or unfavorable on asylum cases based on applicants’ religion, origin, and membership in social groups. HRF-Asylum is a product where lawyers can hop on and look at visualizations, trends of judges’ decisions based on the categories above. This application is built with mutual interest in mind. Human Rights First has a storage of asylum cases but there still are not enough cases to give insights into every single immigration judge and they aren’t publicly available to scrape. Users who benefit from the knowledge provided by the website will in return upload cases they worked on previously to enrich the database. The more cases the application possesses the better and more accurate the insights of judges the website can provide. This product is a promising tool that will enable thousands of asylum seekers around the world to successfully be granted asylum status in the United States.

Concern

Our Data Science team was the 7th Data Science team that contributed to the product. Our minimal viable product goals(MVP) of this application are to create visualization of insights into immigration judges and build a scraper that extracts important information from asylum cases from stakeholder’s database and uploaded documents. Inheriting a repository from previous teams, we were overwhelmed by a large amount of code, many directories and random Jupyter notebooks. Moreover, we had to work with a Full Stack team who didn’t speak the same language, Python vs JavaScript. Another concern that we had coming to this project was the domain knowledge needed to find the exact information that stakeholders wanted us to scrape from asylum cases. I had my own concern too. Besides being a data scientist, I was also a technical project manager for my data science team. I am a quiet type of person but I got out of my comfort zone to become a manager. As a manager of my team, I had to arrange collaboration with a full-stack team. I had to communicate with my team to assign tasks for them based on our MVPs. However, I was shy to contact with the full-stack team and my not-so-well communication skills slowed down our progress but thanks to my talented teammates, this process was made easier. I will tell you more below.

The hassle

I will start with a story on our data science team. Our team’s MVPs are getting visualizations for judges and building a scraper for asylum cases. We used Plotly to create plots and sent them as json format to the back-end to process and grab judges’ data. The front-end would use those data and the json to render visualizations on the website. For the scraper, we used an Optical Character Reader to convert PDF files to images, and then return all the text inside these images. After that we passed the text into our Natural Language Processing model to pick out the most important information from asylum cases. It took us a long time to understand how the above things work and even more time to get our contribution in because of our lackluster communication.

Our team member J made a pull request for his findings on a new data set related to asylum cases but his pull request needed some change before we could approve it. However, after committing changes to the pull request, J forgot to tell us about his commitment. We completely forgot about it until he voiced it out. I also had the same problem as J. I waited and wondered why nobody reviewed my code. After noticing the group chat, I finally got my team to review my pull request. My takeaway was to always report back to your team what you work on and whether you need a code review for pull requests so that everybody is on the same page and move along the project coherently.

Nerdy part

A little bit deep into how we scrape information from asylum cases.

When our scraper looks for information, it finds patterns in text that will provide us the answer. Previous teams created a function that found the hearing date of the asylum case which was not what our stakeholder wanted. I had to modify a function to find the decision date of asylum cases.

The image above is a part in a whole asylum case. In our code we find the exact pattern “date of this notice” and we return 4/30/2013

The code above is an example of how we implemented the finding decision date function. We had a main function called find similar. We input a pattern into that function and it would spit out any sentence that had a similar pattern. We just needed to pick the correct sentence and return the date in the correct format. The format on line 33 was to convert the date in xx/xx/xxxx format into YYYY-MM-DD format. This was a problem when we tried to connect the back-end database. I found this and informed the whole team during our first cross-functional team meeting.

We, the data science team, worked alone and ironed out our own problems. We had some people deal with the scraper above. “get_date()” is just one of our thirteen fields that we need to scrape from asylum cases. Others in our team were in charge of the visualizations. The problem arose as we did not communicate well with the full stack team. The front-end team waited for the data from the back-end. The back-end waited for our data science’s endpoints and we waited for the visualizations to show up. A, an initiator of our team, met with the back-end team and suggested we have a peer programming with everyone in all the teams. These meetings had dramatically boosted the development speed of all the teams. I was glad that I had a teammate like A. In the first meeting, when the back-end people could not connect with our data science API endpoints, we realized that the endpoint urls that were in the back-end code base weren’t the correct ones. I found out the bug I mentioned above and suggested having a uniform format for the data. Moreover we had a very unnecessarily complicated application architecture that had been decided by the previous team but we could not change it. Data science database in this system stored our stakeholder’s asylum cases along with all the uploaded cases. Our scraper extracted data from these cases and loaded it onto a ds_case table inside the back-end database. The back-end and the front-end used a different table called case inside the back-end database. The format problem was a bug that affected the ability to move the data from ds_case table to case table. Another dysfunctional code was the endpoints. Endpoint urls in the front-end were all placeholders, which explained why there had been no interaction between front-end and back-end. With just one cross-functional meeting, we were able to figure out the bug that somehow nobody had seen before and a solution for it. Communication was king here. We constantly discussed with the crossing team and were no longer bound to our data science team alone. We explained how our code worked for back-end people, how columns in our table were named in such a way. We not only made the back-end work easier but relieved ourselves the stress of not knowing why the website did not receive our API.

Progress and Future feature

Our data science team fixed functions to get the decision date, the original city and state where the case was originally heard, and the credibility of applicants. We successfully created a chart based on our stakeholder’s wonderful drawing.

Most importantly, the application now displays the charts. It was the result of the hard work by front-end, back-end, and data science teams altogether. We could debug a lot quicker by working with each other.

Regarding the scraper part of the project, the application has an upload button that attorneys can use to upload their own asylum cases.

After receiving the uploaded cases, our users now have a new feature to review the uploaded case. When the review button is clicked the form above will pop.

The form has pre-populated fields which are extracted using our scraper. The users can save time filling out the forms and the hassle caused by it when they upload cases. We contributed a lot to this project with new features and improvements to the application. However, there is still a lot more to do for future teams that will continue this project. The accuracy of the scraper needs further improvement, especially on protected ground which is the most important field to scrape from asylum cases. There is a big problem with how data is moving around between the front-end, back-end and data science. We should have one case table instead of the current two. Otherwise, it complicates when future data science teams decide to change the name of the fields or even the format of the data. The back-end team has to create a mechanism to accommodate these changes.

Takeaways

Becoming a technical project manager was a great learning experience. I learned that by taking the initiative and speaking up more will not only help teammates clearly understand my ideas and push the project to move further faster and more smoothly. Most importantly, I learned to work in a cross-functional team setting. I had been struggling and slow starting at first. Our data science team had not progressed much until we talked and worked with the full stack team. Showcasing our work and asking for clarification, we harmonically moved along in our development journey. We all spoke the same language after these meetings. I am so grateful for this experience and appreciate Lambda school and Human Rights First for giving me a chance to work on a meaningful project that will help thousands of asylum seekers. Apart from this, I’m so thankful that I had a chance to put into use the knowledge and skills I had acquired throughout my Lambda journey. I was able to use FastAPI efficiently to create endpoints. I gained more knowledge on how to build a pattern matcher using Spacy for our Natural Language Processing model. I was able to understand old queries and create new queries to grab files from data science database on Amazon Web Services and load data to the Postgresql database on Heroku. Last but not least, I was able to read and contribute to a repository that was already worked on.

--

--