Writing your 1st Research Paper

My experience from the 4-month long journey

Mehul Gupta
Data Science in your pocket
9 min readApr 27, 2021

--

https://www.enago.com/academy/writing-first-scientific-research-paper/

I have been in the AI industry for ~2 years now as a Data Scientist. I always wished to pen down a paper as soon as I joined the corporate world a couple of years back. One of the tasks on my bucket list

Why? As this is what ‘Scientists’ do !!

Right?

At least this is what I use to think earlier. Though, it took me nearly 1.5 years to start off with one. This one revolved around Information extraction from prescription images. I won’t be further elaborating on the paper but on the whole journey that started somewhere in October’20.

Before this, I wrote nearly 50 blogs around Data Science & AI, co-authored a paper (hence not a novice in writing !!) but never thought of penning down a paper as the corresponding author before my Team Lead pushed me into writing one.

So without wasting a penny, let's jump on to the

Rough paper structure

As we know any Paper has a certain format (basically a sequence of segments) that are required to be followed. The most common ones are

Abstract

Introduction

Methodology

Results

Discussion

References

We will talk about the essence of each one of them one by one:

Abstract

A sort of short summary covering:

What is the problem you’re solving & its relevance in the current world i.e. Background & Objective

Your methodology to counter this problem & what makes it unique from existing solutions (if any)

Results of your research (base metric on which you have evaluated your results)

Being the 1st section, it is the ‘make it’ or ‘break it’ section that decides whether the reader would further read it or not. Also, it's a no-nonsense section where we must refrain from writing down long stories. Let it be crisp & clear

Introduction

It’s that section that requires your story writing skills, a lot. Basically, this is the place where you set up the background for your problem. This may include a few key points to consider

  1. Problem background

Like, if I am presenting a paper on some algorithms to digitize prescriptions, it should have answers to the following questions (at least)

What is a prescription? This may include the history, the geography & the biology of prescriptions i.e. everything about prescriptions

Why do we need to digitize it? Why solving this problem necessary?

The scale of this problem in the real world.

And other roundabout stories to give the reader a wholesome idea about the A-Z of the problem statement

Literature Review

Ideally, this should be done before working on your idea. But, generally done after doing everything else😅

As it is time-consuming & at times, boring. Here, you need to read lots & lots of research papers dealing with

the same problem as yours or

related problems like yours.

The crux of this segment includes:

What has been done so far? More of a summarized version of various ways this problem has been tackled in the past.

A few flaws/gaps if any (like accuracy achieved was not up to the mark, latency was high) in existing solutions

The gaps you are filling with your research in this field

For this, you might need to read some papers (more of an overview is required) & compare your work with them, collectively. This section took me nearly 2–3 weeks to wrap up (read nearly 2 dozen papers).

Methodology

Time to present the core of the paper.

This should include whatever you used for your solution, be it the approach, some assumptions, external library/API, etc. As my paper presented algorithms, I will be talking about presenting them. The hierarchy I followed was

Overview of external libraries used (that were used for major causes)

Assumptions/Observations so as to define why the algorithm is designed in such a way as presented.

The algorithm

Now, the algorithm can be presented in the below ways

  • Theoretical explanations for all the steps one by one (a must I guess)
  • Pseudocodes
  • Diagrams/flow charts are an add-on always
  • Codes (as git repo link) can be a big boost
  • Examples may do wonders !!

Just a note, try defining every little spec of the algorithm, though, how irrelevant it appears.

Results

Again a very critical section where you need to justify how good your research/work is. Results are something on which I spent ~2 months of the entire 4-month span.

It is this important

A few key pieces of advice I got on this section was

Always start off with a few side metrics. A ‘side-metric’ can be defined as those numbers that really don’t represent your solution's potential but are related to the problem.

Mention the hardware details used while testing & training.

Detailed information about the dataset used for training/testing is essential. Do remember to avoid making biased data choices while testing just to make your results look good.

Try presenting your results from multiple perspectives. This may include output quality (like accuracy, F1-Score), latency/time consumed per sample, model size (if applicable), ease of deployment, etc.

It may be the case your results may not be up to mark but a justifiable explanation on why this happened can compensate for this. It should look like you have analyzed your results from every perspective possible

Discussion

This section can give a lift to your paper as it usually talks about the scope of the solution. This may include

  • Applications: In which fields your solution can be used? Go wild with your imagination (but no fantasy tales). It can be the case that your solution, if generalized, can be helpful in other domain problems.
  • Pros & Cons: It's time to split your research into two halves, the problems/restrictions one may face & the benefits of your solution.

Like, for say: If you develop a rule-based system for something, you don't need to have labeled data. That’s a big +ve when it comes to real-world problems. Or, your results are good but the solution is real slow, which is a con.

  • Improvements/Future plans: This is the place where you can discuss what improvements could have been brought in to make the solution stronger (maybe even experimental thoughts) & future extensions in your research you’re planning.

This was also one of those sections that may consume a lot of time but a great investment as you may explore a few loopholes within your work & can improve it. Also, this exercise may help in garnering a few citations(will come on this later in the post) when someone picks some point from here & pens another paper. This is amongst the most elaborated sections of the paper.

References

You must have used a number of facts, read a few research papers for literature review by the time. References are used to cite (mention) the documents you read to drive your research, be it at any stage where you read it. The ‘citations’ for all such documents have to be mentioned in References.

But, what a citation is?

My 1st assumption was it is the ‘url’ of the research paper/report on the internet.

A citation for a paper is a special, unique text attached to any paper that helps in keeping a record of how many times a particular paper was cited/mentioned in other researches. This text comes in different formats like BibTeX, RIS, etc. which I would leave for now.

The total citations of the paper help in having an idea about the ‘importance/quality’ of a paper. The greater the total citation count for a paper, the more important it is considered.

Note: Do avoid picking reference facts & figures from unauthorized sources like medium blogs, newspaper articles, etc. That does bring down the quality of your research.

Are we done? No way !!
What to do after writing a paper down?

We need to find some conference or journal that is apt for your research field (not any random) & submit your paper to it.

But, what’s the difference between conference & journal?

Basically, they are type of ‘venues’ where your research can get published. A Conference is more of a bigger version of a roundtable discussion where the researcher discusses his idea with fellow researchers. On the other hand, a Journal is more like an academic ‘magazine’ on a certain domain(like computers, healthcare, etc.) getting published at some frequency (quarterly/yearly, etc.) publishing original researches by scholars.

As this is completely up to the researcher where he wishes to publish his work, I can definitely ease out the choice.

When to choose Conference over Journals?

  1. You wish for quick results (either a yes/no). They are comparatively faster as a fixed window for review is set. Journals may take months very easily
  2. You haven’t touched depth yet (fine details are missing from the paper)
  3. You wish to have some more collaborators to work on your idea.

When to choose Journals over Conferences?

  1. Your work has depth & every fine detail
  2. You wish to have multiple chances. You get your work reviewed & after working on those reviews, can resubmit. Not the case with Conference
  3. Your research paper is long (there exist page limits at Conference but not in Journals, my current paper was 28 pages long)

Once you are decided out of the two venues, it's time to choose which particular conference/journal. As I chose to publish in a journal, the rest of the post revolves around submitting to a journal.

How to choose a particular journal to submit?

It depends on a few factors

  • How relatable the journal is to your work. Don’t submit a ‘biomedicine’ based work in a ‘Finance’ journal, you will easily get rejected. As I am writing this post, my paper has been rejected by ~10 journals within an hour of submission citing ‘out of scope’ for the journal (Though under review in the 11th😁😁)
  • To choose a Journal, scores like the Impact factor & Cite score can be considered. The higher, the better. There exist no such scores for Conferences
  • Acceptance Rate (how often the journal accepts a paper submitted to them)& Estimated Time for Publishing (estimated time taken between submission & publication of your work if accepted).

Do remember to maintain a trade-off between all the above points. Like, a journal with a very high Impact Factor may have very low acceptance & your work may not match their standards. Hence, keep a priority list of multiple journals to submit is always a good option as once you get rejected, you can submit your work to others without wasting time as journals take a lot of time on their end (can be very frustrating waiting for as long as 8 months to hear from them). So, a paper that took you 4 months to pen down, may take 8 months to publish 😧😧

Once your list of journals/conferences is finalized, pick your top priority journal & do read the:

Guide For Authors

It is basically a set of guidelines the journal released for the authors to follow before any submission. Some of the major points to note down are:

  • Structure of research paper required for submission

It is regarding the structure/sections necessary for the paper. If you missed any section, include them if possible

  • Extra documents required while submission

This usually includes

Cover Letter to Editor: A formal letter convincing the Editor of the journal why work is a perfect fit for the journal

Conflict Of Interests: This one is a hard nut to explain but easy to prepare. Do follow this post to have an overview. It revolves around that no author has any sort of conflict before submission of the manuscript

Highlights: Something very similar to Abstract but bulleted & even more summarized.

…etc.

Once done with all the documents, go to the official page of the journal on the web. Submission instructions would be present somewhere there. From there on, it’s nothing more than half an hour task.

Once you are done submitting, the waiting game begins. In some days/weeks/months, you should receive a mail conveying the status of your paper which can either be 1) rejected 2) accepted 3)require improvements. In the 3rd case, get those improvements done accordingly & resubmit.

And that’s done !!

Before concluding, we must keep in mind that Research Papers are like movies, they can’t be binary (like valid or invalid) but good or bad. You must have read a few papers with a few major segments missing, some sections being cut short like the Discussion or Literature Review, choose a different hierarchy of sections, pick medium blogs as a reference, etc. & still get published, though, in a lesser rated journal/conference. Hence, the quality of a paper lies in the hands of the authors. The better the quality, the better are the chances you get it published in a coveted journal/conference.

--

--