Photo by Olav Ahrens Røtne on Unsplash

INDEX : Step by step approach

SECTION 1 : Understanding the problem & data.
1. Detailed overview
2. The business problem
3. About the dataset
4. Exploratory data analysis and pre-processing
SECTION 2 : The action plan.
5. Evaluation metric
6. Loss function
7. Baseline model
7.1. K-Fold cross validation
7.2. Post-processing : binning
7.3. Error Analysis
7.3.1. Why these features are not performing well?
7.3.2. Possible workarounds
7.3.3. Limitations with current LSTM model
8. Model with SOTA pretrained embeddings
8.1. BERT
8.2. USE
8.3. XLNet
8.4. RoBERTa
SECTION 3 : Inferences and analysis.
9. Final results
9.1. Difference between baseline model and final_model.
Photo by cottonbro from Pexels

Let’s take a quick overview on Stack Overflow, before we dive deep into the project itself. Stack Overflow is one of the largest QA platform for computer programmers. People posts questions-queries associated with wide range of topics (mostly related to computer programming) and fellow users try to resolve queries in the most helpful manner.

SECTION1 : Brief overview
1. Business problem : Need of search engine.
2. 2.1. Dataset
2.2. The process flow
2.3. High level Overview
3. Exploratory data analysis and Data pre-processing
SECTION 2 : The attack plan

4. Modelling : The tag predictor
4.1. A TAG Predictor model
4.3 Time based splitting Modelling
4.4. GRU based Encoder-decoder seq2seq model
4.5. Model embedding
4.6. Word2Vec embedding
4.7. Multi-label target problem
5. LDA (Latent Dirichlet allocation) : Topic Modelling
6. Okapi BM25 Score : simplest searching technique
7. Sentence embedding : BERT
8. Sentence embedding : Universal sentence encoder

SECTION 3 : Productionizing the solution
9. Entire pipeline deployment on a remote server
9.1. A Cloud platform
9.2. Web App using Flask
SECTION 4 : Results and conclusion

10. Results and conclusion
10.1 Final Results : BERT
10.2. Final Results : USE
