How to Build a Text Analysis Website by Python Flask and OpenAI API (2): Data Validity Check Functions

Henry Wu
preprintblog
Published in
5 min readJan 7, 2024

Previously

Data Validity Check

Data Validity Check Functions play a crucial role in ensuring the integrity and reliability of your Python project. Here are some benefits and reasons why you should build them as you write the code:

As for when to implement these checks, it’s generally best to integrate data validity checks as you develop your project, not just at the end. This approach aligns with the principles of agile and test-driven development, where you ensure each part of your application is robust and reliable as you build it. Integrating checks early also helps in identifying potential issues in the data flow and logic, making it easier to address them in the development phase rather than after the project is near completion or launched.

Data Validity Check Functions

Our website analysis uploaded text or PDF files. Obviously, we need set a limit for the size of uploaded files. For example, the text words should be larger than 50 and smaller than 1000; and the file size should be smaller than 5MB.

The current code is:

# Configure server-side session
app.config["SESSION_TYPE"] = "filesystem"
Session(app)

# Create a Systems Manager client and Get parameter
ssm = boto3.client('ssm', region_name='us-west-2')
parameter = ssm.get_parameter(Name='FLASK_SECRET_KEY', WithDecryption=True)
secret_key = parameter['Parameter']['Value']
app.secret_key = secret_key
@app.route('/index', methods=['GET', 'POST'])
def index_post():
if request.method == 'POST':
# Handling JD text or file submission
jd_text = request.form.get('jd_text')
jd_file = request.files.get('jd_file')
if jd_text:
session['jd_text'] = jd_text
elif jd_file and allowed_file(jd_file.filename):
session['jd_text'] = extract_text(jd_file)
return "JD Saved"

# Handling resume text or file submission
resume_text = request.form.get('resume_text')
resume_file = request.files.get('resume_file')
if resume_text:
session['resume_text'] = resume_text
elif resume_file and allowed_file(resume_file.filename):
session['resume_text'] = extract_text(resume_file)
return "Resume Saved"

# Redirect to the result page if both JD and resume are present
if 'jd_text' in session and 'resume_text' in session:
return redirect(url_for('result_resume_assessment'))

# Render the initial form
return render_template('index.html')

def allowed_file(filename):
return '.' in filename and filename.rsplit('.', 1)[1].lower() == 'pdf'

def extract_text(pdf_file):
pdf = PyPDF2.PdfReader(io.BytesIO(pdf_file.read()))
text = ""
for page in pdf.pages:
text += page.extract_text() if page.extract_text() else ''
return text

Words Number Control

We are going to add data valid check code into the process:

        elif jd_file and allowed_file(jd_file.filename):
session['jd_text'] = extract_text(jd_file)
return "JD Saved"
        elif resume_file and allowed_file(resume_file.filename):
session['resume_text'] = extracted_text
return "Resume Saved"

the updated code:

        elif jd_file and allowed_file(jd_file.filename):
file_content = jd_file.read()

print("File content size:", len(file_content)) # Check file content size
if len(file_content) == 0:
return "Uploaded file is empty", 400

pdf = PyPDF2.PdfReader(io.BytesIO(file_content))
if len(pdf.pages) > 5:
return "The JD PDF is too long to handle, upload a shorter one", 400

extracted_text = extract_text(file_content)
word_count = len(extracted_text.split())
if word_count < 2:
return "Job Description is too short, please upload a longer one", 400
if word_count > 2000:
return "Job Description is too long, please upload a shorter one", 400

session['jd_text'] = extracted_text
return "JD Saved"
        elif resume_file and allowed_file(resume_file.filename):
file_content = resume_file.read() # Read the content of the uploaded file

pdf = PyPDF2.PdfReader(io.BytesIO(file_content))
if len(pdf.pages) > 2:
return "Under no circumstance should a resume be longer than 2 pages", 400

extracted_text = extract_text(file_content) # Pass the file content to the function
word_count = len(extracted_text.split())
if word_count < 2:
return "Resume is too short, please upload again", 400
if word_count > 2000:
return "Resume has too many words, please upload again", 400

session['resume_text'] = extracted_text
return "Resume Saved"

PDF Files Size Control

Add a validity check function

def is_valid_file_size(file_content):
# Check if the file size is within the allowed limit
return len(file_content) <= 5 * 1024 * 1024 # 5MB limit for file upload
        elif jd_file and allowed_file(jd_file.filename):
file_content = jd_file.read()

if not is_valid_file_size(file_content):
return "Uploaded file is too large (max 5 MB)", 400

HTML files

Corresponding HTML files:

        function submitJDFile() {
var jdFormText = document.getElementById('jdFormText');
var formData = new FormData(jdFormText);

fetch('/index', {
method: 'POST',
body: formData
})
.then(response => {
if (!response.ok) {
// If the response is not OK, get the text of the response
return response.text().then(text => { throw new Error(text || 'Unknown error occurred') });
}
return response.text();
})
.then(data => {
if(data === 'JD Saved') {
document.getElementById('jdFormTextContainer').style.display = 'none';
document.getElementById('resumeFormTextContainer').style.display = 'block';
} else {
// Display the specific error message if it's not 'JD Saved'
alert(data);
}
})
.catch(error => {
console.error('Error:', error);
// Display the specific error message from the catch block
alert(error.message);
});
}
        function submitResumePDF() {
var resumeFormText = document.getElementById('resumeFormText');
var formData = new FormData(resumeFormText);

fetch('/index', {
method: 'POST',
body: formData
})
.then(response => {
if (!response.ok) {
// If the response is not OK, get the text of the response
return response.text().then(text => { throw new Error(text || 'Unknown error occurred') });
}
return response.text();
})
.then(data => {
if(data === 'Resume Saved') {
window.location.href = '/result_resume_assessment';
} else {
// Display the specific error message if it's not 'Resume Saved'
alert(data);
}
})
.catch(error => {
console.error('Error:', error);
// Display the specific error message from the catch block
alert(error.message);
});
}

If user input text content, we can do the data validity on the client-side:

        function countWords(text) {
return text.trim().split(/\s+/).length;
}

function moveToresumeFormText() {
var jdFormText = document.getElementById('jdFormText');
var formData = new FormData(jdFormText);
var jdText = formData.get('jd_text'); // Retrieve the job description text

// Check if the text is empty
if (!jdText.trim()) {
alert('The job description cannot be empty.');
return;
}

// Use the countWords function to get the word count
var wordCount = countWords(jdText);

// Check if the word count exceeds 1000
if (wordCount > 1000) {
alert('The job description cannot exceed 1000 words.');
return;
}

// Continue with the original fetch request if validation passes
fetch('/index', {
method: 'POST',
body: formData
})
.then(response => {
if (response.ok) {
// Hide the JD form and show the resume form
document.getElementById('jdFormTextContainer').style.display = 'none';
document.getElementById('resumeFormTextContainer').style.display = 'block';
initializeresumeFormText();
} else {
throw new Error('Failed to save job description');
}
})
.catch(error => {
console.error('Error:', error);
alert('An error occurred while saving the Job Description.');
});
}
        function initializeresumeFormText() {
var resumeFormTextContainer = document.getElementById('resumeFormTextContainer');

// Set the HTML for the resume form
var resumeFormTextHtml = `
<h2 class="center-text">Upload Resume</h2>
<form id="resumeFormText" method="post" action="/index" style="text-align: center;">
<textarea name="resume_text" placeholder="Input Resume"></textarea><br>
<div style="margin-bottom: 10px;">
<input type="submit" value="Continue">
</div>
<div>
<a href="#" onclick="uploadResumePDF()">OR Upload Resume PDF.</a><br>
</div>
</form>`;

resumeFormTextContainer.innerHTML = resumeFormTextHtml;

// Add event listener for the resume form submission
document.getElementById('resumeFormText').addEventListener('submit', function(event) {
var resumeText = this.resume_text.value; // Retrieve the resume text

// Check if the text is empty
if (!resumeText.trim()) {
alert('The resume cannot be empty.');
event.preventDefault(); // Prevent form submission
return;
}

// Use the countWords function to get the word count
var wordCount = countWords(resumeText);

// Check if the word count exceeds 1000
if (wordCount > 1000) {
alert('The resume cannot exceed 1000 words.');
event.preventDefault(); // Prevent form submission
return;
}
});
}

--

--

Henry Wu
preprintblog

Indie Developer/ Business Analyst/ Python/ AI/ Former Journalist/ Codewriter & Copywriter