Effectively Using AI as a Buddy for Software Development: A Case Study
A practical approach to leveraging AI collaboratively for complex SQL dependency tasks
Artificial Intelligence (AI) has become an indispensable tool in the software development lifecycle. However, its utility depends not only on its capabilities but also on how developers collaborate with it. In this article, I’ll share a practical use case and illustrate the best practices for using AI as a collaborative buddy rather than a solution to offload complex tasks entirely. The example involves sorting 200 complex AWS Redshift view statements by their dependencies, a problem that benefits significantly from human-AI collaboration.
The Challenge
Imagine you’re dealing with 200 nested AWS Redshift SQL view statements. These views have intricate dependencies on one another, and you need to determine the correct order to process them. This requires a topological sort of the views based on their dependencies, which are embedded within the SQL statements. Here’s a simplified example of one such statement:
CREATE OR REPLACE VIEW claims_preprocessing.generate_encounter_id AS
SELECT nd.patient_data_source_id, nd.start_date, nd.end_date, nd.claim_id,
ne.encounter_id AS old_encounter_id
FROM (
SELECT fd.patient_data_source_id, fd.start_date, fd.end_date, fd.claim_id,
fd.previous_max_end_date, fd.new_group_flag,
SUM(fd.new_group_flag) OVER(
PARTITION BY fd.patient_data_source_id
ORDER BY fd.start_date, fd.claim_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS encounter_group
FROM (
SELECT gd.patient_data_source_id, gd.start_date, gd.end_date, gd.claim_id,
gd.previous_max_end_date,
CASE WHEN (gd.start_date > COALESCE(gd.previous_max_end_date, '1900-01-01'::date))
THEN 1 ELSE 0 END AS new_group_flag
FROM (
SELECT bd.patient_data_source_id, bd.start_date, bd.end_date, bd.claim_id,
MAX(bd.end_date) OVER(
PARTITION BY bd.patient_data_source_id
ORDER BY bd.start_date, bd.claim_id
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS previous_max_end_date
FROM (
SELECT DISTINCT m.patient_data_source_id, m.start_date, m.end_date, m.claim_id
FROM claims_preprocessing.encounters__stg_medical_claim m
JOIN claims_preprocessing.asc__anchor_events u
ON (m.claim_id = u.claim_id)
) bd
) gd
) fd
) nd
JOIN (
SELECT unique_encounters.patient_data_source_id, unique_encounters.encounter_group,
ROW_NUMBER() OVER(
ORDER BY unique_encounters.patient_data_source_id,
unique_encounters.encounter_start_date) AS encounter_id
FROM (
SELECT numbered_data.patient_data_source_id, numbered_data.encounter_group,
MIN(numbered_data.start_date) AS encounter_start_date
FROM (
SELECT fd.patient_data_source_id, fd.start_date, fd.end_date, fd.claim_id,
fd.previous_max_end_date, fd.new_group_flag,
SUM(fd.new_group_flag) OVER(
PARTITION BY fd.patient_data_source_id
ORDER BY fd.start_date, fd.claim_id
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS encounter_group
FROM (
SELECT gd.patient_data_source_id, gd.start_date, gd.end_date, gd.claim_id,
gd.previous_max_end_date,
CASE WHEN (gd.start_date > COALESCE(gd.previous_max_end_date, '1900-01-01'::date))
THEN 1 ELSE 0 END AS new_group_flag
FROM (
SELECT bd.patient_data_source_id, bd.start_date, bd.end_date, bd.claim_id,
MAX(bd.end_date) OVER(
PARTITION BY bd.patient_data_source_id
ORDER BY bd.start_date, bd.claim_id
ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS previous_max_end_date
FROM (
SELECT DISTINCT m.patient_data_source_id, m.start_date, m.end_date, m.claim_id
FROM claims_preprocessing.encounters__stg_medical_claim m
JOIN claims_preprocessing.asc__anchor_events u
ON (m.claim_id = u.claim_id)
) bd
) gd
) fd
) numbered_data
GROUP BY numbered_data.patient_data_source_id, numbered_data.encounter_group
) unique_encounters
) ne
ON (nd.patient_data_source_id = ne.patient_data_source_id
AND nd.encounter_group = ne.encounter_group);
The Pitfalls of Relying Solely on AI
Your first inclination might be to have the AI solve the entire problem directly — to generate a dependency-resolved order of the views. However, this approach is inherently problematic. AI models often:
- Produce incorrect or incomplete results, especially if the dependency parsing logic is flawed.
- Fail to enable iteration, making it difficult for you to guide corrections.
For example, the AI initially attempted to construct a regular expression to parse the SQL statements for dependencies. However, the complexity of the nested queries caused it to fail repeatedly. Specific mistakes included:
- Misinterpreting nested queries and failing to capture all dependencies accurately.
- Overcomplicating the regex logic without considering alternative approaches.
- Fixating on regex patterns instead of adapting to a simpler and more effective solution.
Additionally, the AI struggled to handle specific SQL constructs effectively, such as:
- Separately managing different types of
JOIN
Clauses with their specific syntax, including:
- Standard
JOIN ... ON
LEFT/RIGHT/INNER/CROSS JOIN
FROM
clauses
2. Not handling:
- Parenthesized
FROM
/JOIN
expressions - Deeply nested subqueries
- Multiple join conditions
The AI focused on perfecting the regex — a path that would have been overly complicated and time-consuming. By recognizing this, I could course-correct, advising the AI to abandon regex altogether. Instead, I directed it to work collaboratively to produce programs that could solve the problem iteratively. For example, I instructed the AI to look for patterns, such as x.y
where x.y
matched a view name from the list of known views. This adjustment simplified the parsing process significantly.
The takeaway? Blindly trusting AI for complex and nuanced tasks often leads to suboptimal outcomes. Instead, work with the AI to produce tools and solutions collaboratively, iterating on them as needed.
Solution: A Collaborative Approach
To solve this problem, you can work with AI in a structured way. Here’s how:
Step 1: Break the Problem into Manageable Pieces
Instead of offloading the entire task to AI, divide it into smaller, straightforward tasks:
- Identify all view names.
- Extract dependencies for each view.
- Perform the topological sort.
- Match the sorted view order with the SQL statements.
Step 2: Guide the AI
Start with specific instructions for each step. For example:
- Task: Write a Python script to extract dependencies.
- Expectation: The script should identify references in the form
schema.view
and match them against the list of view names.
Step 3: Validate and Iterate
AI-generated scripts will often require adjustments. The parsing challenges, such as failures to capture nested dependencies or overcomplicated regex logic, are discussed in the “Pitfalls” section. You may need to refine or adapt the parsing logic to account for these issues and align with the specific needs of your task.
Step 4: Assemble the Solution
Once dependencies are accurately extracted and validated, you can use another AI-assisted script to perform the topological sort and verify the output.
Practical Implementation: A Three-Program Workflow
Here’s how the process might look:
- Program 1: List all view names from the SQL scripts.
- Program 2: Extract dependencies for each view and output them in a human-readable format for validation.
- Program 3: Perform the topological sort and match the sorted order with the SQL statements.
Each program serves a distinct purpose, and validating outputs at every step is crucial to catching errors, such as incorrect dependency extraction or sorting mistakes, early in the process.
The Takeaway: Collaborate, Don’t Offload
The key lesson is to collaborate with AI, not delegate tasks entirely. You can harness its power effectively by breaking problems into smaller parts, steering AI when it makes mistakes, and validating outputs.
In this case, working iteratively with AI allowed us to tackle a complex dependency resolution problem. The AI expedited the solution while enabling human oversight, ensuring accuracy and flexibility.
This collaborative approach to AI in software development ensures that you remain in control, leveraging AI as a powerful assistant rather than a fragile automaton.