See you in the neighborhood: using Gemini to support housing assistance programs

Willis Zhang
Google Cloud - Community
5 min readMar 18, 2024

This demo was originally created by G. Hussain Chinoy and reworked to fit a housing use case.

Local governments are responsible for providing assistance to those in need but often lack the resources to do so at scale. As a result, many people who are eligible for assistance programs do not receive them quickly enough. One way to address this problem is to use data analytics to identify those who are most likely to need assistance and then target them with outreach efforts.

Using housing as an example, we will demonstrate step-by-step how to use BigQuery to understand our applicant population, then use Generative AI in Vertex AI to cater assistance programs to the right people.

Segment our population into cohorts

First, we will use ML and public datasets in BigQuery to segment different cohorts of underserved populations. The following SQL query creates a k-means model that will group county populations in the United States into 3 general clusters. You can copy and paste the query directly into the BigQuery UI.

CREATE OR REPLACE MODEL `<dataset-name>.housing_pool`
OPTIONS
( MODEL_TYPE='KMEANS',
NUM_CLUSTERS=3 ) AS

WITH
acs_2017 AS (
SELECT
geo_id,
rent_over_50_percent + rent_40_to_50_percent + rent_35_to_40_percent AS rent_over_35_percent,
housing_units_renter_occupied AS rental_units,
commuters_by_public_transportation/total_pop AS public_transit,
vacant_housing_units/total_pop AS vacant_units,
families_with_young_children/total_pop AS young_families,
FROM
`bigquery-public-data.census_bureau_acs.county_2017_5yr`
WHERE housing_units_renter_occupied <> 0),


LIHTC_Units AS (
SELECT
SUM(n_units) AS total_LIHTC_units,
SUBSTR (fips2010,0,5) AS shortFIPS
FROM
`bigquery-public-data.sdoh_hud_housing.2017_lihtc_database_hud`
WHERE n_units <> 0
GROUP BY shortFIPS ),
join_tbl AS (
SELECT
ML.MIN_MAX_SCALER(ROUND((rent_over_35_percent/rental_units)/(total_LIHTC_units/rental_units),3)) OVER() AS rent_burden,
ML.MIN_MAX_SCALER(ROUND(public_transit/(total_LIHTC_units/rental_units),3)) OVER() AS public_transit,
ML.MIN_MAX_SCALER(ROUND(vacant_units/(total_LIHTC_units/rental_units),3)) OVER() AS vacant_units,
ML.MIN_MAX_SCALER(ROUND(young_families/(total_LIHTC_units/rental_units),3)) OVER() AS young_families
FROM acs_2017
JOIN LIHTC_Units ON acs_2017.geo_id = LIHTC_Units.shortFIPS )


SELECT * FROM join_tbl

Here are some details about the query:

  • It joins data from the Census Bureau’s American Community Survey (ACS) and from the Department of Housing and Urban Development (HUD) Low-Income Housing Tax Credit (LIHTC) program
  • We look at the population who pay over 35% of their income towards rent
  • Other inputs include the population that commutes using public transportation, vacant housing units, and families with young children
  • The inputs are normalized based on the total population
  • We further factor total_LIHTC_units into our features in order to heavily weigh populations that have less LIHTC coverage
  • Finally, we use MIN_MAX_SCALER as a helpful ML preprocessing function to scale our units between 0 and 1

After a minute of running, the query to create the model should be done. Click Go To Model, and then click the Evaluation tab to see numeric breakdowns of our clusters.

You can now use this model to see which cluster your county belongs to (here is a helpful reference from the Federal Communications Commission).

Understand our results and get recommendations

When looking at the blend of socioeconomics for each cohort, it may be difficult to find a concise way to describe each of them. This only gets harder as we increase the number of features. Generative AI can assist in synthesizing the main themes of our data and presenting it back in a way that can “prompt” follow up action.

Save the image above of the evaluation breakdown. We’ll use it in our prompt to the Gemini Multimodal model in Vertex AI: inside your project, click “Insert Media” to insert the image, then paste the following prompt:

{Mission}
You help government officials market housing
assistance programs. You customize assistance
programs to the needs of specific populations.

{Instructions}
Given the following population clusters and their
socioeconomic makeup shown in the picture, explain
if the cluster is suitable for housing assistance.
Then for each of the clusters suitable for
assistance, come up with a brand persona and
campaign title based on their distinct attributes,
and explain what would be the appropriate next
step for each of those clusters.

The parameters used were:

   Temperature: 0.1,
Token limit: 1389,
Top-K: 33,
Top-P: 0.77

After pressing “Submit,” the result may be similar to this:

Cluster 1: Young Families with Low Rent Burden
- Suitable for assistance: No
- The rent burden is very low, indicating that housing
costs are not a significant challenge for this cluster.
- The public transit usage is also low, suggesting that
they may have access to other means of transportation.
- The vacancy rate is relatively high, which means there
may be sufficient housing options available without
the need for assistance.

Cluster 2: Young Families with Moderate Rent Burden
- Suitable for assistance: Yes
- The rent burden is higher than Cluster 1, indicating
that housing costs may be a challenge for some
families in this cluster.
- The public transit usage is slightly higher,
suggesting that they may rely on public transportation
for commuting.
- The vacancy rate is higher than Cluster 1, but still
relatively low, indicating that there may be some
competition for affordable housing.

Brand persona: Struggling Young Family
Title label: "Helping Young Families Thrive: Affordable
Housing Solutions"
Next step: Provide information about affordable housing
options, rental assistance programs, and financial
counseling services.

Cluster 3: Young Families with Moderate Rent Burden and
High Public Transit Usage
- Suitable for assistance: Yes
- The rent burden is similar to Cluster 2, but the
public transit usage is significantly higher,
indicating that they heavily rely on public
transportation for commuting.
- The vacancy rate is lower than Cluster 2, suggesting
that there may be a shortage of affordable housing
options in this area.

Brand persona: Commuting Young Family
Title label: "Affordable Housing for Commuters: Making
Transit-Oriented Living a Reality"
Next step: Provide information about affordable housing
options near public transit hubs, transportation
subsidies, and car-sharing programs.

This response helps us to de-prioritize cohorts that align with cluster 1 while shoring up the specific needs of clusters 2 and 3. Feel free to dial the temperature and other parameters as you see fit and provide more agency-specific guidelines on scoring applicant need.

Next steps

This introductory level use case provides a blueprint for building your own models using these queries and prompts as an initial template. Where you can go from here:

  • The ACS dataset can get more granular, up to the block group and census tract level, but providing your agency’s historical applicant data would have the greatest impact on a model’s utility
  • BigQuery and Vertex AI provide many more prediction services, such as creating a recommendation engine that could assist applicants with the best resources or analyzing sentiment on tickets or survey responses to track emerging areas of concern

--

--

Willis Zhang
Google Cloud - Community

Google Cloud engineer. Current obsessions: Kubernetes and data analytics for local governments. Posts and opinions are my own.