See you in the neighborhood: using Gemini to support housing assistance programs

Published in

Google Cloud - Community

5 min readMar 18, 2024

This demo was originally created by G. Hussain Chinoy and reworked to fit a housing use case.

Local governments are responsible for providing assistance to those in need but often lack the resources to do so at scale. As a result, many people who are eligible for assistance programs do not receive them quickly enough. One way to address this problem is to use data analytics to identify those who are most likely to need assistance and then target them with outreach efforts.

Using housing as an example, we will demonstrate step-by-step how to use BigQuery to understand our applicant population, then use Generative AI in Vertex AI to cater assistance programs to the right people.

Segment our population into cohorts

First, we will use ML and public datasets in BigQuery to segment different cohorts of underserved populations. The following SQL query creates a k-means model that will group county populations in the United States into 3 general clusters. You can copy and paste the query directly into the BigQuery UI.

CREATE OR REPLACE MODEL `<dataset-name>.housing_pool`
OPTIONS
 ( MODEL_TYPE='KMEANS',
   NUM_CLUSTERS=3 ) AS
  
WITH
 acs_2017 AS (
 SELECT
   geo_id,
   rent_over_50_percent + rent_40_to_50_percent + rent_35_to_40_percent AS rent_over_35_percent,
   housing_units_renter_occupied AS rental_units,
   commuters_by_public_transportation/total_pop AS public_transit,
   vacant_housing_units/total_pop AS vacant_units,
   families_with_young_children/total_pop AS young_families,
 FROM
   `bigquery-public-data.census_bureau_acs.county_2017_5yr`
   WHERE housing_units_renter_occupied <> 0),


LIHTC_Units AS (
 SELECT
   SUM(n_units) AS total_LIHTC_units,
   SUBSTR (fips2010,0,5) AS shortFIPS
 FROM
   `bigquery-public-data.sdoh_hud_housing.2017_lihtc_database_hud`
 WHERE n_units <> 0
 GROUP BY shortFIPS ),
 join_tbl AS (
 SELECT
   ML.MIN_MAX_SCALER(ROUND((rent_over_35_percent/rental_units)/(total_LIHTC_units/rental_units),3)) OVER() AS rent_burden,
   ML.MIN_MAX_SCALER(ROUND(public_transit/(total_LIHTC_units/rental_units),3)) OVER() AS public_transit,
   ML.MIN_MAX_SCALER(ROUND(vacant_units/(total_LIHTC_units/rental_units),3)) OVER() AS vacant_units,
   ML.MIN_MAX_SCALER(ROUND(young_families/(total_LIHTC_units/rental_units),3)) OVER() AS young_families
 FROM acs_2017
 JOIN LIHTC_Units ON acs_2017.geo_id = LIHTC_Units.shortFIPS )


SELECT * FROM join_tbl

Here are some details about the query:

It joins data from the Census Bureau’s American Community Survey (ACS) and from the Department of Housing and Urban Development (HUD) Low-Income Housing Tax Credit (LIHTC) program
We look at the population who pay over 35% of their income towards rent
Other inputs include the population that commutes using public transportation, vacant housing units, and families with young children
The inputs are normalized based on the total population
We further factor total_LIHTC_units into our features in order to heavily weigh populations that have less LIHTC coverage
Finally, we use MIN_MAX_SCALER as a helpful ML preprocessing function to scale our units between 0 and 1

After a minute of running, the query to create the model should be done. Click Go To Model, and then click the Evaluation tab to see numeric breakdowns of our clusters.

You can now use this model to see which cluster your county belongs to (here is a helpful reference from the Federal Communications Commission).

Understand our results and get recommendations

When looking at the blend of socioeconomics for each cohort, it may be difficult to find a concise way to describe each of them. This only gets harder as we increase the number of features. Generative AI can assist in synthesizing the main themes of our data and presenting it back in a way that can “prompt” follow up action.

Save the image above of the evaluation breakdown. We’ll use it in our prompt to the Gemini Multimodal model in Vertex AI: inside your project, click “Insert Media” to insert the image, then paste the following prompt:

{Mission}
You help government officials market housing 
assistance programs. You customize assistance 
programs to the needs of specific populations.

{Instructions}
Given the following population clusters and their 
socioeconomic makeup shown in the picture, explain 
if the cluster is suitable for housing assistance. 
Then for each of the clusters suitable for 
assistance, come up with a brand persona and 
campaign title based on their distinct attributes, 
and explain what would be the appropriate next 
step for each of those clusters.

The parameters used were:

   Temperature: 0.1,
   Token limit: 1389,
   Top-K: 33,
   Top-P: 0.77

After pressing “Submit,” the result may be similar to this:

Cluster 1: Young Families with Low Rent Burden
  - Suitable for assistance: No
  - The rent burden is very low, indicating that housing 
    costs are not a significant challenge for this cluster. 
  - The public transit usage is also low, suggesting that 
    they may have access to other means of transportation. 
  - The vacancy rate is relatively high, which means there 
    may be sufficient housing options available without 
    the need for assistance.

Cluster 2: Young Families with Moderate Rent Burden
  - Suitable for assistance: Yes
  - The rent burden is higher than Cluster 1, indicating 
    that housing costs may be a challenge for some 
    families in this cluster.
  - The public transit usage is slightly higher, 
    suggesting that they may rely on public transportation 
    for commuting.
  - The vacancy rate is higher than Cluster 1, but still 
    relatively low, indicating that there may be some 
    competition for affordable housing.

Brand persona: Struggling Young Family 
Title label: "Helping Young Families Thrive: Affordable 
    Housing Solutions" 
Next step: Provide information about affordable housing 
    options, rental assistance programs, and financial 
    counseling services.

Cluster 3: Young Families with Moderate Rent Burden and 
    High Public Transit Usage
  - Suitable for assistance: Yes
  - The rent burden is similar to Cluster 2, but the 
    public transit usage is significantly higher, 
    indicating that they heavily rely on public 
    transportation for commuting.
  - The vacancy rate is lower than Cluster 2, suggesting 
    that there may be a shortage of affordable housing 
    options in this area.

Brand persona: Commuting Young Family 
Title label: "Affordable Housing for Commuters: Making 
    Transit-Oriented Living a Reality" 
Next step: Provide information about affordable housing 
    options near public transit hubs, transportation 
    subsidies, and car-sharing programs.

This response helps us to de-prioritize cohorts that align with cluster 1 while shoring up the specific needs of clusters 2 and 3. Feel free to dial the temperature and other parameters as you see fit and provide more agency-specific guidelines on scoring applicant need.

Next steps

This introductory level use case provides a blueprint for building your own models using these queries and prompts as an initial template. Where you can go from here:

The ACS dataset can get more granular, up to the block group and census tract level, but providing your agency’s historical applicant data would have the greatest impact on a model’s utility
BigQuery and Vertex AI provide many more prediction services, such as creating a recommendation engine that could assist applicants with the best resources or analyzing sentiment on tickets or survey responses to track emerging areas of concern

See you in the neighborhood: using Gemini to support housing assistance programs

Segment our population into cohorts

Understand our results and get recommendations

Next steps

Written by Willis Zhang