Courtney Whalen
Aug 12 · 7 min read

Driven by climate change concerns, there has been significant growth in the renewable energy industry worldwide. Part of the reason for that growth is that environmentally-driven policy is making photovoltaic (PV) technology more affordable than ever. Consequently, solar panels are being erected faster than can be reliably recorded, especially in developing countries where much of the data is self-reported. At Astraea, we are building a scalable geospatial platform to bring the power of machine learning to Earth-observing data. In our solar farm detection study, we used Earth-observing data and computer vision to locate utility-scale solar farms. Through deep learning and the use of free, publicly available data, we developed a cost-effective approach to identify solar farms not yet recorded in any official solar databases, track the expansion of solar farms over time, and gain insight related to solar policy across different regions when visualizing a broad expanse of solar farms.

Figure 1. Source:


The free, publicly available imagery we used in this study was from the Sentinel-2 mission. Although Sentinel-2 has only moderate spatial resolution (10-m), utility-scale solar farms have distinguishing features, such as a large footprint and recognizable pattern, which make them great candidates for detection by computer vision algorithms. Also, since Sentinel-2 revisits the same spot on Earth every 5 days and has global coverage, it provides a cost-effective means of tracking the growth of solar farms in every country in the world at any given time.

Figure 2: Comparison of moderate (Sentinel-2) and high resolution (Google Imagery) images of a solar farm.


We developed an iterative process to train and improve our solar farm detection model, which is shown in Figure 3:

  1. We manually drew outlines around 100 solar farms to create our own training data.
  2. We used these geo-located outlines of known solar farms to crop the Sentinel imagery into image chips with positive and negative examples of solar farms.
  3. Then we used the image chips to train a convolutional neural network (CNN) model to distinguish between what is, and what is not, a solar farm.
  4. Finally, we scored the model on imagery across an entire country, then verified and added correct predictions to our training data.
  5. We repeated the steps above several times to improve model performance.
Figure 3: Data science pipeline. *The U.S. Energy Information Administration (EIA) produces an open dataset containing geographic coordinates of known U.S. solar plants. *AOI/TOI refers to the area of interest and time of interest of the imagery we are scoring.

Our model performance on a held-out test set and the effects of different efforts to improve model performance are shown in Figure 4. Our final model was correct in 92% of its solar farm predictions, and it was able to capture 85% of all the solar farms in the data we fed it. An interesting finding was that the most important factor in model improvement came from getting more training data — not from hyperparameter tuning or testing out different CNN architectures.

Figure 4: Performance of the test set across different modeling iterations.

Once we looped through the data science pipeline several times and had a model that was performing well on the United States, we scored the model on Earth-observing imagery of China. China is a leader in the solar energy industry, but data on its current solar stock is nearly impossible to find. Our model was able to correctly detect 944 solar arrays in China in August 2018, which provided us with insider knowledge into China’s solar power capabilities.

Figure 5: Heatmap of predicted solar farms in China.

Interesting Findings

As mentioned earlier, our scalable modeling process allowed us to create a cost-effective approach to expanding our training dataset and through this process, we discovered unrecorded solar farms, tracked the expansion of solar farms over time, and visualized heatmaps of solar farms across entire countries.

Using our model to detect new solar farms, as opposed to finding and drawing outlines around thousands of solar farms with a manual, broad-area search, was a much more cost-effective approach, and one that allowed us to scale our training set quickly. As the area of interest increases, the cost-benefit of using a deep learning model to label solar farms becomes very apparent.

Our model was also able to find several solar farms that weren’t yet recorded in any official solar databases like Wiki-Solar, and also don’t currently exist in various popular high resolution web maps that are refreshed less often that Sentinel-2 imagery, such as the solar farm in Figure 6. This solar farm is detected in August 2018 by our model. The Google Maps imagery is from May 2013, which was before the solar farm was built. Having near real-time imagery is critical for generating accurate and timely information.

Figure 6: Predicted solar farm north of Milford, Utah that exists in near real-time imagery (taken August 2018), but not in Google Maps (taken May 2013).

We scored the model over the United States across several years, which meant that our model was able to track the growth of individual solar farms over time. This is valuable information as farms expanding over time could surpass their recorded solar power capacities if their data isn’t kept up-to-date. In particular, we observed the expansion of several farms in California where local policy encourages the use of solar power, such as the farm in Figure 7.

Figure 7: Solar farm in California expanding between 2016 and 2018.

When looking at a heatmap of our model predictions across the United States, we can see the widespread effects of state policy in solar installations. California and North Carolina are brightly lit in Figure 8 because state policy is causing the solar industry to grow quickly there, whereas other sunny states may not be taking full advantage of their solar resources.¹

Figure 8: Heatmap of U.S. model predictions with California and North Carolina lit up.


- Extrapolation: When extrapolating our model to China, the model predicted several false positives and performed poorly in forested areas where solar panels tended to blend in with their surroundings. The model performed much better in desert regions where there was a high contrast between the panels and the surrounding land. Extrapolation is a common challenge in computer vision, which can be reduced by taking the time to plan and curate a representative training set.

- Rare Target: Since we were initially creating our own training data manually by drawing polygons around known solar farms, we did not have many training samples in our dataset. And though the number of utility-scale solar farms is growing globally, there are still not many of these in existence, which makes these farms a very rare target. This requires careful thinking about how to extract as much value as possible out of the samples you do have, while at the same time feeding your model a somewhat realistic class balance and a diverse set of examples. For instance, we found that certain crop fields and tennis courts look similar to solar farms, so we ended up adding these as negative examples in subsequent iterations of our model.

- Data Quality: When using Earth-observing data, special attention should be paid to certain considerations, such as image quality. The Sentinel-2 imagery that we used in our study was not atmospherically corrected since a considerable amount of time is required to perform this correction. When energy from the sun is reflected off an object, it has to travel a long way and through a broad expanse of atmosphere to reach a satellite sensor, and the interference encountered along this journey can jeopardize the image quality. In coastal areas, for example, there is a lot of water vapor in the atmosphere that results in blurry imagery, so our model had trouble picking out solar farms along coastlines. We also had some problems with clouds covering or casting shadows over solar farms, so we had to filter cloudy images out of our training set.

- Scalability: We fed the model imagery over two of the world’s largest countries, the United States and China, which are 3,796,742 and 3,705,407 square miles, respectively. Furthermore, we trained the model using 9 spectral bands in the visible, near infrared, and shortwave infrared spectral range, so the sheer volume of data made scalability a constant design consideration throughout this study.² For the U.S., we also included imagery across several years to look at solar farm growth over time. In total, we processed 13 terabytes of imagery.

We were able to address these challenges by creating a scalable processing workflow and by utilizing the Earth-observing data science expertise of our staff. Visualization is a key component when working with Earth-observing imagery, and through the visualization of our model predictions, we were able to discover new solar farms that aren’t yet recorded in any pre-existing solar database, to track solar farms as they expanded over time, and to gain interesting insights related to solar policy when looking at a broad view of our predictions.

And as an added bonus, one of the finest achievements of our model was finding an adorable panda bear-shaped solar farm in China!

Figure 9: Adorable pandas in Datong, Shanxi, China. Model prediction is shown in yellow.

For questions or comments, please visit our website:

Many thanks to the other data scientists who worked on this project:
Dr. Kimberly Scott, Co-Founder & VP of Data Science
Jason Brown, Senior Data Scientist
Eric Culbertson, Data Scientist

Written by Courtney Whalen
Data Scientist


See the Earth as it could be. Astraea’s EarthAI platform provides a suite of products focused on removing the complexities of discovering, processing, and analyzing EO data at scale.

Courtney Whalen

Written by

Data Scientist at Astraea, Inc.


See the Earth as it could be. Astraea’s EarthAI platform provides a suite of products focused on removing the complexities of discovering, processing, and analyzing EO data at scale.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade