“Data is the next water”

Katerina Sedova
Spyglass
Published in
9 min readMar 19, 2017

This week we began to peek at the horizon and get a glimpse of the future for Spyglass, beyond the H4D class. We built on our work last week exploring buy-in and the famously complex DOD procurement process. A key part of our beneficiary discovery this week was to understand how the AWG acquires technology. Our second goal was to hone in on a single use case where we can add value with a meaningful solution, while differentiating ourselves in a crowded social media data analytics space.

As one of our interviewees pointed out, “Data is the next water” and there are 3800 companies working in this space already. Throughout the H4D journey, our beneficiary and tech discovery surfaced a number of aspects that are critical to situational awareness in semi-permissive environments, from text analysis of directly threatening posts to atmospherics and general sentiment analysis. This week was no exception. Our interviews further reinforced what we have been learning: nuanced text analysis is hard, a lot of brilliant people are working on it, it is already a crowded space, and there are companies, like Babel Street, who do it exceptionally well in 250 languages. As we touched on in our last week’s post, we continue to look for points of integration to augment our current text analysis capability. We are also making progress on the threat classification front. Thanks to our new friends at Babel Street, this week we received information that we could incorporate into our engine, which allows us to continue our progress on the threat classification. We also got a lead on a list of threatening code words in English that we can translate and use to train Spyglass algorithm. We continue hunting for an expert-tagged training dataset of threatening and non-threatening tweets and tracking one down continues to be a challenge.

Moreover, we have made advances in image recognition and we believe that this analysis is one way Spyglass can differentiate itself. If we can identify pictures of US personnel posted on social media, analyze the associated text for sentiment or threat terms, evaluate the source of the tweet, and flag an alert, we could add value in the specific scenario: alerting US personnel to potentially being targets of crowd-sourced attacks. For example, we could provide the capability for AWG to train our algorithm on pictures of specific individuals who are about to deploy or already operating in the semi-permissive environments and a means to manage it as people enter and leave the service.

OpenFace neural network training on Steve Blank

To explore this hypothesis, this week Jose started playing with open source technology called OpenFace, which provides face recognition with deep neural networks. With any luck, we will have something relevant to demo in class on Monday, so tune in!

Finally, much of our learning this week came from getting out of the building to visit Haystax Technology, which specializes in monitoring security threats for large events, municipalities, and companies. Jose and Katya joined the Haystax team at their regular “Lunch and Learn” meeting, where the whole company takes a few hours over lunch to meet budding entrepreneurs, to learn about the problems they are solving, and to find ways to help. To make the most out the visit, we did a presentation to the whole Haystax team — the first such presentation for us outside of class. One of the key points of feedback echoed what we have been learning: image recognition could be a point of differentiation for us, if Spyglass could do this well.

With some of the executive staff at Haystax Technology

Bryan Ware and his executive team were also quite generous in offering us to play with their product interface, Constellation, and the Haystax threat streams application, which overlays a stream of tweets over a map in a particular location. Over the next week, we will investigate if we can use the Constellation system without going through Haystax algorithm. If this is possible, we can have a UI solution for the MVP and let Jordan do what he does best — work on the machine learning algorithm.

Constellation map and threat streams apps

In addition, we received a good summary of the Twitter terms of use and contacts at Twitter to follow-up with on GNIP data licensing. Another key takeaway from our visit to Haystax came organically as we touched on possible dual use of Spyglass. Based on our interviews at Haystax, we believe the market opportunity for a pure social media solution for corporate security customers may be shaky, as companies are typically unwilling to pay more than $10K per license annually. This confirms our direction so far to not actively seek dual use scenario as a primary goal, and focus on delivering a great solution to AWG.

The summaries below detail the additional takeaways from our interviews this week.

A., Seafront Analytics

  • Making technology stand out: There’s a ton of technology out there, but it takes too long to go from “zero to hero.” If we could develop a lightweight tool with a 30–60 minute learning curve that would provide value.
  • Operational focus: Folks, especially on the conventional side don’t have a lot of money to spend, especially on non-operational things. They would rather try to get more value out of the tools they already have rather than buy new ones.
  • Uplifting Analysts: Most clients want tools that will empower their analysts, not provide answers for them (how do we distill 200,000 posts to 200, and let the analyst provide the analytical framework on the remaining data?)

K., The Penrose Corporation

  • Cost problems: In spite of the multitude of technology products in the OSINT space, no one introduced a cost structure that suits the DOD’s preference for enterprise solutions.
  • Managing client expectations: Focus on what is doable given constraints (time, money, technology) and communicate that with stakeholders. Articulate what we can achieve now and where we could get given X amount of time. Get a commander comfortable with our interim solution and get to the point where they tell us “we would he a customer if.”
  • Veracity of information: If in the conduct of normal duties, you are able to push social media information to soldiers who are in a position to corroborate that information, thus lending veracity to an account, that is a value add. What is important is that we can explain to a commander the risks and the flaws of our model, and the math that powers it will lend credibility to our efforts.

J., Researcher at the National Institute from Standards and Technology

  • Expert on Data mining tweets, multilingual and geographic information. Analysts won’t be able to provide real tweets, but she can recreate 25 fictitious tweets that are threatening. These tweets could be used for language analysis.
  • Threats might use coded language so more important than language analysis is to use threat + coded language analysis: identifying in which context eggplant means a vegetable or a bomb.
  • Some methods used for geo-inferencing are: social network analysis (who is talking to whom), looking for locations that are mentioned in tweets, particular unique locations or monuments only found in one place (pilgrimage sites), events that people are talking about, dialects spoken in different places, which can help identify the country.
  • If we go the route of image recognition, we could use night, day, dawn and dusk to infer the location of outdoors pictures.

J., Sr. Operation Specialist, and K., Sr. Military Adviser, AWG

  • The Technology Readiness Level is a scale used by the DOD to categorize new technology that is not currently implemented to define whether money is going to R&D or Purchases.
  • On levels 1 to 3, DOD usually does not invest money but expect this to be done as part of the Industry Research and Development. DOD will consider investing money after a prototype is presented through a solicitation process.
  • There are two options for purchasing a new technology: a procurement process for wide distribution and a fielded purchase for limited edition distribution. Procurement means that not only you can buy, but also you can field it for worldwide use with safety confirmations to have soldiers use on the field. Procurement is a longer and more complicated process.
  • If we provide a solution by the end of the class and AWG likes it, they will have to do further due diligence to decide how to proceed. If AWG commander accepts this solution, AWG could write the requirements and either sole-sourcing or open for bidding. If it is accepted for army-wide use, AWG would have to present it and require further chain of command guidance. Step 1 is presentation of the final product to the leadership.

Haystax Technology Site Visit

B., co-founder and CEO; J., VP of Product Management; C., Director of Product Marketing; R., Director of Product Engineering; P., Solutions Architect, R., CTO; C., Executive VP of Product

  • Haystax’ use case: “I am responsible for security of a very large place, a city or corp. I need to monitor anything that creates a problem”. Clients include major sporting events and concerts, large municipalities. This includes algorithms for insider threats: human resources, social media, network security, connecting risk into the model.
  • The main platform — the Constellation system — is a multi-tenant cloud-based platform. It contains a series of apps: Assets (facilities, buildings of interest); Incidents (incident log, users can subscribe to email alerts when new incidents get created); Map (shows all of the data in the system on a map, see your assets, incidents, threat streams [twitter,rss]); Threat Streams (3 column view of real-time streaming data feed analysis from Twitter & RSS). Through the demo license, we can configure our own column views, add our own RSS feeds. We can also post items into the API to populate our own feed. A Spyglass tenant has been created for us, including sample asset and sample incident, which we can explore. The “how to” deep dive into the system is available upon request.
  • Training data: look for examples of social media threats and examples of what that awareness looks like for your beneficiaries — this might uncover untold needs. Try designing 30 threatening tweets and use that as a training data set. They can give us a list of threatening code words in English, which we can potentially translate.
  • The use case for global corporate security is there, but limited. One of their customers during Arab Spring was not getting a full picture from social media, and by the time they started the evacuation, it was under fire. Spyglass could have helped make a decision sooner. However, the commercial market for our project might be limited to big corporations, who will pay at most around $10,000 / year for the service.
  • There are many restrictions on the use of Twitter for law enforcement or security. Use by law enforcement is not permitted except to serve public safety use cases as outlined below. Use of the product to detect potential violent threats against the public is allowable if law enforcement is acting at the behest of the event organizer to provide security. Other law enforcement uses to detect potential violent threats against the public during large public events is allowable ONLY IF: the event has broad public access and impact (e.g. large sporting events, city marathons, concerts, academic or corporate conferences) and the event is not political in nature (e.g. a protest, rally, community organizing meeting). For law enforcement, government, and public safety customers, named entities should not be people; sources (@handles) should not be reported for tweets.

Overall, it has been a great week of learning. Honing in on a use case is an important milestone for us. We are doubling down on image recognition as a value add to our problem sponsor’s goals and a point of differentiation for Spyglass. Next week we will test the hypothesis that this use case is the right direction.

For now, Spyglass is a drop in the ocean of data and social media data analytics solutions. Hopefully, one day it can be a formidable tide in this ocean.

--

--