The million-dollar question: What do we want Spyglass to be?
We began this week considering how to narrow the scope of our product and focus on a clear value proposition for a single beneficiary. These steps would help us prove our concepts and prototype more clearly. When it comes to open source social media, there are many options for analyzing the data collected: sentiment analysis on text in foreign languages, object recognition in images, network analysis of users and identification of trending topics, to name a few.
All of these contribute to our goal of enhancing force protection and situtational awareness, albeit in different ways. Our current challenge is to identify the minimum capabilities required to provide value to a single target beneficiary and obtain feedback from those users. We need to decide whom we want our first beta testers to be and which capability is most pressing for them. In a way, we have been trying to answer this question the entire semester, and we had hoped that our beneficiary and tech discovery would naturally lead us to an obvious conclusion. Now that we have concluded Week 7 of Hacking for Defense (H4D) without an obvious answer, it seems that we as a team might have to make some choices.
Our first hypothesis for this week postulated that there are training datasets with classified tweets by threat level that we could use for training a machine learning (ML) algorithm. We were unable to obtain a threat-based dataset due to clearance and procedural limitations, but we have a number of leads on alternative datasets. We looked into sentiment analysis as a proxy for threat analysis to prove the classification capability of our algoirthm. Jordan built a prototype feature of sentiment classification in Arabic.
For our starting point, we used a dataset of sentiment-tagged Arabic tweets from UC Irvine (UCI). The model was trained using a support vector. The current accuracy level is ~76%, though it is somewhat biased towards negative sentiment. According to their paper, the UCI researchers improved the accuracy of their model to 84% by incorporating a stemming algorithm that de-conjugates the Arabic text. While doable, this improvement will requires extensive Arabic knowledge and manual labor. Rather than deepen our Arabic analysis capabilities, we are considering incorporating Russian as an additional language.
Our second hypothesis stated that the primary feature of Spyglass should be a real-time alert system of unusual activity (including threats) to forces on the ground. We found it difficult to validate this hypothesis with current soldiers in the conventional military who have deployed recently. Beneficaries we spoke to either lacked experience in a combat deployment in a semi-permissive environment or had not deployed within the past 3 years. Understandably, it’s extremely difficult to get in touch with folks who are currently deployed abroad.
As a result, we are considering using AWG Operational Advisors as initial beta testers for our tool with real-time alerting capability. Even though AWG OAs are unlikely to be our target beneficiary, they may be able to provide valuable insights for Spyglass as beta testers. Furthermore, our latest H4D course lecture demonstrated the importance of winning buy-in from the organization. That could start with getting beta testers who are also in a position to demonstrate buy-in and support for Spyglass.
For our tech discovery, we visited one of Babel Street’s offices in Virginia and Palantir’s DC office in Georgetown. Babel Street is doing a very detailed work on understanding language differences and local ontologies. Their lexicon translation database has been trained by linguistics over several years not only on meaning but also on sentiment analysis. Palantir’s Gotham platform is an advanced secure data aggregation and analysis tool. Palantir’s strategy focuses on solving their customer’s specific problems rather than a providing generic software licenses to customers. Our key takeaways from these visits are featured in our interview summaries below.
Landon Bell. Deployment Strategist, Palantir. Formerly Navy Special Forces.
- Getting users engaged: “The biggest problem we have with users is when they are disenfranchised from the outcome.” If we want users to buy-in, they need to feel connected to the outcomes we are trying to achieve.
- Improving, not replacing, humans: Their product is designed to make humans more powerful, not automate their jobs away. The higher the stakes of your mission the more you want to optimize away from the machine making a decision. Spyglass adheres to the same philosophy.
- Advice moving forward: Focus on capabilities, not features. The problem is almost impossible to narrow down too much, so try to create as specific a use case as possible to demonstrate the value of our product. Get it into those people’s hands and start testing ASAP.
- Buying in: Who needs us the most? Who is feeling the pain because they lack the capabilities we offer? Who is the decision maker who can allocate this money? We need to answer these questions if we realistically want to deploy our product.
A., CPT. Army Cybercom, Planner for CEMA Support to Corps and Below.
- Unlikely NTC can incorporate Spyglass in upcoming BCT experiments in April. We’re running cyber/social media experiments at the National Training Center (NTC) through a closed Insurgent Communications Network (ICN). The ICN runs on a closed fiber ring that applications over the open web can’t access.
- During the tests, our Information Warfare cell will replicate the opposition force posting things on social media (Facebook, Twitter) and the open web (eBay, etc.). We also have neutral people replicating the population posting things on those sites.
- OSINT teams scrape all of this information sent over the ICN, provide tippers to the brigade, and tell the BCTs what to be on the lookout for.
G., CPT. Regimental Irregular Warfare Operations Officer, National Training Center (NTC).
- The ICN infrastructure and hardware at NTC is clunky and not that accessible. “You have to be physically located at NTC to be in the ICN.”
- NTC is upgrading to a new closed network called ION over the next 6–8 months. With the NIPR backbone, ION will be remotely accessible and accessible to people outside the training centers. ION will be maintained by Army Cybercom, whereas ICN is maintained by NTC’s in-house communications shop.
- NTC replicates the environment of countries all over the world to provide a framework for its roleplaying simulations. These show how conflicts would evolve based on factors that are introduced. BCT rotations are 10 days long, so sometimes developments that might take months or years in the real world are shortened.
Michael Lumpkin, former Special Envoy and Coordinator of the Global Engagement Center (GEC) at the U.S. Department of State.
- GEC’s strategy is focused on partner-driven messaging and data analytics for effective counter-messaging: GEC is an interagency entity, housed at the State Department and charged with coordinating U.S. counterterrorism messaging to foreign audiences. Primary mission is to counter messaging of violent extremism from terrorist organizations including ISIL.
- GEC has a chief privacy officer to carefully ensure compliance: GEC was given a “privacy carve-out” of a strict interpretation of the 1974 Privacy Act for regulating data collection through the National Defense Authorization Act of 2015–2016. GEC also established a social media fusion cell with representatives from partner nations in order to abide by privacy laws of those nations.
- Obstacles to effective social media counter-messaging are organizational resistance to innovation and adoption of new tools. DOD suffers from the “not invented here” bias: if it’s not created in a given organization, it’s not good enough. A way to change is through high-level buy-in from the decision makers: “breaking the rules, but having a plan.” How does Spyglass fit into the plan?
- Keys to effective buy-in and tool adoption: modularity, adaptability, ease of integration with existing system to decrease barrier to adoption. There are 3800 different tools focusing on social media analytics — know your unique value add and ensure that there is funding for longer-term adoption.
Brendan Huff, Andy Pessaud, Jeff Chapman of Babel Street.
- Babel Street is a powerful text analytics platform that enables search of open source intelligence, supports 55,000 news sources and blogs, 30 social media sources/channels, and the Dark Web. It breaks the language barrier by enabling search of terms in English and traversing 250 languages.
- Babel Street’s ontology uses an entity — not a word — as the base unit. It understands entities across languages, vernaculars, slang, and even misspellings. This level of nuance has been attained through 10 years of human translation and curation of machine learning algorithm results by linguists.
- Babel Street has many advanced features: granular sentiment detection, conversation volume tracking, and event mapping, to name a few. There is threat detection capability and premium alerts that can be set up for customers. Babel Street is working on building a mobile app and integration with mobile platforms.
- Compliance with privacy regulations and Terms of Service (ToS) contracts are paramount for tools that analyze data. Babel Street has a privacy officer to ensure full compliance with laws and ToSs of social networks and data providers, such as Twitter. Twitter is not shy about terminating access to their data to entities that violate its ToS, a fate that has affected some of Babel Street’s competitors. A GNIP license must be purchased by anyone commercializing social media data analysis — does Spyglass need this?
- “Data is the next water.” Everybody wants to get into the data analytics game. Few actors “own” and provide data through licensing its use to data analytics companies. Opportunities exist for Spyglass to “plug-in” to Babel Street to use their language capability, but it runs into potential data ownership problems.
Sgt. Jon Gillis, Marine Corps Warfighting Lab.
- The biggest challenge to adoption of Spyglass will happen at handoff: can the DoD/military/etc. handle deployment of our prototype and productionizing on their own systems? As it stands today, probably not.
- ATAK is not suitable for large data transfer; Nett Warrior is 3–5 years away.
- One way to work around the DOD’s acquisition problems is to bundle cheap and effective solutions along with other acquisitions.
B., SPC. U.S. Army Cyber Operations Specialist.
- Cultural impediments: Cyber is sometimes viewed as “the other army.” There is a leadership culture that doesn’t always understand the mission. Developing understanding of how our product affects conventional operations is critical to getting buy-in.
- Not a key beneficiary: Most guys in his area will not be deployed and are assigned to support different central commands as needed. Many work in a highly-classified environment and will not likely benefit directly from our product.
- Geoinferencing workaround: Even if only 2% of social media is geotagged, that’s still a lot. Over time we could perform trend analysis and use those observations to inference non-geotagged tweets that fit the pattern.
G., Special Agent, Internal Security Division, Department of Homeland Security.
- Buy in HMS would be at the agency level. The Department of Homeland Security does not integrate all their agencies decision making procedures. In order to get buy in from Customs and Border Patrol, the Secret Service or the Coast Guard it will be necessary to talk to every agency independently.
- Currently the Internal Security Division would be interested in a system that protects its facilities and DHS key personnel in public acts. Currently this targets are considered at low risk. They might be interesting opportunities concerning Customs and Border Patrol.