FAQ

Christopher W. Beitel, Ph.D.
Project Clarify
Published in
11 min readDec 4, 2019

This is where we collect questions people frequently ask us about the project.

State Understanding. Why should we seek to understand the mental state (cognitive or emotional) of our training experience participants? Why does state matter in regard to mental training? Why does state matter in professional contexts or in relationships?

It’s clear to most people that one’s task performance depends significantly on ones mental state such as one’s level of focus vs. distraction, relaxation vs. agitation, happiness vs. sadness, etc. Because of this, task performance also depends on one’s awareness and regulation ability vis. state. When the task is a group task the interpersonal variants of perception and regulation are also relevant. The success of organizations working to solve important world problems turn on the performance of their members and thus in turn on these ways in which individuals and groups relate to state.

Benefit to Mental Training. How is Project Clarify useful to the task of building mental training experiences that really help make people more effective?

Real-time state understanding will benefit mental training by enabling us to (1) improve peoples state awareness and regulatory ability beforehand through dedicated training in that regard, (2) provide feedback to help keep people in their best state for training, and (3) adapt training experiences based on a user’s state. State understanding will further benefit studies of training experiences by (1) reducing the number of samples needed by enabling us to control the variable of person state and (2) improving our ability to interpret the result of these having passively collected state annotations both acutely (pre/during/post training) as well as over the duration of the study. Even more transformatively, our state understanding methods will enable an advancement of the state of the art in understanding cortical sensors such as (1) by providing a means for distillation of representational capability from a modality trained where data is abundant (e.g. visual) to one where it is not (e.g. cortical), (2) by providing partial supervision to otherwise self-supervised learning such as the correspondence of modalities, and (3) by providing an extensible foundation of representational capability that can be subsequently tailored to specific application contexts (see below, Transfer Learning).

Novelty. How is this novel? Are you just reproducing some existing work or is this actually something new?

Learning abstract representations of expression state is itself a very new research area in machine learning. To our knowledge, learning from human annotation of abstract expression similarity (in contrast to semantic labels of emotion or otherwise state) has not yet been done in the context of video, audio, or various neural sensing modalities. Further to our knowledge our efforts to use these methods as a partial supervisory signal for otherwise self-supervised learning of state representations (e.g. from the correspondence of audio and video tracks from spoken video) is brand new. Lastly, the application of representations learned in any of these ways for the purpose of real-time state feedback (including for the purpose of boosting meta-cognitive awareness and regulatory skills, to support the maintenance of a target state, or otherwise) is completely new.

Commercial Alternatives. Can’t we just pay some company with like 50 engineers who are already doing this kind of thing to do this for us? Is there really no off-the-shelf option?

In regard to this, it’s important to precisely specify what is the thing to which we are considering a commercial alternative. Indeed there are existing commercial options for predicting semantic emotion labels of varying levels of effectiveness. This is not a goal of this project. Our primary objective is to improve mental training and studies of mental training interventions by developing the means to represent state — as nuanced and informative as possible. The level of precision in our distinctions between closely related states determines how much of such ability to distinguish these we can provide to users through training. Indeed it is a limit on all of the benefits of such representations outlined above (Benefit to Mental Training). Towards this end we are seeking a progression of research that will allow us to progressively improve these representations well beyond what to our knowledge is available commercially — first with semi-self-supervised learning from massive audio and video data sets and next this in conjunction with emerging methods for understanding cortical sensors.

Another issue is that the question of whether a commercial alternative exists often comes with the implied assumption that reproducing or exceeding the representational capabilities of commercial options and putting these into production ourselves is hard. This is not so, see Immediate Availability and Labor Feasibility below.

Last, advancing the capabilities of the academic research community in the realm of human state understanding is an important research area. As academics, it is our responsibility to include in our considerations of how to conduct our research whether our approach is in the best interest of the community. Not only (1) do our current methods exceed the capabilities of all commercial alternatives we know of and (2) do having our own capabilities for visual state representation play an important role in our strategy for representing cortical modalities but (3) it is also an important service to the community to enable our peers to build upon this work.

Immediate Availability. What is available now? What has been accomplished already and is to the point that we can use it in support of our projects now?

A state understanding model that can be used either with single frames or real-time frame sequences is available now and works well. You can play with this model yourself (right now) by launching the alpha demo notebook on colab, here. If you’re one of our internal members you can easily be granted credentials that can be used to make requests to a GPU-backed deployment of this model with Kubeflow’s TFServing component enabling near real-time inference with a standard-quality internet connection. If you are an internal member then you also can read about our qualitative and quantitative results in our project working doc (there’s no reason other than time constraints this hasn’t been shared with everyone else yet, high priority).

Governance. Is Project Clarify a community project governed by its members or does it reside within the governance of UCSF? How can our organization trust that our needs for the platform and capabilities will continue to be met as the project changes over time?

Project Clarify is an Apache 2.0 licensed open source project and we make decisions as a community. There are three primary venues to participate in this decision-making. One is when, once a month, each of our weekly SIG meetings (i.e. SIG-ML, SIG-Platform, SIG-FX (feedback experiences), and SIG-PM) adopt an intentionally higher-level strategic perspective. Another venue is that of our regular gatherings, i.e. beginning-of-quarter hackathons and end-of-quarter demo days. The third venue is the classic open source model of “propose, discuss, and build”. Anyone is welcome to file or comment on an issue (here) including in regard to preferences of prioritization.

Diversity. Some people are just not that expressive. And some people are just very different perhaps on account of their neurology or significant life experiences. Will these methods work with someone with more or less monotonic facial expressions?

We need to characterize, study, and if necessary improve this over time. Our initial focus is on further developing the capabilities (both for visual and neural modalities) using data from expression-typical individuals. It is plausible that non-typicals will be able to use EQ training experiences enabled by our core visual state understanding capabilities to learn to express a greater range of emotion if so desired. It is further plausible that the capability to represent the states of typicals from visual and cortical modalities will further enable research on the scientific basis of these non-typical patterns of expressiveness (e.g. be able to identify and further study non-typical cortical events having established a baseline of representations of typical events).

Cortical Modalities / Neurotech. I’m most interested in EEG, fNIRS, eCOG, or generally something other than image, video, or audio-understanding. How has the program been designed around these modalities? Is our strategy novel and highly promising in these regards?

A substantial issue in regard to understanding data from cortical and other brain sensors is modeling (1) sources of noise, (2) the dynamics by which activity propagates to sensors, and (3) features and complexes of features that are relevant to understanding such modalities that human perceptual systems have not evolved to perceive. Learning from a mixture of human annotations and unlabeled properties of data (i.e. self-supervised learning) for observable modalities is promising in regard to all of these. For the first, where a key source of noise is learning to be represented (e.g. movements of facial muscles), these representations put us in a very favorable position to learn noise-independent representations of cortical sensor modalities when some sort of visual/cortical correspondence loss is included. In regard to the second and third, deep learning has the potential to learn these from data and our former point about supervision from visual modalities positions us to succeed in doing so with much less neural sensor data. This is important because of massive double challenge with cortical modalities of (1) very high dimensionality, and (2) compared to visual modalities, very high difficulty of obtaining large data sets.

Transfer Learning. Someone said something about this being useful for transfer learning, can you explain?

Transfer learning is a classic method in deep learning and involves training a network to perform one task before continuing to train it to perform another (often related but more specialized) task. For example developing a visual system before specifically learning to perceive the depth and trajectory of a ping pong ball during a game of table tennis. Not only are our representations useful for feedback, adaptation, and direct characterization (as described above in Benefit to Mental Training) but provides a foundation for transfer learning — including both (1) transfer to learning similar abstract representations in specialized contexts such as those personalized for a single individual or more finely applicable to a narrow context like the expressions engineers tend to make when working in offices as well as (2) transfer to learning to predict labels of interest such as those assigned by expert clinicians.

They go high, we go local. The application I have in mind needs very high inference speeds (e.g. because the style of feedback or adaptivity depends on it) and needs to work with high-resolution video (e.g. in the course of characterizing the state of a group of interacting study participants) or other high-content neural sensor data types.

This problem is actually much easier to solve than one might imagine by simply running models on a local machine with a modern graphics accelerator.

Interpretability. I’ve heard that deep learning is not interpretable. Is what you’re doing interpretable?

Human states are hard to interpret and our methods are providing a solution to this interpretability problem (and in so doing providing us a means to help people become more effective and healthier). But more directly in regard to model interpretability — the representations learned by our models can easily be interpreted by mapping labeled examples into representation spaces learned from unlabeled examples. Also, our models can be extensively characterized in regard to their performance in predicting annotations assigned by human annotators thus allowing us to interpret how their performance (including any failure modes of this) relates to the performance of human annotators. Lastly, related to the discussion above in Transfer Learning, our models can be used as the basis for co- or transfer-learning the capability to parameterize an existing simplified mechanistic model that one might consider more intuitive.

Labor feasibility. Do you have the people necessary to succeed with a project like this? Someone from the Commercial Alternatives section said we needed a team of like 50 people for this to be feasible, is that accurate?

A single researcher-developer went soup to nuts with alpha demo in about 4 weeks so no it is not accurate that a large team (10+) is necessary for a project like this. The tasks of the project are organized as a core trunk and auxiliary branches. The trunk of the project is narrow enough to be conducted by 1–3 talented researcher-developers while still providing great benefit. Everything else is non-essential but potentially very beneficial as well — intended to be completed by our network of contributors according to their level of interest. See also.

Platform Novelty. What’s different about your vision for a combined mobile randomized clinical trial (RCT) and experience development platform? There already are commercial mobile RCT platforms.

The front-end application we have designed (see ai4hi.org/interface-design) isn’t that hard to implement (most of it is already built) and will provide the means to rapidly pilot ML-first state-aware training experiences. We are designing this platform to support engagement and habit formation which is an essential problem in studies with digital therapeutics — including with a Chrome Extension that will provide opt-out browser persistence for drivers of engagement (but with repeated opt-in for anything private like webcam access).

Expense. Isn’t this kind of research really expensive? Can we afford this? What is the trade-off between doing this on the cloud vs. on premises.

This sort of research is actually not as expensive as you might imagine. Our cloud budget sits comfortably under $60k with both research and production expenses included.

Usability. Sometimes academics write software packages that people can’t use because they’re not well designed or flat out don’t work. Or sometimes they work under narrow circumstances and those circumstances aren’t communicated. Or sometimes they work fine but they’re not properly documented and too complicated to figure out without that. Will Project Clarify be usable by people other than its core developers? Will the project be able to continue if one of its core developers leaves to do something else?

There are three important points to make: (1) our non-core developers and internal users are a very effective test group, most notably new developers and new internal users, in regard to usability, (2) we know what we’re doing, and (3) there is a clear mechanism established for those who have usability issues to indicate this via GitHub issues. In regard to the second point, we develop with industry standard best practices for testing, containerization, and scalable deployment (Kubernetes). Usability is important to us so if you have any specific usability issues please file a GitHub issue so we can be aware of your perspective.

Origin Story. Where did this project come from?

Christopher founded the project while taking time off from his career to work on open source and topics of interest (mindfulness/meditation, EQ, human effectiveness, computer vision, neuroscience, and generally self-supervised learning). Adam then hired him as Director of ML Research at Neuroscape to bring the project to bear on Neuroscape’s goals and gave him the time and independence to succeed. One of the FAANG companies has provided us a really valuable grant of computational resources and it would have been pretty hard to succeed without that.

Open Source. Most academic research projects are not conducted with this level of openness. People generally like to get their papers and grants before sharing. Why do we need an open community effort around this topic?

It seems unethical not to given the potential value of these methods to improving peoples’ lives as well as given (1) the scope of work that could be performed and (2) the benefit peers can derive by sharing core infrastructure or capabilities and still benefit just as much individually (read “papers and grants”) by simply all working at a higher level of abstraction and capability.

--

--

Christopher W. Beitel, Ph.D.
Project Clarify

Director of ML Research @ UCSF Neuroscape; Founder, Project Clarify; DL, CV, Neurosci+tech, Metacognition, Meditation, Math, Music, Running, Open Source, Infra