Vision for All Seasons

#CVPR2022 “Unofficial” Workshop Minutes

Nicholas Teague

Published in

From the Diaries of John Henry

20 min readJun 23, 2022

Opening Remarks

The following represent an unofficial meeting minutes of sorts, recording the discussions presented at the IEEE Computer Vision and Pattern Recognition — Vision for All Season Workshop. I recorded these notes as an attendee and given how good the workshop turned out wanted to share with the community as a form of contribution to online attendees. My primary interest in attendance was that the workshop represented a deep dive in applications surrounding covariate shift in the context of a high value application (that of self-driving cars). If any of the speakers desire that I withdraw their associated content please feel free to contact me as you see fit. I did speak with the workshop organizer and he ok’d the project with inclusion of a link to the workshop homepage as a form of citation, which is provided shortly and again at the conclusion. Please note that a majority of the talking points presented herein are based on contributions of the noted speakers, or otherwise what they may have built on prior work — based on the nature of this aggregation I can’t consider this writeup a formal academic contribution as underlying prior work citations will require referral to corresponding papers by the noted speakers. In a few cases the included talking points were recorded in a near verbatim manner, in other cases vastly abbreviated. All errors in grammar and content are by this author, all material contributions by the original workshop presenters or prior work. As is a custom that I have adopted in this blog, I am also including here a “soundtrack” via various youtube links for your listening pleasure, inspired by a trip to the Louisiana Music Factory — worth a visit next time you’re in town. The preferred reading experience is in the Medium app on an iPad.

Yeah so fine print complete, presented here are the unofficial meeting minutes. Enjoy.

4th Vision for All Seasons Workshop

Presented at the Computer Vision and Pattern Recognition conference held in New Orleans, LA on June 19, 2022

Workshop Home Page

Workshop Organizers: Dangxin Dai, Christos Sakaridis, Martin Hahner, Robby T. Tan, Wim Abbeloos, Daniel Olmeda Reino, Jiri Matas, Bernt Scheile, Luc Van Gool

Agenda:

Opening Remarks
Invited Talks: Vishal Patel, Mario Fritz, Patrick Pérez
ACDC Challenge 2022 — Winner Presentations: Matej Grcié, Ting Sun
Workshop Paper Presentations
Invited Paper Presentations
Invited Talks: Peter Kontschieder, Werner Ritter, Raul de Charette,
Invited Talks: Kate Saenko, Sen Wang
Closing remarks

Bloodstains and Teardrops — Big Chief Monk Boudreaux

Invited Talks: Vishal Patel, Mario Fritz, Patrick Pérez

Invited Talk 1

Domain adaptive object detection in adverse conditions

Based on presentation by Vishal Patel

In modern computer vision applications, mainstream object detection has evolved through algorithms like [RCNN -> Fast RCNN -> Faster RCNN -> YOLO -> Detectron], however each of these approaches have had a fundamental limitation associated with the properties of training set used to derive a model. As a few examples:

Training data (e.g. images of driving) may have been recorded in one city, and there is a desire to operate a vehicle utilizing that model in another city.
Images may have been recorded in good weather, and there is a desire to operate in bad weather (rain, fog, snow, nighttime, etc).
Images may have been derived from “synthetic data” (like video graphics engines), and there is a desire to deploy in the real world.

Each of these examples represent a form of “domain shift” between the training data distribution and the environment surrounding deployment. The goal of the speaker’s work is to use annotated samples from a target domain and adapt to a shifted domain. He notes several types of approaches available:

Unsupervised adaption (labeled source data, unlabeled target data)
Source free adaption (no access to source data, adapt a trained model to target domain)
Fully test time adaption (no access to source data, adapt trained model to any target distribution shifts in real time during deployment)

Unsupervised adaption

In this case during deployment we have access to labeled source data. Examples include:

Gradient reversal layer: top branch is traditional, a feature extraction layer branches to a fully connected layer for domain classification to identify presence of source or target domain properties, e.g. DA-Faster (CVPR 2018), SWDA (CVPR 2019)
Methods used a domain prior model, e.g. a clean image times a translation map to a target domain, where mapping can either be multiplicative (haze map) or additive (rain map)
Methods using degradations following the physics of image formation
Speaker asked what if we replace translation by an estimation network, during “prior adversarial training for detection”, and proceeded to demonstrate this is competitive quantitatively

Source free domain adaption

Here we assume during deployment we have a trained model without access to the labeled samples it was trained on, and we are presented with unlabeled samples from the target domain.

Usually approaches are based on knowledge distillation in a student teacher framing
RPN use multiple contrastive views of an object instance, but constructive learning needs to identify e.g. is object a bus or a truck. The speaker proposes a graph network for instance relation integrated into a student teacher model framing, and demonstrated some qualitative results.

Fully test time adaption

See the speaker’s paper Patel, WACV 2022

Invited Talk 2

Understanding and Fixing Failure Cases by Adversarial Manipulations

Based on presentation by Mario Fritz

Robustness and generalization are desirable. Existing methods already do well on i.i.d. samples, but real world distributions will often not have been covered by a training set. Often one will find rare cases in the tails or out of the training distribution in the real world. Or expected domain shifts may occur like from severe weather conditions. This issue can either be addressed during training (by increasing coverage of training data) or some methods need to be identified to support at test time.

The speaker discussed an approach involving overcoming the issue of fixed training data sets by a form of targeted synthesis at test time.

Robustness is often studied from the standpoint of “adversarial robustness” — which represents desiring protection against “worst case” test time samples. Such robustness may be more challenging for complex models and complex conditions. One approach to address is known as adversarial training, in which added training data is produced by generating samples in a manner similar to what an adversary would generate. (Speaker noted a few distinctions between empirical risk minimization, adversarial training, adversarial image manipulations, and adversarial weather training.)

This presentation considered a line of work by speaker and students considering adversarial editing, training, and testing.

1) Adversarial image manipulation by object removal:

How to automate the process? Speaker had a NeurIPS 18 paper using in-painting after masking generated by classification. Demonstrated how this increased robustness.
Data augmentation also alleviates to an extent.

2) Automating adversarial testing and training:

We desire to automate identification of edge cases, find failure modes of a model
e.g. for detecting cows, stay on a cow manifold and increase loss with respect to cow
Can even synthesize hard examples, or use occlusion

3) Synthesizing adversarial examples from weather behavior

Can be accomplished using a simulator
Configuration can use a scene generator with Carla simulator, and weather parameters varied against a loss function

(Carla is a mainstream automated driving training / benchmarking simulator using high res graphics engine and physics engine)

Demonstrated failure cases in low light conditions, e.g. water on ground led to image segmentation errors, misclassification
Demonstrated an example performance drop in conjunction between a new town not seen in training, with adverse weather conditions. Both domain shifts compound impact to performance. => improved with adversarial training
Sometimes even trivial adversarial conditions can impact performance when found in conjunction (e.g. night + rain)

What data do we need?

Rare weather conditions
Data current model can’t handle
Sufficient target info
Tractable to both a generalist or domain specialist

Speaker offered an outlook slide in closing with diverse considerations for future work. (I didn’t follow along with everything.)

Invited Talk 3

Data, Sensors, and Adaption for adverse conditions

Based on apresentation by Patrick Perez (Valeo)

Some challenges for achieving levels 3–5 self driving:

Driving types
Geographic regions
Weather and lighting conditions

What would help?

More raw data
More annotation
More sensors
More transfer

What is available to us for real world annotated data? There are a decade of driving data sets, from KITTI, to ONCE, and ACDC. Only some of these data sets have well represented night driving conditions (e.g. ACDC has 25% night driving coverage). Some others that have good night coverage include BDD100k, Mapillary Vistas, A*3D, ONCE, and CADCD.

Sensors that assist in bad weather:

Speaker demonstrated the Valeo external sensor suite, including ultrasonic, camera, radar, and LiDAR.
LiDAR has the best coverage across all conditions.
Cameras struggle in distance accuracy, particularly in low light operation.

Radar:

Radar outputs a doppler range and angle
One can use radar to annotate video or visa versa (speakers discussed the latter)
For multiple views, radar can help with semantic segregation.

Annotation Efficient Learning refers to when given lots of unlabeled training data and a constrained annotation budget. Speaker listed methods to compensate by transferring knowledge across modalities, domains, tasks, and etc.

When faced with a train / test domain gap, there is a need to translate between a source and target domain, e.g. synthetic vs. real, US vs. Europe or Asia, etc.

The speaker noted methods for adversarial adaption:

by self training (Vu & Cord, CVPR 2019)
Encouraging low entropy, e.g. AdvEnt
Against a multi target baseline, e.g. MTKT
Cross-modal unsupervised domain adaption, e.g. xMUDA at night
Allowing pseudo segmentation for learning without annotation, aka “drive and segment”, e.g. using LiDAR-camera SSL

Takeaways:

Night time is one of key challenges for vision based learning. Multiple sensors are useful, as are domain adaption, simulation and day to night transfer. Driver monitoring needs attention, especially at night, one could perhaps share data collection and onboard compute for thus purpose.

X. Adjuah (I Own the Night) — Christian Scott aTunde Adjuah

ACDC Challenge

ACDC refers to a benchmark dataset designed to oversample adverse weather conditions. It includes extensive coverage of fog, nighttime, rain, and snow conditions, and was collected near Zurich, Switzerland. It includes 4006 images of adverse conditions in total.

This workshop’s ACDC challenge had four tracks:

Track 1 unsupervised normal to adverse domain adaption
Track 2 supervised semantic segmentation in adverse conditions
Track 3 separately considered each adverse condition
Track 4 uncertainty aware semantic segmentation

The challenge received public submittals from 176 teams. All four tracks included algorithms that performed substantially better than prior state of the art (in ranges of 5–22% improvements against benchmarks, including 10% at night). The winning team for tracks 2–3 hailed from the University of Zagreb and for tracks 1 and 4 from Tencent’s Youtu Lab. The benchmark remains open for submissions.

The following are notes for the presentations from the two winning teams:

Winning team presentation, tracks 2–3

Large scale semantic segmentation through multi-resolution processing and selective pseudo labeling

Based on a presentation by Marek Grcic (University of Zagreb)

A SwiftNet for the 2020’s:

The team applied a convolutional (Convenext) backbone pre-trained on ImageNet utilizing multiple (6) resolutions in a pyramidal fusion, with a lean ladder style upsampling to recover semantics.

To support training, they identified several data sets with similar to ACDC taxonomy, in the process identifying an additional pool of 25k driving images, some with adverse condition representations. They leveraged this annotated data and further improved with a type of pseudo labeling. They noted that performance was improved by a confidence interval calibration of image segmentations.

They noted that convolutional networks perform better at high resolution images, as it is easier to refine boundaries and recognize semantics. They noted that unrecognizable regions for humans are different than those for robots, as the human eye can only see 90 shades of grey, common cameras can record 256.

Winning team presentation, tracks 1 and 4

Exploring High Quality Pseudo Label in Normal to Adverse Domain Adaption

Based on presentation by Ting Sun (Tencent YouTu Lab)

Track 1 challenges included:

hard visual target domains
Large domain gap between UDA and Oracle
Large margins between normal and target domains

The team used DAFormer as a baseline with a transformer segmentation method, which outperformed an oracle supervised result. They did ablation studies on the training target domains separately and together. They applied a Swim-L transformer backbone. Their training included multi-scale pseudo labeling. Multiple image scales were passed through a teacher network. They had a regularization of semantic structure by adversarial learning, or handcrafting structure representations. The nighttime performance used a pre-trained model CycleGAN with an improved network backbone.

The team’s track 3 model used Swin-L with Upernet, and Mask2Former, and performed data boosting with cross pseudo labeling for reference images.

Without Notice — Extended

Accepted Papers:

(The blogger omitted a few papers in which he considered his notes insuffcient.)

DooDLeNet: double deeplab enhanced feature fusion for thermal-color semantic segmentation

Based on presentation by Oriel Frigo (Another Brain in Paris)

Issues of modality alignment make segmentation challenging. One modality is often more informative than the other, which direction may differ between e.g. day and night. Differences in sensors, field of view, and etc. mean alignment between modalities have offsets.

This paper used confidence weights for modalities, with a correlation matrix. Confidence weighting and correlation weighting improves performance in both day and night.

Conclusion: two feature fusion strategies, confidence and correlation weighting, improved sota on thermal / camera alignment for semantic segmentation.

Efficient domain incremental learning approach to drive in all weather conditions

Based on presentation by Mohammed Jehanzeb Mirza (TuGraz)

Current methods perform good in clear weather. To deploy in autonomous vehicles we need to perform in adverse weather, e.g. the dark and rain. Networks need to adapt in dynamic fashion to domain scenarios without forgetting what was learned in training.

This paper offered DISC (“domain incremental through statistical correction”).

The method included, during training, freezing the pre-trained model except for statistics of batch normalization. For purposes of domain adaption, deployment would only adapt statistics of batch norm to distinct weather conditions, thus only needing to store the batch norm adaption statistics in memory. During inference one can load the frozen weights, and apply plug and play statistics based on the weather condition (e.g. clear, rain, fog, snow).

The speaker found that the method, by only updating batch-norm parameters, resulted in domain adaption without model forgetting.

Further, speak noted for offline learning the batchnorm statistics could be trained until convergence, and then for online adaption they could be trained only for a single epoch, which approach appeared to perform favorably to other online learning methods.

Conclusion: Weather conditions could be categorized as a distribution shift to batch-norm parameters. The statistical correction achieved strong gains. DISC was efficient with low requirement for parameter storage. It could be adapted to online learning without forgetting.

Physics Based Image Deshadowing Using Local Linear Model

Based on presentation by Tamir Einy (Applied Materials, joint with Tel Aviv University)

In the context of computer vision applications, mainstream practice may have reduced performance in context of shadows. Shadows may be hard to interpret when they:

Obstruct non-uniform objects
Are casted on a non-uniform surface
Result in saturated colors

This paper focused on shadow removal. The term Umbra refers to a fully shadowed area. Fenumbra refers to a partially shadowed area. Previous work has sought to estimate a shadow gain, and includes two stages for shadow detection and removal.

This work’s framework involves:

Predicting a mask
Outputting two maps
Applying to a linear equation
Getting a shadow free predicted image

The method used a L2 loss function between ground truth and a predicted shadow free setting. The shadow removal network was applied with a small number of parameters, on order of 1,000 times fewer parameters than other works. It demonstrated qualitative results with fewer artifacts. They tested on two data sets and also demonstrated quantitative results.

Riverfront — Leo Nocentelli

Invited Papers

Invited Paper 1

Continual Test-Time Domain Adaption

Based on presentation by Ching Wang (Computer vision lab of ETH Zurich)

For continual test-time domain adaption, prior work uses pseudo labels and entropy measures, and assume a static target domain. Real world distributions are non-stationary. This may result in error accumulation, and catastrophic forgetting. How can we alleviate error accumulation outside of stable environments?

This paper suggests weight-averaged pseudo-labels, in a framing the blogger would loosely characterize as [pre-trained source model -> student model -> moving average to teacher model]. Partly inspired by dropout, the method randomly restores a small part of neurons to the source weight.

The speaker demonstrated benchmarks and found that the method alleviated forgetting and error accumulation.

Invited Paper 2

Exploiting Temporal Relations on Radar Perception for Autonomous Driving

Based on presentation by Pu (Perry) Wang (Mitsubishi Electric Vehicles Laboratories)

Why use radar in autonomous driving? It is robust to adverse weather and cost effective — both to procure and to maintain. One drawback is low resolution. Another drawback is reduced consistency in consecutive frames (e.g. height, width, orientation), which is challenging considering that mainstream ML often treats sensor consistency as a form of inductive bias.

This work offered a method to enhance features to compensate for low quality of radar. Speaker demonstrated an architecture including cross attention between successive input frames. The presentation proceeded to demonstrate quantitative and qualitative performance.

Invited Paper 3

Both Style and Fog Matter: Cumulative Domain Adaption for Semantic Foggy Scene Understanding

Based on presentation by Xianzheng Ma (Wuhan University)

With regards to transferring knowledge between clear and foggy domains, state of the art mainly seeks to align domains by adversarial training with synthetic fog. However this type of defogging may introduce artifacts, as learning with synthetic fog does not perfectly match the real world domain.

This work sought to investigate the domain gap, and assumed the fog was caused by both a “style gap” and a “fog gap”. By adding an intermediate domain gap, they sought to disentangle the two domains of style and fog. They proposed the disentanglement network FDN, and demonstrated performance comparison, achieving state of the art in foggy benchmarks.

Invited Paper 4

Ithaca 365: Dataset and Driving Perception under Repeated and Challenging Weather Conditions

Based on presentation by Carlos Andres Diaz-Ruiz (Cornell University)

This work offers a new data set:

40 teracells collected over a 15 km route under diverse conditions
4 cams, LiDAR, etc
7000 annotated frames
680k unlabeled frames

A unique aspect was that the annotation of object segmentation aimed to segment both visible and occluded parts of objects, as a multi-modal road segmentation. They captured branches and occlusions on the road with a mask CNN. expect this work with enable new research directions for coarsely grained image pairs.

Invited Paper 5

Gated2Gated: Self-Supervised Depth Estimation from Gated Images

Based on presentation by Amanpreet Walia (Mercedes)

LiDAR is sparse, and fails in snow and fog due to backscattering. CMOS arrays are dense images, but miss on depth accuracy. Hybrid approaches may be like CMOS array with coarse photon gating, using LiDAR in clear conditions and etc.

The speaker proposed gated to gated self-supervision as a solution, and discussed a training procedure. Demonstrated qualitative results (e.g. at night).

Invited Paper 6

FIFO: Learning Fog Invariant Features for Foggy Scene Segmentation

Based on presentation by Sohyun Lee

Existing models have substantial performance degradation in foggy conditions. Their datasets can be segmented into domains like clear weather, synthetic fog, and real fog.

For virtual fog, existing methods use synthetic gradients ranging from light to dense fog. However such methods may result in performance degradation in clear weather. Also, driving through fog in the real world may be affected by vectors other than just fog density.

This work offers FIFO, which uses a fog-pass filtering module with a segmentation network trained in two steps. The method learns fog invariant features, as speaker demonstrated by average hausdorff distance. The method was demonstrated to largely outperform prior art.

Be Like Water — PJ Morton

Invited Talks: Peter Kontschieder, Werner Ritter, Raul de Charette

Invited Talk 4

Ingredients for Mapping the Metaverse

Based on presentation by Peter Kontschieder (Meta)

This talk was on the subject of developing 3D semantic maps from 2D images for use in the “metaverse”, using next generation computer vision algorithms.

(The blogger’s notes were somewhat sparse in this talk, these items broadly highlight a few of the speaking points. There were several model acronyms noted by the speaker that were not recorded.)

For aggregating multi modality human annotated data sets, recent work demonstrated reconstructing results from NeRF (Midenhall, ECCV ‘20).

Panoptic segmentation seeks to distinguish between “things” and “stuff”. The speaker demonstrated an architecture and resulting qualitative improvements. He noted a transformer backbone based on (Liu et al ICCV 2021). AutoRF seeks to learn 3D object radiance field from a single view.

Limitations of prior work:

Requires multiple views
Controlled setting
Strong priors
Accurate annotations

Speaker noted AutoRF training methods and algorithms.

Invited Talk 5

European Research Project AI-SEE: AI enhancing vehicle vision on low visibility conditions — overview and first results

Based on presentation by Dr Werner Ritter (Mercedes-Benz AG)

How can we ensure that automated vehicles drive reliably even in adverse weather conditions? The goal of AI-SEE is a novel sensing and AI stack. Next generations of self driving vehicles will be level 3+, should the system carry any liability? When will we allow blind people to drive? Level 3 currently assumes minimum 10 seconds for handover from system to user, so apparently not for level 3.

Some California permitting for testing driverless fleets excludes rights for deployment in rain or fog. In some areas of Europe there is rain or snowfall for >100 days a year.

Prior work in a sensor package of camera, LiDAR, and radar identified gaps like:

Adaption to weather
Sensor costs
Other types of sensors need improvement for consideration

“Gated cameras” may deliver better results in adverse weather vs traditional cameras. Gated has various depth slices illuminated and recorded. An aggregated image results from adding up the depth slices. Interference information out of a depth range (e.g. reflections) are not recorded so does not overlay and obstruct image information of interest. Gated cameras may someday output a high resolution depth map with relative accuracy of +/-5%.

AI-SEE seeks to adapt gating learners to current environmental conditions. This will need a combination of depth cues like stereo and temporal. Following this they will seek extensions to other modalities.

(Q&A indicated that the gated mechanism is hardware based, and this blogger speculates part of the enabling mechanism may be associated with speed of sensor acquisition.)

Invited Talk 6

Physics aware learning for adverse conditions

Based on presentation by Raoul Charette (French National Institute of Research)

Why does physics matter? The world is long tailed. Mainstream models have poor robustness to interpolation and extrapolation. A new field is emerging for theory guided data science. Physics can be incorporated in either of preprocessing or postprocessing.

Take physics based rendering for rain: one can render using a physics engine. Fog is the simplest of weather conditions, while rain is a dynamic condition, you can’t approximate it with a binary effect, you need a particle simulator. And it needs to account for motion of car. Trying to do this with a GAN lacks physical interpretability.

The speaker has developed a physics based rendering engine for rain. They have used this to augment data sets, and currently have library of 400k images.

In the speaker’s prior work Physics Informed Guided Disentanglement, they tried to disentangle physical traits and other traits (blogger is reminded of the fog presentation above). Speaker demonstrated an adversarial model architecture. One can train a GAN on conditions ranging form clear to rain, and use a discriminator to source images by adjusting a differentiable parameter. (A lot of architectural considerations were further discussed, some that the blogger did not follow along with).

The current work has extended the progression from just clear->rain to also include progressions associated with day->night. Speaker used FID score to compare distance of generated images to real and found suitable.

A Thousand Miles — Erika Lewis

Invited Talks: Kate Saenko, Sen Wang

Invited Talk 7

Learning Generalizable Adaptable Visual Representations

Based on presentation by Kate Saenko (Boston University / MIT-IBM Watson AI)

Dataset bias is another way to think about what is taking place with domain shift, arising from cases of distinctions between what data is used to train verses what is used to evaluate, for example from differences in geographic locations or traffic conditions. Such dataset bias reduces accuracy — the speaker noted a case on order of 14% drop even with similar data collections.

Deep networks learn representations, with features biased to the training data. One solution is to label more data, this is expensive though. Another is to adapt between domains. Or one can try to achieve domain generalization, and adapt accros domains.

Part of the challenge is that i.i.d. testing might have a tendency to overstate generalization.

For domain adaption with distribution alignment, what role does pre-training play? Unsupervised pre-training has become popular, how can we do that with multiple domains in the training data, and try to adapt between domains as a part of pre-training? In modern large scale pre-training, consider that last year there was a domain adaption challenge where the winner won by a large margin using just a transformer trained on the data instead of applying any form of domain adaption.

Does pre-training with a particular backbone matter? (Common practice uses a ResNet backbone.)

The speaker surveyed different pretraining considerations. High level takeaways:

Bigger models are better
Including text in pretraining (like image caption pairs) did not always help.

Do we still need domain adaptation or just use pretrained backbones? Yes, domain adaption still improves results, particularly older domain adaption methods (newer methods are possibly overfit to the ResNet backbone).

Can pretraining data be customized? Different types of synthetic stuff in pretraining work better for different downstream tasks (e.g. for the difference between satellite images, street view applications, etc.). The speaker introduced the Task2Sim architecture. Compared 5NN, fine tuning, linear probing.

In summary: pretraining is important.

Invited Talk 8

All weather robot perception and autonomy: a radar approach

Based on presentation by Dr Sen Wang (Heriot-Watt University)

The speaker’s work is interested in all kinds of moving robot applications, not just cars. Adverse weather is relevant to all of them.

LiDAR and camera operate in close range of visible light spectrum. Radar is much lower frequency. There are more waveforms that it can pass through (like snowfall and fog). The talk demonstrated longer range of radar verses cameras and LiDAR. Cameras have low resolution at a long distance. This is relevant to all weather conditions.

Radar models at 360 degrees and as a continuous waveform, using a spinning sensor with polar representation. A returned range y axis vs azimuth x axis in degrees can be translated to Cartesian representation. One can expect speckle noise and multi path reflections.

The speaker noted 3 main questions for radar applications:

1) Where am I?

Speaker has prior work “RadarSLAM” to convert to Cartesian, update location per motion estimation, includes pose estimation. Noted a real world example where camera covered by snow, half of radar obstructed, and still achieved good slam results. One way to benchmark slam is to travel in a loop and measure drift when return, apparently they did ok.
Degeneration of mainstream sensor modalities: LiDAR and camera have reduced detection in snow, fog. Motion blur.

2) What is around me?

Noted a dataset to evaluate as Herriot-watt RADIATE data set, which is a multi modality radar dataset in adverse weather with object annotation. Over 500 users to date.

3) What should I do next?

This question involves trajectory prediction.
Future motion depends on surrounding road users (and their types).
Demonstrated an encoder decoder architecture based on a sequence of predicted velocities and poses (“trajectory”).
For challenging lighting conditions, can build a map in daytime and place recognition at night.

Hammer on the Stone — Eric Johanson

In closing, please note this represents the fourth workshop held on this subject of computer vision in adverse weather conditions. For additional material we suggest considering review of prior held workshops, and provide here again a link to the workshop homepage both as a resource and a form of citation.

Vision for All Seasons — Workshop Home Page

References

Dai, D., Sakaridis, C., Hahner, M., Tan, R. T., Abbeloos, W., Reino, D. O., Matas, J., Scheile, B., and Gool, L. V. Vision for all seasons: Adverse weather and lighting conditions, 2022. URL https://vision4allseason.net/.

Diaz-Ruiz, C. A., Xia, Y., You, Y., Nino, J., Chen, J., Monica, J., Chen, X., Luo, K., Wang, Y., Emond, M., Chao, W.-L., Hariharan, B., Weinberger, K. Q., and Campbell, M. Ithaca365: Dataset and driving perception under repeated and challenging weather conditions, 2022. URL https://openaccess.thecvf.com/content/ CVPR2022/papers/Diaz-Ruiz_Ithaca365_Dataset_and_Driving_Perception_Under_Repeated_and_Challenging_Weather_ CVPR_2022_paper.pdf.

Frigo, O., Martin-Gaffé, L., and Wacongne, C. Doodlenet: Double deeplab enhanced feature fusion for thermal-color semantic segmentation, 2022. URL https://arxiv.org/abs/2204.10266.

Le, H. and Samaras, D. Physics-based shadow image de- composition for shadow removal. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–1, 2021. doi: 10.1109/tpami.2021.3124934. URL https://doi.org/10.1109%2Ftpami.2021.3124934.

Lee, S., Son, T., and Kwak, S. Fifo: Learning fog-invariant features for foggy scene segmentation, 2022. URL https://arxiv.org/abs/2204.01587.

Li, P., Wang, P., Berntorp, K., and Liu, H. Exploiting temporal relations on radar perception for autonomous driving, 2022. URL https://arxiv.org/abs/2204.01184.

Ma, X., Wang, Z., Zhan, Y., Zheng, Y., Wang, Z., Dai, D., and Lin, C.-W. Both style and fog matter: Cumulative domain adaptation for semantic foggy scene understanding, 2021. URL https://arxiv.org/abs/2112.00484.

Mirza, M. J., Masana, M., Possegger, H., and Bischof, H. An efficient domain-incremental learning approach to drive in all weather conditions, 2022. URL https://arxiv.org/abs/2204.08817.

Walia, A., Walz, S., Bijelic, M., Mannan, F., Julca-Aguilar, F., Langer, M., Ritter, W., and Heide, F. Gated2gated: Self-supervised depth estimation from gated images, 2021. URL https://arxiv.org/abs/2112.02416.

Wang, Q., Fink, O., Van Gool, L., and Dai, D. Continual test-time domain adaptation, 2022. URL https://arxiv.org/abs/2203.13591.

For further readings please check out the Table of Contents, Book Recommendations, and Music Recommendations. For more on Automunge: automunge.com