Identity Management & Governance with AI/ML: signatures, facial recognition, authorship

Alexandra Petrus
Jun 29, 2018 · 12 min read

Notes from the Bucharest AI Meetup #9

AI is showing up more and more in optimising workflows used to automate provisioning of identity. As we find more and more opportune to dive into learning:

  • how we can provide automated and repeatable ways to govern the identity life-cycle;
  • how to be compliant with identity and privacy regulations;
  • discover Identity as a Service (IDaaS) solutions;

we focused a full meetup edition on touch basing with some of the solutions-use cases out there.

You asked, we answer. Our attempt in trying to define main concepts correctly

Definition of AI: Intelligence demonstrated by machines, be they via means of using Machine Learning, Natural Language Processing or various other AI researches/sub-fields. AI was founded as a discipline in 1956 and divided into subfields that often fail to communicate with each other. These sub-fields are based on tech considerations, ie: robotics, machine learning, natural language processing, computer vision, knowledge reasoning etc [Read more on history, major goals, approaches, philosophy, tech and glossary on Wikipedia].

Definition of ML: Subset of AI giving computers the ability to learn with data. A great resource for understanding why the next 2 decades will be more important for AI’s progress than the last 200 years, in layman’s terms.

We definitely had a very busy Tuesday, by co-organising the Recent Advances in Artificial Intelligence — Industry Track with University of Bucharest, for RAAI 2018, bridging Industry and Academia in AI, and our last meetup edition, before the Summer break.

Here’s a live stream of the RAAAI 2018 Industry Track Panel where AI professionals shared their thoughts. Check out our Facebook Bucharest AI page for more video streams.

The agenda for the last meetup edition, before the Summer break, looked like this:

🙌🏻 The Networking space was proudly sponsored by Sparktech, whom we greatly thank for the support and contribution.

👌🏻 The venue was offered by Commons, great hosts as usual, thank you kindly. A special discount of 25% is available/awaits to be grabbed by any Bucharest AI-er wanting to use their co-working space, meeting rooms and more. Use discount code “BUCHAREST.AI25” at checkout, if interested.

Looking to tune in for the meetup’s video streamed? Check it out.

Keynote Signature Detection and Matching

Bucharest AI asked, Andrei answered:

a. Doing Machine Learning is no easy task. Do you have anyone that motivated/influenced your work through the years?

I can’t choose one person in particular, but a few people do come to mind. One of them is Emil Slusanschi, who first sparked my interest in Machine Learning, while I was taking part in a summer school he coordinated during my 3rd year of study at the “Politechnica” University of Bucharest. Thomas Gaertner and Niels Goerke, two of the professors from my Master’s Program at the University of Bonn, Germany, made me understand the importance of having a thorough mathematical foundation for the concepts which we use. Of course, I also follow the greatest minds of the field, my personal favorites being Andrew Ng, Ian Goodfellow and Christian Manning.

b. Do you have any recommendations for people just starting out in machine learning?

I think the most important thing is to keep a balance between intuitive understanding of ML methods and having a strong mathematical foundation for them. Focusing only on one makes it hard to reach the desired results. Also, I would recommend not disregarding the so-called “traditional” ML techniques. Deep neural networks are great, but Machine Learning is (still) much more than Deep Learning.

c. Is there a personal project you are really proud of? Can you tell us a bit about it?

We had a project for a client from the financial audit business which needed a method of extracting information from thousands of very different-looking, mostly scanned documents. It was a considerable effort and it involved many clever techniques, both ML and non-ML, but the end result was a huge success. If you want a “personal personal” project: my rock band Secret Society. :)

👆Click for slides deck from Andrei.

Keynote Deep Vision Interoperability Standard Specification

VisageCloud is an end-to-end face recognition and classification solution. It can work on photos, selfies, ID card and video streams. It is fully API enabled, so you can easily integrate it in other apps or business systems.

A few use cases are represented of: helps you check-in guests at your event or hotel, be aware when a VIP customer walks in and be notified when a person of interest shows up on one of your camera screens; makes it easy to prove that you are you — just by showing up and being yourself; works as a cloud service or on-premise.

Bogdan talked us about DeepVISS — Deep Vision Interoperability Specification Standard, their open source initiative for streamlining integration of computer vision solutions.

Bucharest AI asked, Bogdan answered:
a. Working independently on a computer vision standard is pretty brave. In what ways do you see this thing failing?

Standards are driven by community and adoption, so it’s more about merging concepts and feedback from several players in the the market, both demand partners (customers, integrators, resellers) and demand partners (researchers, technology providers). While we had the initiative behind DeepVISS and we have spearheaded some of the solution design effort, at the moment we have 7 more partners contributing on this standard, so I wouldn’t exactly say we’re developing it independently.

Even so, there is a lot of room for failure, mostly because DeepVISS may not have enough adoption in the computer vision community. Another risk associated with standards is that individual players, when faced with design decision or change, may choose to branch out their own version of the format, instead of contributing to the standard. All these risks are to be expected. Our mitigation is to focus on communication with partners and to demonstrate both the short term and the long term advantages of a standardised approach. Personally, I think that actually making the computer vision realise the need for standardisation is already winning the battle. Ultimately, our objective is to have people converge on a computer vision integration best practice, even if it’s not DeepVISS. Only time will tell.

B. I guess DeepVISS is the response of internal need of VisageCloud. Can you describe what challenges have you faced with building/scaling VisageCloud?

I would rather say that DeepVISS is a need that the computer vision community will become aware of in 2019–2020. Every field and every industry which are on the cusp of innovation and disruption go through this process: at the peak of market expectation, everyone is scrambling to deliver something, anything as fast as possible. However, as the hype fades and the industry transitions towards the plateau of productivity, the cold reality sets in: solutions need to operate in the exigences of the enterprise, components need to be rigorously tested and reused as much as possible, security needs to be enforced, scalability becomes a must-have, operational readiness is a requirement. And that is when standards become needed, to give predictability and efficiency to the market and to allow teams to focus on core business (computer business algorithms) rather than non-core assets (such as integrating APIs).

The rise of the Internet in the 80s and 90s drove a need for standards in packet switching (TCP/IP) and transfer protocols (HTTP, FTP, SMTP). The 1999 dot-com hype was followed by years of developing web standards (HTTP, JavaScript), security standards (SSL, TLS), payment processing standards (PCI DSS). The mobile hype from the late 2000s eventually drove the need for platforms, framework and development standards. One particular topic I am familiar with is the rise of advertising standards maintained by the IAB (VAST, OpenRTB, DAAST), as I was responsible for writing the first draft for DAAST (Digital Audio Advertising Standard Template) back in 2014–2015.

We predict that the same need will manifest itself in the field of computer vision. DeepVISS is just the preparation for addressing that market-wide need. Therefore, I would hardly characterise DeepVISS as answering a need specific to VisageCloud. Rather, it is just a matter of our anticipating this market need ahead of time and taking preemptive action in addressing it.

C. Who is DeepVISS addressing to? In what way can people benefit from it?

At DeepVISS we have two types of partners:

  • Demand partners: solution integrators and resellers, who aim to deliver features and value to customers fast, risk-free and without too many technology obstacles. For them, DeepVISS is a easier, more straightforward way of assembling puzzles of components into market-ready solutions.
  • Supply partners: computer vision developers, technology providers and research, who aim to bring their innovation to production without having to deal with the hassles of scaling databases, designing APIs, building UIs, thinking about security, privacy provisions and GDPR compliance. For them, DeepVISS is a way of focusing on the innovation within their blackbox, while letting someone else deal with the non-core technology aspects.

We are still discovering new avenues on which DeepVISS can facilitate the connection between commercial customers with real-world problems to solve and researchers&developers who would rather focus on improving the state-of-the-art in computer vision than on designing databases. DeepVISS is the bridge between those two worlds.

👆Click for slides deck from Bogdan.

Keynote Authorship verification/profiling/style bridge detection

Bucharest AI asked, Stefan answered:

a. Can you expand on the differences between working in academia vs private sector? Pros / Cons for each one?

I’d like to narrow this answer a bit, and talk only about research in academia vs research in the private sector. Having experienced both, I must say that both have their appeal. For example, working in academia you tend to focus more on the abstract subjects, like to obtain state-of-the-art performance in a very specific and niche sub-problem, while ignoring everything else. In the private sector, things actually need to work in real-life, dynamically changing environments. This means that you need to focus not only on the algorithm, but divert some of your time to deploying it somewhere, handling edge-cases, dealing with noisy data and all sorts of bugs along the way. And with people too. Though from a certain point of view this might seem boring, it offers a certain kind of satisfaction — you create something that people actually use. It’s sort of like working only on a car engine with the sole purpose to obtain maximum horse-power, versus working on most the car, also trying to obtain as much horse-power, but while keeping it economical.

So, let’s give a few pros/cons for academia:

  • you usually work on very abstract and specific subject (pro, if you’re into going all-in);
  • if you’re lucky, you get to choose specifically what you like/want to study (pro);
  • hard deadlines are usually only for writing articles and for conferences (pro);
  • however, this more “relaxed”/”bohemian” lifestyle is addicting and is difficult to change (con);
  • also, you need to have good self-managing and prioritisation skills, sort of what you would need for entrepreneurship (I’d consider this a tough task and thus a con), otherwise you slowly fall behind.
  • For industry: you get to work on real-world problems and make a (non-abstract) difference (pro),
  • you need to learn other technologies and interact with different teams (pro), and last but not least,
  • depending on the degree of autonomy you have, you are/aren’t able to steer the project your way (pro/con).

b. What industry do you think authorship verification will have the most impact in the near future?

Tough one. Authorship verification touches many areas, usually in hidden ways, and it’s a double edge sword. If we talk about surveillance (centralized or not), being able to identify anonymous writers, blog posters, social network activists, etc., will directly lead to decreased anonymity and free speech and thus strengthening an authoritarian government. However, being able to identify online stalkers/harassers is definitely a welcomed tool in the police’s inventory. Maybe the most well-known usage today of a type of authorship verification is in tools that perform large-scale plagiarism detection, meaning less “copy-paste” in scientific publications, degrees, books, etc., and more original content. Other implementations also exist and are working well for very specific tasks in the private sector, like guarding against fake/malicious online reviews. I think in the near-future we will see (very) personal assistants that learn the way we write and will be able to intelligently correct us or help us with reformulations or suggestions, if we ask for help. Overall, different flavours of authorship verification will, most likely, be naturally integrated and we will start to take them for granted, much in the same way we rely now on autocorrect on our phones or on our PCs. Like I’m doing right now as I’m writing this text.

c. How is a day in the life of a scientific researcher? Can you expand on the daily challenges you face every day and what motivates you to overcome them?

This one is easy. You don’t usually know what will happen. You have a general idea, like, tomorrow I’ll train this network that I’ve modified a bit today and get some test results, which I *know* will be better than yesterday. And then you spend one week debugging and retraining it only to suddenly realise what you’ve been doing wrong while taking a shower before going to bed. And then having trouble falling asleep because “how could I have missed that” and then being excited to retrain the network yet again. Then failing again because “dammit, I was sure that was it”. Some days you spend reading and trying to understand a couple of articles, some you’re all focused on writing data caching code because there’s never enough RAM on the machine you’re working on. This, I think, is the best thing in research. You’re never really sure what will happen tomorrow, but it’s usually something new, which, for me, is the main reason of doing what I’m doing. I consider myself lucky to get paid for my hobby.

👆Click for slides deck from Stefan.

Bonus: Looking for a NLP library, Romanian language included?

Meet NLP Cube — the NLP library supporting 70 languages (incl Romanian) and supported by Tiberiu Boros and Stefan Dumitrescu. Watch the streamed video to learn more about this open source project, or check out their progress on Github. Contact them for any further instructions or questions.

To access the Romanian model, contact them for a link, and make sure to download as well the RO embeddings , from here.

For anyone getting into NLP, here’s the video bibliography from Andrew Ng’s course on RNNs. And for a faster development cycle, the four sequence encoding blocks as a replacement of RNN/LSTM that can perform competitively to recurrent alternatives and save you a lot of computational time (courtesy of Han Xiao, Senior Research Scientist III at Tencent AI).

What’s next @BucharestAI

🎩 Chapeau to you all, for all your support and contributions thus far. A community it’s always the power of its members. It’s our honour to have had the chance to meet so many of you, and we’re looking forward to enabling more of you to meaningfully and humblingly practice and build your AI skills, knowledge, collaboration and awareness. And don’t forget:

Powerful things happen when like-minded people connect.

☀️ Happy Summer, fellow BucharestAI-ers (practitioners and enthusiasts alike)!

Bucharest AI

Cluster for Romanian AI practitioners and enthusiasts…

Bucharest AI

Cluster for Romanian AI practitioners and enthusiasts, startups, academia, curated global AI use cases, challenges and lessons learned. Join us to understand, ignite and refine transformative ideas, products and services in the applied Artificial Intelligence sphere.

Alexandra Petrus

Written by

New Tech Product Strategist & ENFJ-T | @BucharestAI |@Women_in_AI | ex-VP Products @reincubate | ❤#products #innovation #emergingtech #AI

Bucharest AI

Cluster for Romanian AI practitioners and enthusiasts, startups, academia, curated global AI use cases, challenges and lessons learned. Join us to understand, ignite and refine transformative ideas, products and services in the applied Artificial Intelligence sphere.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store