SATIS 2018 — Day 1
I’m at the 1st ACM SIGOPS Summer School on Advanced Topics in Systems held in Sommarøy, Norway. Why the first and why this place? Robbert van Renesse (one of the conference’s hosts) explained that most of the speakers know each other from 30 years ago, when they were young and innocent students at the Arctic School on Distributed Systems held in 1988 in Tromsø, Norway.
Then, he explained why is “good” to become a SIGOPS member…
… and passed the mic to the first speaker: Dag Johansen.
Johansen’s talk sarted with a brief introduction to distributed systems, in order to put the audience in the context, and showing the similar characteristics of all distributed systems, that is, the 3-layer division of: User Interface (UI), Processing, and Data Storage.
Why soccer analytics? it is a very good scenario to test the distributed systems paradox: a system that must be simple (UI) but complex (behind the scenes). Quoting Johansen:
“When people turn on a switch they don’t care how the electricity got there”
But why a distributed system? what is the problem with centralized? we distribute to gain responsiveness. In this use case: football (soccer):
- 90+ minutes indeterministic flow of game
- Individual athletes, collective team
- Non-predictable outcome (few foals: 1–0 & 1–1 final scores [anderson 13])
Problem: How to accurately measure all the training modalities imposed on the athlete.
Related work: Manually counting (steps, distance, reps, etc), GPS and accelerometers, Video motion analysis
Approach: Develop scalable, non-invasive, highly available, precise sports system, monitoring external load, enabling visual feedback and personalized intervention in real time
Match and Training Toolkit:
- Capture positional data
- Capture video: analytics (semi-automatic feature extraction), visual feedback (complete field; zoom), precision introduced delay
- Non-functional properties: scalable, available, non-invasive, precise (capture correct details), real-time
Proof of concept: 23 people recording a match on video, OpenCV analysis got < 1 fps, and you need 33 fps for real-time. Then, distributed systems came to the rescue.
After Dag’s talk, Leslie Lamport gave two talks, in the first entitled “Algorithms are not programs” he defended the idea that algorithms shuld be described and written with math rather than programming languages or pseudo-languages. This applies to many more algorithms than the ones taught in algorithm courses. Citing Lamport:
“You don’t produce 10x less code by better coding. You do it with a better algorithm, which come from mathematical thinking”.
The second talk was Lamport’s Turing Award talk entitled: “The early days of concurrency: a memoir”. In the talk, Lamport revisited some of the seminal works on distributed systems:
- The Mutual Exclusion Problem: E. W. Dijkstra. Solution of a problem in concurrent programming control. Communications of the ACM, 8(9):569, September 1965.
- Producer-Consumer Synchronization: identified as concurrency problem in: E.W Dijkstra, Cooperating Sequential Processes. EWD (1965). Dijkstra implicit model is based on the representation of a execution as a sequence of states, it is called the standard model.
- Proving Invariance Properties in the Standard Model. Edward Ashcroft, Proving Assertions in Parallel Models (1975).
Then, he did talk about a well known article:
Finalizing with the Liveness & Fairness problem. And saying that the best way to reason formally about liveness was the article of Amir Pnueli, The Temporal Logic of Programs. FOCS (1977).
What happened in 1978?
R: Fault-Tolerance was introduced.
(Introduced by Dijkstra in 1974, but no one noticed)
Afternoon session’s speaker was Lorenzo Alvise, talking about consistency vs. performance. DISCLAIMER: I took less pictures because the light was not good, so I copy-pasted from the original articles in order to explain better the talk.
Lorenzo started asking: what consistency means? and explained that there was (is) a semantic gap between Database people and for systems people. The key was to study about transaction’s serializability.
Then his talk was about explaining the following works:
A Critique of ANSI SQL Isolation Levels. Berenson et al, SIGMOD’95. ANSI isolation levels should be intended to proscribe phenomena, not anomalies. Thus, ANSI SQL phenomena are weaker than their locking counterparts.
Phenomenons are expressed to single object histories, but consistency often involve muliple objects. So, same guaranties are needed for running and commited transactions, but optimistic approache tire on the difference. The solution was Snapshot Isolation: a guarantee that all reads made in a transaction will see a consistent snapshot of the database (in practice it reads the last committed values that existed at the time it started), and the transaction itself will successfully commit only if no updates it has made conflict with any concurrent updates made since that snapshot.
Semantic gap between how isolation guarantees are defined and how are experienced by application programmers. Definition (in terms of histories and operations, invisible to applications) is not the same as Perception (Properties of observable states). This inspire more scalable implementations by not tying definitions down to system-specific assumptions. For instance:
A state-based definition Crooks et al, 2017: http://www.cs.cornell.edu/lorenzo/papers/Crooks17Seeing.pdf
Salt: Combining ACID and BASE in a Distributed Database: https://www.usenix.org/system/files/conference/osdi14/osdi14-paper-xie.pdf. The committed state of an alkaline subtransaction is observable by other BASE or alkaline subtransactions. By leveraging this finer granularity of isolation, BASE transactions can achieve levels of performance and availability that elude ACID transactions.
Salt achieves most of the ease of programming of ACID and most of performance of BASE by rewriting performance-critical transactions incrementally. However, the extra programming effort involved is still non-trivial and can easily introduce bugs. Callas aims to move beyond the ACID/BASE dilemma. Rather than trying to draw performance from weakening the abstraction offered to the programmers, Callas unequivocally adopts the familiar abstraction offered by the ACID paradigm and set its sight on finding a more efficient way to implement that abstraction. Source: https://repositories.lib.utexas.edu/bitstream/handle/2152/42014/XIE-DISSERTATION-2016.pdf
Callas then decouples the concerns of abstraction and implementation: it offers ACID guarantees uniformly to all transactions, but uses a novel technique, modular concurrency control (MCC), to customize the mechanism through which these guarantees are provided.
But Callas is not perfect:
Solution? Transaction Chopping (Shasha et al. ’95) in the form of Runtime Pipelining (RP), a static and automatic analysis which produce the best slicing: slices that are serializable and also with the best performance. RP first statically constructs a directed graph of tables, with edges representing transactional data / control-flow dependencies, and topologically sorts each strongly connected set of tables. Transactions are correspondingly reordered and split into steps.
Problem? Self conflict, the only way to avoid chopping conflicts is not chopping at all. Solution? TEBALDI hierarchical and modular CC.
End of day 1: hiking and then go watch the Atlantic Ocean.
Second day report: https://medium.com/@madestro/satis-2018-day-2-9222dac84495