DevOpsDays London 2018 Notes
Neurodiversity and the Essence of DevOps @jeffsussna
- Dark Hero of Information Age
- Realizing Empathy
- DevOps is about empathy
- Autism isn't defined by the absence of empathy
- World tends to view autistic people by what they lack
- Anxiety levels are higher, it’s like running 6 miles to get to work
- Accommodations help everyone
- Beaware of assumptions
- Talking about things is talking about people, be careful about wording
- DevOps teams are wrong
- “We hire the top 1%”, what about the other 99%?, seek the wisdom in everyone
- Curiosity is mandatory
- Listening is easy, but it's easy to collect feedback and do nothing
- Respect Struggle
- DevOps -> EmpathyOps
How to leverage AWS features to secure and centrally monitor your accounts @KateAWhalen
- 4K AWS accounts managed by teams
- If you get ownership of your infra, you get ownership of securing it, we need to share security expertise
- Security is a shared responsibility between AWS and you
Trusted Advisor
- It’s free
- Will flag open S3 buckets
- In S3 bucket permissions, Deny overrides allows
- For security groups will flag overly permissive ones
- In SGs the most permissive rule wins
AWS Inspector
- Automated security inspector
- Costs 30c / server / scan
- Install it on your instances
- Be Aware that it reports traffic to the AWS metadata service as unencrypted and a potential issue
StackSets
- Apply changes across multiple AWS accounts
- Once account can now affect many others
Death By Dashboards
- Provide specific, timely, actionable feedback
Actionable
- Best efforts are better than no efforts
- Anyone seeing an issue should feel confident to fix it
- Your creative vision matters less than the experiences of people using your tools
Timely
- Compliance != security
Alerts
- Remove noise don't overwhelm users
- When everything is urgent nothing is
Visible
- Security status is transparent
- Leave it better than you found it
- We succeed and fail together
Don’t Panic! @efinlay24
- You’ve been placed on call!
- Everyone feels the same at the start
- What if I can't fix it?
Ghosts of Incidents Future
- Incident handling is a skill that can be taught
- Start off dealing with alerts during working hours
- Have a plan
- Practice regularly
- Break things and see what happens
Ghosts of Incidents Present
- Take a deep breath
- Don't drive straight in
- Gather info
- What's already been tried?
- What's the minimum viable solution?
- Call for help
- Communication is key
- Put someone in charge
- Incident Channels, let you construct a timeline afterwards easily
- Tired people don’t think good
Ghosts of Incidents Past
- Postmortems
- Incident reports
- Prioritise follow up actions
Bright Screens, Blue Days: Developing Self-Care Tech @niceotherwise
Mental Health
- 1/6 people experience mental health problems in England
- Healthcare system is overloaded
- Private care is expensive
- Stigmatised, socially taboo
Solution?
- Tech? A Smartphone App
- You can argue with your therapist but not your phone
CBT
- Depressed people have an altered view of reality
Designing for mental health is complex
- Talk to distressed people
- Follow best practices
- Share resources, give access
Support After 5pm Open Space
- Script how a person would fix a problem
- Have the scripts autorun based on alerts
- RunDeck
- Rotate people through teams to increase people's exposure to different services
- Lots of people had best effort on call systems with volunteers who are paid to join it
- Offer training for the core systems and provide run books
- Swedish law limits how much you can work, to have people on call at the weekend they have to take 2 days off in the week
- Having people in different timezones helps a lot
- Pipe alerts into team specific Slack rooms, improves fix rate
- “Done” for a project requires monitoring and alerts
- Discussion of how not having to run / be on call for the software you push to prod lowers the quality. Also affected by demands to release quickly and develop more features
- Differences between prod and staging don’t highlight issues before release
- Some companies group services into tiers
- Everyone had bad experiences with this
- Lets you excuse poor quality in lower tier code
- High tier services depend on lower tier services
What To Expect From Your Manager Open Space
- Feedback on what you’re doing
- Champions teams needs to the rest of the company
- Sets the teams expectations
- Make sure the rest of the org knows what your team does
- Feeds praise back to the team
- Don't mislead your team
- People can tell if you’re lying and lose trust in you
- Don’t tell them what to do, get stuff out of there way so they can do things
- Build emotional safety so teams can admit they don’t know
- Team needs to know they can trust their manager
- Culture defines what the right thing to do is, don’t reward staying late
- Don’t put your manager in a situation where they can’t defend you
- Schedule 1:1s weekly so people know they have your time even if things are busy, but make them skippable if no one has anything
- “The Adventures of Johnny Bunko”, sequel to “Drive” written from the employee perspective
- Servant leadership
Good Monoliths Open Space
- What business value do you get by splitting your monolith into microservices?
- Faster to develop an initial idea as a monolith
- Good monoliths have well defined interfaces between modules
- Decouple parts of your monolith with queues that enable parts to communicate
- Can enforce business rules with security groups in a Microservices world
- Shared libraries couple monoliths, what’s your alternative? Copy paste code?
- Monoliths don't have API dependencies, can use GRPC to address this in Microservices
- With large teams and a monolith anyone can start reaching across module boundaries and create a mess
Who Broke Prod? Growing a Culture of Blameless Failure @growerofawesome
- Why does it feel bad when it goes down?
Negative Feedback
- Have to practice responding to negative feedback
Seeking Blame
- Self defense, “is someone else's fault”
Incident
- Brutal Transparency
- People who are worried about blame stop talking
- Collaboration
- Work through the problem with someone else
- Use “we” rather than “They/I/You/Them” to build trust
Postmortem
- “Learning Review” rather than postmortem
- “Beyond Blame”
Visibility
- Make failure visible
- Dont just visualise application metrics
- Show sales pipeline / customer feedback
- Everyone should be able to see the failures
Reward
- Never punish people for trying
- Words can make people feel blame / criticised, impacts there creativity
- Reward positive behaviour
Managing people and other horror stories @InformatiQ
- Management is a role in the team
Role as a manager
- Plan, workon and deliver projects
- Maintain a level of quality in services delivered
- A manager is expected to deliver what a whole team delivers
- Accountable for the results of the team
- Managers exist for people and results
- IC’s rely on themselves to deliver
- Manager rely on the team, there is no self
- Help developers do what they do best
- Management is not a promotion it's a different role
- People are your responsibility
- Thou Shall Manage a Team
- Work together, have trust
2. Thous shall give them a reason to exist
- Why are we here?
- Common purpose
3. Serve your team
- When your team is awesome, you’re awesome
Why are Distributed Systems so hard? A network partition survival guide @deniseyu21
- There used to be one monolithic database in the basement
- Now more people want to query it, for things like business analysts
Scale Vertically
- Add more compute power
- Eventually hit limits
Scale Horizontally
- Scalability
- Availability
- Latency
Shared Nothing
- All modern clouds are based on shared nothing
- No shared resources
8 Fallacies of Distributed Systems
- Network is unreliable
- Byzantine Generals Problem
CAP
C = Lineralisability
A = Availability
- How do you know if its slow or dead?
- Timeouts, what to set them to?
- Need to monitor systems to workout whats normal and set timeouts
P = Partition Tolerance
- Connectivity between 2 nodes fail
- Can't know the state of the otherside
- Are inevitable
- Hardware fails
- Software behaves weirdly
Tickets and Silos Ruin Everything @damonedwards
- Forces that undermine operations
Silos
- Team A throws requests over the wall to Team B
- Interferes with feedback loops
Ticket Queues
- Are expensive
- Disintegrate and obscure value streams
- Snowflake makers
Toil
- Excessive toil prevents you from improving the business
- Can't spend time reducing toil
Low Trust
- If you have to escalate up to make decisions, the people who make them have less context
Operations as a service
- Give people self service options
Startups Dos and Don’ts Open Space
- Whoever's in charge needs to have the most shares, at some point someone will have to make the crappy decisions
Keep everyone close
- Easier to make changes faster
- Take no shit its your vision
- Easier to influence culture
- Share everything
- Whatever your MVP is make it smaller
- Go to many meetups, make contacts
Outsourcing
- Own the repos
- Shit will go wrong
- Make sure you don't lose the work
- They will hide things
- Define quality upfront
- They’re motivated by price not long term maintainability
- Takes lots of time to manage the relationship
- Same problem finding quality as you have when hiring people
- You’ll need to become an accountant
- Dont underestimate how much other work happens outside of engineering
- Sack people as soon as you consider doing it
- Don't have time to fix people
- Bad people affect the whole team
- Keep it fun, it’s a startup you’re apart of this thing that could be the next Google
- If you have “rockstars” make time for them to teach other people
- You then have many great people
- Remove a single point of failure
Builders vs Operators
- Watch for people who just want to do CV development
- Be Careful what happens without an ops person
- You don’t have time to build a beautiful ops setup
- You need to do what you need to do, not what you want to do
- Get an AWS account manager, they can be very generous with free credits
You’ll lie a lot
- Make sure its to the right people
- Sell people on the fantasy
- Never pay retail for anything, call and ask for a business account