How to Find a “Silent” Bug

Michal Valoušek
code.kiwi.com
Published in
6 min readOct 25, 2016

Now and again, some very weird bugs occur.

Photo by David Clode on Unsplash

In the case of Django, and out internal Kiwibase project, I spent half a day fighting with a particularly weird bug.

What Happened

  • A colleague had prepared a branch for our shiny new feature
  • Tests were passed (okay, I’m lying slightly; there was one minor bug in the HTML content which was not related to this issue)
  • I was able to successfully run it on my computer

So I tried to deploy it on the server and it didn’t work. The only screen I could see was this:

Houston, we have a problem

There wasn’t any traceback or error message in the log. I tried to run the application behind uwsgi, with the server directly through runserver. Still the same behaviour.

So I decided to switch the project to DEBUG=True mode to see if there was some traceback there.

Boom!

When project was run with DEBUG=True, everything was okay. When I switched it to DEBUG=False, only that screen above appeared and no other traces were available (and no, I didn't forget to add ALLOWED_HOSTS).

How I Found the Problem

I found that the code in feature/assets-rebase branch was bad, but in develop healthy. So I prepared the script bisect.sh:

#!/bin/bash
TIME=5
sleep $TIME
docker-compose build
docker-compose kill && docker-compose rm -f
docker-compose up&
sleep 45
cat ~/snippet.py >> ./avian_resources/settings.py
sleep $TIME
curl -s http://localhost:8000/accounts/login/ | grep -q "500 - Internal server error"
RET=$((1-$?))
git checkout avian_resources/settings.py
exit $RET

Then I ran these commands:

$ git bisect start
$ git bisect bad feature/assets-rebase
$ git bisect good develop
$ git bisect run ./bisect.sh

What does this do?

  • git will go through the codebase history (commits between develop and feature/assets-rebase) and check some versions of our application
  • Then script bisect.sh will be invoked
  • New Docker containers will be built (according to requirements.txt)
  • The application will be launched
  • settings.py will be modified (we add the content of snippet.py, to it, which just overrides the development settings with DEBUG=False; see below)
  • curl access login page; if we are in a bad git revision, the page with message "500 - Internal server error" will be displayed and the exit code of grep will be set to 0 (for anything else, eg a standard login page, the exit code will be 1)
  • Code RET=$((1-$?)) inverts the exit code, so "500 - Internal server error" page will be represented as 1, and everything else as 0
  • Then we revert to temporary changes in the settings.py file and return our modified exit code

This way, git found the first failing commit. Unfortunately, not the one which caused the original issue. This commit failed because of some missing packages in the requirements.* files.

To remember my colleague’s note:

Bisection Prerequisites1. One commit == one logical change
2. No broken commits on master

Yes. We broke the “master” and bisect has found the wrong commit. This is the most important message here: never put a broken commit into git!

No matter. git bisect helped me anyway (a significantly lower number of commits). I continued manually, and soon found 6217fb280a05ca1995aaddcc4dc06791bb62c2a5, "Auditlog for assets".

I verified that this commit was having a problem with DEBUG=False (and the preceding commit didn't). When I removed every reference to the auditlog 3rd party package, I could successfully run the project even with DEBUG=False.

Okay, so now I know that auditlog is probably the cause of all my problems. But what exactly is wrong? There was no traceback in the log, so I decided to make my very own (and then go through stacktrace back to the point in the source code which caused this error).

I noticed that the error page in the browser is nicely rendered. Django uses our custom 500.html template, so it means that the code server_error function is invoked. I went there, and put, before the try statement, assert:

@requires_csrf_token
def server_error(request, template_name='500.html'):
assert False
try:
template = loader.get_template(template_name)
except TemplateDoesNotExist:
return http.HttpResponseServerError(
'<h1>Server Error (500)</h1>',
content_type='text/html')
return http.HttpResponseServerError(template.render())

Now, when I accessed the login page in the browser, there wasn’t a nice 500.html page. Just the simple text: A server error occurred. Please contact the administrator..

But most importantly, we finally got some traceback in the console! Hurray!

[02/Sep/2016 07:29:38] "GET /accounts/login/?next=/ HTTP/1.1" 500 59
Traceback (most recent call last):
File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 149, in get_response
response = self.process_exception_by_middleware(e, request)
File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 147, in get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python3.5/site-packages/django/views/generic/base.py", line 68, in view
return self.dispatch(request, *args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/django/utils/decorators.py", line 67, in _wrapper
return bound_func(*args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/django/views/decorators/debug.py", line 76, in sensitive_post_parameters_wrapper
return view(request, *args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/django/utils/decorators.py", line 63, in bound_func
return func.__get__(self, type(self))(*args2, **kwargs2)
File "/usr/local/lib/python3.5/site-packages/allauth/account/views.py", line 98, in dispatch
return super(LoginView, self).dispatch(request, *args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/allauth/account/views.py", line 66, in dispatch
**kwargs)
File "/usr/local/lib/python3.5/site-packages/django/views/generic/base.py", line 88, in dispatch
return handler(request, *args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/django/views/generic/edit.py", line 213, in get
return self.render_to_response(self.get_context_data())
File "/usr/local/lib/python3.5/site-packages/allauth/account/views.py", line 129, in get_context_data
site = get_current_site(self.request)
File "/usr/local/lib/python3.5/site-packages/allauth/utils.py", line 155, in get_current_site
site = Site.objects.get_current(request=request)
File "/usr/local/lib/python3.5/site-packages/django/contrib/sites/models.py", line 65, in get_current
return self._get_site_by_id(site_id)
File "/usr/local/lib/python3.5/site-packages/django/contrib/sites/models.py", line 35, in _get_site_by_id
site = self.get(pk=site_id)
File "/usr/local/lib/python3.5/site-packages/django/db/models/manager.py", line 122, in manager_method
return getattr(self.get_queryset(), name)(*args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/django/db/models/query.py", line 381, in get
num = len(clone)
File "/usr/local/lib/python3.5/site-packages/django/db/models/query.py", line 240, in __len__
self._fetch_all()
File "/usr/local/lib/python3.5/site-packages/django/db/models/query.py", line 1074, in _fetch_all
self._result_cache = list(self.iterator())
File "/usr/local/lib/python3.5/site-packages/django/db/models/query.py", line 52, in __iter__
results = compiler.execute_sql()
File "/usr/local/lib/python3.5/site-packages/django/db/models/sql/compiler.py", line 846, in execute_sql
cursor = self.connection.cursor()
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/base.py", line 233, in cursor
cursor = self.make_cursor(self._cursor())
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/base.py", line 204, in _cursor
self.ensure_connection()
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/base.py", line 199, in ensure_connection
self.connect()
File "/usr/local/lib/python3.5/site-packages/django/db/backends/base/base.py", line 174, in connect
connection_created.send(sender=self.__class__, connection=self)
File "/usr/local/lib/python3.5/site-packages/django/dispatch/dispatcher.py", line 192, in send
response = receiver(signal=self, sender=sender, **named)
File "/usr/local/lib/python3.5/site-packages/jsonfield/fields.py", line 166, in configure_database_connection
loads=lambda x: x)
File "/usr/local/lib/python3.5/site-packages/psycopg2/_json.py", line 168, in register_default_jsonb
loads=loads, oid=JSONB_OID, array_oid=JSONBARRAY_OID, name='jsonb')
File "/usr/local/lib/python3.5/site-packages/psycopg2/_json.py", line 139, in register_json
register_type(JSON, not globally and conn_or_curs or None)
TypeError: argument 2 must be a connection, cursor or None
During handling of the above exception, another exception occurred:Traceback (most recent call last):
File "/usr/local/lib/python3.5/wsgiref/handlers.py", line 137, in run
self.result = application(self.environ, self.start_response)
File "/usr/local/lib/python3.5/site-packages/django/core/handlers/wsgi.py", line 177, in __call__
response = self.get_response(request)
File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 230, in get_response
response = self.handle_uncaught_exception(request, resolver, sys.exc_info())
File "/usr/local/lib/python3.5/site-packages/django/core/handlers/base.py", line 296, in handle_uncaught_exception
return callback(request, **param_dict)
File "/usr/local/lib/python3.5/site-packages/django/utils/decorators.py", line 149, in _wrapped_view
response = view_func(request, *args, **kwargs)
File "/usr/local/lib/python3.5/site-packages/django/views/defaults.py", line 59, in server_error
assert False
AssertionError

The lower part, behind During handling of the above exception, another exception occurred isn't interesting (it's our assert). But take a look at the upper part. There are some suspicious messages about jsonfield! Yes jsonfield, not auditlog. But if you look into the django-auditlog requirements, you find a reference to django-jsonfield>=1.0.0 package.

We’re on the home straight. The package django-jsonfield hasn't been maintained for some time. There is version 1.0.1 on pypi, but 1.0.3 on GitHub. When I upgraded it, the problem with DEBUG=False disappeared.

Hurray!

Lessons Learned

  • Each commit (not even to master branch) must not break the application functionality
  • If you encounter this “silent” type of bug, try injecting some asserts into the source code to get an idea where the code is failing
  • Inspect not only direct 3rd party packages used in the project, but also their dependencies
  • Sleep well

Extras

Content of ~/snippet.py file:

class Dev(Common):
DEBUG = False
ALLOWED_HOSTS = ['*']
EMAIL_BACKEND = 'django.core.mail.backends.filebased.EmailBackend'
EMAIL_FILE_PATH = '/tmp/app-emails'
CUSTOM_BACKEND_URL = 'http://0.0.0.0:12345/custom/api'

I would like to thank Bence Nagy for the solution around Git Bisect and his presentation at our internal Kiwi.com development meetup covering this topic.

--

--