Kevin Martin
1 min readMar 16, 2017

--

For my tasks that subclass AthenaQuery I simply set the “results_path_template” attribute. This directs the AthenaTask where to place the output at S3. I typically set it to something like “cdn-results/%s” which gets interpolated with the Luigi date parameter we provide along with the “results_path_base” to end up with results at s3://500px-emr/athena-results/cdn-results/2017–03–15.

I don’t use query_retrieve() in production. It was just there as an exploratory mechanism during development. That said, here’s an example of its use:

class Ex(AthenaQuery):
def run(self):
with open("/tmp/test", "w") as out:
for row in self.query_retrieve("select * from activity limit 10;"):
out.write(row.getString(1))

Note that query_retrieve() will return a generator whose values yielded are only valid until the next iteration. This means that attempting to store each row object and only afterwards accessing the row data will fail. Instead, you must access the row data for the current row before retrieving the next value from the generator.

--

--