Week 10

Baptiste Higgs
Design Computing
Published in
3 min readMay 12, 2017

Was having a bit of trouble figuring out how to download and extract the data. My data comes from the BOM who uses FTP (File Transfer Protocol) instead of HTTP or similar, which made it a bit difficult.

Link to BOM data: ftp://ftp.bom.gov.au/anon/gen/clim_data/IDCKWCDEA0/tables/

Here is the function that I ended up using:

  • The function basically just reconstructs the wget linux terminal command and runs it using os.system. The os.system command runs any linux terminal command from inside python (I think)!

(Note: if you want to do something similar, you’ll need to import os. I imported mine at the top of my file.)

  • The structure of the wget command is similar to any other terminal command: wget [FLAGS] [URL]
  • The — no-verbose flag makes the command spam the terminal less, turning a horrifyingly large amount of spam into this (which looks like a lot, but believe me it’s much less):
  • The — no-parent flag means that it won’t ‘leave’ the original intended directory. For example, if you want to download everything in the /home/docs/pics/cat_pics/ folder, you won’t be able to even see the folder /home/docs/videos/ or the folder /home/system/, or any of the files inside those folders.
  • The — recursive flag tells it to download everything inside the folders inside the folder you’re downloading. For example, if we were to run it on the /code1161base/OpenDataProject/ folder shown below, without the recursive flag it would only download aboutMe.yml and RetrieveData.py, but with the recursive flag it would download everything in the ftp.bom.gov.au folder and below.
  • The — level={} flag relates to this last point with the — recursive flag. It sets the maximum level of recursion that it searches for. For example, if you wanted to download all of /fun/games/ folder, but the sub-folders went down to /fun/games/solo/card_games/solitaire/game_saves/ you could set — level=1 to only download files in /fun/games/solo/, or to =3 to get down to files in /fun/games/solo/card_games/solitaire/ but not in /fun/games/solo/card_games/solitaire/game_saves/. I set — level=5 which is much more recursion than any folder should have that I’m downloading so I download everything (but it stops in case something goes unexpectedly deep).
  • After all the flags, I put my standard base URL which all requests would be going through, and then added the location, which should be parsed into the function. The location parameter should either be ‘’, for downloading everywhere’s data, ‘[state]/’, for all the data in a certain state, or ‘[state]/[station]/’, for downloading a certain weather station’s data.
os.system(“wget --no-verbose --no-parent --recursive --level=5 ftp://ftp.bom.gov.au/anon/gen/clim_data/IDCKWCDEA0/tables/nsw/bega/")

Time to try and reorganise the data!

--

--