AWS Redshift 之 UDF 撰寫

莉森羊
莉森羊
Aug 22, 2017 · 11 min read

需求

使用自訂的UDF去對指定欄位進行一個ETL動作

實作時間

2016.09

參考資源

AWS官方文件

實做細節

所謂UDF呢,全名是User-Defined Functions,是自定義函數的意思。在這個案例中,因為需要把很醜的uri過濾出有用的資訊,因此一般的SQL指令是不夠的,需要自己編寫UDF來滿足案例需求。

首先,先創一個迷你的table來測試UDF:

create table udf_test ("uri" VARCHAR(10000));

插入五筆data (其中含兩筆是不會匹配出結果的數據)

insert into udf_test values('http://smart.789.kakamobi.cn/api/open/v2/advert-sdk/confirm.htm?spaceid=80&advertid=2934&resourceid=2089&action=view&redirectUrl=&trackSessionId=998e4a8479a941138c1f984b11615372&_platform=android&_srv=t&_appName=jiakaobaodian&_product=%E9%A9%BE%E8%80%83%E5%AE%9D%E5%85%B8&_vendor=oppo&_renyuan=xyx&_version=6.3.1&_system=LMY47D&_manufacturer=QiKU&_systemVersion=5.1&_device=8681-M02&_imei=867886023818648&_productCategory=jiakaobaodian&_operator=M&_androidId=b61eb4985a5e1e26&_mac=ec%3A5a%3A86%3Afb%3A45%3Aaf&_appUser=391304d4e019441cb362c85485dfcf3a&_pkgName=com.handsgo.jiakao.android&_screenDpi=3.0&_screenWidth=1080&_screenHeight=1920&_network=g2&_launch=70&_webviewVersion=4.7&_firstTime=2015-10-21%2007%3A07%3A55&_userCity=410100&_p=&_gpsType=baidu&_cityName=%E9%83%91%E5%B7%9E%E5%B8%82&_cityCode=410100&_gpsCity=410100&_longitude=113.849043&_latitude=34.554371&_ipCity=441900&interface_version=2.0&bitauto_version=5.2&_j=1.0&schoolName=%E9%95%92%E9%80%9A%E9%A9%BE%E6%A0%A1&schoolCode=410100382&_r=d93e6d3853e4477d844ac0495abf281c&sign=5f5a717996a34ee2e89cb88539a94d83'), ('http://weatherapi.market.xiaomi.com/wtr-v2/city/positioning?longitude=F4G5p61TWPGTDw4mIawVZRqBzFD4xR-fzxJfF7xETm_A3C01xdEdgm4AOCkldtCzRKQ5o4SoFj6L%0AZ8CaddyH7W2f-F-i5Xg92nWpxitnx8KDilB0KGLCkHuAfKVp_18TyBzRy5Za4rmutnGBJJ5rsjcA%0AvO3Pwnup6ei2cjQDQB0%3D%0A&latitude=Uy-Sz-z_gznqqhDcQBESOj7aPJYF7AxSWfZ6fyW5aK3nBgWk-ZuaI8WmIYfBGVYmr2VPF3SBIvDP%0ANy_q1DDDFQUtUZ9HyXNWUtFouvBzvYnGnr8HXoLSbZpoWZ4MHGfSUHOnPHV63vEu4IH0jthHofrQ%0A29L6_FthYaVTgAce3lk%3D%0A&device=virgo&miuiVersion=V7.0.6.0.KXECNCI&modDevice=&source=miuiWeatherApp&encoded=latitude,longitude'),('http:///api/open/v2/advert-sdk/confirm.htm?spaceid=102&advertid=3274&resourceid=2394&action=view&redirectUrl=&trackSessionId=3abdb57155e540179df9087fc6acd91f&_platform=android&_srv=t&_appName=jiakaobaodian&_product=%E9%A9%BE%E8%80%83%E5%AE%9D%E5%85%B8&_vendor=oppo&_renyuan=xyx&_version=6.3.1&_system=LMY47D&_manufacturer=QiKU&_systemVersion=5.1&_device=8681-M02&_imei=867886023818648&_productCategory=jiakaobaodian&_operator=M&_androidId=b61eb4985a5e1e26&_mac=ec%3A5a%3A86%3Afb%3A45%3Aaf&_appUser=391304d4e019441cb362c85485dfcf3a&_pkgName=com.handsgo.jiakao.android&_screenDpi=3.0&_screenWidth=1080&_screenHeight=1920&_network=g2&_launch=70&_webviewVersion=4.7&_firstTime=2015-10-21%2007%3A07%3A55&_userCity=410100&_p=&_gpsType=baidu&_cityName=%E9%83%91%E5%B7%9E%E5%B8%82&_cityCode=410100&_gpsCity=410100&_longitude=113.849043&_latitude=34.554371&_ipCity=441900&interface_version=2.0&bitauto_version=5.2&_j=1.0&schoolName=%E9%95%92%E9%80%9A%E9%A9%BE%E6%A0%A1&schoolCode=410100382&_r=57eeee17c70e478184afaf78f40c790d&sign=e2e7ff58c0f40bed554a216a7229a8d4'),('http://weatherapi.market.xiaomi.com/wtr-v2/city/datasource?cityNames=%E9%83%B8%E5%9F%8E%3B%E5%91%A8%E5%8F%A3%2C%E6%B2%B3%E5%8D%97%2C%E4%B8%AD%E5%9B%BD%3Bzh_cn%3Bdancheng%3B%7CDancheng%3Bzhoukou%2Chenan%2CChina%3Ben_us%3B%3B%7C%E9%84%B2%E5%9F%8E%3B%E5%91%A8%E5%8F%A3%2C%E6%B2%B3%E5%8D%97%2C%E4%B8%AD%E5%9C%8B%3Bzh_tw%3B%3B&longitude=DI8G4KpaeI8nfs6gK2XicLy23qNxwBKCg8UcTM_s5QWUnzeob5ZTtJGTHZ8Yuub19T004LnrHAfW%0A2ynAwHzIWEPPTFeOils8oqh-Np8ttgfb6jOk0qUGSixDUOG64M1HwsVZYjnxQOCKeffY4RU4u58p%0AzuTChMOjp74A74dghuE%3D%0A&latitude=W8Yt1JXwOqycxTI721FtWS6iDW6c6X9dGbDYQuFsgMqA8sUyr74JGY3uO-lTL__5TMf49dKzVdj0%0AqrbgNoIw6VPrh-0btXohUXrQjQjl5O2z7p8N9_XeL2sjSU9RPD72rUqOU4Payf46J34vuXwIVHey%0AEz7FjRI39BogFfh0LWQ%3D%0A&phoneCode=0394&areaCode=477100&device=dior&miuiVersion=V6.5.4.0.KHICNCD&modDevice=&source=miuiWeatherApp&encoded=latitude,longitude'),('http://ic.snssdk.com/2/article/v27/stream/?category=news_entertainment&count=20&max_behot_time=1450510141&last_refresh_sub_entrance_interval=1450519149&bd_city=%E6%B4%9B%E9%98%B3%E5%B8%82&bd_latitude=34.669721&bd_longitude=112.372382&bd_loc_time=1450518517&loc_mode=0&lac=10645&cid=60700930&iid=3282230650&device_id=8187339662&ac=4g&channel=wap_apponly&aid=13&app_name=news_article&version_code=481&version_name=4.8.1&device_platform=android&ssmix=a&device_type=M623C&os_api=19&os_version=4.4.4&uuid=867332022159601&openudid=649579ed9d8ed7d3&manifest_version_code=481');

Ummm… 你沒看錯,原始資料真的這麼醜

好的,接下來進入正題,來撰寫UDF。

  1. 擷取經度資訊:
create function f_extracturi_lng(uri VARCHAR(10000))returns VARCHAR(10000)IMMUTABLEas $$import reRG_URI_long = re.compile(r'([1][0-9][0-9])\.+([0-9]{0,16})')if not uri:return Nonetry:long_found = RG_URI_long.search(uri).group()return long_foundexcept:return None$$ language plpythonu;

2. 擷取緯度資訊的做法雷同:

create function f_extracturi_lat(uri VARCHAR(10000))returns VARCHAR(10000)IMMUTABLEas $$import reRG_URI_long = re.compile(r'([3][0-9])\.+([0-9]{0,16})')if not uri:return Nonetry:long_found = RG_URI_long.search(uri).group()return long_foundexcept:return None$$ language plpythonu;
select uri, f_extracturi_lng(uri), f_extracturi_lat(uri)from udf_test;

)
Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade