AWS Redshift 之 UDF 撰寫
Aug 22, 2017 · 11 min read
需求
使用自訂的UDF去對指定欄位進行一個ETL動作

實作時間
2016.09
參考資源
實做細節
所謂UDF呢,全名是User-Defined Functions,是自定義函數的意思。在這個案例中,因為需要把很醜的uri過濾出有用的資訊,因此一般的SQL指令是不夠的,需要自己編寫UDF來滿足案例需求。
首先,先創一個迷你的table來測試UDF:
create table udf_test ("uri" VARCHAR(10000));
插入五筆data (其中含兩筆是不會匹配出結果的數據)
insert into udf_test values('http://smart.789.kakamobi.cn/api/open/v2/advert-sdk/confirm.htm?spaceid=80&advertid=2934&resourceid=2089&action=view&redirectUrl=&trackSessionId=998e4a8479a941138c1f984b11615372&_platform=android&_srv=t&_appName=jiakaobaodian&_product=%E9%A9%BE%E8%80%83%E5%AE%9D%E5%85%B8&_vendor=oppo&_renyuan=xyx&_version=6.3.1&_system=LMY47D&_manufacturer=QiKU&_systemVersion=5.1&_device=8681-M02&_imei=867886023818648&_productCategory=jiakaobaodian&_operator=M&_androidId=b61eb4985a5e1e26&_mac=ec%3A5a%3A86%3Afb%3A45%3Aaf&_appUser=391304d4e019441cb362c85485dfcf3a&_pkgName=com.handsgo.jiakao.android&_screenDpi=3.0&_screenWidth=1080&_screenHeight=1920&_network=g2&_launch=70&_webviewVersion=4.7&_firstTime=2015-10-21%2007%3A07%3A55&_userCity=410100&_p=&_gpsType=baidu&_cityName=%E9%83%91%E5%B7%9E%E5%B8%82&_cityCode=410100&_gpsCity=410100&_longitude=113.849043&_latitude=34.554371&_ipCity=441900&interface_version=2.0&bitauto_version=5.2&_j=1.0&schoolName=%E9%95%92%E9%80%9A%E9%A9%BE%E6%A0%A1&schoolCode=410100382&_r=d93e6d3853e4477d844ac0495abf281c&sign=5f5a717996a34ee2e89cb88539a94d83'), ('http://weatherapi.market.xiaomi.com/wtr-v2/city/positioning?longitude=F4G5p61TWPGTDw4mIawVZRqBzFD4xR-fzxJfF7xETm_A3C01xdEdgm4AOCkldtCzRKQ5o4SoFj6L%0AZ8CaddyH7W2f-F-i5Xg92nWpxitnx8KDilB0KGLCkHuAfKVp_18TyBzRy5Za4rmutnGBJJ5rsjcA%0AvO3Pwnup6ei2cjQDQB0%3D%0A&latitude=Uy-Sz-z_gznqqhDcQBESOj7aPJYF7AxSWfZ6fyW5aK3nBgWk-ZuaI8WmIYfBGVYmr2VPF3SBIvDP%0ANy_q1DDDFQUtUZ9HyXNWUtFouvBzvYnGnr8HXoLSbZpoWZ4MHGfSUHOnPHV63vEu4IH0jthHofrQ%0A29L6_FthYaVTgAce3lk%3D%0A&device=virgo&miuiVersion=V7.0.6.0.KXECNCI&modDevice=&source=miuiWeatherApp&encoded=latitude,longitude'),('http:///api/open/v2/advert-sdk/confirm.htm?spaceid=102&advertid=3274&resourceid=2394&action=view&redirectUrl=&trackSessionId=3abdb57155e540179df9087fc6acd91f&_platform=android&_srv=t&_appName=jiakaobaodian&_product=%E9%A9%BE%E8%80%83%E5%AE%9D%E5%85%B8&_vendor=oppo&_renyuan=xyx&_version=6.3.1&_system=LMY47D&_manufacturer=QiKU&_systemVersion=5.1&_device=8681-M02&_imei=867886023818648&_productCategory=jiakaobaodian&_operator=M&_androidId=b61eb4985a5e1e26&_mac=ec%3A5a%3A86%3Afb%3A45%3Aaf&_appUser=391304d4e019441cb362c85485dfcf3a&_pkgName=com.handsgo.jiakao.android&_screenDpi=3.0&_screenWidth=1080&_screenHeight=1920&_network=g2&_launch=70&_webviewVersion=4.7&_firstTime=2015-10-21%2007%3A07%3A55&_userCity=410100&_p=&_gpsType=baidu&_cityName=%E9%83%91%E5%B7%9E%E5%B8%82&_cityCode=410100&_gpsCity=410100&_longitude=113.849043&_latitude=34.554371&_ipCity=441900&interface_version=2.0&bitauto_version=5.2&_j=1.0&schoolName=%E9%95%92%E9%80%9A%E9%A9%BE%E6%A0%A1&schoolCode=410100382&_r=57eeee17c70e478184afaf78f40c790d&sign=e2e7ff58c0f40bed554a216a7229a8d4'),('http://weatherapi.market.xiaomi.com/wtr-v2/city/datasource?cityNames=%E9%83%B8%E5%9F%8E%3B%E5%91%A8%E5%8F%A3%2C%E6%B2%B3%E5%8D%97%2C%E4%B8%AD%E5%9B%BD%3Bzh_cn%3Bdancheng%3B%7CDancheng%3Bzhoukou%2Chenan%2CChina%3Ben_us%3B%3B%7C%E9%84%B2%E5%9F%8E%3B%E5%91%A8%E5%8F%A3%2C%E6%B2%B3%E5%8D%97%2C%E4%B8%AD%E5%9C%8B%3Bzh_tw%3B%3B&longitude=DI8G4KpaeI8nfs6gK2XicLy23qNxwBKCg8UcTM_s5QWUnzeob5ZTtJGTHZ8Yuub19T004LnrHAfW%0A2ynAwHzIWEPPTFeOils8oqh-Np8ttgfb6jOk0qUGSixDUOG64M1HwsVZYjnxQOCKeffY4RU4u58p%0AzuTChMOjp74A74dghuE%3D%0A&latitude=W8Yt1JXwOqycxTI721FtWS6iDW6c6X9dGbDYQuFsgMqA8sUyr74JGY3uO-lTL__5TMf49dKzVdj0%0AqrbgNoIw6VPrh-0btXohUXrQjQjl5O2z7p8N9_XeL2sjSU9RPD72rUqOU4Payf46J34vuXwIVHey%0AEz7FjRI39BogFfh0LWQ%3D%0A&phoneCode=0394&areaCode=477100&device=dior&miuiVersion=V6.5.4.0.KHICNCD&modDevice=&source=miuiWeatherApp&encoded=latitude,longitude'),('http://ic.snssdk.com/2/article/v27/stream/?category=news_entertainment&count=20&max_behot_time=1450510141&last_refresh_sub_entrance_interval=1450519149&bd_city=%E6%B4%9B%E9%98%B3%E5%B8%82&bd_latitude=34.669721&bd_longitude=112.372382&bd_loc_time=1450518517&loc_mode=0&lac=10645&cid=60700930&iid=3282230650&device_id=8187339662&ac=4g&channel=wap_apponly&aid=13&app_name=news_article&version_code=481&version_name=4.8.1&device_platform=android&ssmix=a&device_type=M623C&os_api=19&os_version=4.4.4&uuid=867332022159601&openudid=649579ed9d8ed7d3&manifest_version_code=481');
Ummm… 你沒看錯,原始資料真的這麼醜
好的,接下來進入正題,來撰寫UDF。
本次UDF目標是從很醜的uri欄位中,擷取出經緯度的資訊,我以正則來實踐:
- 擷取經度資訊:
create function f_extracturi_lng(uri VARCHAR(10000))returns VARCHAR(10000)IMMUTABLEas $$import reRG_URI_long = re.compile(r'([1][0-9][0-9])\.+([0-9]{0,16})')if not uri:return Nonetry:long_found = RG_URI_long.search(uri).group()return long_foundexcept:return None$$ language plpythonu;
2. 擷取緯度資訊的做法雷同:
create function f_extracturi_lat(uri VARCHAR(10000))returns VARCHAR(10000)IMMUTABLEas $$import reRG_URI_long = re.compile(r'([3][0-9])\.+([0-9]{0,16})')if not uri:return Nonetry:long_found = RG_URI_long.search(uri).group()return long_foundexcept:return None$$ language plpythonu;
編寫完後,使用上面建立的迷你測試table來檢驗udf:
select uri, f_extracturi_lng(uri), f_extracturi_lat(uri)from udf_test;
