# R 技術文 — 區議會選區分界歷屆變動及人口地圖

May 4 · 11 min read

(完整即食原始碼放喺各章節底，已合併前文相關代碼，唔駛前抄後抄咁辛苦)

## B. 區議會選區人口偏離標準人口基數幅度地圖

B1. 處理 PDF 檔及清理數據

```# READ THE PDF FILE AND EXTRACT THE TABLES
# https://www.eac.hk/pdf/distco/2019dc/final/ch/Appendix_VI(Chi).pdf
out_mat = extract_tables("Appendix_VI(Chi).pdf", method="stream", encoding='UTF-8')```

```## CREATE AN EMPTY DATA FRAME
out_df = data.frame()## READ EACH OF THE 18 TABLES
for (k in 1:18){
## EXTRACT THE RELEVANT COLUMNS (NOTE: NUMBER OF COLUMNS VARIES)
out_temp = out_mat[[k]][,c(1,2,ncol(out_mat[[k]])-1,ncol(out_mat[[k]]))]

## USE REGULAR EXPRESSION TO EXTRACT THE RELEVANT ROWS (WITH VALID DCCA ID)
out_temp = out_temp[grep("\\D\\d\\d", out_temp[,1], perl=TRUE),]

## SPLIT THE NAME INTO 2 COLUMNS (CHINESE NAME AND ENGLISH NAME)
out_temp = cbind(out_temp, str_match(as.character(out_temp[,2]),"([^\\s]+)\\s+(.+)")[,c(2:3)])

## CONVERT TO A DATAFRAME
out_temp = as.data.frame(out_temp, stringsAsFactors=FALSE)

## CONVERT THE POPULATION TO INTEGER (REMOVE NON-NUMERIC CHAR)
out_temp[,3] = as.integer(gsub("[^0-9]","",out_temp[,3]))

## CONVERT THE PERCENTAGE TO NUMERIC TYPE
out_temp[,4] = as.numeric(as.numeric(sub("%","",out_temp[,4])))

## COMBINE THE DATAFRAME WITH THE CONSOLIDATED ONE
out_df = rbind(out_df, out_temp)
}## EXPORT THE CONSOLIDATED DATAFRAME TO AN EXCEL FILE
## CLEAN THE FILE IN THE EXCEL
## EDIT DCCAs E17, F25 (>1 ROW IN THE NAME COLUMN)
write.xlsx(out_df, "out.xlsx", row.names=FALSE)## IMPORT THE CLEANED CONSOLIDATED EXCEL FILE
imp_df = read.xlsx("out.xlsx", sheetIndex = 1, encoding="UTF-8")## RENAME THE COLUMNS
colnames(imp_df) = c("CACODE","NAME","Pop","Deviation","CNAME","ENAME")## DROP UNNECESSARY COLUMNS
imp_df['NAME'] <- NULL
imp_df['ENAME'] <- NULL```

B2. 載入 Shapefile 以及將人口數據併入

```####################################################################
## IMPORT THE SHAPEFILE
DC2019 = readOGR(dsn='DCCA_2019.shp')## SET THE CRS (COORDINATE REF SYSTEM) TO HK1980 GRID SYSTEM (EPSG:2326)
## https://spatialreference.org/ref/epsg/2326/
projection(DC2019) = crs("+init=epsg:2326")## TRANSFORMATION INTO WGS 84 SYSTEM (EPSG:4326)
## https://spatialreference.org/ref/epsg/4326/
DC2019 = spTransform(DC2019,"+init=epsg:4326")## REMOVE "CNAME", THE SAME AS THE "ENAME" IN THE 2019 DCCA FILE
DC2019@data\$CNAME <- NULL```

```## ADD THE ADDITIONAL DATA TO THE DC2019 SpatialPolygonsDF USING LEFT_JOIN
DC2019@data = left_join(DC2019@data, imp_df,by=c("CACODE"="CACODE"))## CREATE A PALETTE FUNCTION
pal <- colorNumeric(
palette = "PiYG",
domain = DC2019@data\$Deviation, reverse = TRUE)```

B3. 用 leaflet 繪製地圖

i) Basemap 選項

ii) 依偏離基數人口嘅幅度填色，再加上標籤等等

Written by