Summary. estimating frequency of change

Benjamin S. Kim
Epopcon Data Science & Engineering
2 min readSep 20, 2018

Topics

identify the problem of estimating the frequency of change
propose several estimators that measure the frequency of change
show how precise our proposed estimators are

Challenges

  1. incomplete change : possible to catch if a page has changed btw accesses, not how many the page changed 현실적으로 우리는 모든 변화횟수를 측정할 수 없다.
    - previous work has mainly (already) focused on how to given the complete change history, not incomplete

2. Irregular access interval : 크롤을 시도하는 시점이 랜덤하다. 10일동안 2번 변화했다고 단순 평균구간(10/2=5일)을 구할 수 없다.

3. Difference in available information :some information would be provided or not

Solution for this challenges

  1. we assume that we repeatedly access an element, either actively or passively.
    - Active monitoring (vs. Passive monitoring)
  2. we may have different levels of information regarding the changes of an element
    - Regular interval(vs. Random interval)
  3. may use the frequency of change for different purposes
    - Existence of change vs.Last date of change(vs. Complete history of change)

Preliminaries

  • an element : a web page
  • the change is any modification to the page.
  • Estimation Criteria : Unbiased,Consistency,Efficiency
  • A Poisson process : often used to model a sequence of random events that happen independently with fixed rate over time ( X ~ binomial distribution)

the total time elapsed during our n accesses n번째 접근까지 전체소요시간 : T
the total number of changes 관측된 전체 변화횟수 : X
yes/no(1/0) of whether the element changed in ith access i번째 접근시 변화여부 : Xi
unit time period 단위시간: I
* the frequency at access 단위시간내 빈도수 : 1/I = f

T = nI = n/f 접근횟수(n번) * 단위시간 = 단위빈도수 대비 접근횟수
r = λI = λ/f 단위변화횟수(λ번) * 단위시간 = 단위빈도수 대비 단위변화횟수 (the ratio of the change frequency to the access frequency)

Here are two kinds of how to estimate the lambda : Existence of change vs. last date of change. but did not explain the detail on the last one which our team did not apply to our model.
Categorization would solve the issue more easier if the purpose to try to model is different.

Source :

  1. Junghoo Cho,Hector Garcia-Molina(Stanford Univ), Estimating Frequency of Change
  2. Image files are in my personal drive.

--

--