Information Retrieval 1

Lu Shengliang
Programmers Don’t Read Books

--

Overview

1. Information makes users overwhelmed

2. Process of finding materials that satisfied an information need

3. Does not answer a specific question; retrieves the doc that could be used to answer that question

4.

a. retrieving relevant doc to a query

b. large set efficiently

5. Earlier literature, information seeking as a form of problem sovling method

6. information need is a pyramid; only its peek is made visible by users

7. semantic aspect 语义方面 meaning of word; syntactic aspect句法方面 order of word; interaction and feedback; authority and authenticity

8. History

9. IR vs DR

10. information storage and retrieval — aboutness

11. traditional IR system;

  • get a set of keywords
  • stopwords eliminated; words are stemmed
  • user interface
  • enhanced retrieval

12. enhanced retrieval

  • auto categorization
  • spam filtering
  • info routing — ?
  • auto clustering
  • recommending
  • info extraction
  • info integration
  • question answering

12. type of IR system

13. keyword search

problems:

  • language ambiguities
  • spamdexing — strange web page full of keywords

14. boolean search

  • and / or are not perfect
  • usually very precise
  • non-ranking-able

15. vector space model

  • document indexing — keywords
  • term weighting
  • similarity ranking of doc — cosine coefficient
  • latent semantic analysis
  • probabilistic models — subjective; independent

16. language models

  • NLP
  • lexical ambiguity
  • inefficiency

17. properties of IR system

18. various hypothesis — relevance hypothesis; associated hypothesis; cluster hypothesis

19. word stemming

20. ranked retrieval — 需要精辟

21. Challenges for IR systems

  • language and information need complexities
  • relevance complexities
  • word mismatch complexities
  • modern spelling complexities
  • paraphrasing complexities
  • anaphora complexities — it; them…

22. challenges associated with individual differences in IR — could be my paper’s topic

  • training both user and the system

23. use semi-structured data instead of database — database more for structured data

24. concept-based IR

  • keyword-based IR model — commonly used in search engines
  • concept-based IR model — 我曾经尝试找对于该关键字的描述,或者经常co-occur的keywords,把这些当做是concept的word bag
  • semantic linguistic network of concepts — Nowadays’ google
  • Thesaurus — manual work; synonymy; antonymy
  • predictive model — neural networks;

25. features of conceptual structures

  • type of conceptual structure
  • form of representation — tree; semantic network; context vectors;
  • relationship supported by a conceptual structure — subsumption; a kind-of; a part of; association; and / x-or
  • creation of conceptual structure — manual creation; automatic learning; NLP

26. IR and related areas

  • focus on user aspect
  • AI — first order predicate logic and bayesian networks
  • NLP
  • Machine learning

--

--