Information Retrieval 1
Overview
1. Information makes users overwhelmed
2. Process of finding materials that satisfied an information need
3. Does not answer a specific question; retrieves the doc that could be used to answer that question
4.
a. retrieving relevant doc to a query
b. large set efficiently
5. Earlier literature, information seeking as a form of problem sovling method
6. information need is a pyramid; only its peek is made visible by users
7. semantic aspect 语义方面 meaning of word; syntactic aspect句法方面 order of word; interaction and feedback; authority and authenticity
8. History
9. IR vs DR
10. information storage and retrieval — aboutness
11. traditional IR system;
- get a set of keywords
- stopwords eliminated; words are stemmed
- user interface
- enhanced retrieval
12. enhanced retrieval
- auto categorization
- spam filtering
- info routing — ?
- auto clustering
- recommending
- info extraction
- info integration
- question answering
12. type of IR system
13. keyword search
problems:
- language ambiguities
- spamdexing — strange web page full of keywords
14. boolean search
- and / or are not perfect
- usually very precise
- non-ranking-able
15. vector space model
- document indexing — keywords
- term weighting
- similarity ranking of doc — cosine coefficient
- latent semantic analysis
- probabilistic models — subjective; independent
16. language models
- NLP
- lexical ambiguity
- inefficiency
17. properties of IR system
18. various hypothesis — relevance hypothesis; associated hypothesis; cluster hypothesis
19. word stemming
20. ranked retrieval — 需要精辟
21. Challenges for IR systems
- language and information need complexities
- relevance complexities
- word mismatch complexities
- modern spelling complexities
- paraphrasing complexities
- anaphora complexities — it; them…
22. challenges associated with individual differences in IR — could be my paper’s topic
- training both user and the system
23. use semi-structured data instead of database — database more for structured data
24. concept-based IR
- keyword-based IR model — commonly used in search engines
- concept-based IR model — 我曾经尝试找对于该关键字的描述,或者经常co-occur的keywords,把这些当做是concept的word bag
- semantic linguistic network of concepts — Nowadays’ google
- Thesaurus — manual work; synonymy; antonymy
- predictive model — neural networks;
25. features of conceptual structures
- type of conceptual structure
- form of representation — tree; semantic network; context vectors;
- relationship supported by a conceptual structure — subsumption; a kind-of; a part of; association; and / x-or
- creation of conceptual structure — manual creation; automatic learning; NLP
26. IR and related areas
- focus on user aspect
- AI — first order predicate logic and bayesian networks
- NLP
- Machine learning