Ryan A. Rossi
Adobe Research
About Me
Contact Me
Curriculum Vitae
Introduction to Search Engine Theory
An undergraduate course I developed (with the help of Jean-Louis Lassez) and taught back in 2007-2008. This course was designed to cover only the basics at an undergraduate level.
[Word Document]
Lectures 1 & 2: Introduction, Ergodic Theorem, Perron-Frobenius Theorem, Power Method and Foundations of PageRank
Lecture 3: Hyperlink-Induced Topic Search (HITS)
Lecture 4: PageRank & SALSA
Lecture 5: Latent Semantic Analysis
Lecture 6: Ranking Links: Search and Surf Engines
Lecture 7: Detecting Spam Sites
Lecture 8: Spectral Clustering and Graph Partitioning
Lecture 9: K-means, Hierarchical and Zoomed Clustering, Hidden Markov Models
Homework 1 - Ergodic and Perron-Frobenius Theorems
Homework 2 - Hubs and Authorities (HITS)
Homework 2.1 - Sets of Hubs and Authorities
Homework 3 - PageRank
Homework 4 - Latent Semantic Analysis
Homework 5 - Ranking Links
Homework 6 - K-means and Hierarchical Clustering
Homework 7 - Spectral Clustering
Homework 8 - Building A Search Engine
A reference textbook is
Introduction to Information Retrieval
, Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schutze, Cambridge University Press. 2008. You may view the textbook online, or print your own copy.
Core Papers
Authoritative Sources in a Hyperlinked Environment
The PageRank Citation Ranking: Bringing Order to the Web
The stochastic approach for link-structure analysis (SALSA) and the TKC effect
Introduction to Latent Semantic Analysis
Automatic Cross-Language Information Retrieval using Latent Semantic Indexing
Ranking Links on the Web: Search and Surf Engines
Spam Detection Papers
Combating Web Spam with TrustRank
Measuring Similarity to Detect Qualified Links
Topical TrustRank: Using Topicality to Combat Web Spam
Improving Web Spam Classifiers Using Link Structure
A Large-Scale Study of Link Spam Detection by Graph Algorithms
Advanced Reading
The ATHENS System for Novel Information Discovery
Detecting Anomalies in Graphs
Searching and Ranking Web Pages
Self-Organization and Identification of Web Communities
Organizing WWW Images Based On The Analysis of Page Layout and Web Link Structure
The ATHENS System for Novel Information Discovery
Indexing by Latent Semantic Analysis
Signature Based Intrusion Detection using Latent Semantic Analysis
Symbolic Stochastic Systems
As an alternative to Matlab, there is a free software package called SciLab that is very similar. You can download this software from
. There is also online help at
and a guide:
An Introduction to Scilab
Data sets
Abortion Refined
Computational Geometry Refined
Death Penalty Refined
Gun Control Refined
Movies Refined
Net Censorship Refined
These data sets were made public by
Panayiotis Tsaparas
Last updated 05/10/07