| |
- IRank(graph: structures.graph.Graph, hard_result_set: dict, broad_positive_res_set: dict, ordered_list: list, positive_query_len: int)
- IR is normalized.
IR = IR1 * IR2
For further details on algorithm see IRank1 and IRank2 functions in this module.
- IRank1(hard_result_set: dict, ordered_list: list, positive_query_len: int, word_count: list = None)
- IR1 is normalized.
IR1 = appearance penal factor * appearance dividend
Appearance penal factor - takes values from 0 to 1, all sites not matching The Query given by our God Saviour - User
are 0, this will also further propagate to RANK and assure that only sites matching
given criteria appears in results. Otherwise, this holds a number following formula:
number of words from positive query appearing on site / num of words in pos. query
therefore, max value is 1 (all words are present)
Appearance dividend - holds count of positive search query words in site divided by maximal same number across all
sites
- IRank2(graph: structures.graph.Graph, hard_result_set: dict, broad_positive_res_set: dict, ordered_list: list, positive_query_len: int, word_count: list = None)
- Calculates IR2 factor by formula:
IR2 = 1 + appearance words count / appearance file count
Appearance word count - for each site that has a link to matched site, sum of count of appearances of words from
positive query
Appearance file count - number of linking sites in which appear words from positive search query
- get_ranks(pagerank: numpy.ndarray, graph: structures.graph.Graph, hard_result_set: dict, broad_positive_res_set: dict, ordered_list: list, positive_query_len: int)
- Rank calculation algorithm:
Formula influenced by:
number of appearances of the searched words on site - IR1 (included in IR)
number of sites linking to site - PR (PageRank - US6285999B1)
number of searched words on linking sites to site - IR2 (included in IR)
Normalized version of PageRank is used (values 0-1) - PR
IR is also normalized.
RANK = PR * IR
RANK is normalized.
For details on the algorithm see function IRank in this module and pagerank.py module.
:param pagerank: PR
:param graph: PopulateStructures attribute
:param hard_result_set: result set of the search query, see execute_query method in query.py module or
eval_query method in advanced.eval_query.py module
:param broad_positive_res_set: result set of broad set of sites influencing ranking algorithm, see execute_query
method in query.py module or eval_query method in advanced.eval_query.py module
:param ordered_list: order od sites from PS object
:param positive_query_len: number of parameters influencing ranking process (all 'positive' words)
:return: rank matrix (with additional details)
- normalize(vec)
- Function performs mathematical normalization of the first order for a given n - dimensional vector.
:param vec: vector to be normalized
:return: normalized vector
|