Python: module irank

irank

index
c:\users\gige\pycharmprojects\oisisi_python\search\irank.py

Modules

numpy

Functions


IRank(graph: structures.graph.Graph, hard_result_set: dict, broad_positive_res_set: dict, ordered_list: list, positive_query_len: int)
IR is normalized.     IR = IR1 * IR2 For further details on algorithm see IRank1 and IRank2 functions in this module.

IRank1(hard_result_set: dict, ordered_list: list, positive_query_len: int, word_count: list = None)
IR1 is normalized.     IR1 = appearance penal factor * appearance dividend Appearance penal factor - takes values from 0 to 1, all sites not matching The Query given by our God Saviour - User                             are 0, this will also further propagate to RANK and assure that only sites matching                             given criteria appears in results. Otherwise, this holds a number following formula:                                 number of words from positive query appearing on site / num of words in pos. query                             therefore, max value is 1 (all words are present) Appearance dividend - holds count of positive search query words in site divided by maximal same number across all                         sites

IRank2(graph: structures.graph.Graph, hard_result_set: dict, broad_positive_res_set: dict, ordered_list: list, positive_query_len: int, word_count: list = None)
Calculates IR2 factor by formula:         IR2 = 1 + appearance words count / appearance file count Appearance word count - for each site that has a link to matched site, sum of count of appearances of words from                         positive query Appearance file count - number of linking sites in which appear words from positive search query

get_ranks(pagerank: numpy.ndarray, graph: structures.graph.Graph, hard_result_set: dict, broad_positive_res_set: dict, ordered_list: list, positive_query_len: int)
Rank calculation algorithm: Formula influenced by:     number of appearances of the searched words on site - IR1 (included in IR)     number of sites linking to site - PR (PageRank - US6285999B1)     number of searched words on linking sites to site - IR2 (included in IR) Normalized version of PageRank is used (values 0-1) - PR IR is also normalized.     RANK = PR * IR RANK is normalized. For details on the algorithm see function IRank in this module and pagerank.py module. :param pagerank: PR :param graph: PopulateStructures attribute :param hard_result_set: result set of the search query, see execute_query method in query.py module or                         eval_query method in advanced.eval_query.py module :param broad_positive_res_set: result set of broad set of sites influencing ranking algorithm, see execute_query                                 method in query.py module or eval_query method in advanced.eval_query.py module :param ordered_list: order od sites from PS object :param positive_query_len: number of parameters influencing ranking process (all 'positive' words) :return: rank matrix (with additional details)

normalize(vec)
Function performs mathematical normalization of the first order for a given n - dimensional vector. :param vec: vector to be normalized :return: normalized vector