.. _algorithms-page: Algorithms ========== The algorithms implemented in this library are: - Majority voting (baseline algorithm) - Sums [1]_ - Average.Log [1]_ - Investment [1]_ - PooledInvestment [1]_ - TruthFinder [2]_ Each algorithm is represented by a class. Usage is roughly as follows. :: alg = AlgorithmClass(param1=val1, ...) results = alg.run(mydata) The available parameters depend on the algorithm, and are listed in detail below. See :ref:`output-page` for details on the format of the results as returned by the :meth:`~truthdiscovery.algorithm.base.BaseAlgorithm.run` method. .. _majority-voting: Majority voting --------------- Majority voting is a simple baseline algorithm where all sources are assigned a trust score of 1, and the belief score for a variable taking a given value is simply the number of sources who make that assertion, divided by the maximum number of sources for a claim (to ensure all belief scores are between 0 and 1). There are no optional parameters. :: from truthdiscovery import MajorityVoting voting = MajorityVoting() results = voting.run(data) Iterative algorithms -------------------- Apart from majority voting, all algorithms are *iterative*, in that they initialise the trust and belief scores to initial values, and iteratively update them until some stopping criterion is satisfied. Optional parameters common to all iterative algorithms are: - ``iterator``: this controls the mode of iteration and the stopping criterion. It should be an :any:`Iterator` instance. There are two types of iterator available: :any:`FixedIterator`, where a fixed number of iterations are performed, and :any:`ConvergenceIterator`, where iteration continues until the distance between successive trust scores becomes lower than a given threshold. Unless otherwise stated, the default ``iterator`` is a :any:`FixedIterator` for 20 iterations. - ``priors``: this determines the 'prior belief' scores, i.e. initial belief score for each claim. See :any:`PriorBelief` for the available choices. Unless otherwise stated, the default for ``priors`` is :any:`PriorBelief.FIXED`. As well as returning final results with ``alg.run(mydata)``, iterative algorithms support returning an iterable of partial results as the algorithm iterates with :any:`run_iter` : :: from truthdiscovery import Sums alg = Sums() for results in alg.run_iter(mydata): print("Trust at iteration {}".format(results.iterations)) print(results.trust) For each of the algorithms below, please refer to the cited paper for details on how the algorithm operates and the meaning of any additional optional parameters. Sums and Average.Log ~~~~~~~~~~~~~~~~~~~~ Both algorithms were introduced by Pasternack and Roth [1]_, and do not accept any additional parameters. Investment and PooledInvestment ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Also introduced by Pasternack and Roth [1]_, both accept an optional parameter ``g``, which defaults to 1.2 for *Investment* and 1.4 for *PooledInvestment*. ``priors`` defaults to :any:`PriorBelief.VOTED` for *Investment* and :any:`PriorBelief.UNIFORM` for *PooledInvestment*. TruthFinder ~~~~~~~~~~~ Introduced in by Yin et. al. [2]_, *TruthFinder* has the following optional parameters: - ``influence_param`` (:math:`\rho` in the paper, default: 0.5) - ``dampening_factor`` (:math:`\gamma` in the paper, default: 0.3) - ``initial_trust`` (default: 0.9) The default mode of iteration is until convergence in :any:`DistanceMeasures.COSINE` with threshold 0.001. Unlike the other algorithms, *TruthFinder* always initialises the trust vector to fixed values (see ``initial_trust``) instead of the belief vector, so ``priors`` is not applicable. Examples ~~~~~~~~ :: from truthdiscovery import ( AverageLog, ConvergenceIterator, FixedIterator, Investment, MajorityVoting, PooledInvestment, Sums, TruthFinder ) # Perform 35 iterations, VOTED priors alg1 = Sums(iterator=FixedIterator(35), priors=PriorBelief.VOTED) # Iterate until L1 distance is less than 0.01 alg2 = AverageLog(iterator=ConvergenceIterator(DistanceMeasures.L1, 0.01)) # Iterate until convergence, but no more than 100 iterations # A ConvergenceError exception is raised if convergence within # 0.01 is not achieved within 100 iterations myit = ConvergenceIterator(DistanceMeasures.L_INF, 0.01, limit=100) alg3 = Investment(iterator=myit, g=1.15) References ---------- .. [1] Pasternack, Jeff and Roth, Dan, `Knowing What to Believe (When You Already Know Something) `_. .. [2] X. Yin and J. Han and P. S. Yu, `Truth Discovery with Multiple Conflicting Information Providers on the Web `_.