Algorithms

The algorithms implemented in this library are:

  • Majority voting (baseline algorithm)
  • Sums [1]
  • Average.Log [1]
  • Investment [1]
  • PooledInvestment [1]
  • TruthFinder [2]

Each algorithm is represented by a class. Usage is roughly as follows.

alg = AlgorithmClass(param1=val1, ...)
results = alg.run(mydata)

The available parameters depend on the algorithm, and are listed in detail below. See Output for details on the format of the results as returned by the run() method.

Majority voting

Majority voting is a simple baseline algorithm where all sources are assigned a trust score of 1, and the belief score for a variable taking a given value is simply the number of sources who make that assertion, divided by the maximum number of sources for a claim (to ensure all belief scores are between 0 and 1). There are no optional parameters.

from truthdiscovery import MajorityVoting
voting = MajorityVoting()
results = voting.run(data)

Iterative algorithms

Apart from majority voting, all algorithms are iterative, in that they initialise the trust and belief scores to initial values, and iteratively update them until some stopping criterion is satisfied.

Optional parameters common to all iterative algorithms are:

  • iterator: this controls the mode of iteration and the stopping criterion. It should be an Iterator instance. There are two types of iterator available: FixedIterator, where a fixed number of iterations are performed, and ConvergenceIterator, where iteration continues until the distance between successive trust scores becomes lower than a given threshold.

    Unless otherwise stated, the default iterator is a FixedIterator for 20 iterations.

  • priors: this determines the ‘prior belief’ scores, i.e. initial belief score for each claim. See PriorBelief for the available choices.

    Unless otherwise stated, the default for priors is PriorBelief.FIXED.

As well as returning final results with alg.run(mydata), iterative algorithms support returning an iterable of partial results as the algorithm iterates with run_iter :

from truthdiscovery import Sums
alg = Sums()
for results in alg.run_iter(mydata):
    print("Trust at iteration {}".format(results.iterations))
    print(results.trust)

For each of the algorithms below, please refer to the cited paper for details on how the algorithm operates and the meaning of any additional optional parameters.

Sums and Average.Log

Both algorithms were introduced by Pasternack and Roth [1], and do not accept any additional parameters.

Investment and PooledInvestment

Also introduced by Pasternack and Roth [1], both accept an optional parameter g, which defaults to 1.2 for Investment and 1.4 for PooledInvestment.

priors defaults to PriorBelief.VOTED for Investment and PriorBelief.UNIFORM for PooledInvestment.

TruthFinder

Introduced in by Yin et. al. [2], TruthFinder has the following optional parameters:

  • influence_param (\(\rho\) in the paper, default: 0.5)
  • dampening_factor (\(\gamma\) in the paper, default: 0.3)
  • initial_trust (default: 0.9)

The default mode of iteration is until convergence in DistanceMeasures.COSINE with threshold 0.001.

Unlike the other algorithms, TruthFinder always initialises the trust vector to fixed values (see initial_trust) instead of the belief vector, so priors is not applicable.

Examples

from truthdiscovery import (
    AverageLog,
    ConvergenceIterator,
    FixedIterator,
    Investment,
    MajorityVoting,
    PooledInvestment,
    Sums,
    TruthFinder
)

# Perform 35 iterations, VOTED priors
alg1 = Sums(iterator=FixedIterator(35), priors=PriorBelief.VOTED)

# Iterate until L1 distance is less than 0.01
alg2 = AverageLog(iterator=ConvergenceIterator(DistanceMeasures.L1, 0.01))

# Iterate until convergence, but no more than 100 iterations
# A ConvergenceError exception is raised if convergence within
# 0.01 is not achieved within 100 iterations
myit = ConvergenceIterator(DistanceMeasures.L_INF, 0.01, limit=100)
alg3 = Investment(iterator=myit, g=1.15)

References

[1](1, 2, 3, 4, 5, 6) Pasternack, Jeff and Roth, Dan, Knowing What to Believe (When You Already Know Something).
[2](1, 2) X. Yin and J. Han and P. S. Yu, Truth Discovery with Multiple Conflicting Information Providers on the Web.