Data

Our raw, labeled, and engineered data

To access our datasets, kindly use the contact field and indicate which dataset you would like to receive

Project Step 3: Data Collection

  • Dataset 1: 170K Records from Telegram Channel of Sputnik Arabic (AR)
  • Dataset 2: 270K Records from Telegram Channel of Russia Today Arabic (AR)

Project Step 4: Data selection and preparation

  • Dataset 3: 4000 unique records from Sputnik and RT – top views and forwarded – AR
  • Dataset 4: 4000 translated unique records from Sputnik and RT – top views and forwarded – ENG
  • Dataset 5: 500 posts with the most engagements prepared for human labelling (combined dataset Sputnik and RT)

Project Step 6: Human labelling and research findings

  • Dataset 6: 275 posts humanly labelled and connected to narratives

Project step 8: Machine Learning Experiment

  • Dataset 7: Machine labelling of 275 posts from human dataset for comparison and calculated evaluation metrics
  • Dataset 8: Machine labelling of ~1500 additional posts for human evaluation with categorical labels from (partially) correct to (partially) incorrect and explanations of the choice