Cleaning Data

Gathering Data  Cleaning Data  Analysing Data  Presenting Data

Clean data is critical to a successful social media monitoring exercise. We at SInotech have seen many other systems full of inaccurate, irrelevant data – which end up costing users hours and hours of manual cleaning.

To avoid this we built intelligent automated systems which address the sources of irrelevant data, in particular:

  • Adverts and navigation text: we developed a specific algorithm which recognises which parts of a web page is real content, as opposed to navigation text and adverts. By focusing on the real content we ensure that you do not have to sift through many adverts for your brands
  • Spam: several layers of smart pattern matching algorithms automatically recognise if a piece of text (or a whole website!) is truly genuine; spam text is discarded immediately
  • Dates: our Crawler applies several types of logic to try and identify the date when each piece of content was posted; this means that you can accurately filter your brand’s mentions by date range, and do not see mentions dating from one year ago!
  • Duplicates: our own fingerprinting technology ensures that the same mention is not picked up twice, giving a true measure of the size of your coverage, not inflated numbers
  • Loose query definition: most of the time, a brand’s name is too generic to provide relevant results (think Apple or GAP). To address this our query definition engine supports advanced query definition syntax, including some special fields which allow for far more accurate query setups

Combined together, this makes SIP:Enterprise a best-in-class system for data quality – something we are extremely proud of, and which saves our Customers large amounts of time and money.



Copyright © 2007-2011. SinoTech Group (China) Limited. All Rights Reserved. All other trademarks and third party references property of their respective owners. 京ICP证070353号