Document categorization based on technicality

  • Studied the significance of Readability formulas, acquired from linguistic domain on document classification.
  • Developed a new input feature set extracted from graph-of-word representation of text.
  • Analysed and reported the impact of document length, different content words set in effective document classification.
  • Contributed two large dataset for machine learning based text analysis, available on GitHub for public use.
  • This method finds its usage as filtering attribute in advance search tools, user profile over web that includes user’s interests and behaviour while browsing web pages.