Document Classification and Management of Unstructured Data: Leveraging Microsoft Purview Sensitive Information Types (SITs)
In this blog post, our Associate Consultant Robin Groh outlines a systematic approach to setting up SITs in the MS Purview Compliance portal, focusing on building a logical and comprehensive classification system.
In today’s data-driven business environment, effectively managing and classifying documents is crucial for operational efficiency and compliance. Microsoft Purview offers a powerful suite of tools to help organizations classify documents that are saved in different Microsoft Applications through SITs. Moving forward, let’s take a closer look at the three crucial steps in this strategy.
1. Developing a Categorization Framework
The first step in setting up SITs in Purview is to establish a categorization framework that aligns with your organization’s structure and operational domains. In this blogpost, we build up an example using three hierarchical levels: Business Domains, Functional Areas, and Document Categories.
Business Domains: Identify the different domains within your company, such as Human Resources, Finance, Legal & Procurement, Management, IT, Research and Development, Marketing, and Sales. These domains represent the broadest classification level and serve as the foundation for further categorization.
Functional Areas: Within each business domain, identify specific functional areas that further describe the domain’s activities. For example, within the Finance domain, functional areas might be Budgeting, Accounting, and Financial Planning, among others.
Document Categories: At the most granular level, classify documents based on their content and purpose within each functional area. For instance, within the Accounting area of the Finance domain, categories might include Invoices, Financial Statements, and Tax Documents.
2. Keyword Identification and Classification
After establishing a categorization framework, the next step is to identify and classify keywords that will be used to tag and classify documents within Purview. Start with a set of documents you already know should be categorized, such as financial statements, and conduct a thorough review to identify relevant keywords. It is also advisable to research similar documents online to gather a broader set of examples. Make sure to have enough documents for adequate analysis and testing.
Classify identified keywords into four categories:
Must Have: Keywords that are essential for a document to be classified within a specific category.
Should Have: Keywords that are commonly found in the document type but are not essential.
Could Have: Keywords that might appear in the document but are less common.
Must Not Have: Keywords that should not appear in the document, helping to reduce false positives.
3. Logic Construction in Purview
With your keywords categorized, the final step is to build your classification logic within Purview. When configuring SITs, prioritize quality over quantity — focus on the most relevant keywords to ensure accurate classification. Aim to minimize false negatives (incorrectly excluding relevant documents) rather than minimizing false positives (incorrectly including irrelevant documents). “Must Not Have” keywords should be used judiciously to prevent documents from being wrongly classified.
Conclusion
Setting up SITs in Microsoft Purview requires a systematic approach that starts with a solid categorization framework and careful keyword identification. By aligning your classification logic with your organization’s operational structure and focusing on precision, you can enhance the management and security of your documents saved in different Microsoft Applications. This process not only aids in compliance and data protection but also enhances operational efficiency by ensuring that documents are accurately classified and easily retrievable.
Best regards,
Robin Groh
If you like the article, make sure to:
- 🔔 Follow us on Medium
- 📰 Read more about what we do on the EMPA website
- 🔔 Connect on LinkedIn