Executive Summary
While supporting Research & Analytics initiatives within Citigroup’s Manager’s Control Assessment (MCA) governance environment, I identified an opportunity to automate a highly manual regulatory traceability and data classification exercise involving large-scale operational risk and governance data sets aggregated from dozens of independent MCA tracking systems across the enterprise.
The organization’s initial strategy relied on a distributed “divide and conquer” manpower approach in which multiple teams manually reviewed service request ticket data in an attempt to identify and classify regulator-related primary key identifiers associated with:
- Controls,
- Activities,
- Assessments,
- Risks,
- and Processes.
After evaluating both the data structures and the operational approach, I recognized that the scale and ambiguity of the data created a significant probability of human classification error. Rather than relying solely on brute-force manual review, I designed and developed an NLP-inspired contextual classification solution leveraging n-gram logic and confidence scoring techniques to automate identifier recognition and improve classification accuracy.
The resulting process achieved more accurate results than the distributed manual review effort while dramatically accelerating analysis across the full enterprise data set.
Business Context
Following the global financial crisis and subsequent regulatory scrutiny, Citigroup significantly expanded enterprise governance, operational risk management, and regulatory traceability programs through the implementation of the Manager’s Control Assessment (MCA) framework.
MCA became a foundational governance structure supporting:
- operational risk management,
- regulatory reporting,
- audit readiness,
- control monitoring,
- issue management,
- and enterprise compliance activities.
The framework relied heavily on interconnected governance entities including:
- Business Activities,
- Risks,
- Controls,
- Assessments,
- Processes,
- Corrective Action Plans (CAPs),
- and related operational governance artifacts.
As part of a broader analytics and governance initiative, Citigroup Research & Analytics leadership assembled a large enterprise data set containing service request ticket information sourced from dozens of independent MCA tracking systems across the organization.
The objective was to identify and classify regulator-related governance identifiers embedded throughout the data in order to support enterprise governance analysis, regulatory traceability, and operational risk visibility.
Operational Challenge
The primary challenge involved accurately identifying and classifying governance-related primary key identifiers embedded within highly fragmented and inconsistent ticketing data.
The organization initially approached the effort using a traditional manpower-scaling strategy:
- dividing the data across multiple teams,
- assigning ticket populations to individuals,
- and manually reviewing records to locate relevant identifiers.
However, after analyzing both the data structures and the governance taxonomy, I recognized several major operational problems:
- extremely large data volumes,
- inconsistent formatting across source systems,
- contextual ambiguity,
- duplicate identifier values,
- and high probability of human error during manual review.
A particularly significant challenge involved overlapping primary key values across multiple governance domains.
The same identifier values could potentially correspond to:
- Controls,
- Activities,
- Risks,
- Assessments,
- or Processes
depending entirely upon the surrounding textual context within the ticket data.
As a result, simple keyword matching or direct identifier searches would not produce reliable classification results.
The operational challenge therefore became one of contextual interpretation and confidence-based classification rather than simple data extraction.
Transformation Opportunity
Rather than scaling the problem linearly through additional manual review effort, I identified an opportunity to apply lightweight NLP-style contextual analysis techniques to automate identifier recognition and improve classification accuracy.
The opportunity involved:
- reducing manual review effort,
- improving governance data quality,
- increasing regulatory traceability accuracy,
- and accelerating enterprise-scale data analysis.
By incorporating contextual language analysis around each identified primary key occurrence, it became possible to infer the most probable governance classification associated with ambiguous identifiers.
This transformed the initiative from a brute-force search exercise into a structured operational intelligence and contextual classification problem.
Solution Development
After analyzing the governance identifier structures and associated ticket data, I designed and developed an automated classification process leveraging:
- contextual n-gram logic,
- surrounding lexical analysis,
- identifier pattern recognition,
- and confidence-based classification techniques.
The solution evaluated the words and phrases immediately surrounding identified primary key values within the service request ticket population in order to determine the most likely governance category associated with each identifier occurrence.
By analyzing contextual language patterns adjacent to each identifier, the process was able to:
- distinguish overlapping identifier values,
- improve classification confidence,
- reduce ambiguity,
- and automate large portions of the manual governance analysis process.
The automated workflow effectively simulated lightweight NLP-style contextual reasoning without requiring large-scale machine learning infrastructure.
Once developed and tested, the process was executed across the full enterprise data set and consistently produced results that were both:
- more accurate,
- and substantially faster
than the distributed manual review effort being performed across multiple teams.
In many cases, the automated analysis completed before individual review teams had finalized their own assigned ticket populations.
Leadership & Execution
Although the broader initiative initially relied upon distributed manual analysis methods, I independently recognized both the operational limitations and scalability risks associated with the approach.
My role evolved from participant within the review effort to designing and operationalizing a more scalable intelligence-driven analysis framework capable of improving both execution quality and analytical confidence.
The effort required:
- data structure analysis,
- governance taxonomy interpretation,
- contextual classification design,
- automation logic development,
- and operational validation.
Most importantly, the initiative demonstrated the ability to apply systems thinking and operational intelligence concepts within highly regulated enterprise governance environments.
Operational Outcomes
The NLP-inspired governance classification process delivered several significant operational benefits, including:
- improved primary key classification accuracy,
- reduced manual review effort,
- accelerated enterprise-scale governance analysis,
- improved operational consistency,
- enhanced regulatory traceability visibility,
- and reduced human classification error.
The process also demonstrated the viability of applying contextual analysis and lightweight NLP reasoning techniques to large-scale enterprise governance and operational risk management workflows.
By replacing brute-force manual processing with intelligent contextual classification, the initiative improved both execution speed and governance data quality across the broader analytics effort.
Strategic Insight
This initiative reinforced the principle that operational governance challenges often evolve into contextual intelligence problems at enterprise scale.
As governance ecosystems become increasingly interconnected, traditional manual review approaches become operationally unsustainable due to:
- data volume,
- ambiguity,
- fragmentation,
- and contextual complexity.
The project demonstrated how lightweight NLP-inspired techniques, contextual reasoning, and confidence-based classification can significantly improve operational governance analysis without requiring large-scale AI infrastructure.
Long before enterprise AI governance tooling became mainstream, this initiative represented an early example of combining:
- operational intelligence,
- contextual NLP reasoning,
- governance analytics,
- regulatory traceability,
- and workflow automation
to improve enterprise-scale regulatory and operational risk analysis.