Open Role — Architect: Data Manufacturing
I’ve slowly come to realize how central the manufacturing of data is to Bloomberg’s mission of providing decision makers with a critical edge. Certainly, when I started here in 2014, I vaguely understood that data — like fundamental company data or breaking news — had some impact on the company and our clients. I was also dimly aware that there was some process in which we found sources of relevant data and made them accessible to our clients. But, it took a long time before I fully realized the complexity and scope of this process. From where I now sit, the impact of this manufacturing process, which converts raw incoming materials into something more easily digestible, is enormous.
One aspect of this manufacturing process is the conversion of data from an unstructured or semi-structured form into a structured form. As I described in my EMNLP 2018 keynote, the extraction of data from text and tables (which I love dearly) is a complex scientific challenge in its own right. However, deploying machine learning systems requires configuration per task, and, as the breadth of domains in which we operate is large, this configuration cost is non-trivial. Beyond that, in many cases, existing state-of-the-art methods do not reach the quality bar required in the financial domain. These challenging domains require human intervention to remediate and to perform quality assurance.
However, even when the incoming data is structured, there can still be significant challenges. For example, incoming structures must be mapped to internal data schemas. Often, incoming structured data can be noisy and may occasionally change in format, therefore requiring monitoring and remediation. Normalization of incoming structured data also sometimes requires the application of complex accounting rules (e.g., recalculating reported income with respect to GAAP standards). Often, human intervention is also needed to reconfigure past streams and move them to a new data schema, or to clean past data streams that haven’t historically been addressed sufficiently.
This process is distinguished from a mere collection of data pipelines. Data manufacturing is a different sort of work. Unlike a data pipeline, data manufacturing requires a repeated human and machine process of configuring and maintaining data ingestion flows. This central process of human quality control is a central aspect that distinguishes this data manufacturing process from other automated data processing pipelines. As such, Bloomberg’s Global Data team operates an enormous data manufacturing operation.
Therefore, it is with great pleasure that I’m happy to announce that my Data Science team in the Office of the CTO at Bloomberg has an opening for a very senior data science architect to work directly in this area. My hope is that, together, we will push the state of the art in understanding what a data manufacturing operation should look like and bring increased transparency to the global capital markets by delivering cleaner data — even faster — to our clients.
If you’re interested, please reach out to Jen Carberry jcarberry7@bloomberg.net.
