Diving into the Data Lake at NYU Langone

NYU Langone Health Tech Hub
NYU Langone Health Tech Hub
3 min readJun 12, 2019

In a recent survey (1), healthcare executives listed digital transformation, advanced analytics and artificial intelligence (AI) as their top priorities for 2019.

This comes as no surprise. Information and technology (together and separately) are now foundational to healthcare delivery, and the importance of data is second to none. However, significant challenges remain. A report (2) by the International Data Corporation (IDC) projects that healthcare data will grow faster than in manufacturing, financial services or media, yet the report scores the healthcare industry below average in “data-readiness,” a measure of data management, utilization and monetization. Common barriers to successful data analytics include difficulty in accessing and integrating multiple, disparate data sources; excessive time spent in preparing data sets for analysis; and poor data management/governance (3).

Our Approach

At NYU Langone Health, we are overcoming these barriers by propagating data democracy and empowering our data citizens.

This is no easy task. In an academic medical center such as ours, data citizens are very diverse, ranging from clinical analysts, informaticists and data scientists to researchers and bright students with fresh ideas. Their data requirements vary widely, from diverse internal clinical and non-clinical data to large and small external data sets, and include diverse tools such as SQL, SAS, Python, R, Tableau and even Microsoft Excel. Much of the data is sensitive in nature; its security, scalability and governance being key considerations.

After considering myriad options from conventional databases and data warehousing to data virtualization and NoSQL, we took a big step toward facile and secure data access by establishing an innovative data lake built on the Hadoop data management platform. We now have a secure, scalable platform that allows us to bring diverse internal and external data sets together in one place without prematurely binding users to predefined, static data models.

Benefits

Although in its early stages, we are already seeing a number of benefits of this approach. The data lake has reduced the operational burden on our conventional analytic servers. It has freed up our ”data citizens” to develop their own data models without constraints, and also saved on the time and expense of data transformation (extract-transform-load or ETL). Last but not the least, it is allowing us to take a step forward in our already mature enterprise data governance strategy, through metadata integration. Here is a sampling of our use cases.

· The data lake now supports our Clinical Quality and Effectiveness analysis team in using advanced analytics to compute and report a variety of internal and external quality and safety measures using SAS, Python and other tools.

· It enables our predictive analytics unit to get the data they need for training predictive models using Python.

· It is providing a foundation to establish an integrated cardiovascular data repository.

· And it is spawning agile innovation. To give one example, an Internal Medicine resident skilled in analytics has leveraged the data lake to create a panel management tool for our internal medicine residents in Brooklyn that enables them to monitor relevant outcomes and identify diabetic and hypertensive patients that would benefit from additional care such as home visits, telephone calls and additional clinic appointments. By providing opportunities for targeted interventions, this potent tool facilitates the achievement of quality metrics for the care of these conditions.

The data lake is just one component of our enterprise analytic strategy, but it opens new opportunities for operational efficiency and innovation. And, as Silicon Valley Congresswoman Anna Eshoo so aptly put it (4), innovation is the calling card of the future.

  • Rajan Chandras, MS, Director, Data Architecture and Strategy — Connect with me on LinkedIn

[1] https://www.hcinnovationgroup.com/population-health-management/news/13031072/survey-digital-ai-top-priorities-in-2019-but-ehrs-will-dominate-it-spend

[2] https://healthitanalytics.com/news/big-data-to-see-explosive-growth-challenging-healthcare-organizations

[3]https://healthitanalytics.com/resources/white-papers/democratizing-data-for-healthcare-success

[4] https://techcrunch.com/2012/04/23/keen-on-congresswoman-anna-eshoo-what-washington-dc-can-learn-from-silicon-valley-tctv/

--

--