Managing large data

Direction of solution

Abhijit Deshpande
Technical Series
3 min readMar 14, 2014

--

In the previous posting, I presented the problem side. Thanks for the responses. The problem with the responses was common. It ventured in implementation of new system that had negatives along with few positives. The major negative was who will manage the new system. I feel, the basic assumption was, support systems in corporates are geared to manage variety if IT solutions (ignore the support in IT sector). If this assumption (support farmework can manage newer IT systems) gets invalidated, then my team owns the management. Is this what I desire?

I need a SIMPLE solution. How many “degree’s of freedom” needed to manage a simple system? Cannot forget the basics.

By the way, did someone use the expicit hint provided in previous post?

Lets sum up:

  • Need a SIMPLE, EFFECTIVE and COMPLETE solution
  • Minimal overheads of managing the system (happy support teams).
  • Give me ability to store any document / file (image, exe, dll, pdf, doc, xls, ppt, zip… you name it). Why not video (1 gig)?
  • Easy retrieval (with controlled by access)
  • Easy Backup, Restore , server migrations etc (without headaches, nightmares, war-room please, had enough!!!)

As mentioned, we do not need invent, just discover (may be, at this point of time, innovation is “also” not expected).

Since 2007, there was a parallel school of thoughts related to Database vs File System? Is data structured or unstructured? How to manage unstructured data? By CLOB or BLOB mechanisms (i.e convert the unstructured to structured one). The problems were obvious:

  • transactional consistency
  • data manageability and recoverability
  • security
  • performance

Enter FILESTREAM

The release of SQL Server (since 2008 version — pretty long, huh!) introduced FILESTREAM having intent to provide a “best of both worlds” solution, offering the advantages below.

  • Excellent NTFS (it’s Microsoft) data streaming performance — makes reading and writing the BLOB data much faster.
  • Faster client access methods — Client applications can request access to the FILESTREAM data files directly without bothering the database operations
  • Reduced overhead on SQL Server
  • Transactional consistency — any transaction errors will roll back. Imagine record inserted but file copy failed due to space constraint (or vice versa)?
  • Significantly easier backup management with “point-in-time” restores (till yesterday morning @ 11:23 am, everything was fine? restore that).
  • Improved security management — permissions granted or denied on a FILESTREAM column just as for any other SQL Server columns.

Negatives, we need to list them too.

  • Bit of learning curve
  • Reliability of implementation (after they are also programmers and coders and testers like us)
  • Enterprise guarantee. (“pigs will fly” syndrome —will they shit on my head)?
  • Scalability
  • Any more are welcome from you? Yes you? Just leave a note beside this →

Lets trim these negatives

  • Learning curve: C++, C, C#, VB
    One book (approx. 350 pages with examples and code) and you are an expert. Still afraid of this?
  • Reliability of implementation → Its 5+ years now
  • Enterprise guarantee → Again 5+ years and not much noise around. If you have enterprise license, Microsoft will support you ON SITE.
  • Scalability → Its coming to Azure.

This must be a dream?

If you ask me as a programmer, architect, entrepreneur yes this is a dream — dream come true.

It has been three days of efforts and this will go live tomorrow.

☺☺☺☺☺☺☺☺☺☺☺☺

--

--

Abhijit Deshpande
Technical Series

Co-Founder Revamp Consulting.Investor, Board of Directors, Eki Communications