Fix and convert file encoding using in Nodejs, Javascript

the moral of the day is…don’t become a developer / admin / engineer if you are to ashamed to admit you’re capable of producing totally no reasonable output after working on a bug for 5 hours.

I’ve been trying to read log files written by a Windows application on Windows server, into an object and then sort out specific data from each log entry in a file. Usually, I’ll have code like

//using npm fs-jetpack
var jetpack = require(‘fs-jetpack’);
var logFile = ‘logs.log’;
var logDir = jetpack.cwd(‘./logs’);
var data = logDir.read(logFile);

That seems clear enough. `data` should be a `String`.

So I proceed. I `data.split(‘\n’)` to get an array for with each line of the log entry as an element.

var splitByNewLine = data.split(‘\n’);

My goal is say

if (splitByNewLine[i].trim().indexOf(“needle”) > -1) {
console.log(‘Needle up, please dont sit’);
}

This is where the sun goes black. I kept on getting `-1`, meaning the `needle` is never found in `splitByNewLine[i]`.

I did all forms of conversions and comparisons. eg.

splitByNewLine[i].replace(/[^\x00-\x7F]/g, “”).trim().indexOf(“needle”);

I blamed myself for hours and shifted fingers to Javascript, OS, Nodejs versions, Native code compiling.. blah what didn’t i try or say (Ill leave the list in a comment). After googling,

Google
windos log files illegal characters linux encoding indexof write stream as plain textwww.google.com.ng

I got this lucky break;

http://superuser.com/questions/411214/what-could-cause-the-file-command-in-linux-to-report-a-text-file-as-data

Check out the 3rd answer.

file -D filename

I modified mine to

❯ file — mime-encoding — mime-type QQEJ120703.log 
QQEJ120703.log: text/plain; charset=utf-16le

Tried it out with other files,

 ❯ file — mime-encoding — mime-type end.json [13:02:22]
end.json: application/octet-stream; charset=binary
 ❯ file — mime-encoding — mime-type ../watcher.js [13:14:36]
../watcher.js: text/plain; charset=us-ascii

Then had to go read this

And figured it out with this post

The file I wanted to read `QQEJ120703.log` had `charset=utf-16le’. After a bit of digging around and work-arounds, I found, `npm iconv-lite`.

//var iconv = require(‘iconv-lite’)
//where `line` is a variable holding a Buffer.
var str = iconv.decode(line, 'utf16le');
//str becomes a simple string you can manipulate or search using str.indexOf();

I have to go finish the TL;DR on encoding and mime-types. It’s a very peculiar problem. I hope it helps that peculiar person who shouldn’t have to spend hours on something so trivial.

I’ll try out Nodejs fs module using the encoding option as that seems to be built in already. I just wanted something abit more cross-platform

If any one can summarize encoding and mime-types especially explaining what happened behind the scenes, that would be awesome.