Fix and convert file encoding using in Nodejs, Javascript
the moral of the day is…don’t become a developer / admin / engineer if you are to ashamed to admit you’re capable of producing totally no reasonable output after working on a bug for 5 hours.
I’ve been trying to read log files written by a Windows application on Windows server, into an object and then sort out specific data from each log entry in a file. Usually, I’ll have code like
//using npm fs-jetpack
var jetpack = require(‘fs-jetpack’);
var logFile = ‘logs.log’;
var logDir = jetpack.cwd(‘./logs’);
var data = logDir.read(logFile);
That seems clear enough. `data` should be a `String`.
So I proceed. I `data.split(‘\n’)` to get an array for with each line of the log entry as an element.
var splitByNewLine = data.split(‘\n’);
My goal is say
if (splitByNewLine[i].trim().indexOf(“needle”) > -1) {
console.log(‘Needle up, please dont sit’);
}This is where the sun goes black. I kept on getting `-1`, meaning the `needle` is never found in `splitByNewLine[i]`.
I did all forms of conversions and comparisons. eg.
splitByNewLine[i].replace(/[^\x00-\x7F]/g, “”).trim().indexOf(“needle”);
I blamed myself for hours and shifted fingers to Javascript, OS, Nodejs versions, Native code compiling.. blah what didn’t i try or say (Ill leave the list in a comment). After googling,
windos log files illegal characters linux encoding indexof write stream as plain textwww.google.com.ng
I got this lucky break;
Check out the 3rd answer.
file -D filename
I modified mine to
❯ file — mime-encoding — mime-type QQEJ120703.log
QQEJ120703.log: text/plain; charset=utf-16le
Tried it out with other files,
❯ file — mime-encoding — mime-type end.json [13:02:22]
end.json: application/octet-stream; charset=binary
❯ file — mime-encoding — mime-type ../watcher.js [13:14:36]
../watcher.js: text/plain; charset=us-ascii
Then had to go read this
Q: Is Unicode a 16-bit encoding? A: No. The first version of Unicode was a 16-bit encoding, from 1991 to 1995, but…unicode.org
And figured it out with this post
This forum is for the discussion of Solaris and OpenSolaris. Solaris / OpenSolaris General Sun, SunOS and Sparc related…www.linuxquestions.org
The file I wanted to read `QQEJ120703.log` had `charset=utf-16le’. After a bit of digging around and work-arounds, I found, `npm iconv-lite`.
//var iconv = require(‘iconv-lite’)
//where `line` is a variable holding a Buffer.
var str = iconv.decode(line, 'utf16le');
//str becomes a simple string you can manipulate or search using str.indexOf();
I have to go finish the TL;DR on encoding and mime-types. It’s a very peculiar problem. I hope it helps that peculiar person who shouldn’t have to spend hours on something so trivial.
I’ll try out Nodejs fs module using the encoding option as that seems to be built in already. I just wanted something abit more cross-platform
If any one can summarize encoding and mime-types especially explaining what happened behind the scenes, that would be awesome.