Csvs-nodejs

4 min readNov 3, 2023

This autumn I decided to dedicate a portion of my time to an open source project. I chose to work on a NodeJS app for csvs databases because I previously participated in the project and know their stack well.
csvs is a plain text database that uses git for storage and rollbacks. The NodeJS CLI utility synchronizes csvs databases with each other and with data in other formats — JSON, filesystem directories, social media backups. I took up the task of integrating the org-mode data format and providing reports on database statistics.

Org-mode export

Early September I picked up the org-mode export feature. Internally in csvs-nodejs each database is represented as a JSON object. So this entry

| datum     | uuid     |    actdate |    saydate | actname | sayname | files  |
| some text | 48c475bd | 1599-01-01 | 1599-01-01 | william | mary   | 7e7b8e |

| files  | uuid     | filename        | filehash  |
| 7e7b8e | b8c80611 | "firstpic.jpg"  | 8236ea5a0 |
| 7e7b8e | e92071cb | "secondpic.jpg" | 0aa477f97 |

is represented as this JSON object

{
  "_": "datum",
  "UUID": "48c475bd",
  "datum": "some text",
  "actdate": "1599-01-01",
  "saydate": "1599-01-01",
  "actname": "william",
  "sayname": "mary",
  "files": {
    "_": "files",
    "UUID": "7e7b8e",
    "items": [
      {
        "_": "file",
        "UUID": "b8c80611",
        "filename": "firstpic.jpg",
        "filehash": "8236ea5a0"
      },
      {
        "_": "file",
        "UUID": "e92071cb",
        "filename": "secondpic.jpg",
        "filehash": "0aa477f97"
      }
    ]
  }
}

Another way to represent database entries is org-mode markup format. org-mode is a more complex alternative to markdown or YAML which allows storing multiple plain text entries along with front matter metadata for each.

The same entry in org-mode looks like this

. *
:PROPERTIES:
:uuid: 48c475bd
:actdate: 1599-01-01
:saydate: 1599-01-01
:actname: william
:sayname: mary
:files: (:_ 'files' :UUID '7e7b8e' :items ((:_ 'file' :UUID 'b8c80611' :filename 'firstpic.jpg' :filehash '8236ea5a0')(:_ 'file' :UUID 'e92071cb' :filename 'secondpic.jpg' :filehash '0aa477f97')))
:END:
some text

According to the specification, this command has to read the database and output a valid org-mode file
csvs -i /path/to/database -t biorg

The import of a csvs database has already been handled and the cli returned a NodeJS stream of JSON entries. To support org-mode export I needed to add a WriteableStream that would turn each JSON entry into org-mode text and output that to stdout.

Org-mode import

Now I needed to parse that org-mode file and import it back into the CLI.

the org-mode-parser library did well at converting org-mode entries into JSON, and before long the first PR the first PR was ready.

Arrays were stored as Emacs Lisp property lists so I had to parse them separately. I learned from reading around the net that parsing involves tokenization, and building an abstract syntax tree. I found functions for each step in manila/node-lisp-parser and adapted them to plists.

I submitted the second PR and proceeded to statistics reports.

Csvs stats

In October I worked on reporting database schema and stats. The schema for each csvs database is stored in a metadir.json according to fetsorn/csvs-spec.

each key in the schema object represents a database entity, and if entity has a parent, its name is specified in the “trunk” field.
A branch without a trunk is called a root. Multiple roots are allowed. A branch that has a trunk is called a leaf.

{
  "datum": {
    "type": "string"
  },
  "saydate": {
    "trunk": "datum"
  },
  "sayname": {
    "trunk": "datum"
  },
  "files": {
    "trunk": "datum",
    "type": "array"
  },
  "file": {
    "trunk": "files",
    "type": "object"
  },
  "filename": {
    "trunk": "file",
    "type": "string"
  }
}

Each branch has a type — “string”, “number”, “object” or “array”. Values of type “array” have multiple leaves of type “object”.

I took up the task to add a --stats flag that would show the schema in the form of a tree.

csvs -i ./database --stats
entries: 65
datum
|- actdate
|- actname
|- files
  |- file
     |- filename

First I built a nested object representation of the schema, and then printed a tree line with indentation for each level of nesting. The PR was accepted soon after.

Biorg stats

The org-mode format does not specify a database schema, but it can be inferred from the structure of JSON objects returned from the parser. In case entries omit metadata, I looked for the one with the largest number of properties and printed as a tree and submitted the the PR.

csvs -i ./data.bi.org --stats
datum
|- actdate
|- actname
|- files
  |- file
     |- filename

Conclusion

While working on this project I learned to interact with the filesystem, got to know how to deal with NodeJS streams, and wrote my first parser. Thanks to my team at NobleScript and @fetsorn for supporting the initiative to develop open source and for the help with finding solutions.