Python vs. R vs. COBOL: Which is best for Data Science?

Rob Story
6 min readOct 27, 2015

--

We put the hottest languages to the test!

Sorting an Array

R: sort(x)Python: numpy.sort(x)COBOL: IDENTIFICATION DIVISION. 
PROGRAM-ID. SORT01.
AUTHOR. SHIBU.T.
DATA DIVISION.
WORKING-STORAGE SECTION.
01 TBL.
02 WS-TBL OCCURS 10.
05 WS-FLD PIC 99.
05 WS-FLD1 PIC X(3).
05 WS-FLD2 PIC 99.
01 WS-TAB-HLD.
05 WK-FLD PIC 99.
05 WK-FLD1 PIC X(3).
05 WK-FLD2 PIC 99.
01 WS-I PIC 99.
01 WS-J PIC 99.
01 K PIC 99.
PROCEDURE DIVISION.
MOVE ’01AAA25’ TO WS-TBL(1)
MOVE ’01BBB20’ TO WS-TBL(2)
MOVE ’04CCC26’ TO WS-TBL(3)
MOVE ’01DDD10’ TO WS-TBL(4)
MOVE ’05EEE26’ TO WS-TBL(5)
MOVE ’04FFF30’ TO WS-TBL(6)
DISPLAY ‘>>>>>>>>BEFORE SORT<<<<<<<<‘
PERFORM VARYING WS-I FROM 1 BY 1 UNTIL WS-I > 6
DISPLAY WS-TBL(WS-I)
END-PERFORM.
DISPLAY ‘>>>>>>>>ASCENDING ORDER<<<‘
PERFORM VARYING WS-I FROM 1 BY 1 UNTIL WS-I = 7
PERFORM VARYING WS-J FROM WS-I BY 1 UNTIL WS-J > 6
IF WS-FLD(WS-J) < WS-FLD(WS-I) THEN
MOVE WS-TBL(WS-I) TO WS-TAB-HLD
MOVE WS-TBL(WS-J) TO WS-TBL(WS-I)
MOVE WS-TAB-HLD TO WS-TBL(WS-J)
END-IF
END-PERFORM
END-PERFORM.
PERFORM VARYING WS-I FROM 1 BY 1 UNTIL WS-I > 6
DISPLAY WS-TBL(WS-I)
END-PERFORM.
DISPLAY ‘>>>>>>>>DESCENDING ORDER<<<<<<‘
PERFORM VARYING WS-I FROM 1 BY 1 UNTIL WS-I = 7
PERFORM VARYING WS-J FROM WS-I BY 1 UNTIL WS-J > 6
IF WS-FLD(WS-J) > WS-FLD(WS-I) THEN
MOVE WS-TBL(WS-I) TO WS-TAB-HLD
MOVE WS-TBL(WS-J) TO WS-TBL(WS-I)
MOVE WS-TAB-HLD TO WS-TBL(WS-J)
END-IF
END-PERFORM
END-PERFORM.
PERFORM VARYING WS-I FROM 1 BY 1 UNTIL WS-I > 6
DISPLAY WS-TBL(WS-I)
END-PERFORM.
STOP RUN.

Working with tables of data:

R: x[1]Python:x.iloc[1]COBOL: IDENTIFICATION DIVISION.
PROGRAM-ID. HELLO.

DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-TABLE.
05 WS-A OCCURS 3 TIMES.
10 WS-B PIC A(2).
10 WS-C OCCURS 2 TIMES.
15 WS-D PIC X(3).

PROCEDURE DIVISION.
MOVE '12ABCDEF34GHIJKL56MNOPQR' TO WS-TABLE.
DISPLAY 'WS-TABLE : ' WS-TABLE.
DISPLAY 'WS-A(1) : ' WS-A(1).
DISPLAY 'WS-C(1,1) : ' WS-C(1,1).
DISPLAY 'WS-C(1,2) : ' WS-C(1,2).
DISPLAY 'WS-A(2) : ' WS-A(2).
DISPLAY 'WS-C(2,1) : ' WS-C(2,1).
DISPLAY 'WS-C(2,2) : ' WS-C(2,2).
DISPLAY 'WS-A(3) : ' WS-A(3).
DISPLAY 'WS-C(3,1) : ' WS-C(3,1).
DISPLAY 'WS-C(3,2) : ' WS-C(3,2).

STOP RUN.

Searching for a value in a Table:

R: which(sapply(df, function(x) any(month == "January")))Python: df[df.month == "January"]COBOL: IDENTIFICATION DIVISION.
PROGRAM-ID. BINSRCH1.
The binary search
reads every input record
after looking up the employee’s month of hire on a table,
by a sequential search, it writes it out to an output file
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
INPUT FILE EMP
SELECT INPUT-FILE ASSIGN EMP.
REPORTFI: A REPORT FILE, PRINTS OUT INFORMATION ON EMPLOYEES
WITH MONTH OF HIRE, SEND TO PRINTER
SELECT REPORT-FILE ASSIGN REPORTFI.
DATA DIVISION.
FILE SECTION.
FD INPUT-FILE
RECORDING MODE IS F
RECORD CONTAINS 80 CHARACTERS.
01 INPUT-RECORD.
INPUT RECORD DESCRIPTION
05 FILLER PIC X(08).
05 FILLER PIC X(01).
05 ER-EMPLOYEE-NUMBER PIC X(05).
05 FILLER PIC X(01).
05 ER-EMPLOYEE-NAME PIC X(25).
05 ER-EMPLOYEE-DEPARTMENT PIC X(05).
05 FILLER PIC X(01).
05 ER-EMPLOYEE-SALARY-CODE PIC X(02).
05 FILLER PIC X(01).
MONTH OF HIRE CAN BE DEFINED AS CHARACTER (ALPHANUMERIC)
05 ER-MONTH-OF-HIRE PIC X(02).
05 FILLER PIC X(29).
0 FD REPORT-FILE
RECORDING MODE IS F
RECORD CONTAINS 133 CHARACTERS.
01 REPORT-RECORD PIC X(133).
WORKING-STORAGE SECTION.
01 FILE-AT-END PIC X VALUE ‘N’.01 SW-VALID-RECORD PIC X VALUE ‘Y’.01 COUNTERS-AND-ACCUMULATORS.
05 CTR-RECORDS-READ PIC 9(5)
PACKED-DECIMAL VALUE 0.
05 CTR-RECORDS-WRITTEN PIC 9(5)
PACKED-DECIMAL VALUE 0.
01 TITLE-HEADING-LINE.
05 FILLER PIC X(1) VALUE SPACES.
05 FILLER PIC X(35)
VALUE ‘EMPLOYEE RECORDS WITH MONTH OF HIRE’.
05 FILLER PIC X(04) VALUE SPACES.
05 FILLER PIC X(33)
VALUE SPACES.
05 REPORT-DATE.
10 REPORT-YY PIC 99.
10 REPORT-MM PIC 99.
10 REPORT-DD PIC 99.
01 DETAIL-PRINT-LINE.
05 FILLER PIC X(1) VALUE SPACES.
05 DL-MONTH-OF-HIRE PIC X(09).
05 DL-RECORD-IMAGE PIC X(80) VALUE SPACES.
01 MONTH-TABLE-LITERALS.
This is hard coding a table.
most explanations are meaningless,
until you study this carefully to see what is happening
these are fillers, because you
won’t be referring to them directly, by name
notice how the code (01) is written right
beside the name (january)
all the pictures must be the same
the literals inside of quotes don’t have
to be the same length, but many will code them that way
05 FILLER PIC X(11) VALUE ‘01JANUARY’.
05 FILLER PIC X(11) VALUE ‘02FEBRUARY’.
05 FILLER PIC X(11) VALUE ‘03MARCH’.
05 FILLER PIC X(11) VALUE ‘04APRIL’.
05 FILLER PIC X(11) VALUE ‘05MAY’.
05 FILLER PIC X(11) VALUE ‘06JUNE’.
05 FILLER PIC X(11) VALUE ‘07JULY’.
05 FILLER PIC X(11) VALUE ‘08AUGUST’.
05 FILLER PIC X(11) VALUE ‘09SEPTEMBER’
05 FILLER PIC X(11) VALUE ‘10OCTOBER’.
05 FILLER PIC X(11) VALUE ‘11NOVEMBER’.
05 FILLER PIC X(11) VALUE ‘12DECEMBER’.
0* Redefines means that this 01 level item
occupies the same spot in memory as the one it redefines
so actually the two 01 levels are
the same thing with different names and different picture
01 MONTH-TABLE REDEFINES MONTH-TABLE-LITERALS.
Next item must occur as many times
as there are fillers in the preceding 01
its picture or the pictures under it
must add up to the same number as the
picture in the fillers above (11 in this example)
05 EACH-MONTH-INFO OCCURS 12 TIMES
You need the indexed by clause
if you’re going to use the search verb
this defines and creates the index -
so no pictures for the index, please
The ascending (or descending) key clause
is required for a binary search
of course, the data must actually be in order
or this won’t work right
ASCENDING KEY IS EACH-month-number
INDEXED BY MONTH-INDEX.
10 EACH-MONTH-NUMBER PIC XX.
10 EACH-MONTH-NAME PIC X(09).
PROCEDURE DIVISION.
PERFORM INITIALIZATION
PERFORM PROCESS-ALL UNTIL
FILE-AT-END = ‘Y’
PERFORM TERMINATION
GOBACK.
INITIALIZATION.
OPEN INPUT INPUT-FILE
OUTPUT REPORT-FILE
WRITE REPORT-RECORD FROM TITLE-HEADING-LINE
PERFORM READ-PAR
* accept gets today’s date from the system
ACCEPT REPORT-DATE FROM DATE.
PROCESS-ALL.
PERFORM LOOKUP-MONTH
MOVE INPUT-RECORD TO dl-reCORD-IMAGE
WRITE REPORT-RECORD FROM DETAIL-PRINT-LINE
PERFORM READ-PAR.TERMINATION.CLOSE INPUT-FILE
REPORT-FILE.
READ-PAR.
READ INPUT-FILE
AT END
MOVE ‘Y’ TO FILE-AT-END
NOT AT END
ADD 1 TO CTR-RECORDS-READ
END-READ.
0 LOOKUP-MONTH.
When doing a binary search,
you don’t set the index to 1 before doing the search
(but if you do, it ignores that and still works)
you search “all” the thing that occurs,
SEARCH ALL EACH-MONTH-INFO
At end means not found
AT END
Move ’N’ to found-switch
move ‘unknown’ to dl-month-of-hire
the when is a condition, like an if.
you must “when” the thing that occurs or an item under it
comparing it to something on input record (month of hire )
WHEN EACH-MONTH-number(MONTH-INDEX) = ER-MONTH-OF-HIRE
Move ‘Y’ to found-switch
at this point you have found it — got a match
so here is where you do what you need to do on a match
MOVE EACH-MONTH-NAME(month-index) TO DL-MONTH-OF-HIRE
END-SEARCH.

The competition is really heating up!

Sources:

--

--