JAMS

A JSON Annotated Music Specification for Reproducible MIR Research.

JAMS provides:
  • A formal JSON schema for generic annotations
  • The ability to store multiple annotations per file
  • Schema definitions for a wide range of annotation types (beats, chords, segments, tags, etc.)
  • Error detection and validation
  • A translation layer to interface with mir_eval for evaluating annotations

For the most recent information, please refer to JAMS on github.

Getting started

Creating a JAMS data structure from scratch

First, create the top-level JAMS container:

>>> import jams
>>> jam = jams.JAMS()

A track in JAMS must have a duration (in seconds). For this example, we’ll make up a fake number, but in reality, you would compute the track duration from the source audio.

>>> jam.file_metadata.duration = 8.0

Now we can create a beat annotation:

>>> ann = jams.Annotation(namespace='beat', time=0, duration=jam.file_metadata.duration)
>>> ann.append(time=0.33, duration=0.0, confidence=1, value=1)

Then, we’ll update the annotation’s metadata by directly setting its fields:

>>> ann.annotation_metadata = jams.AnnotationMetadata(data_source='Well paid students')
>>> ann.annotation_metadata.curator = jams.Curator(name='Rincewind',
...                                                email='rincewind@unseen.edu')

Add our new annotation to the jam:

>>> jam.annotations.append(ann)

We can update the annotation at any time, and add a new observation:

>>> ann.append(time=0.66, duration=0.0, confidence=1, value=1)

Once you’ve added all your data, you can serialize the annotation to a string:

>>> jam.dumps(indent=2)
{
  "sandbox": {},
  "annotations": [
    {
      "data": [
        {
          "duration": 0.0,
          "confidence": 1.0,
          "value": 1.0,
          "time": 0.33
        },
        {
          "duration": 0.0,
          "confidence": 1.0,
          "value": 1.0,
          "time": 0.66
        }
      ],
      "annotation_metadata": {
        "annotation_tools": "",
        "curator": {
          "name": "Rincewind",
          "email": "rincewind@unseen.edu"
        },
        "annotator": {},
        "version": "",
        "corpus": "",
        "annotation_rules": "",
        "validation": "",
        "data_source": "Well paid students"
      },
      "namespace": "beat",
      "sandbox": {}
    }
  ],
  "file_metadata": {
    "jams_version": "0.2.0",
    "title": "",
    "identifiers": {},
    "release": "",
    "duration": 8.0,
    "artist": ""
  }
}

Or save to a file using the built-in save function:

>>> jam.save("these_are_still_my.jams")

Reading a JAMS file

Assuming you already have a JAMS file on-disk, say at ‘these_are_also_my.jams’, you can easily read it back into memory:

>>> another_jam = jams.load('these_are_also_my.jams')

JAMS Structure

This section describes the anatomy of JAMS objects.

JAMS

A JAMS object consists of three basic properties:
  • file_metadata, which describes the audio file to which these annotations are attached;
  • annotations, a list of Annotation objects (described below); and
  • sandbox, an unrestricted place to store any additional data.

FileMetadata

The file_metadata field contains the following properties:
  • identifiers: an unstructured sandbox-type object for storing identifier mappings, e.g., MusicBrainz ID;
  • artist, title, release : meta-data strings for the track in question;
  • duration : non-negative number describing the length (in seconds) of the track; and
  • jams_version : string describing the JAMS version for this file.

Annotation

Each annotation object contains the following properties:
  • namespace : a string describing the type of this annotation;
  • data : a list of observations, each containing:
    • time : non-negative number denoting the time of the observation (in seconds)
    • duration : non-negative number denoting the duration of the observation (in seconds)
    • value : actual annotation (e.g., chord, segment label)
    • confidence : certainty of the annotation
  • annotation_metadata : see Annotation_Metadata; and
  • sandbox : additional unstructured storage space for this annotation.
  • time : optional non-negative number indicating the beginning point at which this annotation is valid
  • duration : optional non-negative number indicating the duration of the valid portion of this annotation.

The permissible contents of the value and confidence fields are defined by the namespace.

Note

The time and duration fields of annotation are considered optional. If left blank, the annotation should be assumed to be valid for the entirety of the track.

Annotation_Metadata

The meta-data associated with each annotation describes the process by which the annotation was generated. The annotation_metadata property has the following fields:

  • corpus: a string describing a corpus to which this annotation belongs;
  • version : string or number, the version of this annotation;
  • curator : a structured object containing contact information (name and email) for the curator of this data;
  • annotator : a sandbox object to describe the individual annotator — which can be a person or a program — that generated this annotation;
  • annotation_tools, annotation_rules, validation: strings to describe the process by which annotations were collected and pre-processed; and
  • data_source : string describing the type of annotator, e.g., “program”, “expert human”, “crowdsource”.

Task namespaces

In JAMS v0.2.0, the concept of task namespaces was introduced. Broadly speaking, a namespace defines the syntax (and some semantics) of a particular type of annotation.

For example, the chord namespace requires that all observed value fields are valid strings within a pre-defined grammar. Similarly, the tempo namespace requires that value fields be non-negative numbers, and the confidence fields lie within the range [0, 1].

JAMS ships with 26 pre-defined namespaces, covering a variety of common music informatics tasks. This collection should not be assumed to be complete, however, and more namespaces may be added in subsequent versions. Please refer to Namespace definitions for a comprehensive description of the existing namespaces.

Namespace specification format

In this section, we’ll demonstrate how to define a task namespace, using tempo as our running example. Namespaces are defined by JSON objects that contain partial JSON schema specifications for the value and confidence fields of the Annotation, as well as additional meta-data to describe the namespace and encoding.

tempo.json is reproduced here:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
{"tempo":
    {
        "value": {
            "type": "number",
            "minimum": 0
        },
        "confidence": {
            "type": "number",
            "minimum": 0,
            "maximum": 1.0
        },
        "dense": false,
        "description": "Tempo measurements, in beats per minute (BPM)"
    }
}

The key “tempo” at line 1 is the string with which this namespace will be identified in JAMS objects by the annotation’s namespace field. This string must be a unique identifier.

Lines 3–6 specify the valid contents of the value field for tempo annotations. In this case, values must be numeric and non-negative. Any valid JSON schema definition can be substituted here, allowing for structured observation objects. (See pattern_jku for an example of this.)

Similarly, lines 7–11 specify valid contents of the confidence field. Most namespaces do not enforce specific constraints on confidence, so this block is optional. In the case of tempo, confidence must be a numeric value in the range [0, 1].

Line 12 dense is a boolean which specifies whether the annotation should be densely encoded during serialization or not. There is functionally no difference between dense and sparse encoding, but dense coding is more space-efficient for high-frequency observations such as melody contours.

Finally, line 13 contains a brief description of the namespace and corresponding task.

Local namespaces

The JAMS namespace management architecture is modular and extensible, so it is relatively straightforward to create a new namespace schema and add it to JAMS at run-time:

>>> jams.schema.add_namespace('/path/to/my/new/namespace.json')

Beginning with JAMS 0.2.1, a custom schema directory can be provided by setting the JAMS_SCHEMA_DIR environment variable prior to importing jams. This allows local customizations to be added automatically at run-time without having to manually add each schema file individually.

Example usage

Storing annotations

This section demonstrates a complete use-case of JAMS for storing estimated annotations. The example uses librosa to estimate global tempo and beat timings.

example_beat.py

The following script loads the librosa example audio clip, estimates the track duration, tempo, and beat timings, and constructs a JAMS object to store the estimations.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
#!/usr/bin/env python

import librosa
import jams


def beat_track(infile, outfile):

    # Load the audio file
    y, sr = librosa.load(infile)

    # Compute the track duration
    track_duration = librosa.get_duration(y=y, sr=sr)

    # Extract tempo and beat estimates
    tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)

    # Convert beat frames to time
    beat_times = librosa.frames_to_time(beat_frames, sr=sr)

    # Construct a new JAMS object and annotation records
    jam = jams.JAMS()

    # Store the track duration
    jam.file_metadata.duration = track_duration

    beat_a = jams.Annotation(namespace='beat')
    beat_a.annotation_metadata = jams.AnnotationMetadata(data_source='librosa beat tracker')

    # Add beat timings to the annotation record.
    # The beat namespace does not require value or confidence fields,
    # so we can leave those blank.
    for t in beat_times:
        beat_a.append(time=t, duration=0.0)

    # Store the new annotation in the jam
    jam.annotations.append(beat_a)

    # Add tempo estimation to the annotation.
    tempo_a = jams.Annotation(namespace='tempo', time=0, duration=track_duration)
    tempo_a.annotation_metadata = jams.AnnotationMetadata(data_source='librosa tempo estimator')

    # The tempo estimate is global, so it should start at time=0 and cover the full
    # track duration.
    # If we had a likelihood score on the estimation, it could be stored in
    # `confidence`.  Since we have no competing estimates, we'll set it to 1.0.
    tempo_a.append(time=0.0,
                   duration=track_duration,
                   value=tempo,
                   confidence=1.0)

    # Store the new annotation in the jam
    jam.annotations.append(tempo_a)

    # Save to disk
    jam.save(outfile)


if __name__ == '__main__':

    infile = librosa.util.example_audio_file()
    beat_track(infile, 'output.jams')

example_beat_output.jams

The above script generates the following JAMS object.

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
{
  "sandbox": {},
  "file_metadata": {
    "duration": 61.45886621315193,
    "title": "",
    "release": "",
    "identifiers": {},
    "artist": "",
    "jams_version": "0.2.3"
  },
  "annotations": [
    {
      "sandbox": {},
      "duration": null,
      "data": [
        {
          "value": null,
          "confidence": null,
          "time": 0.11609977324263039,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 0.5572789115646258,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 0.9984580498866213,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 1.4628571428571429,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 1.9272562358276644,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 2.391655328798186,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 2.8328344671201813,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 3.297233560090703,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 3.7616326530612243,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 4.2260317460317465,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 4.690430839002268,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 5.154829931972789,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 5.61922902494331,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 6.0836281179138325,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 6.524807256235827,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 6.989206349206349,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 7.453605442176871,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 7.918004535147392,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 8.382403628117913,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 8.870022675736962,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 9.311201814058958,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 9.775600907029478,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 10.24,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 10.704399092970522,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 11.145578231292516,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 11.609977324263038,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 12.07437641723356,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 12.538775510204081,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 13.003174603174603,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 13.467573696145125,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 13.931972789115646,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 14.396371882086168,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 14.837551020408164,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 15.27873015873016,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 15.74312925170068,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 16.207528344671204,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 16.671927437641724,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 17.11310657596372,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 17.600725623582765,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 18.04190476190476,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 18.52952380952381,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 18.970702947845805,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 19.435102040816325,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 19.89950113378685,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 20.36390022675737,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 20.805079365079365,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 21.292698412698414,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 21.73387755102041,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 22.221496598639455,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 22.66267573696145,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 23.127074829931974,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 23.591473922902495,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 24.055873015873015,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 24.49705215419501,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 24.961451247165535,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 25.425850340136055,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 25.913469387755104,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 26.354648526077096,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 26.81904761904762,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 27.28344671201814,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 27.74784580498866,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 28.189024943310656,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 28.65342403628118,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 29.1178231292517,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 29.60544217687075,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 30.06984126984127,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 30.53424036281179,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 30.975419501133786,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 31.43981859410431,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 31.880997732426305,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 32.36861678004535,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 32.833015873015874,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 33.29741496598639,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 33.73859410430839,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 34.202993197278914,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 34.66739229024943,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 35.131791383219955,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 35.57297052154195,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 36.060589569160996,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 36.52498866213152,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 36.989387755102044,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 37.430566893424036,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 37.89496598639456,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 38.35936507936508,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 38.8237641723356,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 39.2649433106576,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 39.75256235827664,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 40.216961451247165,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 40.68136054421769,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 41.12253968253968,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 41.586938775510205,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 42.05133786848072,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 42.515736961451246,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 42.956916099773245,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 43.44453514739229,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 43.885714285714286,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 44.373333333333335,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 44.83773242630385,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 45.302131519274376,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 45.7665306122449,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 46.20770975056689,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 46.672108843537416,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 47.13650793650794,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 47.600907029478456,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 48.06530612244898,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 48.529705215419504,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 48.99410430839002,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 49.458503401360545,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 49.92290249433107,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 50.387301587301586,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 50.85170068027211,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 51.2928798185941,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 51.757278911564626,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 52.22167800453515,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 52.68607709750567,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 53.15047619047619,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 53.614875283446715,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 54.05605442176871,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 54.52045351473923,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 54.98485260770975,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 55.44925170068027,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 55.913650793650795,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 56.37804988662131,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 56.842448979591836,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 57.30684807256236,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 57.77124716553288,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 58.2356462585034,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 58.6768253968254,
          "duration": 0.0
        },
        {
          "value": null,
          "confidence": null,
          "time": 59.14122448979592,
          "duration": 0.0
        }
      ],
      "namespace": "beat",
      "time": 0,
      "annotation_metadata": {
        "corpus": "",
        "validation": "",
        "annotation_tools": "",
        "version": "",
        "curator": {
          "name": "",
          "email": ""
        },
        "annotation_rules": "",
        "annotator": {},
        "data_source": "librosa beat tracker"
      }
    },
    {
      "sandbox": {},
      "duration": 61.45886621315193,
      "data": [
        {
          "value": 129.19921875,
          "confidence": 1.0,
          "time": 0.0,
          "duration": 61.45886621315193
        }
      ],
      "namespace": "tempo",
      "time": 0,
      "annotation_metadata": {
        "corpus": "",
        "validation": "",
        "annotation_tools": "",
        "version": "",
        "curator": {
          "name": "",
          "email": ""
        },
        "annotation_rules": "",
        "annotator": {},
        "data_source": "librosa tempo estimator"
      }
    }
  ]
}

Evaluating annotations

The following script illustrates how to evaluate one JAMS annotation object against another using the built-in eval submodule to wrap mir_eval.

Given two jams files, say, reference.jams and estimate.jams, the script first loads them as objects (j_ref and j_est, respectively). It then uses the JAMS.search method to locate all annotations of namespace "beat". If no matching annotations are found, an empty list is returned.

In this example, we are assuming that each JAMS file contains only a single annotation of interest, so the first result is taken by indexing the results at 0. (In general, you may want to use annotation_metadata to select a specific annotation from the JAMS object, if multiple are present.)

Finally, the two annotations are compared by calling jams.eval.beat, which returns an ordered dictionary of evaluation metrics for the annotations in question.

example_eval.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/usr/bin/env python

import sys
import jams

from pprint import pprint

def compare_beats(f_ref, f_est):

    # f_ref contains the reference annotations
    j_ref = jams.load(f_ref)

    # f_est contains the estimated annotations
    j_est = jams.load(f_est)

    # Get the first reference beats
    beat_ref = j_ref.search(namespace='beat')[0]
    beat_est = j_est.search(namespace='beat')[0]

    # Get the scores
    return jams.eval.beat(beat_ref, beat_est)


if __name__ == '__main__':

    f_ref, f_est = sys.argv[1:]
    scores = compare_beats(f_ref, f_est)

    # Print them out
    pprint(dict(scores))

Data conversion

JAMS provides some basic functionality to help convert from flat file formats (e.g., CSV or LAB).

example_chord_import.py

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/usr/bin/env python

import jams
import sys


def import_chord_jams(infile, outfile):

    # import_lab returns a new jams object,
    # and a handle to the newly created annotation
    chords = jams.util.import_lab('chord', infile)

    # Infer the track duration from the end of the last annotation
    duration = max([obs.time + obs.duration for obs in chords])

    chords.time = 0
    chords.duration = duration

    # Create a jams object
    jam = jams.JAMS()
    jam.file_metadata.duration = duration
    jam.annotations.append(chords)

    # save to disk
    jam.save(outfile)


if __name__ == '__main__':

    infile, outfile = sys.argv[1:]
    import_chord_jams(infile, outfile)

chord_output.jams

Calling the above script on 01_-_I_Saw_Her_Standing_There.lab from IsoPhonics should produce the following JAMS object:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
{
  "annotations": [
    {
      "duration": 175.804082,
      "data": [
        {
          "duration": 2.6122669999999997,
          "value": "N",
          "confidence": 1.0,
          "time": 0.0
        },
        {
          "duration": 8.846803000000001,
          "value": "E",
          "confidence": 1.0,
          "time": 2.6122669999999997
        },
        {
          "duration": 1.4628569999999996,
          "value": "A",
          "confidence": 1.0,
          "time": 11.45907
        },
        {
          "duration": 4.521546999999998,
          "value": "E",
          "confidence": 1.0,
          "time": 12.921927
        },
        {
          "duration": 2.966888000000001,
          "value": "B",
          "confidence": 1.0,
          "time": 17.443474
        },
        {
          "duration": 1.497686999999999,
          "value": "E",
          "confidence": 1.0,
          "time": 20.410362
        },
        {
          "duration": 1.4628580000000007,
          "value": "E:7/3",
          "confidence": 1.0,
          "time": 21.908049
        },
        {
          "duration": 1.4860770000000016,
          "value": "A",
          "confidence": 1.0,
          "time": 23.370907
        },
        {
          "duration": 1.486076999999998,
          "value": "A:min/b3",
          "confidence": 1.0,
          "time": 24.856984
        },
        {
          "duration": 1.497686999999999,
          "value": "E",
          "confidence": 1.0,
          "time": 26.343061
        },
        {
          "duration": 1.5092970000000037,
          "value": "B",
          "confidence": 1.0,
          "time": 27.840747999999998
        },
        {
          "duration": 5.955917999999997,
          "value": "E",
          "confidence": 1.0,
          "time": 29.350045
        },
        {
          "duration": 1.497686999999999,
          "value": "A",
          "confidence": 1.0,
          "time": 35.305963
        },
        {
          "duration": 4.459452000000006,
          "value": "E",
          "confidence": 1.0,
          "time": 36.80365
        },
        {
          "duration": 2.982543999999997,
          "value": "B",
          "confidence": 1.0,
          "time": 41.263102
        },
        {
          "duration": 1.474466999999997,
          "value": "E",
          "confidence": 1.0,
          "time": 44.245646
        },
        {
          "duration": 1.4860770000000016,
          "value": "E:7/3",
          "confidence": 1.0,
          "time": 45.720113
        },
        {
          "duration": 1.4860770000000016,
          "value": "A",
          "confidence": 1.0,
          "time": 47.20619
        },
        {
          "duration": 1.4628569999999996,
          "value": "A:min/b3",
          "confidence": 1.0,
          "time": 48.692267
        },
        {
          "duration": 1.497686999999999,
          "value": "E",
          "confidence": 1.0,
          "time": 50.155124
        },
        {
          "duration": 1.4860770000000016,
          "value": "B",
          "confidence": 1.0,
          "time": 51.652811
        },
        {
          "duration": 2.9721550000000008,
          "value": "E",
          "confidence": 1.0,
          "time": 53.138888
        },
        {
          "duration": 9.020951999999987,
          "value": "A",
          "confidence": 1.0,
          "time": 56.111043
        },
        {
          "duration": 3.0185940000000073,
          "value": "B",
          "confidence": 1.0,
          "time": 65.13199499999999
        },
        {
          "duration": 3.0418140000000022,
          "value": "A",
          "confidence": 1.0,
          "time": 68.150589
        },
        {
          "duration": 3.0069840000000028,
          "value": "E",
          "confidence": 1.0,
          "time": 71.192403
        },
        {
          "duration": 1.497686999999999,
          "value": "A",
          "confidence": 1.0,
          "time": 74.199387
        },
        {
          "duration": 4.539501000000001,
          "value": "E",
          "confidence": 1.0,
          "time": 75.697074
        },
        {
          "duration": 2.9721550000000008,
          "value": "B",
          "confidence": 1.0,
          "time": 80.236575
        },
        {
          "duration": 3.012962999999999,
          "value": "E",
          "confidence": 1.0,
          "time": 83.20873
        },
        {
          "duration": 1.5149279999999976,
          "value": "A",
          "confidence": 1.0,
          "time": 86.221693
        },
        {
          "duration": 1.5209070000000082,
          "value": "A:min/b3",
          "confidence": 1.0,
          "time": 87.736621
        },
        {
          "duration": 1.4628569999999854,
          "value": "E",
          "confidence": 1.0,
          "time": 89.25752800000001
        },
        {
          "duration": 1.4370680000000107,
          "value": "B",
          "confidence": 1.0,
          "time": 90.720385
        },
        {
          "duration": 11.949235999999999,
          "value": "E",
          "confidence": 1.0,
          "time": 92.157453
        },
        {
          "duration": 3.0185940000000073,
          "value": "B",
          "confidence": 1.0,
          "time": 104.106689
        },
        {
          "duration": 3.0534239999999926,
          "value": "E",
          "confidence": 1.0,
          "time": 107.12528300000001
        },
        {
          "duration": 2.9453800000000143,
          "value": "A",
          "confidence": 1.0,
          "time": 110.178707
        },
        {
          "duration": 1.4896309999999744,
          "value": "E",
          "confidence": 1.0,
          "time": 113.12408700000002
        },
        {
          "duration": 1.4860770000000088,
          "value": "B",
          "confidence": 1.0,
          "time": 114.61371799999999
        },
        {
          "duration": 2.845166000000006,
          "value": "E",
          "confidence": 1.0,
          "time": 116.099795
        },
        {
          "duration": 9.101501000000013,
          "value": "A",
          "confidence": 1.0,
          "time": 118.944961
        },
        {
          "duration": 3.0069839999999886,
          "value": "B",
          "confidence": 1.0,
          "time": 128.04646200000002
        },
        {
          "duration": 2.983764000000008,
          "value": "A",
          "confidence": 1.0,
          "time": 131.053446
        },
        {
          "duration": 3.006984999999986,
          "value": "E",
          "confidence": 1.0,
          "time": 134.03721000000002
        },
        {
          "duration": 1.4313290000000052,
          "value": "A",
          "confidence": 1.0,
          "time": 137.044195
        },
        {
          "duration": 4.582638999999972,
          "value": "E",
          "confidence": 1.0,
          "time": 138.475524
        },
        {
          "duration": 2.9837640000000363,
          "value": "B",
          "confidence": 1.0,
          "time": 143.05816299999998
        },
        {
          "duration": 1.5092969999999752,
          "value": "E",
          "confidence": 1.0,
          "time": 146.04192700000002
        },
        {
          "duration": 1.5092970000000037,
          "value": "E:7/3",
          "confidence": 1.0,
          "time": 147.551224
        },
        {
          "duration": 1.451246999999995,
          "value": "A",
          "confidence": 1.0,
          "time": 149.060521
        },
        {
          "duration": 1.5092970000000037,
          "value": "A:min/b3",
          "confidence": 1.0,
          "time": 150.511768
        },
        {
          "duration": 1.509297000000032,
          "value": "E",
          "confidence": 1.0,
          "time": 152.021065
        },
        {
          "duration": 1.5325169999999844,
          "value": "B",
          "confidence": 1.0,
          "time": 153.53036200000003
        },
        {
          "duration": 4.469842,
          "value": "E",
          "confidence": 1.0,
          "time": 155.062879
        },
        {
          "duration": 1.5325169999999844,
          "value": "B",
          "confidence": 1.0,
          "time": 159.532721
        },
        {
          "duration": 4.516280999999992,
          "value": "E",
          "confidence": 1.0,
          "time": 161.065238
        },
        {
          "duration": 1.5325170000000128,
          "value": "B",
          "confidence": 1.0,
          "time": 165.581519
        },
        {
          "duration": 1.5325170000000128,
          "value": "A",
          "confidence": 1.0,
          "time": 167.114036
        },
        {
          "duration": 1.0908560000000023,
          "value": "E",
          "confidence": 1.0,
          "time": 168.646553
        },
        {
          "duration": 1.9497639999999876,
          "value": "E:9",
          "confidence": 1.0,
          "time": 169.737409
        },
        {
          "duration": 4.116908999999993,
          "value": "N",
          "confidence": 1.0,
          "time": 171.687173
        }
      ],
      "namespace": "chord",
      "time": 0,
      "annotation_metadata": {
        "version": "",
        "annotation_tools": "",
        "annotator": {},
        "curator": {
          "email": "",
          "name": ""
        },
        "data_source": "",
        "corpus": "",
        "annotation_rules": "",
        "validation": ""
      },
      "sandbox": {}
    }
  ],
  "file_metadata": {
    "duration": 175.804082,
    "jams_version": "0.2.3",
    "artist": "",
    "identifiers": {},
    "release": "",
    "title": ""
  },
  "sandbox": {}
}

More examples

In general, converting a dataset to JAMS format will require a bit more work to ensure that value fields conform to the specified namespace schema, but the import script above should serve as a simple starting point.

For further reference, a separate repository jams-data has been created to house conversion scripts for publicly available datasets. Note that development of converters is a work in progress, so proceed with caution!

API reference

API Reference

Core functionality

This library provides an interface for reading JAMS into Python, or creating them programatically.

Function reference
load(path_or_file[, validate, strict, fmt]) Load a JAMS Annotation from a file.
Object reference
JAMS([annotations, file_metadata, sandbox]) Top-level Jams Object
FileMetadata([title, artist, release, …]) Metadata for a given audio file.
AnnotationArray([annotations]) This list subclass provides serialization and search/filtering for annotation collections.
AnnotationMetadata([curator, version, …]) Data structure for metadata corresponding to a specific annotation.
Curator([name, email]) Container object for curator metadata.
Annotation(namespace[, data, …]) Annotation base class.
Observation(time, duration, value, confidence) Core observation type: (time, duration, value, confidence).
Sandbox(unconstrained) Functionally identical to JObjects, but the class hierarchy might be confusing if all objects inherit from Sandboxes.
JObject(**kwargs) Dict-like object for JSON Serialization.
Observation(time, duration, value, confidence) Core observation type: (time, duration, value, confidence).

Namespace management

add_namespace(filename) Add a namespace definition to our working set.
namespace(ns_key) Construct a validation schema for a given namespace.
namespace_array(ns_key) Construct a validation schema for arrays of a given namespace.
is_dense(ns_key) Determine whether a namespace has dense formatting.
values(ns_key) Return the allowed values for an enumerated namespace.
get_dtypes(ns_key) Get the dtypes associated with the value and confidence fields for a given namespace.
list_namespaces() Print out a listing of available namespaces

Display

display(annotation[, meta]) Visualize a jams annotation through mir_eval
display_multi(annotations[, fig_kw, meta]) Display multiple annotations with shared axes

Sonification

sonify(annotation[, sr, duration]) Sonify a jams annotation through mir_eval

Evaluation

beat(ref, est, \*\*kwargs) Beat tracking evaluation
chord(ref, est, \*\*kwargs) Chord evaluation
melody(ref, est, \*\*kwargs) Melody extraction evaluation
onset(ref, est, \*\*kwargs) Onset evaluation
segment(ref, est, \*\*kwargs) Segment evaluation
tempo(ref, est, \*\*kwargs) Tempo evaluation
pattern(ref, est, \*\*kwargs) Pattern detection evaluation
hierarchy(ref, est, \*\*kwargs) Multi-level segmentation evaluation
transcription(ref, est, \*\*kwargs) Note transcription evaluation

Namespace conversion

convert(annotation, target_namespace) Convert a given annotation to the target namespace.

Utility functions

import_lab(namespace, filename[, infer_duration]) Load a .lab file as an Annotation object.
expand_filepaths(base_dir, rel_paths) Expand a list of relative paths to a give base directory.
smkdirs(dpath[, mode]) Safely make a full directory path if it doesn’t exist.
filebase(filepath) Return the extension-less basename of a file path.
find_with_extension(in_dir, ext[, depth, sort]) Naive depth-search into a directory for files with a given extension.

Namespace definitions

Beat

beat

Beat event markers with optional metrical position.

time duration value confidence
[sec] [sec] [number] or [null]

Each observation corresponds to a single beat event.

The value field can be a number (positive or negative, integer or floating point), indicating the metrical position within the bar of the observed beat.

If no metrical position is provided for the annotation, the value field will be null.

Example

time duration value confidence
0.500 0.000 1 null
1.000 0.000 2 null
1.500 0.000 3 null
2.000 0.000 4 null
2.500 0.000 1 null

Note

duration is typically zero for beat events, but this is not enforced.

confidence is an unconstrained field for beat annotations, and may contain arbitrary data.

beat_position

Beat events with time signature information.

time duration value confidence
[sec] [sec]
  • position
  • measure
  • num_beats
  • beat_units

Each observation corresponds to a single beat event.

The value field is a structure containing the following fields:

  • position : the position of the beat within the measure. Can be any number greater than or equal to 1.
  • measure : the index of the measure containing this beat. Can be any non-negative integer.
  • num_beats : the number of beats per measure : can be any strictly positive integer.
  • beat_units : the note value for beats in this measure. Must be one of: 1, 2, 4, 8, 16, 32, 64, 128, 256.

All fields are required for each observation.

Example

time duration value confidence
0.500 0.000
  • position: 1
  • measure: 0
  • num_beats: 4
  • beat_units: 4
null
1.000 0.000
  • position: 2
  • measure: 0
  • num_beats: 4
  • beat_units: 4
null
1.500 0.000
  • position: 3
  • measure: 0
  • num_beats: 4
  • beat_units: 4
null
2.000 0.000
  • position: 4
  • measure: 0
  • num_beats: 4
  • beat_units: 4
null
2.500 0.000
  • position: 1
  • measure: 1
  • num_beats: 4
  • beat_units: 4
null

Note

duration is typically zero for beat events, but this is not enforced.

confidence is an unconstrained field for beat annotations, and may contain arbitrary data.

position should lie in the range [1, beat_units], but the upper bound is not enforced at the schema level.

Chord

chord

Chord annotations described by an extended version of the grammar defined by Harte, et al. [1]

time duration value confidence
[sec] [sec] string
[1](1, 2, 3) Harte, Christopher, Mark B. Sandler, Samer A. Abdallah, and Emilia Gómez. “Symbolic Representation of Musical Chords: A Proposed Syntax for Text Annotations.” In ISMIR, vol. 5, pp. 66-71. 2005.

This namespace is similar to chord_harte, with the following modifications:

  • Sharps and flats may not be mixed in a note symbol. For instance, A#b# is legal in chord_harte but not in chord. A### is legal in both.
  • The following quality values have been added:
    • sus2, 1, 5
    • aug7
    • 11, maj11, min11
    • 13, maj13, min13

Example

time duration value confidence
0.000 1.000 N null
0.000 1.000 Bb:5 null
0.000 1.000 E:(*5) null
0.000 1.000 E#:min9/9 null
0.000 1.000 G##:maj6 null
0.000 1.000 D:13/6 null
0.000 1.000 A:sus2 null

Note

confidence is an unconstrained field, and may contain arbitrary data.

chord_harte

Chord annotations described according to the grammar defined by Harte, et al. [1]

time duration value confidence
[sec] [sec] string

Each observed value is a text representation of a chord annotation.

  • N specifies a no chord observation
  • Notes are annotated in the usual way: A-G followed by optional sharps (#) and flats (b)
  • Chord qualities are denoted by abbreviated strings:
    • maj, min, dim, aug
    • maj7, min7, 7, dim7, hdim7, minmaj7
    • maj6, min6
    • 9, maj9, min9
    • sus4
  • Inversions are specified by a slash (/) followed by the interval number, e.g., G/3.
  • Extensions are denoted in parentheses, e.g., G(b11,13). Suppressed notes are indicated with an asterisk, e.g., G(*3)

A complete description of the chord grammar is provided in [1], table 1.

Example

time duration value confidence
0.000 1.000 N null
0.000 1.000 Bb null
0.000 1.000 E:(*5) null
0.000 1.000 E#:min9/9 null
0.000 1.000 G#b:maj6 null
0.000 1.000 D/6 null
0.000 1.000 A:sus4 null

Note

confidence is an unconstrained field, and may contain arbitrary data.

chord_roman

Chord annotations in roman numeral format, as described by [2].

time duration value confidence
[sec] [sec]
  • tonic
  • chord

The value field is a structure containing the following fields:

  • tonic : (string) the tonic note of the chord, e.g., A or Gb.

  • chord : (string) the scale degree of the chord in roman numerals (1–7), along with inversions, extensions, and qualities.

    • Scale degrees are encoded with optional leading sharps and flats, e.g., V, bV or #VII. Upper-case numerals indicate major, lower-case numeral indicate minor.

    • Qualities are encoded as one of the following symbols:

      • o : diminished (triad)
      • + : augmented (triad)
      • s : suspension
      • d : dominant (seventh)
      • h : half-diminished (seventh)
      • x : fully-diminished (seventh)
    • Inversions are encoded by arabic numerals, e.g., V6 for a first-inversion triad, V64 for second inversion.

    • Applied chords are encoded by a / followed by a roman numeral encoding of the scale degree, e.g., V7/IV.

[2](1, 2) http://theory.esm.rochester.edu/rock_corpus/harmonic_analyses.html
Example
time duration value confidence
0.000 0.500
  • tonic: C
  • chord: I6
0.500 0.500
  • tonic: C
  • chord: bIV
1.000 0.500
  • tonic: C
  • chord: Vh7

Note

The grammar defined in [2] has been constrained to support only the quality symbols listed above.

confidence is an unconstrained field, and may contain arbitrary data.

Key

key_mode

Key and optional mode (major/minor or Greek modes)

time duration value confidence
[sec] [sec] string

The value field is a string matching one of the three following patterns:

  • N : no key
  • Ab, A, A#, Bb, ... G# : tonic note, upper case
  • tonic:MODE where tonic is as described above, and MODE is one of: major, minor, ionian, dorian, phrygian, lydian, mixolydian, aeolian, locrian.

Example

time duration value confidence
0.000 30.0 C:minor null
30.0 5.00 N null
35.0 15.0 C#:dorian null
50.0 10.0 Eb null
60.0 10.0 A:lydian null

Note

confidence is an unconstrained field, and may contain arbitrary data.

Lyrics

lyrics

Time-aligned lyrical annotations.

time duration value confidence
[sec] [sec] string

The required value field can contain arbitrary text data, e.g., lyrics.

Example

time duration value confidence
0.500 4.000 “Row row row your boat” null
4.500 2.000 “gently down the stream” null
7.000 1.000 “merrily” null
8.000 1.000 “merrily” null
9.000 1.000 “merrily” null
10.00 1.000 “merrily” null

Note

confidence is an unconstrained field, and may contain arbitrary data.

lyrics_bow

Time-aligned bag-of-words or bag-of-ngrams.

time duration value confidence
[sec] [sec] array

The required value field is an array, where each element is an array of [term, count]. The term here may be either a string (for simple bag-of-words) or an array of strings (for bag-of-ngrams).

Example

time duration value confidence
0.000 30.00
  • [‘row’, 3]
  • [ [‘row’, ‘row’], 2]
  • [‘your’, 1]
  • [‘boat’, 1]
null

Note

confidence is an unconstrained field, and may contain arbitrary data.

Mood

mood_thayer

Time-varying emotion measurements as ordered pairs of (valence, arousal)

time duration value confidence
[sec] [sec] (valence, arousal)

The value field is an ordered pair of numbers measuring the valence and arousal positions in the Thayer mood model [3].

[3]Thayer, Robert E. The biopsychology of mood and arousal. Oxford University Press, 1989.

Example

time duration value confidence
0.500 0.250 (-0.5, 1.0) null
0.750 0.250 (-0.3, 0.6) null
1.000 0.750 (-0.1, 0.1) null
1.750 0.500 (0.3, -0.5) null

Note

confidence is an unconstrained field, and may contain arbitrary data.

Onset

onset

Note onset event markers.

time duration value confidence
[sec] [sec]

This namespace can be used to encode timing of arbitrary instantaneous events. Most commonly, this is applied to note onsets.

Example

time duration value confidence
0.500 0.000 null null
1.000 0.000 null null
1.500 0.000 null null
2.000 0.000 null null

Note

duration is typically zero for instantaneous events, but this is not enforced.

value and confidence fields are unconstrained, and may contain arbitrary data.

Pattern

pattern_jku

Each note of the pattern contains (pattern_id, midi_pitch, occurrence_id, morph_pitch, staff), following the format described in [4].

time duration value confidence
[sec] [sec]
  • pattern_id
  • midi_pitch
  • occurrence_id
  • morph_pitch
  • staff
[4]Collins T., Discovery of Repeated Themes & Sections, Music Information Retrieval Evalaluation eXchange (MIReX), 2013 (Accessed on July 7th 2015). Available here.

Each value field contains a dictionary with the following keys:

  • pattern_id: The integer that identifies the current pattern,
    starting from 1.
  • midi_pitch: The float representing the midi pitch.
  • occurrence_id: The integer that identifies the current occurrence,
    starting from 1.
  • morph_pitch: The float representing the morphological pitch.
  • staff: The integer representing the staff where the current note of the
    patter is found, starting from 0.

Example

time duration value confidence
62.86 0.09
  • pattern_id: 1
  • midi_pitch: 50
  • occurrence_id: 1
  • morph_pitch: 54
  • staff: 1
null
62.86 0.36
  • pattern_id: 1
  • midi_pitch: 77
  • occurrence_id: 1
  • morph_pitch: 70
  • staff: 0
null
36.34 0.09
  • pattern_id: 1
  • midi_pitch: 71
  • occurrence_id: 2
  • morph_pitch: 66
  • staff: 0
null
36.43 0.36
  • pattern_id: 1
  • midi_pitch: 69
  • occurrence_id: 2
  • morph_pitch: 65
  • staff: 0
null

Pitch

pitch_contour

Pitch contours in the format (index, frequency, voicing).

time duration value confidence
[sec] [–]
  • index
  • frequency
  • voiced

Each value field is a structure containing a contour index (an integer indicating which contour the observation belongs to), a frequency value in Hz, and a boolean indicating if the values is voiced. The confidence field is unconstrained.

Example

time duration value confidence
0.0000 0.0000
  • index: 0
  • frequency: 442.1
  • voiced: True
null
0.0058 0.0000
  • index: 0
  • frequency: 457.8
  • voiced: False
null
2.5490 0.0000
  • index: 1
  • frequency: 89.4
  • voiced: True
null
2.5548 0.0000
  • index: 1
  • frequency: 90.0
  • voiced: True
null
note_hz

Note events with (non-negative) frequencies measured in Hz.

time duration value confidence
[sec] [sec]
  • number

Each value field gives the frequency of the note in Hz.

Example

time duration value confidence
12.34 0.287 189.9 null
2.896 3.000 74.0 null
10.12 0.5. 440.0 null
note_midi

Note events with pitches measured in (fractional) MIDI note numbers.

time duration value confidence
[sec] [sec]
  • number

Each value field gives the pitch of the note in MIDI note numbers.

Example

time duration value confidence
12.34 0.287 52.0 null
2.896 3.000 20.7 null
10.12 0.5. 42.0 null
pitch_class

Pitch measurements in (tonic, pitch class) format.

time duration value confidence
[sec] [sec]
  • tonic
  • pitch class

Each value field is a structure containing a tonic (note string, e.g., "A#" or "D") and a pitch class pitch as an integer scale degree. The confidence field is unconstrained.

Example

time duration value confidence
0.000 30.0
  • tonic: C
  • pitch: 0
null
0.000 30.0
  • tonic: C
  • pitch: 4
null
0.000 30.0
  • tonic: C
  • pitch: 7
null
30.00 35.0
  • tonic: G
  • pitch: 0
null
pitch_hz

Warning

Deprecated, use pitch_contour.

Pitch measurements in Hertz (Hz). Pitch (a subjective sensation) is represented as fundamental frequency (a physical quantity), a.k.a. “f0”.

time duration value confidence
[sec]
  • number

The time field represents the instantaneous time in which the pitch f0 was estimated. By convention, this (usually) represents the center time of the analysis frame. Note that this is different from pitch_midi and pitch_class, where time represents the onset time. As a consequence, the duration field is undefined and should be ignored. The value field is a number representing the f0 in Hz. By convention, values that are equal to or less than zero are used to represent silence (no pitch). Some algorithms (e.g. melody extraction algorithms that adhere to the MIREX convention) use negative f0 values to represent the algorithm’s pitch estimate for frames where it thinks there is no active pitch (e.g. no melody), to allow the independent evaluation of pitch activation detection (a.k.a. “voicing detection”) and pitch frequency estimation. The confidence field is unconstrained.

Example

time duration value confidence
0.000 0.000 300.00 null
0.010 0.000 305.00 null
0.020 0.000 310.00 null
0.030 0.000 0.00 null
0.040 0.000 -280.00 null
0.050 0.000 -290.00 null
pitch_midi

Warning

Deprecated, use note_midi or pitch_contour.

Pitch measurements in (fractional) MIDI note number notation.

time duration value confidence
[sec] [sec] number

The value field is a number representing the pitch in MIDI notation. Numbers can be negative (for notes below C-1) or fractional.

Example

time duration value confidence
0.000 30.000 24 null
0.000 30.000 43.02 null
15.00 45.000 26 null

Segment

segment_open

Structural segmentation with an open vocabulary of segment labels.

time duration value confidence
[sec] [sec] string

The value field contains string descriptors for each segment, e.g., “verse” or “bridge”.

Example
time duration value confidence
0.000 20.000 intro null
20.00 30.000 verse null
30.00 50.000 refrain null
50.00 70.000 verse (alternate) null
segment_salami_function

Segment annotations with functional labels from the SALAMI guidelines.

time duration value confidence
[sec] [sec] string

The value field must be one of the allowable strings in the SALAMI function vocabulary.

Example
time duration value confidence
0.000 20.000 applause null
20.00 30.000 count-in null
30.00 50.000 introduction null
50.00 70.000 verse null
segment_salami_upper

Segment annotations with SALAMI’s upper-case (large) label format.

time duration value confidence
[sec] [sec] string

The value field must be a string of the following format:

  • “silence” or “Silence”
  • One or more upper-case letters, followed by zero or more apostrophes
Example
time duration value confidence
0.000 20.000 silence null
20.00 30.000 A null
30.00 50.000 B null
50.00 70.000 A’ null
segment_salami_lower

Segment annotations with SALAMI’s lower-case (small) label format.

time duration value confidence
[sec] [sec] string

The value field must be a string of the following format:

  • “silence” or “Silence”
  • One or more lower-case letters, followed by zero or more apostrophes
Example
time duration value confidence
0.000 20.000 silence null
20.00 30.000 a null
30.00 50.000 b null
50.00 70.000 a’ null
segment_tut

Segment annotations using the TUT vocabulary.

time duration value confidence
[sec] [sec] string

The value field is a string describing the function of the segment.

Example
time duration value confidence
0.000 20.000 Intro null
20.00 30.000 Verse null
30.00 50.000 bridge null
50.00 70.000 RefrainA null
multi_segment

Multi-level structural segmentations.

time duration value confidence
[sec] [sec]
  • label : string
  • level : int >= 0

In a multi-level segmentation, the track is partitioned many times — possibly recursively — which results in a collection of segmentations of varying degrees of specificity. In the multi_segment namespace, all of the resulting segments are collected together, and the level field is used to encode the segment’s corresponding partition.

Level values must be non-negative, and ordered by increasing specificity. For example, level==0 may correspond to a single segment spanning the entire track, and each subsequent level value corresponds to a more refined segmentation.

Example
time duration value confidence
0.000 60.000
  • label : A
  • level : 0
null
0.000 30.000
  • label : B
  • level : 1
null
30.00 60.000
  • label : C
  • level : 1
null
0.000 15.000
  • label : a
  • level : 2
null
15.00 30.000
  • label : b
  • level : 2
null
30.00 45.000
  • label : a
  • level : 2
null
45.00 60.000
  • label : c
  • level : 2
null

Tag

tag_cal10k

Tags from the CAL10K vocabulary.

time duration value confidence
[sec] [sec] string

The value is constrained to a set of 1053 terms, spanning mood, instrumentation, style, and genre.

time duration value confidence
0.000 30.000 “bop influences” null
0.000 30.000 “bright beats” null
0.000 30.000 “hip hop roots” null
tag_cal500

Tags from the CAL500 vocabulary.

time duration value confidence
[sec] [sec] string

The value is constrained to a set of 174 terms, spanning mood, instrumentation, and genre.

time duration value confidence
0.000 30.000 “Genre-Best-Rock” null
0.000 30.000 “Vocals-Monotone” null
0.000 30.000 “Usage-At_work” null
tag_gtzan

Genre classes from the GTZAN dataset.

time duration value confidence
[sec] [sec] string

The value field is constrained to one of ten strings:

  • blues
  • classical
  • country
  • disco
  • hip-hop
  • jazz
  • metal
  • pop
  • reggae
  • rock

By convention, only one tag is applied per track in this namespace. This is not enforced by the schema.

Example

time duration value confidence
0.000 30.000 “reggae” null
tag_msd_tagtraum_cd1

Genre classes from the msd tagtraum cd1 dataset.

time duration value confidence
[sec] [sec] string

The value field is constrained to one of 13 strings:

  • reggae
  • pop/rock
  • rnb
  • jazz
  • vocal
  • new age
  • latin
  • rap
  • country
  • international
  • blues
  • electronic
  • folk

By convention, one or two tags per track are possible in this namespace. The sum of the confidence values should equal 1.0. This is not enforced by the schema.

Example

time duration value confidence
0.000 0.000 “reggae” 1.0
tag_msd_tagtraum_cd2

Genre classes from the msd tagtraum cd2 dataset.

time duration value confidence
[sec] [sec] string

The value field is constrained to one of 15 strings:

  • reggae
  • latin
  • metal
  • rnb
  • jazz
  • punk
  • pop
  • new age
  • country
  • rap
  • rock
  • world
  • blues
  • electronic
  • folk

By convention, one or two tags per track are possible in this namespace. The sum of the confidence values should equal 1.0. This is not enforced by the schema.

Example

time duration value confidence
0.000 0.000 “reggae” 0.6666667
0.000 0.000 “rock” 0.3333333
tag_medleydb_instruments

MedleyDB instrument source annotations.

time duration value confidence
[sec] [sec] string

The value field is constrained to the set of instruments defined in MedleyDB instrument taxonomy.

Example

time duration value confidence
0.000 20.000 “darbuka” null
0.000 20.000 “flute section” null
0.000 20.000 “oud” null
tag_open

Open vocabulary tags. This namespace is appropriate for unconstrained tag data, such as tags from Last.FM or Magnatagatune.

time duration value confidence
[sec] [sec] string

The value field is unconstrained, and can contain any string value.

Example

time duration value confidence
0.000 20.000 “rocking” null
0.000 20.000 “rockin’” null
0.000 20.000 “” null
0.000 20.000 “favez^^^” null
tag_audioset

Tags from the full AudioSet (v1) ontology.

time duration value confidence
[sec] [sec] string

The value field is constrained to the vocabulary of the AudioSet ontology.

Example

time duration value confidence
0.000 20.000 “Air brake” null
5.000 25.000 “Yodeling” null
9.000 35.000 “Steam whistle” null
tag_audioset_genre

Tags from the musical genre subset of the AudioSet (v1) ontology.

time duration value confidence
[sec] [sec] string

The value field is constrained to the 66 musical genres of the AudioSet-genre ontology.

Example

time duration value confidence
0.000 20.000 “Oldschool jungle” null
tag_audioset_instrument

Tags from the musical instrument subset of the AudioSet (v1) ontology.

time duration value confidence
[sec] [sec] string

The value field is constrained to the 91 musical instruments of the AudioSet-instruments ontology.

Example

time duration value confidence
0.000 20.000 “Ukulele” null
5.000 25.000 “Piano” null
9.000 35.000 “Tuning fork” null
tag_fma_genre

Tags from the Free Music Archive (FMA) 16-class genre taxonomy.

time duration value confidence
[sec] [sec] string

The value field is constrained to the 16 genres of the FMA data-set.

Example

time duration value confidence
0.000 20.000 “Blues” null
5.000 25.000 “Instrumental” null
9.000 35.000 “Soul-RnB” null
tag_fma_subgenre

Tags from the Free Music Archive (FMA) 163-class genre and sub-genre taxonomy.

time duration value confidence
[sec] [sec] string

The value field is constrained to the 163 genres and sub-genres of the FMA data-set.

Example

time duration value confidence
0.000 20.000 “Pop” null
5.000 25.000 “Power-Pop” null
9.000 35.000 “Nerdcore” null
tag_urbansound

Genre classes from the UrbanSound dataset.

time duration value confidence
[sec] [sec] string

The value field is constrained to one of ten strings:

  • air_conditioner
  • car_horn
  • children_playing
  • dog_bark
  • drilling
  • engine_idling
  • gun_shot
  • jackhammer
  • siren
  • street_music

Example

time duration value confidence
0.000 30.000 “street_music” null

Tempo

tempo

Tempo measurements in beats per minute (BPM).

time duration value confidence
[sec] [sec] number number

The value field is a non-negative number (floating point), indicated the tempo measurement. The confidence field is a number in the range [0, 1], following the format used by MIREX [5].

[5]http://www.music-ir.org/mirex/wiki/2014:Audio_Tempo_Estimation

Example

time duration value confidence
0.00 60.00 180.0 0.8
0.00 60.00 90.0 0.2

Note

MIREX requires that tempo measurements come in pairs, and that the confidence values sum to 1. This is not enforced at the schema level.

Miscellaneous

Vector

Numerical vector data. This is useful for generic regression problems where the output is a vector of numbers.

time duration value confidence
[sec] [sec] [array of numbers]

Each observation value must be an array of at least one number. Different observations may have different length arrays, so it is up to the user to verify that arrays have the desired length.

Blob

Arbitrary data blobs.

time duration value confidence
[sec] [sec]

This namespace can be used to encode arbitrary data. The value and confidence fields have no schema constraints, and may contain any structured (but serializable) data. This can be useful for storing complex output data that does not fit any particular task schema, such as regression targets or geolocation data.

It is strongly advised that the AnnotationMetadata for blobs be as explicit as possible.

Scaper

Structured representation for soundscapes synthesized by the Scaper package.

time duration value confidence
[sec] [sec]
  • label
  • source_file
  • source_time
  • event_time
  • event_duration
  • snr
  • time_stretch
  • pitch_shift
  • role

Each value field contains a dictionary with the following keys:

  • label: a string indicating the label of the sound source
  • source_file: a full path to the original sound source (on disk)
  • source_time: a non-negative number indicating the time offset within source_file of the sound
  • event_time: the start time of the event in the synthesized soundscape
  • event_duration: a strictly positive number indicating the duration of the event
  • snr: the signal-to-noise ratio (in LUFS) of the sound compared to the background
  • time_stetch: (optional) a strictly positive number indicating the amount of time-stretch applied to the source
  • pitch_shift: (optional) the amount of pitch-shift applied to the source
  • role: one of background or foreground

Changelog

Changes

v0.3.2

  • Added schemata for tag_urbansound and scaper (PR #191)
  • Fixed a timing bug in Annotation.slice (PR #189)
  • Added display mapping for tag_open namespaces (PR #188)
  • Updated sortedcontainers dependency to avoid deprecations (PR #187)

v0.3.1

  • Improved documentation (PR #176)
  • Added Annotation.to_samples (PR #173)
  • Added schemata for FMA genre tags (PR #172)
  • Accelerated validation (PR #170)
  • Added schemata for AudioSet tags (PR #168)
  • Added jams.list_namespaces() (PR #166)

v0.3.0

  • Removed the JamsFrame class and replaced the underlying observation storage data structure (PR #149).
  • import_lab now returns only an Annotation and does not construct a JAMS object (PR #154)
  • Accelerated pitch contour sonification (PR #155)
  • Migrated all tests from nosetest to py.test (PR #157)
  • Improved repr() and added HTML rendering for JAMS objects in notebooks (PR #158)
  • Fixed a JSON serialization bug with numpy datatypes (PR #160)

v0.2.3

  • Deprecated the JamsFrame class (PR #153):
    • Moved JamsFrame.to_interval_values() to Annotation.to_interval_values()
    • Any code that uses pandas.DataFrame methods on Annotation.data will cease to work starting in 0.3.0.
  • Forward compatibility with 0.3.0 (PR #153):
  • added type safety check in regexp search (PR #146).
  • added support for pandas=0.20 (PR #150).

v0.2.2

  • added __contains__ method to JObject (PR #139).
  • Implemented JAMS.trim() method (PR #136).
  • Updates to the SALAMI tag namespaces (PR #134).
  • added infer_duration flag to import_lab (PR #125).
  • namespace conversion validates input (PR #123).
  • Refactored the pitch namespaces (PR #121).
  • Fancy indexing for annotation arrays (PR #120).
  • jams.schema.values function to access enumerated types (PR #119).
  • jams.display submodule (PR #115).
  • support for mir_eval >= 0.3 (PR #106).
  • Automatic conversion between namespaces (PR #105).
  • Fixed a type error in jams_to_lab (PR #94).
  • jams.sonify module for sonification (PR #91).

v0.2.1

New features
  • eval support for hierarchical segmentation via the multi_segment namespace (PR #79).
  • Local namespace management (PR #75).
  • Python 3.5 support (PR #73).
  • jams.search() now allows matching objects by equality (PR #71).
  • multi_segment namespace for multi-level structural segmentations. (PR #69).
  • vector namespace for numerical vector data (PR #64).
  • blob namespace for unstructured, time-keyed observation data (PR #63).
  • tag_msd_tagtraum_cd1 and tag_msd_tagtraum_cd2 namespaces for genre tags (PR #83).
Schema changes
  • Annotation objects now have time and duration fields which encode the interval over which the annotation is valid. (PR #67).
Bug fixes
  • Appending data to Annotation or JamsFrame objects now fails if time or duration are ill-specified. (PR #87).