JAMS¶
A JSON Annotated Music Specification for Reproducible MIR Research.
- JAMS provides:
- A formal JSON schema for generic annotations
- The ability to store multiple annotations per file
- Schema definitions for a wide range of annotation types (beats, chords, segments, tags, etc.)
- Error detection and validation
- A translation layer to interface with mir_eval for evaluating annotations
For the most recent information, please refer to JAMS on github.
Getting started¶
Creating a JAMS data structure from scratch¶
First, create the top-level JAMS container:
>>> import jams
>>> jam = jams.JAMS()
A track in JAMS must have a duration (in seconds). For this example, we’ll make up a fake number, but in reality, you would compute the track duration from the source audio.
>>> jam.file_metadata.duration = 8.0
Now we can create a beat annotation:
>>> ann = jams.Annotation(namespace='beat', time=0, duration=jam.file_metadata.duration)
>>> ann.append(time=0.33, duration=0.0, confidence=1, value=1)
Then, we’ll update the annotation’s metadata by directly setting its fields:
>>> ann.annotation_metadata = jams.AnnotationMetadata(data_source='Well paid students')
>>> ann.annotation_metadata.curator = jams.Curator(name='Rincewind',
... email='rincewind@unseen.edu')
Add our new annotation to the jam:
>>> jam.annotations.append(ann)
We can update the annotation at any time, and add a new observation:
>>> ann.append(time=0.66, duration=0.0, confidence=1, value=1)
Once you’ve added all your data, you can serialize the annotation to a string:
>>> jam.dumps(indent=2)
{
"sandbox": {},
"annotations": [
{
"data": [
{
"duration": 0.0,
"confidence": 1.0,
"value": 1.0,
"time": 0.33
},
{
"duration": 0.0,
"confidence": 1.0,
"value": 1.0,
"time": 0.66
}
],
"annotation_metadata": {
"annotation_tools": "",
"curator": {
"name": "Rincewind",
"email": "rincewind@unseen.edu"
},
"annotator": {},
"version": "",
"corpus": "",
"annotation_rules": "",
"validation": "",
"data_source": "Well paid students"
},
"namespace": "beat",
"sandbox": {}
}
],
"file_metadata": {
"jams_version": "0.2.0",
"title": "",
"identifiers": {},
"release": "",
"duration": 8.0,
"artist": ""
}
}
Or save to a file using the built-in save function:
>>> jam.save("these_are_still_my.jams")
Reading a JAMS file¶
Assuming you already have a JAMS file on-disk, say at ‘these_are_also_my.jams’, you can easily read it back into memory:
>>> another_jam = jams.load('these_are_also_my.jams')
JAMS Structure¶
This section describes the anatomy of JAMS objects.
JAMS¶
- A JAMS object consists of three basic properties:
file_metadata
, which describes the audio file to which these annotations are attached;annotations
, a list of Annotation objects (described below); andsandbox
, an unrestricted place to store any additional data.
FileMetadata¶
- The
file_metadata
field contains the following properties: identifiers
: an unstructuredsandbox
-type object for storing identifier mappings, e.g., MusicBrainz ID;artist
,title
,release
: meta-data strings for the track in question;duration
: non-negative number describing the length (in seconds) of the track; andjams_version
: string describing the JAMS version for this file.
Annotation¶
- Each annotation object contains the following properties:
namespace
: a string describing the type of this annotation;data
: a list of observations, each containing:time
: non-negative number denoting the time of the observation (in seconds)duration
: non-negative number denoting the duration of the observation (in seconds)value
: actual annotation (e.g., chord, segment label)confidence
: certainty of the annotation
annotation_metadata
: see Annotation_Metadata; andsandbox
: additional unstructured storage space for this annotation.time
: optional non-negative number indicating the beginning point at which this annotation is validduration
: optional non-negative number indicating the duration of the valid portion of this annotation.
The permissible contents of the value
and confidence
fields are defined by the namespace
.
Note
The time
and duration
fields of annotation
are considered optional. If left blank,
the annotation should be assumed to be valid for the entirety of the track.
Annotation_Metadata¶
The meta-data associated with each annotation describes the process by which the annotation was generated.
The annotation_metadata
property has the following fields:
corpus
: a string describing a corpus to which this annotation belongs;version
: string or number, the version of this annotation;curator
: a structured object containing contact information (name
andannotator
: asandbox
object to describe the individual annotator — which can be a person or a program — that generated this annotation;annotation_tools
,annotation_rules
,validation
: strings to describe the process by which annotations were collected and pre-processed; anddata_source
: string describing the type of annotator, e.g., “program”, “expert human”, “crowdsource”.
Task namespaces¶
In JAMS v0.2.0, the concept of task namespaces was introduced. Broadly speaking, a namespace defines the syntax (and some semantics) of a particular type of annotation.
For example, the chord namespace requires that all observed value fields are valid strings within a pre-defined grammar. Similarly, the tempo namespace requires that value fields be non-negative numbers, and the confidence fields lie within the range [0, 1].
JAMS ships with 26 pre-defined namespaces, covering a variety of common music informatics tasks. This collection should not be assumed to be complete, however, and more namespaces may be added in subsequent versions. Please refer to Namespace definitions for a comprehensive description of the existing namespaces.
Namespace specification format¶
In this section, we’ll demonstrate how to define a task namespace, using tempo as our running example. Namespaces are defined by JSON objects that contain partial JSON schema specifications for the value and confidence fields of the Annotation, as well as additional meta-data to describe the namespace and encoding.
tempo.json is reproduced here:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | {"tempo":
{
"value": {
"type": "number",
"minimum": 0
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1.0
},
"dense": false,
"description": "Tempo measurements, in beats per minute (BPM)"
}
}
|
The key “tempo” at line 1 is the string with which this namespace will be identified in JAMS objects by the annotation’s namespace field. This string must be a unique identifier.
Lines 3–6 specify the valid contents of the value field for tempo annotations. In this case, values must be numeric and non-negative. Any valid JSON schema definition can be substituted here, allowing for structured observation objects. (See pattern_jku for an example of this.)
Similarly, lines 7–11 specify valid contents of the confidence field. Most namespaces do not enforce specific constraints on confidence, so this block is optional. In the case of tempo, confidence must be a numeric value in the range [0, 1].
Line 12 dense is a boolean which specifies whether the annotation should be densely encoded during serialization or not. There is functionally no difference between dense and sparse encoding, but dense coding is more space-efficient for high-frequency observations such as melody contours.
Finally, line 13 contains a brief description of the namespace and corresponding task.
Local namespaces¶
The JAMS namespace management architecture is modular and extensible, so it is relatively straightforward to create a new namespace schema and add it to JAMS at run-time:
>>> jams.schema.add_namespace('/path/to/my/new/namespace.json')
Beginning with JAMS 0.2.1, a custom schema directory can be provided by setting the
JAMS_SCHEMA_DIR
environment variable prior to importing jams
. This allows local
customizations to be added automatically at run-time without having to manually add each
schema file individually.
Example usage¶
Storing annotations¶
This section demonstrates a complete use-case of JAMS for storing estimated annotations. The example uses librosa to estimate global tempo and beat timings.
example_beat.py¶
The following script loads the librosa example audio clip, estimates the track duration, tempo, and beat timings, and constructs a JAMS object to store the estimations.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | #!/usr/bin/env python
import librosa
import jams
def beat_track(infile, outfile):
# Load the audio file
y, sr = librosa.load(infile)
# Compute the track duration
track_duration = librosa.get_duration(y=y, sr=sr)
# Extract tempo and beat estimates
tempo, beat_frames = librosa.beat.beat_track(y=y, sr=sr)
# Convert beat frames to time
beat_times = librosa.frames_to_time(beat_frames, sr=sr)
# Construct a new JAMS object and annotation records
jam = jams.JAMS()
# Store the track duration
jam.file_metadata.duration = track_duration
beat_a = jams.Annotation(namespace='beat')
beat_a.annotation_metadata = jams.AnnotationMetadata(data_source='librosa beat tracker')
# Add beat timings to the annotation record.
# The beat namespace does not require value or confidence fields,
# so we can leave those blank.
for t in beat_times:
beat_a.append(time=t, duration=0.0)
# Store the new annotation in the jam
jam.annotations.append(beat_a)
# Add tempo estimation to the annotation.
tempo_a = jams.Annotation(namespace='tempo', time=0, duration=track_duration)
tempo_a.annotation_metadata = jams.AnnotationMetadata(data_source='librosa tempo estimator')
# The tempo estimate is global, so it should start at time=0 and cover the full
# track duration.
# If we had a likelihood score on the estimation, it could be stored in
# `confidence`. Since we have no competing estimates, we'll set it to 1.0.
tempo_a.append(time=0.0,
duration=track_duration,
value=tempo,
confidence=1.0)
# Store the new annotation in the jam
jam.annotations.append(tempo_a)
# Save to disk
jam.save(outfile)
if __name__ == '__main__':
infile = librosa.util.example_audio_file()
beat_track(infile, 'output.jams')
|
example_beat_output.jams¶
The above script generates the following JAMS object.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 616 617 618 619 620 621 622 623 624 625 626 627 628 629 630 631 632 633 634 635 636 637 638 639 640 641 642 643 644 645 646 647 648 649 650 651 652 653 654 655 656 657 658 659 660 661 662 663 664 665 666 667 668 669 670 671 672 673 674 675 676 677 678 679 680 681 682 683 684 685 686 687 688 689 690 691 692 693 694 695 696 697 698 699 700 701 702 703 704 705 706 707 708 709 710 711 712 713 714 715 716 717 718 719 720 721 722 723 724 725 726 727 728 729 730 731 732 733 734 735 736 737 738 739 740 741 742 743 744 745 746 747 748 749 750 751 752 753 754 755 756 757 758 759 760 761 762 763 764 765 766 767 768 769 770 771 772 773 774 775 776 777 778 779 780 781 782 783 784 785 786 787 788 789 790 791 792 793 794 795 796 797 798 799 800 801 802 803 804 805 806 807 808 809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831 832 833 834 835 | {
"sandbox": {},
"file_metadata": {
"duration": 61.45886621315193,
"title": "",
"release": "",
"identifiers": {},
"artist": "",
"jams_version": "0.2.3"
},
"annotations": [
{
"sandbox": {},
"duration": null,
"data": [
{
"value": null,
"confidence": null,
"time": 0.11609977324263039,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 0.5572789115646258,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 0.9984580498866213,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 1.4628571428571429,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 1.9272562358276644,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 2.391655328798186,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 2.8328344671201813,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 3.297233560090703,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 3.7616326530612243,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 4.2260317460317465,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 4.690430839002268,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 5.154829931972789,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 5.61922902494331,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 6.0836281179138325,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 6.524807256235827,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 6.989206349206349,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 7.453605442176871,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 7.918004535147392,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 8.382403628117913,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 8.870022675736962,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 9.311201814058958,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 9.775600907029478,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 10.24,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 10.704399092970522,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 11.145578231292516,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 11.609977324263038,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 12.07437641723356,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 12.538775510204081,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 13.003174603174603,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 13.467573696145125,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 13.931972789115646,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 14.396371882086168,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 14.837551020408164,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 15.27873015873016,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 15.74312925170068,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 16.207528344671204,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 16.671927437641724,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 17.11310657596372,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 17.600725623582765,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 18.04190476190476,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 18.52952380952381,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 18.970702947845805,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 19.435102040816325,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 19.89950113378685,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 20.36390022675737,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 20.805079365079365,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 21.292698412698414,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 21.73387755102041,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 22.221496598639455,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 22.66267573696145,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 23.127074829931974,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 23.591473922902495,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 24.055873015873015,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 24.49705215419501,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 24.961451247165535,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 25.425850340136055,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 25.913469387755104,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 26.354648526077096,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 26.81904761904762,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 27.28344671201814,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 27.74784580498866,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 28.189024943310656,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 28.65342403628118,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 29.1178231292517,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 29.60544217687075,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 30.06984126984127,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 30.53424036281179,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 30.975419501133786,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 31.43981859410431,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 31.880997732426305,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 32.36861678004535,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 32.833015873015874,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 33.29741496598639,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 33.73859410430839,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 34.202993197278914,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 34.66739229024943,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 35.131791383219955,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 35.57297052154195,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 36.060589569160996,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 36.52498866213152,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 36.989387755102044,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 37.430566893424036,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 37.89496598639456,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 38.35936507936508,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 38.8237641723356,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 39.2649433106576,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 39.75256235827664,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 40.216961451247165,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 40.68136054421769,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 41.12253968253968,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 41.586938775510205,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 42.05133786848072,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 42.515736961451246,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 42.956916099773245,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 43.44453514739229,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 43.885714285714286,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 44.373333333333335,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 44.83773242630385,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 45.302131519274376,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 45.7665306122449,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 46.20770975056689,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 46.672108843537416,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 47.13650793650794,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 47.600907029478456,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 48.06530612244898,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 48.529705215419504,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 48.99410430839002,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 49.458503401360545,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 49.92290249433107,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 50.387301587301586,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 50.85170068027211,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 51.2928798185941,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 51.757278911564626,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 52.22167800453515,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 52.68607709750567,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 53.15047619047619,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 53.614875283446715,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 54.05605442176871,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 54.52045351473923,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 54.98485260770975,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 55.44925170068027,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 55.913650793650795,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 56.37804988662131,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 56.842448979591836,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 57.30684807256236,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 57.77124716553288,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 58.2356462585034,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 58.6768253968254,
"duration": 0.0
},
{
"value": null,
"confidence": null,
"time": 59.14122448979592,
"duration": 0.0
}
],
"namespace": "beat",
"time": 0,
"annotation_metadata": {
"corpus": "",
"validation": "",
"annotation_tools": "",
"version": "",
"curator": {
"name": "",
"email": ""
},
"annotation_rules": "",
"annotator": {},
"data_source": "librosa beat tracker"
}
},
{
"sandbox": {},
"duration": 61.45886621315193,
"data": [
{
"value": 129.19921875,
"confidence": 1.0,
"time": 0.0,
"duration": 61.45886621315193
}
],
"namespace": "tempo",
"time": 0,
"annotation_metadata": {
"corpus": "",
"validation": "",
"annotation_tools": "",
"version": "",
"curator": {
"name": "",
"email": ""
},
"annotation_rules": "",
"annotator": {},
"data_source": "librosa tempo estimator"
}
}
]
}
|
Evaluating annotations¶
The following script illustrates how to evaluate one JAMS annotation object against another using the built-in eval submodule to wrap mir_eval.
Given two jams files, say, reference.jams and estimate.jams, the script first loads them as objects
(j_ref
and j_est
, respectively). It then uses the JAMS.search method to locate all
annotations of namespace "beat"
. If no matching annotations are found, an empty list is returned.
In this example, we are assuming that each JAMS file contains only a single annotation of interest, so the first result is taken by indexing the results at 0. (In general, you may want to use annotation_metadata to select a specific annotation from the JAMS object, if multiple are present.)
Finally, the two annotations are compared by calling jams.eval.beat
, which returns an ordered
dictionary of evaluation metrics for the annotations in question.
example_eval.py¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | #!/usr/bin/env python
import sys
import jams
from pprint import pprint
def compare_beats(f_ref, f_est):
# f_ref contains the reference annotations
j_ref = jams.load(f_ref)
# f_est contains the estimated annotations
j_est = jams.load(f_est)
# Get the first reference beats
beat_ref = j_ref.search(namespace='beat')[0]
beat_est = j_est.search(namespace='beat')[0]
# Get the scores
return jams.eval.beat(beat_ref, beat_est)
if __name__ == '__main__':
f_ref, f_est = sys.argv[1:]
scores = compare_beats(f_ref, f_est)
# Print them out
pprint(dict(scores))
|
Data conversion¶
JAMS provides some basic functionality to help convert from flat file formats (e.g., CSV or LAB).
example_chord_import.py¶
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | #!/usr/bin/env python
import jams
import sys
def import_chord_jams(infile, outfile):
# import_lab returns a new jams object,
# and a handle to the newly created annotation
chords = jams.util.import_lab('chord', infile)
# Infer the track duration from the end of the last annotation
duration = max([obs.time + obs.duration for obs in chords])
chords.time = 0
chords.duration = duration
# Create a jams object
jam = jams.JAMS()
jam.file_metadata.duration = duration
jam.annotations.append(chords)
# save to disk
jam.save(outfile)
if __name__ == '__main__':
infile, outfile = sys.argv[1:]
import_chord_jams(infile, outfile)
|
chord_output.jams¶
Calling the above script on 01_-_I_Saw_Her_Standing_There.lab from IsoPhonics should produce the following JAMS object:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 | {
"annotations": [
{
"duration": 175.804082,
"data": [
{
"duration": 2.6122669999999997,
"value": "N",
"confidence": 1.0,
"time": 0.0
},
{
"duration": 8.846803000000001,
"value": "E",
"confidence": 1.0,
"time": 2.6122669999999997
},
{
"duration": 1.4628569999999996,
"value": "A",
"confidence": 1.0,
"time": 11.45907
},
{
"duration": 4.521546999999998,
"value": "E",
"confidence": 1.0,
"time": 12.921927
},
{
"duration": 2.966888000000001,
"value": "B",
"confidence": 1.0,
"time": 17.443474
},
{
"duration": 1.497686999999999,
"value": "E",
"confidence": 1.0,
"time": 20.410362
},
{
"duration": 1.4628580000000007,
"value": "E:7/3",
"confidence": 1.0,
"time": 21.908049
},
{
"duration": 1.4860770000000016,
"value": "A",
"confidence": 1.0,
"time": 23.370907
},
{
"duration": 1.486076999999998,
"value": "A:min/b3",
"confidence": 1.0,
"time": 24.856984
},
{
"duration": 1.497686999999999,
"value": "E",
"confidence": 1.0,
"time": 26.343061
},
{
"duration": 1.5092970000000037,
"value": "B",
"confidence": 1.0,
"time": 27.840747999999998
},
{
"duration": 5.955917999999997,
"value": "E",
"confidence": 1.0,
"time": 29.350045
},
{
"duration": 1.497686999999999,
"value": "A",
"confidence": 1.0,
"time": 35.305963
},
{
"duration": 4.459452000000006,
"value": "E",
"confidence": 1.0,
"time": 36.80365
},
{
"duration": 2.982543999999997,
"value": "B",
"confidence": 1.0,
"time": 41.263102
},
{
"duration": 1.474466999999997,
"value": "E",
"confidence": 1.0,
"time": 44.245646
},
{
"duration": 1.4860770000000016,
"value": "E:7/3",
"confidence": 1.0,
"time": 45.720113
},
{
"duration": 1.4860770000000016,
"value": "A",
"confidence": 1.0,
"time": 47.20619
},
{
"duration": 1.4628569999999996,
"value": "A:min/b3",
"confidence": 1.0,
"time": 48.692267
},
{
"duration": 1.497686999999999,
"value": "E",
"confidence": 1.0,
"time": 50.155124
},
{
"duration": 1.4860770000000016,
"value": "B",
"confidence": 1.0,
"time": 51.652811
},
{
"duration": 2.9721550000000008,
"value": "E",
"confidence": 1.0,
"time": 53.138888
},
{
"duration": 9.020951999999987,
"value": "A",
"confidence": 1.0,
"time": 56.111043
},
{
"duration": 3.0185940000000073,
"value": "B",
"confidence": 1.0,
"time": 65.13199499999999
},
{
"duration": 3.0418140000000022,
"value": "A",
"confidence": 1.0,
"time": 68.150589
},
{
"duration": 3.0069840000000028,
"value": "E",
"confidence": 1.0,
"time": 71.192403
},
{
"duration": 1.497686999999999,
"value": "A",
"confidence": 1.0,
"time": 74.199387
},
{
"duration": 4.539501000000001,
"value": "E",
"confidence": 1.0,
"time": 75.697074
},
{
"duration": 2.9721550000000008,
"value": "B",
"confidence": 1.0,
"time": 80.236575
},
{
"duration": 3.012962999999999,
"value": "E",
"confidence": 1.0,
"time": 83.20873
},
{
"duration": 1.5149279999999976,
"value": "A",
"confidence": 1.0,
"time": 86.221693
},
{
"duration": 1.5209070000000082,
"value": "A:min/b3",
"confidence": 1.0,
"time": 87.736621
},
{
"duration": 1.4628569999999854,
"value": "E",
"confidence": 1.0,
"time": 89.25752800000001
},
{
"duration": 1.4370680000000107,
"value": "B",
"confidence": 1.0,
"time": 90.720385
},
{
"duration": 11.949235999999999,
"value": "E",
"confidence": 1.0,
"time": 92.157453
},
{
"duration": 3.0185940000000073,
"value": "B",
"confidence": 1.0,
"time": 104.106689
},
{
"duration": 3.0534239999999926,
"value": "E",
"confidence": 1.0,
"time": 107.12528300000001
},
{
"duration": 2.9453800000000143,
"value": "A",
"confidence": 1.0,
"time": 110.178707
},
{
"duration": 1.4896309999999744,
"value": "E",
"confidence": 1.0,
"time": 113.12408700000002
},
{
"duration": 1.4860770000000088,
"value": "B",
"confidence": 1.0,
"time": 114.61371799999999
},
{
"duration": 2.845166000000006,
"value": "E",
"confidence": 1.0,
"time": 116.099795
},
{
"duration": 9.101501000000013,
"value": "A",
"confidence": 1.0,
"time": 118.944961
},
{
"duration": 3.0069839999999886,
"value": "B",
"confidence": 1.0,
"time": 128.04646200000002
},
{
"duration": 2.983764000000008,
"value": "A",
"confidence": 1.0,
"time": 131.053446
},
{
"duration": 3.006984999999986,
"value": "E",
"confidence": 1.0,
"time": 134.03721000000002
},
{
"duration": 1.4313290000000052,
"value": "A",
"confidence": 1.0,
"time": 137.044195
},
{
"duration": 4.582638999999972,
"value": "E",
"confidence": 1.0,
"time": 138.475524
},
{
"duration": 2.9837640000000363,
"value": "B",
"confidence": 1.0,
"time": 143.05816299999998
},
{
"duration": 1.5092969999999752,
"value": "E",
"confidence": 1.0,
"time": 146.04192700000002
},
{
"duration": 1.5092970000000037,
"value": "E:7/3",
"confidence": 1.0,
"time": 147.551224
},
{
"duration": 1.451246999999995,
"value": "A",
"confidence": 1.0,
"time": 149.060521
},
{
"duration": 1.5092970000000037,
"value": "A:min/b3",
"confidence": 1.0,
"time": 150.511768
},
{
"duration": 1.509297000000032,
"value": "E",
"confidence": 1.0,
"time": 152.021065
},
{
"duration": 1.5325169999999844,
"value": "B",
"confidence": 1.0,
"time": 153.53036200000003
},
{
"duration": 4.469842,
"value": "E",
"confidence": 1.0,
"time": 155.062879
},
{
"duration": 1.5325169999999844,
"value": "B",
"confidence": 1.0,
"time": 159.532721
},
{
"duration": 4.516280999999992,
"value": "E",
"confidence": 1.0,
"time": 161.065238
},
{
"duration": 1.5325170000000128,
"value": "B",
"confidence": 1.0,
"time": 165.581519
},
{
"duration": 1.5325170000000128,
"value": "A",
"confidence": 1.0,
"time": 167.114036
},
{
"duration": 1.0908560000000023,
"value": "E",
"confidence": 1.0,
"time": 168.646553
},
{
"duration": 1.9497639999999876,
"value": "E:9",
"confidence": 1.0,
"time": 169.737409
},
{
"duration": 4.116908999999993,
"value": "N",
"confidence": 1.0,
"time": 171.687173
}
],
"namespace": "chord",
"time": 0,
"annotation_metadata": {
"version": "",
"annotation_tools": "",
"annotator": {},
"curator": {
"email": "",
"name": ""
},
"data_source": "",
"corpus": "",
"annotation_rules": "",
"validation": ""
},
"sandbox": {}
}
],
"file_metadata": {
"duration": 175.804082,
"jams_version": "0.2.3",
"artist": "",
"identifiers": {},
"release": "",
"title": ""
},
"sandbox": {}
}
|
More examples¶
In general, converting a dataset to JAMS format will require a bit more work to ensure that value fields conform to the specified namespace schema, but the import script above should serve as a simple starting point.
For further reference, a separate repository jams-data has been created to house conversion scripts for publicly available datasets. Note that development of converters is a work in progress, so proceed with caution!
API reference¶
API Reference¶
Core functionality¶
This library provides an interface for reading JAMS into Python, or creating them programatically.
Object reference¶
JAMS ([annotations, file_metadata, sandbox]) |
Top-level Jams Object |
FileMetadata ([title, artist, release, …]) |
Metadata for a given audio file. |
AnnotationArray ([annotations]) |
This list subclass provides serialization and search/filtering for annotation collections. |
AnnotationMetadata ([curator, version, …]) |
Data structure for metadata corresponding to a specific annotation. |
Curator ([name, email]) |
Container object for curator metadata. |
Annotation (namespace[, data, …]) |
Annotation base class. |
Observation (time, duration, value, confidence) |
Core observation type: (time, duration, value, confidence). |
Sandbox (unconstrained) |
Functionally identical to JObjects, but the class hierarchy might be confusing if all objects inherit from Sandboxes. |
JObject (**kwargs) |
Dict-like object for JSON Serialization. |
Observation (time, duration, value, confidence) |
Core observation type: (time, duration, value, confidence). |
Namespace management¶
add_namespace (filename) |
Add a namespace definition to our working set. |
namespace (ns_key) |
Construct a validation schema for a given namespace. |
namespace_array (ns_key) |
Construct a validation schema for arrays of a given namespace. |
is_dense (ns_key) |
Determine whether a namespace has dense formatting. |
values (ns_key) |
Return the allowed values for an enumerated namespace. |
get_dtypes (ns_key) |
Get the dtypes associated with the value and confidence fields for a given namespace. |
list_namespaces () |
Print out a listing of available namespaces |
Display¶
display (annotation[, meta]) |
Visualize a jams annotation through mir_eval |
display_multi (annotations[, fig_kw, meta]) |
Display multiple annotations with shared axes |
Evaluation¶
beat (ref, est, \*\*kwargs) |
Beat tracking evaluation |
chord (ref, est, \*\*kwargs) |
Chord evaluation |
melody (ref, est, \*\*kwargs) |
Melody extraction evaluation |
onset (ref, est, \*\*kwargs) |
Onset evaluation |
segment (ref, est, \*\*kwargs) |
Segment evaluation |
tempo (ref, est, \*\*kwargs) |
Tempo evaluation |
pattern (ref, est, \*\*kwargs) |
Pattern detection evaluation |
hierarchy (ref, est, \*\*kwargs) |
Multi-level segmentation evaluation |
transcription (ref, est, \*\*kwargs) |
Note transcription evaluation |
Namespace conversion¶
convert (annotation, target_namespace) |
Convert a given annotation to the target namespace. |
Utility functions¶
import_lab (namespace, filename[, infer_duration]) |
Load a .lab file as an Annotation object. |
expand_filepaths (base_dir, rel_paths) |
Expand a list of relative paths to a give base directory. |
smkdirs (dpath[, mode]) |
Safely make a full directory path if it doesn’t exist. |
filebase (filepath) |
Return the extension-less basename of a file path. |
find_with_extension (in_dir, ext[, depth, sort]) |
Naive depth-search into a directory for files with a given extension. |
Namespace definitions¶
Beat¶
beat¶
Beat event markers with optional metrical position.
time duration value confidence [sec] [sec] [number] or [null] –
Each observation corresponds to a single beat event.
The value
field can be a number (positive or negative, integer or floating point),
indicating the metrical position within the bar of the observed beat.
If no metrical position is provided for the annotation, the value
field will be
null
.
Example
time duration value confidence 0.500 0.000 1 null 1.000 0.000 2 null 1.500 0.000 3 null 2.000 0.000 4 null 2.500 0.000 1 null
Note
duration
is typically zero for beat events, but this is not enforced.
confidence
is an unconstrained field for beat annotations, and may contain
arbitrary data.
beat_position¶
Beat events with time signature information.
time duration value confidence [sec] [sec]
- position
- measure
- num_beats
- beat_units
–
Each observation corresponds to a single beat event.
The value
field is a structure containing the following fields:
position
: the position of the beat within the measure. Can be any number greater than or equal to 1.measure
: the index of the measure containing this beat. Can be any non-negative integer.num_beats
: the number of beats per measure : can be any strictly positive integer.beat_units
: the note value for beats in this measure. Must be one of:1, 2, 4, 8, 16, 32, 64, 128, 256
.
All fields are required for each observation.
Example
time duration value confidence 0.500 0.000
- position: 1
- measure: 0
- num_beats: 4
- beat_units: 4
null 1.000 0.000
- position: 2
- measure: 0
- num_beats: 4
- beat_units: 4
null 1.500 0.000
- position: 3
- measure: 0
- num_beats: 4
- beat_units: 4
null 2.000 0.000
- position: 4
- measure: 0
- num_beats: 4
- beat_units: 4
null 2.500 0.000
- position: 1
- measure: 1
- num_beats: 4
- beat_units: 4
null
Note
duration
is typically zero for beat events, but this is not enforced.
confidence
is an unconstrained field for beat annotations, and may contain
arbitrary data.
position
should lie in the range [1, beat_units]
, but the upper bound is not
enforced at the schema level.
Chord¶
chord¶
Chord annotations described by an extended version of the grammar defined by Harte, et al. [1]
time duration value confidence [sec] [sec] string –
[1] | (1, 2, 3) Harte, Christopher, Mark B. Sandler, Samer A. Abdallah, and Emilia Gómez. “Symbolic Representation of Musical Chords: A Proposed Syntax for Text Annotations.” In ISMIR, vol. 5, pp. 66-71. 2005. |
This namespace is similar to chord_harte, with the following modifications:
- Sharps and flats may not be mixed in a note symbol. For instance, A#b# is legal in chord_harte but not in chord. A### is legal in both.
- The following quality values have been added:
- sus2, 1, 5
- aug7
- 11, maj11, min11
- 13, maj13, min13
Example
time duration value confidence 0.000 1.000 N
null 0.000 1.000 Bb:5
null 0.000 1.000 E:(*5)
null 0.000 1.000 E#:min9/9
null 0.000 1.000 G##:maj6
null 0.000 1.000 D:13/6
null 0.000 1.000 A:sus2
null
Note
confidence
is an unconstrained field, and may contain arbitrary data.
chord_harte¶
Chord annotations described according to the grammar defined by Harte, et al. [1]
time duration value confidence [sec] [sec] string –
Each observed value is a text representation of a chord annotation.
N
specifies a no chord observation- Notes are annotated in the usual way:
A-G
followed by optional sharps (#
) and flats (b
)
- Chord qualities are denoted by abbreviated strings:
- maj, min, dim, aug
- maj7, min7, 7, dim7, hdim7, minmaj7
- maj6, min6
- 9, maj9, min9
- sus4
- Inversions are specified by a slash (
/
) followed by the interval number, e.g.,G/3
.- Extensions are denoted in parentheses, e.g.,
G(b11,13)
. Suppressed notes are indicated with an asterisk, e.g.,G(*3)
A complete description of the chord grammar is provided in [1], table 1.
Example
time duration value confidence 0.000 1.000 N
null 0.000 1.000 Bb
null 0.000 1.000 E:(*5)
null 0.000 1.000 E#:min9/9
null 0.000 1.000 G#b:maj6
null 0.000 1.000 D/6
null 0.000 1.000 A:sus4
null
Note
confidence
is an unconstrained field, and may contain arbitrary data.
chord_roman¶
Chord annotations in roman numeral format, as described by [2].
time duration value confidence [sec] [sec]
- tonic
- chord
–
The value
field is a structure containing the following fields:
tonic
: (string) the tonic note of the chord, e.g.,A
orGb
.
chord
: (string) the scale degree of the chord in roman numerals (1–7), along with inversions, extensions, and qualities.
Scale degrees are encoded with optional leading sharps and flats, e.g.,
V
,bV
or#VII
. Upper-case numerals indicate major, lower-case numeral indicate minor.Qualities are encoded as one of the following symbols:
o
: diminished (triad)+
: augmented (triad)s
: suspensiond
: dominant (seventh)h
: half-diminished (seventh)x
: fully-diminished (seventh)Inversions are encoded by arabic numerals, e.g.,
V6
for a first-inversion triad,V64
for second inversion.Applied chords are encoded by a
/
followed by a roman numeral encoding of the scale degree, e.g.,V7/IV
.
[2] | (1, 2) http://theory.esm.rochester.edu/rock_corpus/harmonic_analyses.html |
- Example
time duration value confidence 0.000 0.500 - tonic: C
- chord: I6
– 0.500 0.500 - tonic: C
- chord: bIV
– 1.000 0.500 - tonic: C
- chord: Vh7
–
Note
The grammar defined in [2] has been constrained to support only the quality symbols listed above.
confidence
is an unconstrained field, and may contain arbitrary data.
Key¶
key_mode¶
Key and optional mode (major/minor or Greek modes)
time duration value confidence [sec] [sec] string –
The value
field is a string matching one of the three following patterns:
N
: no keyAb, A, A#, Bb, ... G#
: tonic note, upper casetonic:MODE
wheretonic
is as described above, andMODE
is one of:major, minor, ionian, dorian, phrygian, lydian, mixolydian, aeolian, locrian
.
Example
time duration value confidence 0.000 30.0 C:minor null 30.0 5.00 N null 35.0 15.0 C#:dorian null 50.0 10.0 Eb null 60.0 10.0 A:lydian null
Note
confidence
is an unconstrained field, and may contain arbitrary data.
Lyrics¶
lyrics¶
Time-aligned lyrical annotations.
time duration value confidence [sec] [sec] string –
The required value
field can contain arbitrary text data, e.g., lyrics.
Example
time duration value confidence 0.500 4.000 “Row row row your boat” null 4.500 2.000 “gently down the stream” null 7.000 1.000 “merrily” null 8.000 1.000 “merrily” null 9.000 1.000 “merrily” null 10.00 1.000 “merrily” null
Note
confidence
is an unconstrained field, and may contain arbitrary data.
lyrics_bow¶
Time-aligned bag-of-words or bag-of-ngrams.
time duration value confidence [sec] [sec] array –
The required value
field is an array, where each element is an array of [term, count]
.
The term
here may be either a string (for simple bag-of-words) or an array of strings (for bag-of-ngrams).
Example
time duration value confidence 0.000 30.00
- [‘row’, 3]
- [ [‘row’, ‘row’], 2]
- [‘your’, 1]
- [‘boat’, 1]
null
Note
confidence
is an unconstrained field, and may contain arbitrary data.
Mood¶
mood_thayer¶
Time-varying emotion measurements as ordered pairs of (valence, arousal)
time duration value confidence [sec] [sec] (valence, arousal) –
The value
field is an ordered pair of numbers measuring the valence
and
arousal
positions in the Thayer mood model [3].
[3] | Thayer, Robert E. The biopsychology of mood and arousal. Oxford University Press, 1989. |
Example
time duration value confidence 0.500 0.250 (-0.5, 1.0) null 0.750 0.250 (-0.3, 0.6) null 1.000 0.750 (-0.1, 0.1) null 1.750 0.500 (0.3, -0.5) null
Note
confidence
is an unconstrained field, and may contain arbitrary data.
Onset¶
onset¶
Note onset event markers.
time duration value confidence [sec] [sec] – –
This namespace can be used to encode timing of arbitrary instantaneous events. Most commonly, this is applied to note onsets.
Example
time duration value confidence 0.500 0.000 null null 1.000 0.000 null null 1.500 0.000 null null 2.000 0.000 null null
Note
duration
is typically zero for instantaneous events, but this is not enforced.
value
and confidence
fields are unconstrained, and may contain arbitrary data.
Pattern¶
pattern_jku¶
Each note of the pattern contains (pattern_id, midi_pitch, occurrence_id, morph_pitch,
staff)
, following the format described in [4].
time duration value confidence [sec] [sec]
- pattern_id
- midi_pitch
- occurrence_id
- morph_pitch
- staff
–
[4] | Collins T., Discovery of Repeated Themes & Sections, Music Information Retrieval Evalaluation eXchange (MIReX), 2013 (Accessed on July 7th 2015). Available here. |
Each value
field contains a dictionary with the following keys:
pattern_id
: The integer that identifies the current pattern,- starting from 1.
midi_pitch
: The float representing the midi pitch.
occurrence_id
: The integer that identifies the current occurrence,- starting from 1.
morph_pitch
: The float representing the morphological pitch.
staff
: The integer representing the staff where the current note of the- patter is found, starting from 0.
Example
time duration value confidence 62.86 0.09
- pattern_id: 1
- midi_pitch: 50
- occurrence_id: 1
- morph_pitch: 54
- staff: 1
null 62.86 0.36
- pattern_id: 1
- midi_pitch: 77
- occurrence_id: 1
- morph_pitch: 70
- staff: 0
null 36.34 0.09
- pattern_id: 1
- midi_pitch: 71
- occurrence_id: 2
- morph_pitch: 66
- staff: 0
null 36.43 0.36
- pattern_id: 1
- midi_pitch: 69
- occurrence_id: 2
- morph_pitch: 65
- staff: 0
null
Pitch¶
pitch_contour¶
Pitch contours in the format (index, frequency, voicing)
.
time duration value confidence [sec] [–]
- index
- frequency
- voiced
–
Each value
field is a structure containing a contour index
(an integer indicating which contour the observation belongs to), a frequency
value in Hz, and a boolean indicating if the values is voiced
. The confidence
field is unconstrained.
Example
time duration value confidence 0.0000 0.0000
- index: 0
- frequency: 442.1
- voiced: True
null 0.0058 0.0000
- index: 0
- frequency: 457.8
- voiced: False
null 2.5490 0.0000
- index: 1
- frequency: 89.4
- voiced: True
null 2.5548 0.0000
- index: 1
- frequency: 90.0
- voiced: True
null
note_hz¶
Note events with (non-negative) frequencies measured in Hz.
time duration value confidence [sec] [sec]
- number
–
Each value
field gives the frequency of the note in Hz.
Example
time duration value confidence 12.34 0.287 189.9 null 2.896 3.000 74.0 null 10.12 0.5. 440.0 null
note_midi¶
Note events with pitches measured in (fractional) MIDI note numbers.
time duration value confidence [sec] [sec]
- number
–
Each value
field gives the pitch of the note in MIDI note numbers.
Example
time duration value confidence 12.34 0.287 52.0 null 2.896 3.000 20.7 null 10.12 0.5. 42.0 null
pitch_class¶
Pitch measurements in (tonic, pitch class)
format.
time duration value confidence [sec] [sec]
- tonic
- pitch class
–
Each value
field is a structure containing a tonic
(note string, e.g., "A#"
or
"D"
)
and a pitch class pitch
as an integer scale degree. The confidence
field is unconstrained.
Example
time duration value confidence 0.000 30.0
- tonic:
C
- pitch: 0
null 0.000 30.0
- tonic:
C
- pitch: 4
null 0.000 30.0
- tonic:
C
- pitch: 7
null 30.00 35.0
- tonic:
G
- pitch: 0
null
pitch_hz¶
Warning
Deprecated, use pitch_contour
.
Pitch measurements in Hertz (Hz). Pitch (a subjective sensation) is represented as fundamental frequency (a physical quantity), a.k.a. “f0”.
time duration value confidence [sec] –
- number
–
The time
field represents the instantaneous time in which the pitch f0 was
estimated. By convention, this (usually) represents the center time of the
analysis frame. Note that this is different from pitch_midi and pitch_class,
where time
represents the onset time. As a consequence, the duration
field is undefined and should be ignored. The value
field is a number
representing the f0 in Hz. By convention, values that are equal to or less than
zero are used to represent silence (no pitch). Some algorithms (e.g. melody
extraction algorithms that adhere to the MIREX convention) use negative f0
values to represent the algorithm’s pitch estimate for frames where it thinks
there is no active pitch (e.g. no melody), to allow the independent evaluation
of pitch activation detection (a.k.a. “voicing detection”) and pitch frequency
estimation. The confidence
field is unconstrained.
Example
time duration value confidence 0.000 0.000 300.00 null 0.010 0.000 305.00 null 0.020 0.000 310.00 null 0.030 0.000 0.00 null 0.040 0.000 -280.00 null 0.050 0.000 -290.00 null
pitch_midi¶
Warning
Deprecated, use note_midi
or pitch_contour
.
Pitch measurements in (fractional) MIDI note number notation.
time duration value confidence [sec] [sec] number –
The value
field is a number representing the pitch in MIDI notation.
Numbers can be negative (for notes below C-1
) or fractional.
Example
time duration value confidence 0.000 30.000 24 null 0.000 30.000 43.02 null 15.00 45.000 26 null
Segment¶
segment_open¶
Structural segmentation with an open vocabulary of segment labels.
time duration value confidence [sec] [sec] string –
The value
field contains string descriptors for each segment, e.g., “verse” or
“bridge”.
- Example
time duration value confidence 0.000 20.000 intro null 20.00 30.000 verse null 30.00 50.000 refrain null 50.00 70.000 verse (alternate) null
segment_salami_function¶
Segment annotations with functional labels from the SALAMI guidelines.
time duration value confidence [sec] [sec] string –
The value
field must be one of the allowable strings in the SALAMI function
vocabulary.
- Example
time duration value confidence 0.000 20.000 applause null 20.00 30.000 count-in null 30.00 50.000 introduction null 50.00 70.000 verse null
segment_salami_upper¶
Segment annotations with SALAMI’s upper-case (large) label format.
time duration value confidence [sec] [sec] string –
The value
field must be a string of the following format:
- “silence” or “Silence”
- One or more upper-case letters, followed by zero or more apostrophes
- Example
time duration value confidence 0.000 20.000 silence null 20.00 30.000 A null 30.00 50.000 B null 50.00 70.000 A’ null
segment_salami_lower¶
Segment annotations with SALAMI’s lower-case (small) label format.
time duration value confidence [sec] [sec] string –
The value
field must be a string of the following format:
- “silence” or “Silence”
- One or more lower-case letters, followed by zero or more apostrophes
- Example
time duration value confidence 0.000 20.000 silence null 20.00 30.000 a null 30.00 50.000 b null 50.00 70.000 a’ null
segment_tut¶
Segment annotations using the TUT vocabulary.
time duration value confidence [sec] [sec] string –
The value
field is a string describing the function of the segment.
- Example
time duration value confidence 0.000 20.000 Intro null 20.00 30.000 Verse null 30.00 50.000 bridge null 50.00 70.000 RefrainA null
multi_segment¶
Multi-level structural segmentations.
time duration value confidence [sec] [sec]
- label : string
- level : int >= 0
–
In a multi-level segmentation, the track is partitioned many times —
possibly recursively — which results in a collection of segmentations of varying degrees
of specificity. In the multi_segment
namespace, all of the resulting segments are
collected together, and the level
field is used to encode the segment’s corresponding
partition.
Level values must be non-negative, and ordered by increasing specificity. For example,
level==0
may correspond to a single segment spanning the entire track, and each
subsequent level value corresponds to a more refined segmentation.
- Example
time duration value confidence 0.000 60.000 - label : A
- level : 0
null 0.000 30.000 - label : B
- level : 1
null 30.00 60.000 - label : C
- level : 1
null 0.000 15.000 - label : a
- level : 2
null 15.00 30.000 - label : b
- level : 2
null 30.00 45.000 - label : a
- level : 2
null 45.00 60.000 - label : c
- level : 2
null
Tag¶
tag_cal10k¶
Tags from the CAL10K vocabulary.
time duration value confidence [sec] [sec] string –
The value
is constrained to a set of 1053 terms, spanning mood, instrumentation,
style, and genre.
time duration value confidence 0.000 30.000 “bop influences” null 0.000 30.000 “bright beats” null 0.000 30.000 “hip hop roots” null
tag_cal500¶
Tags from the CAL500 vocabulary.
time duration value confidence [sec] [sec] string –
The value
is constrained to a set of 174 terms, spanning mood, instrumentation, and
genre.
time duration value confidence 0.000 30.000 “Genre-Best-Rock” null 0.000 30.000 “Vocals-Monotone” null 0.000 30.000 “Usage-At_work” null
tag_gtzan¶
Genre classes from the GTZAN dataset.
time duration value confidence [sec] [sec] string –
The value
field is constrained to one of ten strings:
blues
classical
country
disco
hip-hop
jazz
metal
pop
reggae
rock
By convention, only one tag is applied per track in this namespace. This is not enforced by the schema.
Example
time duration value confidence 0.000 30.000 “reggae” null
tag_msd_tagtraum_cd1¶
Genre classes from the msd tagtraum cd1 dataset.
time duration value confidence [sec] [sec] string –
The value
field is constrained to one of 13 strings:
reggae
pop/rock
rnb
jazz
vocal
new age
latin
rap
country
international
blues
electronic
folk
By convention, one or two tags per track are possible in this namespace.
The sum of the confidence values should equal 1.0
.
This is not enforced by the schema.
Example
time duration value confidence 0.000 0.000 “reggae” 1.0
tag_msd_tagtraum_cd2¶
Genre classes from the msd tagtraum cd2 dataset.
time duration value confidence [sec] [sec] string –
The value
field is constrained to one of 15 strings:
reggae
latin
metal
rnb
jazz
punk
pop
new age
country
rap
rock
world
blues
electronic
folk
By convention, one or two tags per track are possible in this namespace.
The sum of the confidence values should equal 1.0
.
This is not enforced by the schema.
Example
time duration value confidence 0.000 0.000 “reggae” 0.6666667 0.000 0.000 “rock” 0.3333333
tag_medleydb_instruments¶
MedleyDB instrument source annotations.
time duration value confidence [sec] [sec] string –
The value
field is constrained to the set of instruments defined
in MedleyDB instrument taxonomy.
Example
time duration value confidence 0.000 20.000 “darbuka” null 0.000 20.000 “flute section” null 0.000 20.000 “oud” null
tag_open¶
Open vocabulary tags. This namespace is appropriate for unconstrained tag data, such as tags from Last.FM or Magnatagatune.
time duration value confidence [sec] [sec] string –
The value
field is unconstrained, and can contain any string value.
Example
time duration value confidence 0.000 20.000 “rocking” null 0.000 20.000 “rockin’” null 0.000 20.000 “” null 0.000 20.000 “favez^^^” null
tag_audioset¶
Tags from the full AudioSet (v1) ontology.
time duration value confidence [sec] [sec] string –
The value
field is constrained to the vocabulary of the AudioSet ontology.
Example
time duration value confidence 0.000 20.000 “Air brake” null 5.000 25.000 “Yodeling” null 9.000 35.000 “Steam whistle” null
tag_audioset_genre¶
Tags from the musical genre subset of the AudioSet (v1) ontology.
time duration value confidence [sec] [sec] string –
The value
field is constrained to the 66 musical genres of the AudioSet-genre ontology.
Example
time duration value confidence 0.000 20.000 “Oldschool jungle” null
tag_audioset_instrument¶
Tags from the musical instrument subset of the AudioSet (v1) ontology.
time duration value confidence [sec] [sec] string –
The value
field is constrained to the 91 musical instruments
of the AudioSet-instruments ontology.
Example
time duration value confidence 0.000 20.000 “Ukulele” null 5.000 25.000 “Piano” null 9.000 35.000 “Tuning fork” null
tag_fma_genre¶
Tags from the Free Music Archive (FMA) 16-class genre taxonomy.
time duration value confidence [sec] [sec] string –
The value
field is constrained to the 16 genres
of the FMA data-set.
Example
time duration value confidence 0.000 20.000 “Blues” null 5.000 25.000 “Instrumental” null 9.000 35.000 “Soul-RnB” null
tag_fma_subgenre¶
Tags from the Free Music Archive (FMA) 163-class genre and sub-genre taxonomy.
time duration value confidence [sec] [sec] string –
The value
field is constrained to the 163 genres and sub-genres of the FMA data-set.
Example
time duration value confidence 0.000 20.000 “Pop” null 5.000 25.000 “Power-Pop” null 9.000 35.000 “Nerdcore” null
tag_urbansound¶
Genre classes from the UrbanSound dataset.
time duration value confidence [sec] [sec] string –
The value
field is constrained to one of ten strings:
air_conditioner
car_horn
children_playing
dog_bark
drilling
engine_idling
gun_shot
jackhammer
siren
street_music
Example
time duration value confidence 0.000 30.000 “street_music” null
Tempo¶
tempo¶
Tempo measurements in beats per minute (BPM).
time duration value confidence [sec] [sec] number number
The value
field is a non-negative number (floating point), indicated the tempo measurement.
The confidence
field is a number in the range [0, 1]
, following the format used by MIREX [5].
[5] | http://www.music-ir.org/mirex/wiki/2014:Audio_Tempo_Estimation |
Example
time duration value confidence 0.00 60.00 180.0 0.8 0.00 60.00 90.0 0.2
Note
MIREX requires that tempo measurements come in pairs, and that the confidence values sum to 1. This is not enforced at the schema level.
Miscellaneous¶
Vector¶
Numerical vector data. This is useful for generic regression problems where the output is a vector of numbers.
time duration value confidence [sec] [sec] [array of numbers] –
Each observation value must be an array of at least one number. Different observations may have different length arrays, so it is up to the user to verify that arrays have the desired length.
Blob¶
Arbitrary data blobs.
time duration value confidence [sec] [sec] – –
This namespace can be used to encode arbitrary data. The value and confidence fields have no schema constraints, and may contain any structured (but serializable) data. This can be useful for storing complex output data that does not fit any particular task schema, such as regression targets or geolocation data.
It is strongly advised that the AnnotationMetadata for blobs be as explicit as possible.
Scaper¶
Structured representation for soundscapes synthesized by the Scaper package.
time duration value confidence [sec] [sec]
- label
- source_file
- source_time
- event_time
- event_duration
- snr
- time_stretch
- pitch_shift
- role
–
Each value
field contains a dictionary with the following keys:
label
: a string indicating the label of the sound sourcesource_file
: a full path to the original sound source (on disk)source_time
: a non-negative number indicating the time offset withinsource_file
of the soundevent_time
: the start time of the event in the synthesized soundscapeevent_duration
: a strictly positive number indicating the duration of the eventsnr
: the signal-to-noise ratio (in LUFS) of the sound compared to the backgroundtime_stetch
: (optional) a strictly positive number indicating the amount of time-stretch applied to the sourcepitch_shift
: (optional) the amount of pitch-shift applied to the sourcerole
: one ofbackground
orforeground
Contribute¶
Changelog¶
Changes¶
v0.3.4¶
v0.3.2¶
v0.3.1¶
v0.3.0¶
- Removed the JamsFrame class and replaced the underlying observation storage data structure (PR #149).
- import_lab now returns only an Annotation and does not construct a JAMS object (PR #154)
- Accelerated pitch contour sonification (PR #155)
- Migrated all tests from nosetest to py.test (PR #157)
- Improved repr() and added HTML rendering for JAMS objects in notebooks (PR #158)
- Fixed a JSON serialization bug with numpy datatypes (PR #160)
v0.2.3¶
- Deprecated the JamsFrame class
(PR #153):
- Moved JamsFrame.to_interval_values() to Annotation.to_interval_values()
- Any code that uses
pandas.DataFrame
methods on Annotation.data will cease to work starting in 0.3.0.
- Forward compatibility with 0.3.0
(PR #153):
- Added the
jams.Observation
type - Added iteration support to Annotation objects
- Added the
- added type safety check in regexp search (PR #146).
- added support for pandas=0.20 (PR #150).
v0.2.2¶
- added
__contains__
method toJObject
(PR #139). - Implemented
JAMS.trim()
method (PR #136). - Updates to the SALAMI tag namespaces (PR #134).
- added infer_duration flag to
import_lab
(PR #125). - namespace conversion validates input (PR #123).
- Refactored the
pitch
namespaces (PR #121). - Fancy indexing for annotation arrays (PR #120).
jams.schema.values
function to access enumerated types (PR #119).jams.display
submodule (PR #115).- support for mir_eval >= 0.3 (PR #106).
- Automatic conversion between namespaces (PR #105).
- Fixed a type error in
jams_to_lab
(PR #94). jams.sonify
module for sonification (PR #91).
v0.2.1¶
- New features
eval
support for hierarchical segmentation via themulti_segment
namespace (PR #79).- Local namespace management (PR #75).
- Python 3.5 support (PR #73).
jams.search()
now allows matching objects by equality (PR #71).multi_segment
namespace for multi-level structural segmentations. (PR #69).vector
namespace for numerical vector data (PR #64).blob
namespace for unstructured, time-keyed observation data (PR #63).tag_msd_tagtraum_cd1
andtag_msd_tagtraum_cd2
namespaces for genre tags (PR #83).
- Schema changes
Annotation
objects now havetime
andduration
fields which encode the interval over which the annotation is valid. (PR #67).
- Bug fixes
- Appending data to
Annotation
orJamsFrame
objects now fails iftime
orduration
are ill-specified. (PR #87).
- Appending data to