yt Enhancement Proposals¶
This is a repository of “yt Enhancement Proposals” (YTEPs). Because yt is relied upon for production-level science, large-scale design decisions are to be described and discussed before being acted upon. YTEP documents are set up to provide minimal overhead to discussion, while still allowing a clear and evolving specification.
YTEPs go through several stages in their lifetime:
- Initial writing of the YTEP
- Announcement and/or discussion on the yt-dev mailing list (may include implementation)
- Implementation and integration, or possibly declination (this includes pull requests)
YTEPs are useful for any code development that affects how others build tools or infrastructure, and necessary for breaking backwards compatibility. They are not necessary for most new developments or any bug fixes.
Contents:
YTEP-0000: Project Governance¶
Abstract¶
Created: August 24, 2014 Author: Britton Smith Modified: August 09, 2019 Author: Madicken Munk
This document describes the high-level structure, policies, procedures, and processes of the yt project.
Status¶
Completed
Project Management Links¶
- Apache Software Foundation
- Initial governance discussion on yt-dev: here.
- Secondary discussion about need to update governance from yt-exec: here.
Detailed Description¶
Motivation¶
The yt project consists of a number of repositories within the yt project organization. The organization itself has a number of people interacting with and contributing to these associated repositories. Here we propose a broad overview of the governance model of the project, which will be detailed in a specific governance repository within the yt project, at: https://github.com/yt-project/governance. Major changes to the governance model will be iterated upon here, and details about the model will happen in the governance repository. This will allow small changes within the governance documentation to move quickly and not need to go through a major vote to update the YTEP.
Structure¶
The governance document will do the following
- Define where the governance documents apply, and how to override them, if relevant
- Provide guidelines on project licensing
- Link to the Code of Conduct, state what happens if a violation occurs, and what avenues are available for reporting violations
- State how conflicts of interest are handled among voting members of the community
- State who the voting members of the community are
- Outline and define the roles within the project, including: contributors, developers, reviewers, and maintainers
- Define how to become a project member, what expections exist for project members, how to become an emeritus member of the project, and how to revoke project membership
- Define how to become a steering committee member, what expections exist for steering committee members, and how members are voted into the steering committee, and how long membership on the steering committee lasts
- Create a project leadership structure that facilites project sustainability, inclusive onboarding practices, and mentorship to learn and understand packages/subpackages within the yt project.
- Make clear guidelines on how voting occurs for changes in the project, including:
- minor documentation changes
- code changes and major documentation changes
- changes to API principles and changes to dependencies or supported versions
- changes to the project governance
- project membership.
- State when project meetings happen, at what frequency that they occur, how they are announced to the community, and where they are documented.
Backwards Compatibility¶
Sic semper inordinatio.
Alternatives¶
The alternative is to continue with no official guidelines and somehow manage, or to continue with an older version of the governance model.
YTEP-0001: IO Chunking¶
Abstract¶
Created: November 26, 2012 Author: Matthew Turk
IO in yt 2.x has always been based on batching IO based on grids. This YTEP describes a new method, which allows for a selection of keywords (‘spatial’, ‘all’, ‘io’) to describe methods of IO that are then left to the frontend or geometry handler to implement. This way, the frontend is able to decide how to access data without any prescriptions on how it should be accessed.
Project Management Links¶
Detailed Description¶
Background¶
“Chunking” in this section refers to the loading of data off disk in bulk. For traditional frontends in yt, this has been in the form of grids: either single or in bulk, grids have been loaded off disk. When Derived Quantities want to handle individual grids, one at a time, they “preload” the data from whatever grids the ParallelAnalysisInterface thinks they deserve. These grids are iterated over, and handled individually, then the result is combined at the end. Profiles do something similar. However, both of these are de facto, and not really designed. They rely on calls to semi-private functions on data objects, manually masking data, on and on.
An explicit method of data chunking that relies on the characteristics of the
desired chunks, rather than the means of the chunking, is needed to bypass this
reliance on the grid mediation of IO. In this method, data objects will
request that the geometry handler supply a set of chunks. Chunks are of the form
(IO_unit, size), where IO_unit is only ever managed or handled by
_read_selection
. This allows the information about all types of IO and
collections of data to live internal to the individual implementations of
GeometryHandler
objects. This way, Grids can still batch based on Grid
information, but this abstraction is not needed for Octree IO.
Note that YTEP-0017 redefines GeometryHandler
to Index
– this reflects
the fact that the process of data selection and IO is better thought of as a
process of indexing, and that any subsequent operations should be conducted at
a higher level.
Main Changes¶
- Data objects no longer have a
_grids
attribute.- Parallelism is restructured to iterate over chunks (decided on by the geometry handler) rather than grids
- Grids do not exist outside of the grid geometry handler
- To specifically break backwards compatibility, a
blocks
property has been added which will iterate and yield block-like data (i.e., grids) and the mask of the blocks. This should encompass the use case of both iterating over the_grids
attribute, obtaining the mask that selects points inside a region, and having a 3D dataset. This should be used exceedingly rarely, but it will be implemented for all types of data. All routines that call upon_grids
directly must be updated to useblocks
.
Implementation¶
The chunking system is implemented in a geometry handler through several
functions. The GeometryHandler
class needs to have the following routines
implemented:
_identify_base_chunk(self, dobj)
: this routine must set the_current_chunk
attribute ondobj
to be equal to a chunk that represents the full selection of data for that data object. This is the “base” chunk from which other chunks will be subselected._count_selection(self, dobj, sub_objects)
: this must count and return the count of cells within a given data object._chunk_io(self, dobj)
: this function should yield a series ofYTDataChunk
objects that have been ordered and created to consolidate IO._chunk_spatial(self, dobj, ngz, sort = None, preload_fields = None)
: this should yield a series ofYTDataChunk
objects which have been created to allow for spatial access of the data. For grids, this means 3D objects, and for Octs the behavior is undefined but should be 3D or possibly a string of 3D objects. This is where ghost zone generation will occur, although that has not yet been implemented. Optionally, the chunk request can also provide a “hint” to the chunking system of which fields will be necessary. This is discussed below._chunk_all(self, dobj)
: this should yield a single chunk that contains the entire data object.
The only place that YTDataChunk
objects will ever be directly queried is
inside the _read_fluid_selection
and _read_particle_selection
routines,
which are implemented by the geometry handler itself. This means that the
chunks can be completely opaque external to the geometry handlers.
To start the chunks shuffling over the output, the code calls
data_source.chunks(fields, chunking_style)
. Right now only “spatial”, “io”
and “all” are supported for chunking styles. This corresponds to
spatially-oriented division, IO-conserving, and all-at-once (not usually
relevant.) The chunks function looks like this:
def chunks(self, fields, chunking_style, **kwargs):
for chunk in self.hierarchy._chunk(self, chunking_style, **kwargs):
with self._chunked_read(chunk):
self.get_data(fields)
yield self
Note what it does here – it actually yields itself. However, inside the chunked_read function, what happens is that the attributes corresponding to the size, the current data source, and so on, are set by the geometry handler (still called a hierarchy here.) So, for instance, execution might look like this:
for ds in my_obj.chunks(["Density"], "spatial"):
print ds is my_obj
print ds["Density"].size
The first line will actually print True, but the results from the second one will be the size of (for instance) the grid it’s currently iterating over. In this way, it becomes much easier to stride over subsets of data. Derived quantities now look like this:
chunks = self._data_source.chunks([], chunking_style="io")
for ds in parallel_objects(chunks, -1):
rv = self.func(ds, *args, **kwargs)
It chunks data off disk, evaluates and then stores intermediate results.
This is not meant to replace spatial decomposition in parallel jobs, but it is designed to enable much easier and mesh-neutral division of labor for parallelism and for IO. If we were to call chunk on an octree, it no longer has to make things look like grids; it just makes them look like flattened arrays (unless you chunk over spatial, which I haven’t gotten into yet.)
Essentially, by making the method of subsetting and striding over subsetted data more compartmentalized, the code becomes more clear and more maintainable.
Field Preloading¶
A common problem with the current chunking system is the problem of preloading for data access for spatial fields. For instance, inside the field generation system, this construction is used:
for io_chunk in self.chunks([], "io"):
for i,chunk in enumerate(self.chunks(field, "spatial", ngz = 0)):
At this point in the system, a single field is being generated and all of the dependencies for that field can be calculated using _identify_field_dependencies, but this is not done. The chunking will first break into IO chunks, and then iterate over those chunks in a spatial chunk. This results in IO not being conducted on the IO chunks, but instead on each individual spatial chunk. For octree datasets, this is not typically that bad, as a spatial chunk there can consist of many items. However, for patch-based datasets (particularly Enzo and the current FLASH implementation) this results in far more fine-grained IO access than we want. As an example, this would not allow any batching of IO inside HDF5 files, despite already ordering the access to the spatial data in that appropriate order. When depositing particles in Enzo, for instance, this results in a single access to every single grid for each particle deposition operation.
For non-spatial fields, IO chunking is typically quite effective and appropriate for patch datasets.
To remedy this, we need to construct a language for preloading within an IO
chunk. This would necessitate the creation of a _field_cache
attribute
on DataContainer
, which would be populated inside the _chunk_io
loop,
if hinting is available. _read_fluid_fields
and _read_particle_fields
would then inspect the chunk they are passed, and for any fields that are
requested, if they are inside the _field_cache
dict (or dict subclass)
those values would be returned. This is managed by the _activate_cache
method.
This would change the loop above to look something like this:
field_deps = self._identify_dependencies(field)
for io_chunk in self.chunks([], "io"):
for i,chunk in enumerate(self.chunks(field, "spatial", ngz = 0,
preload_fields = field_deps)):
This should result in much more efficient IO operations as IO for spatial fields will be able to be consolidated. As they are currently implemented, Octrees would likely not need this improvement, and so they will not need to have this implemented. However, all frontends may ultimately benefit from this, as it could trivially be extended to keep all data resident in memory for situations where many passes over a small amount of data are necessary.
Backwards Compatibility¶
This system changes how data objects access data, and so this may ultimately
result in differences in results (due to floating point error). Additionally,
any code that relies on access of the _grids
attribute on data objects will
be broken.
All Octree code will need to be updated for 3.0. All frontends for grids will need to be updated, as this requires somewhat different IO systems to be in place. Updating the grid patch handling will require minimal code change.
Ghost zones have been implemented, but will require further study to ensure that the results are correctly being calculated. Ghost zone-requiring fields are progressing.
To accommodate situations where data objects or processing routines (not
derived fields) require information about the shape, connectivity and masking
of data, a blocks
attribute has been implemented. This attribute will
yield masks of data and 3D-shaped data containers, enabling most old
grids-using routines to work. By focusing on blocks of data rather than grids,
we emphasize that these may be of any size, and may also be generated rather
than code-inherent data.
Alternatives¶
The main alternative for this would be to grid all data, as is done in 2.x. I believe this is not sustainable.
YTEP-0002: Profile Plotter¶
Abstract¶
Created: December 5, 2012 Author: Matthew Turk
This YTEP describes a profile plotting solution, in the style of the
PlotWindow, to replace the functionality of the methods on the
PlotCollection
that create profiles. It should have sane defaults, a
restricted set of functionality, and should make accessing the underlying
matplotlib axes object very easy.
The method proposed needs to meet several competing needs. It should accept objects, it should “do the right thing” for auto-creating profiles, it should provide access to
Status¶
Completed
The code can be seen in yt/visualization/profile_plotter.py
, specifically
the objects:
PhasePlotter
ProfilePlotter
Project Management Links¶
Detailed Description¶
Motivation¶
The PlotCollection
provides an easy way to quickly make profiles and phase
plots, both from spheres that it will create and from objects. However, it
suffers from a number of deficiencies:
- Accessing the axes objects from matplotlib is non-trivial
- Plotting multiple profiles on the same axes object is non-trivial
- The profile plotter tries to do too many things, and in doing so does not do anything particularly well
- Profile plots can’t be pickled independently of the data
As such, by including this new approach, we will match the functionality from these routines:
add_profile_sphere
add_profile_object
add_phase_sphere
add_phase_object
in a way that will enable access to the underlying plots, pickling of plots, and also easier overplotting and multi-plotting.
Implementation¶
The implementation will need to:
- Accept a data object and set sane defaults for calculating a profile
- Not add more than one field to the resulting
BinnedProfile
object; only one will be added at any time, and additional fields will be left up to the user- Include a standalone
Plot
-type object that contains all of the necessary data to create a representation of the data in Matplotlib.- Deposit the plot into an existing
Axes
orFigure
object.- Provide simple methods of translating the initial Profile data into the matplotlib object, and allow the user to modify the visualization as she sees fit.
The current implementation contains this class structure:
ProfilePlotter
¶
profile
: property, the BinnedProfile1Dscale
: “log” or “linear”, the scale of the profile’s y-axis (x-axis is set by the bins in the profile)_current_field
: property, the name of the field currently selected_setup_plot
: Called to create a new plot specification (not a Matplotlib plot) and usually called in the background
PhasePlotter
¶
profile
: property, the BinnedProfile2Dscale
: “log” or “linear”, the scale of the profile’s y-axis (x-axis and y-axis are set by the bins in the profile)_current_field
: property, the name of the field currently selected_setup_plot
: Called to create a new plot specification (not a Matplotlib plot) and usually called in the background
AxisSpec
¶
This specifies the description (title, bounds, scale, ticks) of an Axis. It is typically used without specifying ticks, which are used only by the ExtJS widgets.
ColorbarSpec
¶
cmap
: property, stringdisplay
: property, true or false
PlotContainer
¶
Used for profile plots
x_spec
: property, AxisSpecy_spec
: property, AxisSpecx_values
: property, numpy arrayy_values
: property, numpy arrayto_mpl
: routine that accepts either (Axes), (Figure), or None. Returns the figure and axes after plottin.
ImagePlotContainer
¶
Used for phase plots
x_spec
: property, AxisSpecy_spec
: property, AxisSpecimage
: property, numpy array (2D)cbar
: property,ColorbarSpec
instanceto_mpl
: routine that accepts either (Axes), (Figure) or None. Returns the figure and axes after plotting.
The indirection enables the user to pickle the plot, without storing the data. But it comes at the price of clarity.
This is currently implemented, and plots returned for the most part to not encourage or allow a substantial amount of modification or fiddling. I think this is okay, as it will very easily allow users to plot multiple lines into the same axes, for instance. However, it does less hand-holding.
Open Questions¶
- Should the
PhasePlotter
andProfilePlotter
objects be refactored to be explicitly favoring matplotlib? The PlotWindow does this, and it is successful.- Should we allow users to dynamically switch fields on the fly?
- Should the desire for pickling of objects be given up in favor of a cleaner and simpler class structure? (i.e., getting rid of the Specification objects.)
Backwards Compatibility¶
We will not remove the existing functionality from PlotCollection
. So
there are no backwards compatibility issues.
Alternatives¶
Alternately, we could provide nothing, and encourage users to create their own
BinnedProfile1D
plots. Or, we could provide a much more specific-to-MPL
alternative.
YTEP-0003: Standardizing field names¶
Abstract¶
Created: December 11, 2012
Author: Casey Stark, Nathan Goldbaum, Matt Turk
Let’s clean up field names in yt, ex “SoundSpeed” -> “sound_speed”. The proposed work will serve to remove Enzo-isms and encourage more consistent field names across frontends.
Status¶
Completed
Project Management Links¶
yt 3.0 goals doc https://docs.google.com/document/d/17Q-rbmTj9PyaTgtN1h6C8vqoWeIZjw_OjFbQp8L3Tkg/edit
Universal field names doc https://docs.google.com/document/d/1Qbt6z27S8VWh8h0kOx–ZYih4BDFAgGlH9pE55I51So/edit
Detailed Description¶
Background¶
The universal field names are PascalCased, while lowercase_underscored names are more standard in Python.
Solution¶
The field names must be updated to lowercase_underscored format. This means all of the common field names, frontend-specific field names, and all references to field names elsewhere in the code base need to be updated.
Additionally, we will take the opportunity to reorganize the field definitions.
Currently all of the field definitions live in the monolithic
universal_fields.py
file, which in yt 2.6 is just shy of 1700 lines long.
In the reference implimentation, we have created a new top-level namespace,
yt.fields
. The fields
module is container for a number of submodules
that contain field definitions or macros to create classes of field
definitions. Currently, the following submodules are attributes of yt.fields
:
angular_momentum
- Gas and particle angular momentum fields and field creation macros.
fluid_fields
- Useful fields that are simple functions of the primitive variables in a hydrodynamic simulation. cell_mass, sound_speed, pressure, entropy, and kinetic_energy all live here.
geometric_fields
- Fields that are functions only of the geometry of the data. This includes radius, x, y, z, curvilinear coordinat fields, zeros, ones, and grid_level.
magnetic_fields
- Derived fields useful for analyzing simulation that include magnetic fields.
particle_fields
- Particle derived fields, including all non-local particle deposition fields and local fields that are functions of primitive particle properties.
species_fields
- Fields for chemical species. This is currently a stub.
vector_operations
- Fields that are the gradient, divergence or curl of another field as well as macros for setting up these fields.
universal_fields
- Fields that have so far not been assigned to one of the other field submodules. Eventually this module will be removed once the remaining fields have been reorganized.
Additionally, the fields
module now contains the logic for field detection
as well as the DerivedField class.
Testing¶
The existing unit testing framework as well as an updated set of field detection tests. If there are additional fixes required in untested parts of the code base, we will update them as we find them.
Backwards Compatibility¶
We will provide a compatibility layer, allowing yt to map from the old field names to the new ones. This compatibility layer will not be enabled by default.
Right now, one can turn it on via the _setup_deprecated_fields function:
import yt
ds = yt.load('IsolatedGalaxy/galaxy0030/galaxy0030')
ds._setup_deprecated_fields()
In the future we might also enable this via a config parameter.
Field Names¶
- Naming rules:
- must be lowercase underscored.
- should be as verbose as possible.
- should not include units.
- vector fields are separated into 4 scalar fields with endings _x, _y, _z, and _magnitude.
This is a listing of all field names currently defined in the yt-3.0 universal_fields module and the proposed replacement names.
Current name | Proposal |
---|---|
GridLevel | grid_level |
GridIndices | grid_indices |
OnesOverDx | ones_over_dx |
Ones | ones |
CellsPerBin | cells_per_bin |
SoundSpeed | sound_speed |
RadialMachNumber | radial_mach_number |
MachNumber | mach_number |
CourantTimeStep | courant_time_step |
ParticleVelocityMagnitude | particle_velocity_magnitude |
VelocityMagnitude | velocity_magnitude |
TangentialOverVelocityMagnitude | tangential_over_velocity_magnitude |
Pressure | pressure |
Entropy | entropy |
sph_r | spherical_r |
sph_theta | spherical_theta |
sph_phi | spherical_phi |
cyl_R | cylindrical_r |
cyl_RCode | remove (unit duplicate) |
cyl_z | cylindrical_z |
cyl_theta | cylindrical_theta |
DiskAngle | remove (replaced by theta) |
Height | remove (replaced by z) |
HeightAU | remove (unit duplicate) |
cyl_RadialVelocity | cylindrical_radial_velocity |
cyl_RadialVelocityABS | cylindrical_radial_velocity_absolute |
cyl_RadialVelocityKMS | remove (unit duplicate) |
cyl_RadialVelocityKMSABS | remove (unit duplicate) |
cyl_TangentialVelocity | cylindrical_tangential_velocity |
cyl_TangentialVelocityABS | cylindrical_tangential_velocity_absolute |
cyl_TangentialVelocityKMS | remove (unit duplicate) |
cyl_TangentialVelocityKMSABS | remove (unit duplicate) |
DynamicalTime | dynamical_time |
JeansMassMsun | jeans_mass (possible unit change) |
CellMass | cell_mass |
CellMassMsun | remove (unit duplicate) |
CellMassCode | remove (unit duplicate) |
TotalMass | total_mass |
TotalMassMsun | remove (unit duplicate) |
StarMassMsun | star_mass (possible unit change) |
Matter_Density | matter_density |
ComovingDensity | comoving_density |
Overdensity | overdensity |
DensityPerturbation | density_perturbation |
Baryon_Overdensity | baryon_overdensity |
WeakLensingConvergence | weak_lensing_convergence |
CellVolumeCode | remove (unit duplicate) |
CellVolumeMpc | remove (unit duplicate) |
CellVolume | cell_volume |
ChandraEmissivity | chandra_emissivity |
XRayEmissivity | xray_emissivity |
SZKinetic | sz_kinetic |
SZY | szy |
AveragedDensity | averaged_density |
DivV | div_v |
AbsDivV | div_v_absolute |
Contours | contours |
tempContours | temp_contours |
SpecificAngularMomentumX | specific_angular_momentum_x |
SpecificAngularMomentumY | specific_angular_momentum_y |
SpecificAngularMomentumZ | specific_angular_momentum_z |
AngularMomentumX | angular_momentum_x |
AngularMomentumY | angular_momentum_y |
AngularMomentumZ | angular_momentum_z |
ParticleSpecificAngularMomentumX | particle_specific_angular_momentum_x |
ParticleSpecificAngularMomentumY | particle_specific_angular_momentum_y |
ParticleSpecificAngularMomentumZ | particle_specific_angular_momentum_z |
ParticleSpecificAngularMomentumXKMSMPC | remove (unit duplicate) |
ParticleSpecificAngularMomentumYKMSMPC | remove (unit duplicate) |
ParticleSpecificAngularMomentumZKMSMPC | remove (unit duplicate) |
ParticleAngularMomentumX | particle_angular_momentum_x |
ParticleAngularMomentumY | particle_angular_momentum_y |
ParticleAngularMomentumZ | particle_angular_momentum_z |
ParticleRadius | particle_radius |
Radius | radius |
RadiusMpc | remove (unit duplicate) |
ParticleRadiusMpc | remove (unit duplicate) |
ParticleRadiuskpc | remove (unit duplicate) |
Radiuskpc | remove (unit duplicate) |
ParticleRadiuskpch | remove (unit duplicate) |
Radiuskpch | remove (unit duplicate) |
ParticleRadiuspc | remove (unit duplicate) |
Radiuspc | remove (unit duplicate) |
ParticleRadiusAU | remove (unit duplicate) |
RadiusAU | remove (unit duplicate) |
ParticleRadiusCode | remove (unit duplicate) |
RadiusCode | remove (unit duplicate) |
RadialVelocity | radial_velocity |
RadialVelocityABS | radial_velocity_absolute |
RadialVelocityKMS | remove (unit duplicate) |
RadialVelocityKMSABS | remove (unit duplicate) |
TangentialVelocity | tangential_velocity |
CuttingPlaneVelocityX | cutting_plane_velocity_x |
CuttingPlaneVelocityY | cutting_plane_velocity_y |
CuttingPlaneBX | cutting_plane_bx |
CuttingPlaneBy | cutting_plane_by |
MeanMolecularWeight | mean_molecular_weight |
JeansMassMsun | remove (duplicate) |
particle_density | particle_density |
MagneticEnergy | magnetic_energy |
BMagnitude | b_magnitude |
PlasmaBeta | plasma_beta |
MagneticPressure | magnetic_pressure |
BPoloidal | b_poloidal |
BToroidal | b_toroidal |
BRadial | b_radial |
VorticitySquared | vorticity_squared |
gradPressureX | grad_pressure_x |
gradPressureY | grad_pressure_y |
gradPressureZ | grad_pressure_z |
gradPressureMagnitude | grad_pressure_magnitude |
gradDensityX | grad_density_x |
gradDensityY | grad_density_y |
gradDensityZ | grad_density_z |
gradDensityMagnitude | grad_density_magnitude |
BaroclinicVorticityX | baroclinic_vorticity_x |
BaroclinicVorticityY | baroclinic_vorticity_y |
BaroclinicVorticityZ | baroclinic_vorticity_z |
BaroclinicVorticityMagnitude | baroclinic_vorticity_magnitude |
VorticityX | vorticity_x |
VorticityY | vorticity_y |
VorticityZ | vorticity_z |
VorticityMagnitude | vorticity_magnitude |
VorticityStretchingX | vorticity_stretching_x |
VorticityStretchingY | vorticity_stretching_y |
VorticityStretchingZ | vorticity_stretching_z |
VorticityStretchingMagnitude | vorticity_stretching_magnitude |
VorticityGrowthX | vorticity_growth_x |
VorticityGrowthY | vorticity_growth_y |
VorticityGrowthZ | vorticity_growth_z |
VorticityGrowthMagnitude | vorticity_growth_magnitude |
VorticityGrowthMagnitudeABS | vorticity_growth_magnitude_absolute |
VorticityGrowthTimescale | vorticity_growth_timescale |
VorticityRadPressureX | vorticity_radiation_pressure_x |
VorticityRadPressureY | vorticity_radiation_pressure_y |
VorticityRadPressureZ | vorticity_radiation_pressure_z |
VorticityRadPressureMagnitude | vorticity_radiation_pressure_magnitude |
VorticityRPGrowthX | vorticity_radiation_pressure_growth_x |
VorticityRPGrowthY | vorticity_radiation_pressure_growth_y |
VorticityRPGrowthZ | vorticity_radiation_pressure_growth_z |
VorticityRPGrowthMagnitude | vorticity_radiation_pressure_growth_magnitude |
VorticityRPGrowthTimescale | vorticity_radiation_pressure_growth_timescale |
x-velocity | velocity_x |
y-velocity | velocity_y |
z-velocity | velocity_z |
Molecular and Atomic Species Names¶
Particular care must be taken to name molecular and atomic species in a way that is unambiguous as well as terse. We need to be able to resolve the following types of species:
- CO (Carbon monoxide)
- Co (Cobalt)
- OVI (Oxygen ionized five times)
- H2+ (Molecular Hydrogen ionized once)
- H- (Hydrogen atom with an additional electron)
The naming scheme we have decided upon is of the form MM[_[mp][NN]]
.
MM
is the molecule, defined as a concatenation of atomic symbols and
numbers, with no spaces or underscores. The second sequence is only required
if the ionization state is not neutral, and is of the form p
and m
to
indicate “plus” or “minus” respectively, followed by the number. Our examples
above would be CO
, Co
, O_p5
, H2_p1
, and H_m1
. Note that we
are not using an exclusively-lowercase convention here, as we did for the
other field names. The name El
will be reserved for electron fields, as it
is unambiguous and will not be utilized elsewhere. Additionally, the isotope
of 2H
will be included as D
.
Neutral ionic species (e.g. H I, O I) are represented as MM_p0
.
For backwards compatibility, neutral species will be mirrored to
MM_
fields, but this practice is deprecated as it is somewhat ambiguous
if we are referring to just the neutral species or all atoms of that type.
As an example, the neutral hydrogen density field will now be
called H_p0_density
, but it will be mirrored to the H_density
field for
backwards compatibility.
Finally, in those frontends which are single-fluid, we will define these fields for each species:
_fraction
_number_density
_density
_mass
This means that if a frontend only has color fields and species fields (as is
the most common case), it will have ("gas", "H2_fraction")
for instance.
Otherwise, for multi-fluid calculations (where gas
is joined by other
fields) the other fields will have their own mass and density and so on.
This will require some parsing, and initially we will only support those fields
we expect to find. Additionally, because different frontends define these
fields in different ways, we will detect which one is the output and define the
rest from that. For example, if the frontend finds a _density
field, the
rest will be computed as derived fields from that.
As a point of clarification, the El_density
currently defined for
Enzo is scaled with respect to the ratio of the electron to proton mass ratio.
This means that dividing it by m_h
will result in the number density.
Moving forward, this will not be the case. We will instead give correct
results for mass density when the alias is queried. The original name,
Electron_Density
, will still be defined the way it currently is, to
preserve access to the original, on-disk fields. (It will be able to be
converted to CGS, as well, and will not be scaled in doing so.)
To refer to the number density of the entirety of a single atom or molecule
(regardless of its ionization state), please use the MM_nuclei_density
fields, as opposed to MM_number_density
fields.
YTEP-0005: Octrees for Fluids and Particles¶
Abstract¶
Created: December 24, 2012 Author: Matthew Turk
In the yt 2.x series, octree AMR codes have largely been supported by re-gridding data to create larger grid patches consisting of both high-resolution data and coarse data. This has the overhead of requiring that each time a dataset (as in from RAMSES or ART) is loaded, the data has to be placed into these grids. This is an expensive process and requires a considerable amount of RAM. This YTEP describes the mechanism in yt 3.0 that directly accesses Octree data, avoiding the costly regridding step and enabling higher-fidelity data access. Additionally, it describes how the Octree data structure will be used for particle data access from datasets such as N-body or SPH simulation output.
Status¶
This YTEP is in progress. Most aspects have been implemented in yt 3.0. A major deficiency (described below) is the lack of a distributed memory octree. Discussion of distributed memory Octrees is reserved for a future YTEP.
Detailed Description¶
Here is where you should write detailed description of what the YTEP proposes. This needs to include:
- Background
- Nature of the problem
- Nature of the solution
- How will the solution be implemented * Brief outline of the code needed to implement this * Code examples of using the solution, in appropriate * How will the solution be tested?
- What are any stumbling points
- What is the proposed method for reaching out to the community about this?
Background¶
In the 2.x branch of yt, RAMSES and (NMSU) ART data are read and processed in a way that mocks up patch-based AMR data. This is sub-par for several reasons:
- A costly re-gridding step is required, where octs are deposited into grid patches that are split with some efficiency measure.
- To conduct IO, coarse grid cells are deposited multiple times into grid patches at finer levels. This results in extremely inefficient IO, as it means that if multiple fine grids overlap with a single cell at coarse resolution, that coarse cell will be read multiple times. It’s also very slow.
- The end result is data that is not exactly what is in the file, reducing the ability of individuals to examine data in a detailed way.
- The regridding code is difficult to parse and understand, and even harder to extend.
In addition to this, particle codes are simply not available in yt 2.x. All attempts to include them have involved a regridding method similar to that for Octree AMR codes, which is not efficient or high-fidelity. Finally, the RAMSES code is broken in 2.x.
Why is it this way?¶
The 2.x branch of yt is relatively inflexible in how data is accessed. There are a number of locations that the attributes grids or _grids are accessed, which are implicitly assumed to be grid patches with a relatively sizable extent. This is used in things like projections, data masking, and the like. For patch-AMR codes where the grids are actually somewhat larger, this is efficient; however, the overhead of python objects and iteration dominate if the grids are smaller than some minimum size or extent. The first implementation of support for RAMSES implemented Octs as grid patches by themselves; this was found to be unbearably slow.
To get around this, a regridding step was applied. This regridding step was based on refinement algorithms, where octs were deposited into grid patches that covered some fraction of the domain. These grid patches were then split to attempt to achieve some minimum efficiency ratio of “refined” (i.e., fine) versus “unrefined” (i.e., coarse) data. When IO was conducted, these were filled in a non-interpolating inverse cascade, where grid patches were filled with fine data, then coarser data. This could be very slow. Additional improvements such as restricting grid patches not to cross “domains” (the RAMSES term for individual files or domains of a specific processr) were eventually added. The NMSU ART data was also loaded in a similar way.
All of this is because yt 2.x relies on “grids” as the fundamental object. As described in YTEP-0001: IO Chunking, in yt 3.0 we no longer rely on grids as the object by which all IO is mediated. Data can now be streamed from disk to memory, and coordinates and resolution information can be seen as independent of that data. This allows octrees to exist without a regridding step.
Octree Implementation¶
The octree implementation is designed around having a full Octree which contains subsets of that octree that are distribute amongst different “domains.” The term “domain” comes from RAMSES, and it is best thought of as whatever the natural, IO-oriented subdivision of the data is. For instance, RAMSES divides into multiple files, each of which is called a domain. For purposes of consolidating IO costs, reading on a per-domain basis makes some sense. NMSU ART does not have the concept of multiple domains, and so we can choose to divide data into domains however we like.
Octrees can then be walked to identify which Octs, and then which cells, contribute to a given geometric selector. This can default back to selecting based on the point-by-point location of the Octs, but it can also be queried much more efficiently by early-terminating an octree traversal if a coarse node is not included inside a geometric selector.
This leads nicely to a future where subsets of the octree are not present on every processor; instead, portions can be passed around at will or pinned to specific processors. This is not yet in place, but the Octree has been designed to be forward compatible with this.
For RAMSES data (where the number of Octs is known before any are added to the
system), the octree is composed of a set of OctAllocationContainers
, one
for each domain, which are pre-allocated and include all of the Octs
themselves. Additionally, there is a base class OctreeContainer
and a
subclass RAMSESOctreeContainer
. The base class handles and exposes the
majority of methods for traversing the octree and querying the octree. The
subclass specifies how Octs
get added to the octree.
Octs are defined to have the following attributes:
- (
np.int64_t
)ind
– index into the localOctAllocationContainer
.- (
np.int64_t
)local_ind
– index into the global Octree container.- (
np.int64_t
)domain
– the domain to which an Oct belongs.- (
np.int64_t
)pos[3]
– the integer index, based on the local level’s refinement (i.e., the center divided by the local dx)- (
np.int8_t
)level
– the level of refinement of the Oct- (
ParticleArrays
)*sd
– this is optional, and a pointer to particle arrays. This is typically only used for N-body data and will otherwise be null.- (
Oct
)*children[2][2][2]
– Pointers to child nodes. Typically, ifany are null, all are null and the Oct is not refined. However, in ART simulations, the root mesh is defined in cells, rather than octs. This is mocked up in yt as a false mesh of Octs, and so thechildren
values can be either NULL (for a refined cell) or not, but may not be homogeneously refined.- (
Oct
)*parent
– an upward pointer, for easier traversal of the Octree.
Particle ad N-body data, which does not typically know the organization and
structure of the resultant Octree in advance, Uses the additional
ParticleArrays
class for storing particle data that will help govern
refinement. ParticleArrays
have enough data to decide where all of the
particles will go during a refinement. This has the downside of mandating that
the positions (but no other fields) of all particles in a simulation must, at
present, be held in memory. This is a key motivating factor in moving to a
distributed octree.
Particle arrays have the following attributes:
- (
Oct
)*oct
– the Oct to which this particle array belongs.- (
ParticleArrays
)*next
– the next particle array in sequence- (
np.float64_t
)**pos
– the array of positions for this particle array- (
np.int64_t
)*domain_id
– the domain ID (multiple domains mandates refinement in N-body data, as we do not want to span two domains in a single oct.)- (
np.int64_t
)np
– the number of particles here.
As noted above there are a number of downsides. Many of these will be simple to fix: for instance, IO right now is characterized by reading in large portions of octrees simultaneously. Furthermore, masks are passed around, although masks are likely an artifact that is no longer necessary (and larget than they need be.)
To add on support for a new Octree code, a subclass of OctreeContainer
must
be made (or RAMSESOctreeContainer
, if you would like to re-use the
OctAllocationContainer
logic) that implements the following routines:
add
– to add new octs to the octreecount
– for counting based on a selectoricoords
,ires
,fcoords
, and IO routines
Additionally, right now the domain subset code is general but not set into base classses. This is also necessary.
Future Work¶
- Generalize the multi-domain support to allow routines such as
icoords
to be applied generally rather than specifically only for each system of allocation.- Allow domains to be pinned to processors (distributed memory) and reduce the overhead for individual processors of storing the entire Octree mesh.
- Convert FLASH to use the Octree code.
- Generalize Octree support structures beyond RAMSES.
- Ensure that children can be independently refined.
Stumbling Blocks¶
- Spatial data and ghost zones is currently not implemented, and implementation may pose challenes. Part of the reason the implementation for patch-based codes is straightforward is that the arrays come back as 3D arrays, to which (for instance) stencils can be applied. However, for Octree data, we may need to move to returning 4D data to reduce the overhead of processing 10^3 arrays. This means (X,Y,Z,N) where the final dimension is all of the Octs. Retaining compatibility between We also do not want to read outside the domain if not necessary; for instance, RAMSES includes ghost zones in the domain file, even if they are active on a different processor. We should utilize this.
- Implementation requires a good deal of understanding of how other Octree codes are set up. We should improve readability and make this easier to use.
- Applying density estimators to particle codes is not yet implemented, and still somewhat unclear. The first implementation will use Voro++ and regions that have some fixed spatial growth affiliated with them. This will likely not be efficient.
Particle Codes¶
Particle codes are currently supported for reading and creating octree structures. This means that particles can be read in and Octree selection applied to them, where the Octree is refined after either reaching a critical particle count threshold in a given Oct or where an Oct spans multiple domains.
Backwards Compatibility¶
Volume rendering no longer works with Octree codes, and will require spatial data support to do so. Additionally, it may be the case that we need to move to a different method for spatial data analysis (X,Y,Z,N) which will require rewriting old scripts.
Alternatives¶
I do not believe there are currently credible alternatives to directly understanding Octree data structures in yt. I believe that while we may be able to improve the implemented system, other options such as grid patch conversions are not worthwhile. The particle code support, relying on Octrees for fast selection, could also be implemented using a kD-tree, which may speed the density estimation.
YTEP-0006: Periodicity¶
Periodicity needs to be dealt with in an explicit, rather than implicit, fashion.
Abstract¶
- Created: January 10, 2013
- Modified: January 29, 2013
- Author: Matthew Turk, Nathan Goldbaum
Periodicity in yt has been handled poorly in the past. Some objects and fields have been set to be periodic by default, but not all. This YTEP aims to define a mechanism by which fields and objects can query periodicity information and use it correctly.
Status¶
In progress. See pull request #410: http://bitbucket.org/yt_analysis/yt/pull-request/410
Project Management Links¶
- Mailing list discussion: http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/2012-December/002739.html
- Mailing list discussion: http://lists.spacepope.org/pipermail/yt-users-spacepope.org/2012-December/003194.html
- Issue: https://bitbucket.org/yt_analysis/yt/issue/484/fields-dont-know-about-periodic-boundary
- Pull request 410: http://bitbucket.org/yt_analysis/yt/pull-request/410
Detailed Description¶
Periodicity is a tricky business. By volume, the majority of simulations analyzed with yt are cosmology simulations, which are exclusively periodic. So objects such as ellipsoids, fields such as radius, and so on have all evolved to select data or regard data as wrapping around the edges of simulation boundaries.
However, all of these should only be periodic when it makes sense. This means that we need to have a method of marking a simulation as periodic and a method for applying this periodicity. For those situations where periodicity makes sense, it should either always be applied if the simulation is periodic, or never applied if it is not. I believe we should allow simulations to be periodic in any one of the three axes, but not necessarily all simultaneously; this may be overly complex. We explicitly do not support any type of domain-wrapping or boundary conditions more complex than simply wrapping around.
Affected Regions¶
Region of the Code | Type of Periodicity | Change? | Status? |
---|---|---|---|
Light cone | Implicit | No | N/A |
Halo finding | Implicit | No | N/A |
Light ray | Implicit | No | N/A |
EnzoFOF | Implicit | No | N/A |
FOF | Implicit | No | N/A |
Halo objects | Implicit | No | N/A |
Fixed Res Buffers | Explicit | Yes | Not done yet |
Multi-halo profiler | Implicit | No | N/A |
Radial column density | Implicit | Yes | See PR #410 |
Periodic regions | Explicit | Yes | Not done yet |
Spheres | Implicit | Yes | Not done yet |
Ellipsoids | Implicit | Yes | Not done yet |
Two-point functions | Implicit | Yes | Not done yet |
Clumps | Implicit | Yes | Not done yet |
Boolean regions | Implicit | Yes | Not done yet |
AMR kD Tree | Explicit | Yes | Not done yet |
Domain decomp | Implicit | Yes | Not done yet |
Radius | Implicit | Yes | Finished: PR 410 |
ParticleRadius | Implicit | Yes | Finished: PR 410 |
Covering grids | Implicit | Yes | Not done yet |
New Method¶
Two types of changes will be made. The first is to remove implicit periodicity and replace it with a check on the periodicity of the simulation. The second is to remove multiple definitions of objects or functions that operate either in periodic or non-periodic methods, and instead provide only one that self-distinguished. Some operations, such as anything that operates on cosmological simulations (which I reluctantly consider halo finding to do) can assume periodicity.
We need to take account of the following types of checks:
- Distance between two points
- Shortest path between two points (uncommon, can be special cased)
- Object inclusion/collision
- Selection of points
We will make the following changes:
- Create a
periodicity
property on allStaticOutput
objects. This will be a tuple of three booleans, indicating whether or not the simulations are periodic (and if they are, they must be periodic by one domain width). This has been implemented for most of the frontends in PR #410.- Remove all locally-defined periodicity functions in favor of the function
periodic_dist
inyt/utilities/math_utils.py
and checking theperiodicity
attribute. For situations where a purely euclidean distance is required, we also supplyeuclidean_dist
, which calculates the distance between two points without considering the domain boundaries. These two functions were finalized in PR #410 and currently live inmath_utils.py
.- Anything that applies periodic shifts to data for checks of inclusion should apply them exclusively through the
periodicity
attribute. For data selectors, we will have a two-step process: the data object will need to implement acheck_periodicity
function.- Everything that relies on
periodic_region
should instead rely onregion
which will include an option (default = True, which actually means) to check periodicity.- The
periodic_region
data object will, in 2.X, become a wrapper around the basic region object.
Backwards Compatibility¶
All operations that relied on implicit periodicity for datasets that cannot be identified as periodic will have different results.
Old results for non-periodic datasets that were incorrect will become correct.
YTEP-0007: Automatic Pull Requests’ validation¶
Abstract¶
Created: February 21, 2013 Updated: January 25, 2015 Author: Kacper Kowalik
This YTEP describes framework used to automatically run both unit and answer testing for incoming pull requests to main YT repository.
Methods proposed here need to be agnostic with respect to chosen continuous integration system.
Status¶
Completed
CI server is running at http://tests.yt-project.org
.
Scripts used in validation process are stored in http://bitbucket.org/xarthisius/yt-validation
.
Project Management Links¶
Detailed Description¶
Background¶
When a new pull request is issued there is no way to automatically test validity of proposed changes. Every author is forced to run the tests manually, which requires: knowledge of how to perform the tests, downloading required data, possessing significant free computing power. This is troublesome and results in tests being run less frequently than they are supposed to be. With implementation of this YTEP the responsibility of running the testsuite will be shifted from PR’s author to designated, automatic infrastructure.
Required Features¶
The CI infrastructure will need to:
- Constantly poll for incoming changes to main repository (or react to POST message if required API will become available)
- Run both unit and answer testsuite.
- Notify the author should the tests fail.
- Notify the author should the PR could not be cleanly merged.
- Inform people responsible for accepting PR that all tests have passed by sending mail to yt-svn@lists.spacepope.org or broadcasting on #yt irc channel.
- Results of tests should be publicly available.
Implementation Details¶
Each PR will pulled and merged with current tip in prepared docker container
that consists of yt’s dependencies: yt_analysis/devenv. Resulting container
will be tagged as yt_analysis/yt-PR#:commit_hash and will be available for
download from public docker registry for developers who wish to test changes
locally. Subsequently, yt_analysis/yt-PR#:commit_hash will be used to run both
unit and answer tests on CI server. If PR contains commits that modify or add
files to doc/ subdirectory in the main yt tree, full documentation build will
be performed and its result will be stored at http://tests.yt-project.org
.
Base container yt_analysis/devenv will be created using install_script.sh and updated every time changes to aforementioned script is merged to yt branch. Additional containers with development environment based on bleeding edge Linux distributions, such as: Gentoo, Debian Sid, Fedora Rawhide, will available and denoted by appropriate tag: :gentoo, :sid, :rawhide respectively. Testsuite will be run on those containers periodically in order to detect incompatibilities with newer versions of yt’s dependencies.
Dockerfiles for all containers will be a part of yt_analysis/yt repository.
Stumbling Blocks¶
CI described in this YTEP does not cover integration tests that should be performed on OSX or Windows, nor alternative installations using e.g. wheels, conda. Since CI server is a part of NCSA’s infrastructure, access allowing to modify existing and creating new tests will be restricted to people present in NCSA’s LDAP.
Backwards Compatibility¶
There are no backwards compatibility issues.
YTEP-0008: Release Schedule¶
Abstract¶
Created: February 21, 2013 Author: Matthew Turk Updated: November 3, 2021 Author: Clément Robert
The yt release schedule is somewhat dysfunctional in several ways. Release dates can be difficult to stick to, and merges to stable occur only after long periods. This results in bug fixes not propagating and increases the pressure on developers for a given “release,” as each release is seen as monumental rather than incremental. This YTEP describes a new mechanism for increasing the cadence of point releases as well as merging from the development branch into the stable branch.
Status¶
Proposed
Project Management Links¶
The Mercurial time based release plan, which has partially inspired this discussion, is available here: http://mercurial.selenic.com/wiki/TimeBasedReleasePlan .
Detailed Description¶
The yt release schedule is irregular. Here’s a table of the releases over time, along with the number of days since the most recent major (i.e., non-point) release. Depending on when this document was last updated, this may include both planned and historical releases.
Version | Release Date | Days Since Last |
---|---|---|
1.0.1 | 2008-10-25 | N/A |
1.5 | 2009-11-04 | 375 |
1.6 | 2010-01-22 | 79 |
1.6.1 | 2010-02-11 | N/A |
1.7 | 2010-06-27 | 156 |
2.0 | 2011-01-17 | 204 |
2.0.1 | 2011-01-20 | N/A |
2.1 | 2011-04-06 | 79 |
2.2 | 2011-09-02 | 149 |
2.3 | 2011-12-15 | 104 |
2.4 | 2012-08-02 | 231 |
2.5 | 2013-03-01 | 211 |
2.5.1 | 2013-03-31 | 30 |
2.5.2 | 2013-05-01 | 30 |
2.5.3 | 2013-06-03 | 33 |
2.5.4 | 2013-07-02 | 29 |
2.5.5 | 2013-08-23 | 52 |
2.6 | 2013-11-23 | 92 |
2.6.1 | 2013-12-03 | 10 |
2.6.2 | 2014-02-28 | 87 |
2.6.3 | 2014-07-23 | 145 |
3.0 | 2014-08-04 | N/A |
3.0.1 | 2014-09-01 | 28 |
3.0.2 | 2014-10-03 | 60 |
3.0.3 | 2014-11-03 | 89 |
3.1 | 2015-01-14 | N/A |
3.2 | 2015-07-24 | N/A |
3.2.1 | 2015-09-09 | 47 |
3.2.2 | 2015-11-13 | 65 |
3.2.3 | 2016-02-04 | 83 |
3.3.0 | N/A | |
3.3.1 | 2016-07-24 | 171 |
3.3.2 | 2016-10-26 | 94 |
3.3.3 | 2016-12-12 | 47 |
3.3.4 | 2017-02-13 | 63 |
3.3.5 | 2017-03-08 | 23 |
3.4 | 2017-08-11 | 156 |
3.4.1 | 2018-02-16 | 189 |
3.5 | 2018-10-16 | 242 |
3.5.1 | 2019-02-26 | 133 |
3.6 | 2020-04-11 | 410 |
3.6.1 | 2020-11-13 | 216 |
4.0 | 2021-07-06 | 235 |
4.0.1 | 2021-07-21 | 15 |
4.0.2 | 2022-02-01 | 195 |
In principle, a long release schedule is not a problem. However, what this results in is a reluctance to merge to the stable branch. This has two major side effects: it leads to many people working off of the development branch and it leads to a long time between bug fixes for individuals working off of the stable branch. The development branch, despite its name, is quite stable – however, this also means that when instabilities (or API changes) are introduced in the development branch, it can be much more disruptive.
What Constitutes a Release¶
The majority of development in the primary yt repository is stable. Seldom are backwards-incompatible changes introduced, nor functionality broken. This is helped by continuous integration and detailed code review. As such, for the most part, yt is in a constant state of “release.”
For the purposes of this document, a “release” constitutes five things:
- A new build of the documentation with API and cookbook is placed in a long-term container.
- The development branch (
yt
) is merged to the stable branch (stable
)- A new tag in the version control history
- An upload of the source code to PyPI ( https://pypi.org/ )
- An new entry on conda-forge ( https://anaconda.org/conda-forge/yt )
- An announcement email (to
yt-users
for minor releases and more broadly for major releases)- For “bugfix”-level releases, changes should be backported to a dedicated branch.
Release Managers¶
The release manager for minor releases will be Matthew Turk, as they will only
be announced to yt-users
. For major releases, a new release manager will
be selected by consensus in the yt-dev
community. Merging, tagging and
uploading will be handled by Matthew Turk, but the release manager will act as
“whip” to ensure the necessary documentation building is done. Additionally,
this release manager will write the release notes and send the email to various
mailing lists.
Version | Release Manager |
---|---|
2.5 | John ZuHone |
2.5.1 | Matthew Turk |
2.5.2 | Matthew Turk |
2.5.3 | Matthew Turk |
2.6 | Kacper Kowalik |
2.6.1 | Matthew Turk |
2.6.2 | Matthew Turk |
2.6.3 | Matthew Turk |
3.0 | Matthew Turk |
3.1 | John Zuhone |
3.2 | Britton Smith |
3.2.1 | Nathan Goldbaum |
3.2.2 | Nathan Goldbaum |
3.2.3 | Nathan Goldbaum |
3.3.0 | Nathan Goldbaum |
3.3.1 | Nathan Goldbaum |
3.3.2 | Nathan Goldbaum |
3.3.3 | Nathan Goldbaum |
3.3.4 | Nathan Goldbaum |
3.3.5 | Nathan Goldbaum |
3.4.0 | Nathan Goldbaum |
3.4.1 | Nathan Goldbaum |
3.5.0 | Nathan Goldbaum |
3.5.1 | Nathan Goldbaum |
3.6.0 | Madicken Munk |
3.6.1 | Madicken Munk |
4.0.0 | Madicken Munk |
4.0.1 | Madicken Munk |
4.0.2 | Matthew Turk |
Backwards Compatibility¶
This should have no backwards-incompatible changes.
Alternatives¶
One alternative would be to forego release numbers and move to completely continuous integration. Another would be to continue on our current path.
YTEP-0009: AMRKDTree for Data Sources¶
Abstract¶
Created: February 28, 2012 Author: Sam Skillman
This proposal outlines the changes (functional and API) necessary for the ability to render volumes using arbitrary data sources. This will still operate with the idea of grids and masks. However, this should lead as a stepping stone to non-grid-based rendering.
Status¶
Status should be one of the following:
- Implemented in yt-3.0 PR77
YTEPs do not need to pass through every stage.
Project Management Links¶
Currently the development of this capability is in the camera-refactor bookmark at: https://bitbucket.org/samskillman/yt/commits/all/tip/..bookmark%28%22camera-refactor%22%29
There is a Camera refactor YTEP-0010 that is closely related to the AMRKDTree functional changes suggested in this YTEP.
This has been implemented as of: https://bitbucket.org/yt_analysis/yt-3.0/pull-request/77/add-legitimate-source-rendering-using-a/diff
Detailed Description¶
Functional Background:
The volume rendering has long operated on either the entire domain or (at best)
a sub-rectangular-prism using left and right edges. This is fairly limiting in
that a user may not need (or want) to render the entire volume, and may want
to restrict the volume shown by something other than a single box. A majority
of the reasoning behind the recent AMRKDTree
was to allow for more generic
adding of grids/data to the homogenized volume.
The primary problem with attempting to do this is that the volume rendering acts on a rectangular brick of data, and the traversal of this brick is fairly far removed from a AMR3DData object. Therefore, we either need to modify the traversal, or somehow mask out the data being handed to the traversal.
The latter approach can be accomplished using the following code:
def _source_mask(field, data):
return 1.0*self.source._get_cut_mask(data)
self.pf.field_info.add_field('source_mask', function=_source_mask, take_log=False)
then when creating the vertex centered data:
mask = grid.get_vertex_centered_data('source_mask',smoothed=False,no_ghost=self.no_ghost).astype('float64')
mask = np.clip(mask, 0.0, 1.0)
mask[mask<1.0] = np.inf
for i,field in enumerate(self.fields):
vcd = grid.get_vertex_centered_data(field,smoothed=True,no_ghost=self.no_ghost).astype('float64')
vcd = vcd*mask
However, this approach is full of quirks since what we really want is to mask
out an entire cell, and not some set of vertices. Therefore, we propose to
modify the PartitionedGrid
object to include an integer mask that is used
during traversal to mask out the individual cells.
API Background:
The method for specifying a data source in a volume rendering has not been
suggested. Currently, the rectangular volume that is used for rendering uses
a le
and re
pair of keywords to specify the left and right edges. When
moving to a data source, we should simplify the volume selection by simply
supplying a data_source=
keyword.
Changes to the Camera interfaces suggested in YTEP-0010 will handle the move to using a data_source.
In my working solution, I have set the AMRKDTree __init__ function to have the following form:
class AMRKDTree(ParallelAnalysisInterface)
def __init__(self, pf, min_level=None, max_level=None, data_source=None):
Backwards Compatibility¶
This YTEP breaks the following backwards compatibility:
- AMRKDTree API
It will additionally break internal uses of the API for the Camera, other cameras inheriting the __init__ of Camera, and the AMRKDTree.
Alternatives¶
- Do nothing
- Add more keyword arguments to everything
- Wait until rendering is ready in yt-3.0, which will also likely demand a breakage of API.
After discussion, it was found to be easiest to only implement in yt-3.0, as it is increasingly difficult to manage the two versions and how they handle grids. Since in yt-3.0 there are explicit mask objects in the pf.h.blocks generator, it was significantly easier than expected to implement the mask on the PartitionedGrid object. I’m also hopeful that this simplification is along the same lines of the idea of a simplified volume rendering scene object, as outlined in YTEP-0010.
YTEP-0010: Refactoring for Volume Rendering and Movie Generation¶
Abstract¶
Created: March 3, 2013 Author: Cameron Hummels
This YTEP describes significant modifications of the camera infrastructure to enable more focus on scenes, camera paths, and movies, while still retaining functionality for individual images.
Status¶
Open to changes through pull requests.
Project Management Links¶
This integrates directly with YTEP-0009 currently in pull request at: https://bitbucket.org/yt_analysis/ytep/pull-request/11/data-source-rendering-camera-refactor
Detailed Description¶
Background¶
Visualization of data is one of the primary reasons why people use yt. yt’s visualization capabilities are quite advanced, particularly in generating single images of a simulated volume. However, the tools for using the camera objects are complicated and difficult to use to generate movies of anything beyond simple camera paths around single simulation outputs. This is understandable based on the way the camera object code built up organically over the last few years, but a refactor of this code could dramatically simplify the steps for generating complex movies.
Here is a rough algorithm of how to create a rendering under the current system:
- The user chooses a method for breaking up the region to be rendered, either a Homogenized Volume, or a kd-tree. Homogenized volumes can be re-used from rendering to rendering, thereby saving time in re-rendering the same volume from different perspectives, as well as allowing the user to define an arbitrary geometric object to act as the rendering volume. On the other hand, the kd-tree is generally faster for individual renderings, but cannot currently be re-used from rendering to rendering, and does not allow the user to specify a subregion to render beyond an AMRRegion object (i.e. box).
- The user must explicitly choose whether she wants to do a volume rendering , or if she wants to do an off-axis projection, as each of these two options is a different camera class. There can be no mixing between these classes–once this is chosen, the user is locked in.
- The user must explicitly define the central focus point of the image to be rendered, along with the ‘width’ of the image (thereby defining the extent to be rendered), the normal vector of the from which the camera will render, and the resulting resolution of the final image.
- The user must explicitly define a transfer function to be used in the case of the volume rendering, and it is generally non-intuitive as to how to get this correct a priori.
Here is a sample script for generating a volume rendering under the current system taken from the docs. Note how much has to be done prior to actually rendering an output image.
>>> from yt.mods import *
>>> pf = load("Enzo_64/DD0043/data0043")
>>> dd = pf.h.all_data()
>>> mi, ma = dd.quantities["Extrema"]("Density")[0]
>>> tf = ColorTransferFunction((np.log10(mi)+1, np.log10(ma)))
>>> tf.add_layers(5, w=0.02, colormap="spectral")
>>> c = [0.5, 0.5, 0.5]
>>> L = [0.5, 0.2, 0.7]
>>> W = 1.0
>>> Npixels = 512
>>> cam = pf.h.camera(c, L, W, Npixels, tf)
>>> cam.snapshot("%s_volume_rendered.png" % pf, clip_ratio=8.0)
Problem¶
Here we note some of the shortcomings of the current camera implementation:
- Too much overhead in generating a simple rendering for the end user. Needs helper functions to use sensible defaults so the user must only call one or two commands before generating a rendering.
- Too much complexity in the camera object constructor due to a large number of parameters and keyword options.
- Volume renderings and off-axis projections are too distinct from each other.
- Not enough focus on time series or persistence of a camera from from one rendering to another for generating movies.
- Cameras are defined in a somewhat counterintuitive manner, rather than focusing on the camera as being in a physical location in a volume and moving it around that volume.
- Not enough integration of particle data in camera renderings.
- Minimal ability to move a camera in a complex path through a volume beyond simple rotations, pans, and zooms.
Proposed Solution¶
We propose to break up the camera infrastructure into a few different classes to enable more transparency and usability of this important functionality.
- Make cameras just cameras. They should be very lightweight, should be situated in a scene, and should not contain references to the volumes.
- Add a “scene” object which then contains components like data sources (i.e. volumes, streamlines, particles), cameras, transfer functions, etc. The scene remains a structure for modifying the underlying components in that scene throughout the duration of the scene.
- Make a camera a reusable object for a given movie which can be modified in virtually any way (location, transfer function, underlying pf) through a series callback functions, or modifying the scene object directly.
- Remove the homogenized volume method for generating volume renderings and make the kd-tree method handle all functionality that homogenized volumes provided (e.g. reusability, usability on an arbitrary geometric object – see ytep 0009).
- Integrate all current camera classes into a single camera class, so we don’t have separate classes for volume renderings, projections, stereoscopic renderings, HEALpix renderings, etc.
- Make the scene understand how to traverse from point A to point B in a complex way by designating keyframes where you constrain the exact rendered image (position/orientation of camera, state of transfer function, data source for rendering, etc.) and having the scene figure out a smooth transition between these keyframes.
- Remove a ridiculous amount of complexity from the Camera and Volume objects by stripping out a large number of variables from the constructors.
- KDTrees should be built for the volume active at any time for easy reusability in future frames (e.g. by moving the camera or changing the transfer function). If the underlying data source changes, then the old kdtree is purged and a new one for that new data source is constructed. This will dramatically reduce overhead on rendering the same volume from different perspectives.
- By default, when one defines a Scene object from a single datadump, it sets the Timeline object to 1 output frame, whereas if one defines a Scene object from a TimeSeries, it adds keyframes for each pf in that TimeSeries uniformly across the Timeline object.
In short, we propose that by reducing complexity of individual objects and splitting them in to multiple objects, we can reduce the complexity of individual operations by adding in a slightly larger set of objects that are more flexible.
- New classes:
- Scene
- Meant to be the main class for dealing with volumetric visualization. It is constructed using a static output instance, which it uses to set up a default camera based on domain extents. It also instantiates a list of objects to be rendered, which include RenderSource instances for volume rendering and streamlines.
- RenderSource
- Base class for rendering types. This can be (minimally) volumetric data for volume rendering, path data for streamlines, point data for particles, and other yet to be determined data types.
- Camera
- A lightweight camera representing the location and orientation of the camera. This can be specified in a number of ways, but to uniquely define it, we need position of camera, pointing vector, and an optional north vector (which is used to determine the image “up” direction which specifies the image “up” direction).
- Timeline
- The timeline object represents how the scene changes with time. It is valid from t=0 to t=1, but this can be mapped on to any number of output frames during the render. One can modify the Timeline object by setting events such as keyframes to change the underlying scene components at any point in the timeline.
- CameraPath
- In dealing with movies, one can set key frames of where and in what orientation one wants the camera to be at certain times. A smoothing function (like a spline) can connect up these keyframes into a smooth camera path for application on the timeline.
In each of these following derived classes, the returned object from the __init__ function is an instance of the Scene class, capable of adding additional sources. These are meant to provide shortcuts
- Derived Classes:
- VolumeScene
- Inherits from Scene, sets up a scene with a volume rendering data source
- StreamlineScene
- Inherits from Scene, sets up a scene with streamlines data source
- ParticleScene
- Inherits from Scene, sets up a scene with particles data source
Sample Scripts for Proposed Infrastructure¶
Under the proposed changes, one could simply get a simple volume rendering by running this short script:
>>> from yt.mods import *
>>> pf = load("Enzo_64/DD0043/data0043")
>>> sc = VolumeScene(pf, 'Density')
>>> im = sc.render()
where the scene constructor uses helper functions to set up all of the default objects (volume, camera, timeline, transfer function) in order to use the entire volume, place a camera at 1.5*domain_right_edge pointing at domain_center and north vector (nx,ny,nz)=(0,0,1), make the timeline object number_of_frames=1, setting the transfer function to use the min/max of the volume and adding 4 isodensity contours.
The previous image can be grabbed using:
>>> im = sc.current_image
If one wanted to modify this scene prior to rendering, a series would allow the end user to change things through a series of callbacks:
>>> from yt.mods import *
>>> pf = load("Enzo_64/DD0043/data0043")
>>> sp = pf.h.sphere([0.5,0.5,0.5],100/pf['kpccm'])
>>> sc = VolumeRender(sp, 'Density')
### Change the camera position and orientation
>>> sc.camera.move(pos=[0,(100,'kpccm'),0], focus=[0,0,0], north=[0,0,1])
>>> sc.render()
In order to create a short movie making a rotation around the center from one side at 100 kpc out to the other side 100 kpc out while the simulation is evolving, one might run a script such as the following. It would automatically set the timeline to match the timeseries data with a framerate of 12 frames/sec.
>>> from yt.mods import *
>>> ts = TimeSeriesData.from_filenames("Enzo_64/DD????/data????")
>>> sc = scene(ts)
>>> keyframe_start = camera(pos = [0,1,0], point = [0,0,0], north = [0,0,1])
>>> keyframe_mid = camera(pos = [1,0,0], point = [0,0,0], north = [0,0,1])
>>> keyframe_end = camera(pos = [0,-1,0], point = [0,0,0], north = [0,0,1])
>>> sc.set_keyframe(time=0, camera = keyframe_start)
>>> sc.set_keyframe(time=0.5, camera = keyframe_mid)
>>> sc.set_keyframe(time=1, camera = keyframe_end)
>>> sc.timeline.set_num_frames(50)
>>> sc.render()
While all the prior examples are focused on handling a single data source at a time, a major goal of the refactor is to allow for the combination of data sources and data types, such as streamlines, particles, opaque planes, and annotations. We want to allow for the composure of a full scene containing many different sources.
For example,
>>> from yt.mods import *
>>> pf = load("Enzo_64/DD0043/data0043")
>>> sc = Scene(pf)
### Change the rendered volume to be a sphere of radius 100 kpc
>>> sp = pf.h.sphere([0.5,0.5,0.5],100/pf['kpccm'])
>>> vr_handle = sc.add_volume_rendering(sp)
### Here vr_handle is an instance of a VolumeRenderSource(RenderSource)
>>> vr_handle.transfer_function.clear()
>>> vr_handle.transfer_function.map_to_colormap(mi, ma, cmap='RdBu')
>>> streamlines = Streamlines(pf,...) # Create streamlines
>>> stream_handle = sc.add_streamlines(streamlines)
>>> stream_handle.set_opacity(0.1)
>>> stream_handle.set_radius((0.1,'kpc'))
>>> sc.add_particles(sp)
>>> particle_handler = sc.get_particle_handle()
>>> particle_handler.transfer_function.set_color_field('density')
>>> particle_handler.transfer_function.set_alpha(0.1)
>>> sc.render()
### Remove piece of the scene
>>> sc.toggle(vr_handle)
... # Type Tag Status
... VolumeRenderSource(density): vr_1 off
... Streamlines(velocity): sl_1 on
... Particles(density): pt_1 on
>>> sc.render()
>>> sc
... # Type Tag Status
... VolumeRenderSource(density): vr_1 off
... Streamlines(velocity): sl_1 on
... Particles(density): pt_1 on
>>> sc.toggle('vr_1')
... # Type Tag Status
... VolumeRenderSource(density): vr_1 on
... Streamlines(velocity): sl_1 on
... Particles(density): pt_1 on
What Needs to be Decided¶
- What should the syntax be for annotations (lines, boxes, orientation vectors)?
- How do we manipulate the Scene positions (positions of all non-spatial annotations)? For example, put the transfer function display over here.
- Probably many more things.
- How to handle the API of running in different parallel regimes (Image plane vs domain vs time-series vs …)
Backwards Compatibility¶
This will break all backwards compatibility with the pf.h.camera interface. We will attempt to keep as many of the useful modifications (pitch, roll, yaw, etc.) as similar as possible to ease the pain.
YTEP-0011: Symbol units in yt¶
Abstract¶
This YTEP describes adding symbolic units to yt
using the sympy package. The
main benefit is to make sure units are carried through calculations in a
transparent and intuitive manner. The new components are:
- a
Unit
class, which describes the dimensionality (powers of mass, length, time, temperature) and conversion factor of any unit.- a
UnitRegistry
class, which stores valid atomic unit symbols (ex: “g” for gram).- a
YTArray
class, a subclass of NumPyndarray
that attaches aUnit
object.- a
YTQuantity
class, a subclass of YTArray which is restricted to a single element (for handling scalars).
Status¶
Completed
Project Management Links¶
The code can be found in the unit refactor pull request:
https://bitbucket.org/yt_analysis/yt/pull-request/662/unit-refactor/diff#comment-897255
The latest work is in Matt Turk’s fork in yt
in the unitrefactor
bookmark:
https://bitbucket.org/MatthewTurk/yt/commits/branch/unitrefactor
The work is based on Casey’s dimensionful library:
Detailed Description¶
Background¶
The current system for units is functional but not terribly flexible.
All data are treated as scalars and it is up to the user to convert data from
CGS, which yt
uses internally, to their chosen unit system. A sample workflow
might look like this:
from yt.mods import *
from yt.utilities.physical_constants import mass_sun_cgs
pf = load('IsolatedGalaxy/galaxy0030/galaxy0030')
dd = pf.h.all_data()
mass = dd['CellMass']
print "Mass in CGS: ", mass
print "Mass in Solar Masses: ", dd['CellMass']/mass_sun_cgs
print "Mass in code units: ", dd['CellMass']/pf['Mass']
This model works well if a user always uses CGS units. If the user needs a quantity in a different unit system, they run into trouble. This is illustrated above in the example to convert to ‘solar mass’ units, since this isn’t a proper unit, the conversion isn’t stored inside the pf dict, so a user will either need to import the unit definition from yt, or add their own definition to their script. The situation is a little bit better for length conversions:
dx = dd['dx']
print "Cell dx in code units: ", dx
print "Cell dx in centimeters: ", dx*pf['Length']
print "Cell dx in megaparsecs: ", dx*pf['mpc']
This works pretty nicely, since all of the various length units are stored in the pf dictionary. However, this example illustrates another problem; here dx is returned in code units, while most quantities are returned in CGS. If we wanted to enforce that all quantities be returned in CGS, we would need to painstakingly go through the codebase, tweaking the field definitions and places where fields are used so that units are handled properly. Clearly, a better solution is needed.
Cosmological units are also handled in a somewhat ad hoc way. Each of the code frontends need to detect that a simulation was performed using comoving units, and define new scaled, comoving and scaled comoving units (i.e. ‘kpccm’, ‘kpchcm’ and ‘kpch’). This encourages duplication of code in each of the frontends and makes likely that different frontends will ignore some of the cosmological units that are defined in the Enzo frontend. In addition, this is not documented in the frontend docs, making it easier for newly supported codes to miss this. Cosmological units are also not labeled correctly in plots.
To ensure units to display nicely on plots, the unit definition is currently encoded as a raw string in LaTeX format:
add_field("MagneticEnergy",function=_MagneticEnergy,
units=r"\rm{ergs}\/\rm{cm}^{-3}",
display_name=r"\rm{Magnetic}\/\rm{Energy}")
This is harmful for readability and has the effect that user-defined or automatically generated fields are not assigned units.
Proposed Solution¶
We propose to handle units in a more automatic fashion, leveraging the symbolic
math library sympy. Instead of returning a NumPy ndarray
when users query
for fields, the __getitem__
selector on data objects will return a
YTArray
, a subclass of ndarray
. This preserves ndarray
’s array
operations, including deep and shallow copies, broadcasting, and views.
Additonally, YTArray
has a Unit
object attached to it that tracks units
associated with each value in the array. This is encoded in the __repr__
method of YTArray
:
>>> dd['density']
YTArray([ 4.92775113e-31, 4.94005233e-31, 4.93824694e-31, ...,
1.12879234e-25, 1.59561490e-25, 1.09824903e-24]) g/cm**3
YTArray
defines several user-visible member functions:
convert_to_units
Converts an array to a valid unit, specified by a string argument. Valid units possess the same
dimension
expression as the current unit.convert_to_cgs
Converts the array to CGS units.
in_units
Returns a copy of the array in a valid unit, specified by a string argument. Valid units possess the same
dimension
expression as the current unit.in_cgs
Returns a copy of the array in CGS units.
It’s important to remember that convert_to_cgs
and convert_to_units
do
in-place conversion of an existing array and in_units
and in_cgs
return
a copy of the original array in the new unit. This can get complicated if one
isn’t careful about the distinction between creating copies and references, as
illustrated in the following example:
>>> dens = dd['density']
>>> print dens
[ 4.92775113e-31 4.94005233e-31 4.93824694e-31 ..., 1.12879234e-25
1.59561490e-25 1.09824903e-24] g/cm**3
>>> dens.convert_to_units('Msun/pc**3')
>>> print dens
[ 7.27920765e-09 7.29737882e-09 7.29471191e-09 ..., 1.66743685e-03
2.35702085e-03 1.62231868e-02] Msun/pc**3
>>> dd['density'].in_units('Msun/pc**3')
YTArray([ 7.27920765e-09, 7.29737882e-09, 7.29471191e-09, ...,
1.66743685e-03, 2.35702085e-03, 1.62231868e-02]) Msun/pc**3
In the example above, if a user tries to query dd['density']
again, they
will find that it has been converted to solar masses per cubic parsec, since a
shallow copy, dens
, underwent an in-place unit conversion. In practice this
is not a big concern, since the unit metadata is preserved and the array values
are still correct in the new unit system, all numerical operations will still be
correct.
One of the nicest aspects of this new unit system is that the symbolic algebra for unitful operations is performed automatically by sympy:
>>> print dd['cell_mass']/dd['cell_volume']
[ 4.92775113e-31 4.94005233e-31 4.93824694e-31 ..., 1.12879234e-25
1.59561490e-25 1.09824903e-24] g/cm**3
>>> print dd['density']
[ 4.92775113e-31 4.94005233e-31 4.93824694e-31 ..., 1.12879234e-25
1.59561490e-25 1.09824903e-24] g/cm**3
YTArray is primarily useful for attaching units to NumPy ndarray
instances. For scalar data, we have created the new YTQuantity
class. In
the proposed implementation, YTQuantity
is a subclass of YTArray
with
the requirement that it is limited to one element. YTQuantity
is primarily
useful for physical constants and ensures that the units are propogated
correctly when composing quantities from arrays, physical constants, and
unitless scalars:
>>> from yt.utilities.physical_constants import boltzmann_constant
>>> print dd['temperature']*boltzmann_constant
[ 1.28901607e-12 1.29145540e-12 1.29077208e-12 ..., 1.63255263e-12
1.59992074e-12 1.40453862e-12] erg
With this new capability, we will have no need for fields defined only to handle
different units (e.g. RadiusCode
, Radiuspc
, etc.). Instead, there will
only be one definition and if a user needs the field in a different unit system,
they can quickly convert using convert_to_units
or in_units
.
When a StaticOutput
object is instantiated, it will its self instantiate and
set up a UnitRegistry
class that contains a full set of units that are
defined for the simulation. This is particularly useful for cosmological
simulations, since it makes it easy to ensure cosmological units are defined
automatically.
The new unit systems lets us to encode the simulation coordinate system and scaling to physical coordinates directly into the unit system. We do this via “code units”.
Every StaticOutput
object will have a length_unit
, time_unit
,
mass_unit
, and velocity_unit
attribute that the user can quickly and
easily query to discover the base units of the simulation. For example:
>>> from yt.mods import *
>>> ds = load("Enzo_64/DD0043/data0043")
>>> print ds.length_unit
128 Mpccm/h
Additionally, we will allow conversions to coordinates int the simulation
coordinate system defined by the on-disk data. Data in code units will be
available by converting to code_length
, code_mass
, code_time
,
code_velocity
, or any combination of those units. Code units will preserve
dimensionality: an array or quantity that has units of cm
will be
convertible to code_length
, but not to code_mass
.
On-disk data will also be available to the user, presented in unconverted code units. To obtain on-disk data, a user need only query a data object using an on-disk field name:
>>> from yt.mods import *
>>> ds = load("Enzo_64"/DD0043/data0043")
>>> dd = ds.h.all_data()
>>> print dd['Density']
[ 6.74992726e-02 6.12111635e-02 8.92988636e-02 ..., 9.09875931e+01
5.66932465e+01 4.27780263e+01] code_mass/code_length**3
>>> print dd['density']
[ 1.92588950e-31 1.74647714e-31 2.54787551e-31 ..., 2.59605835e-28
1.61757192e-28 1.22054281e-28] g/cm**3
Here, the first data object query is returned in code units, while the second is
returned in CGS. This is because Density
is an on-disk field, while
density
is a ‘standard’ yt
field. See YTEP-0003: Standardizing field names.
Unit labels for plots will be programatically generated. This will leverage the sympy LaTeX output module. Even though the field definitions will have their units encoded in plain text, we will be able to automatically generate LaTeX to supply to matplotlib’s mathtext parser.
Implementation¶
Our unit system has 6 base dimensions, mass
, length
, time
,
temperature
, metallicity
, and angle
. The unitless dimensionless
dimension, which we use to represent scalars is also technically a base
dimension, although a trivial one.
For each dimension, we choose a base unit. Our system’s base units are grams, centimeters, seconds, Kelvin, metal mass fraction, and radian. All units can be described as combinations of these base dimensions along with a conversion factor to equivalent base units.
The choice of CGS as the base unit system is somewhat arbitrary. Most unit
systems choose SI as the reference unit system. We use CGS to stay consistent
with the rest of the yt
codebase and to reflect the standard practice in
astrophysics. In any case, using a physical coordinate system makes it
possible to compare quantities and arrays produced by different datasets,
possibly with different conversion factors to CGS and to code units. We go into
more detail on this point below.
We provide sympy Symbol
objects for the base dimensions. The dimensionality
of all other units should be sympy
Expr
objects made up of the base
dimension objects and the sympy
operation objects Mul
and Pow
.
Let’s use some common units as examples: gram (g
), erg (erg
), and solar
mass per cubic megaparsec (Msun / Mpc**3
). g
is an atomic, CGS base
unit, erg
is an atomic unit in CGS, but is not a base unit, and
Msun/Mpc**3
is a combination of atomic units, which are not in CGS, and one
of them even has a prefix. The dimensions of g
are mass
and the cgs
factor is 1
. The dimensions of erg
are mass * length**2 * time**-2
and the cgs factor is 1
. The dimensions of Msun/Mpc**3
are mass /
length**3
and the cgs factor is about 6.8e-41
.
We use the UnitRegistry
class to define all valid atomic units. All unit
registries contain a unit symbol lookup table (dict) containing the valid
units’ dimensionality and cgs conversion factor. Here is what it would look
like with the above units:
{ "g": (mass, 1.0),
"erg": (mass * length**2 * time**-2, 1.0),
"Msun": (mass, 1.98892e+33),
"pc": (length, 3.08568e18), }
Note that we only define atomic units here. There should be no operations in
the registry symbol strings. When we parse non-atomic units like
Msun/Mpc**3
, we use the registry to look up the symbols. The unit system in
yt knows how to handle units like Mpc
by looking up unit symbols with and
without prefixes and modify the conversion factor appropriately.
We construct a Unit
object by providing a string containing atomic unit
symbols, combined with operations in Python syntax, and the registry those
atomic unit symbols are defined in. We use sympy’s string parsing features to
create the unit expression from the user-provided string. Here’s how this works
on Msun/Mpc**3
:
>>> from sympy.parsing.sympy_parser import parse_expr
>>> unit_expr = parse_expr("Msun/Mpc**3")
>>> from sympy.printing import print_tree
>>> print_tree(unit_expr)
Mul: Msun/Mpc**3
+-Symbol: Msun
| comparable: False
+-Pow: Mpc**(-3)
+-Symbol: Mpc
| comparable: False
+-Integer: -3
real: True
...
When presented with a new unit specification string, a new Unit
is created
by first decomposing the unit specification into atomic unit symbols. This may
require considering SI prefixes, which we allow for a whitelisted subset of
atomic unit symbols, listed in the table of unit symbols below. The Unit
instance is then created by combining a sympy expression for the unit and the
appropriate CGS factors, found by combining the CGS factors of the base unit and
optional SI prefixes.
Unit
objects are associated with four instance members, a unit
Expression
object, a dimensionality Expression
object, a
UnitRegistry
instance, and a scalar conversion factor to CGS units. These
data are available for a Unit
object by accessing the expr
,
dimensions
, registry
, and cgs_value
attributes, respectively.
Unit
provides the methods same_dimensions_as
, which returns True if
passed a Unit
object that has equivalent dimensions, get_cgs_equivalent
,
which returns the equivalent cgs base units of the Unit
, and the
is_code_unit
property, which is True
if the unit is composed purely of
code units and False
otherwise. Unit
also defines the mul
, div
,
pow
, and eq
operations with other unit objects, making it easy to
compose compound units algebraically.
The UnitRegistry
class provides the add
, remove
, and modify
methods which allows users to add, remove, and modify atomic unit definitions
present in UnitRegistry
objects. A dictionary lookup table is also attached
to the UnitRegistry
object, providing an interface to look up unit symbols.
In general, unit registries should only be adjusted inside of a code frontend,
since otherwise quantities and arrays might be created with inconsistent unit
metadata. Once a unit object is created, it will not recieve updates if the
original unit registry is modified.
We also provide a singleton default_unit_registry
instance that frontend
developers can copy and modify to build a simulation-specific unit symbol
registry.
The YTArray
class works by tacking a Unit
object onto an ndarray
instance. Besides the conversion methods already listed, most of the
implementation of YTArray depends on defining all possible ndarray
operations on YTArray instances. We want to preserve normal ndarray operations,
while getting the correct units on the resulting YTArray
(be it in-place or
a copy). The proper way to handle operations on ndarray subclasses is explained
in the NumPy docs page, subclassing ndarray. We follow this approach and
describe the desired behavior in the next section below.
The code for these new classes will live in a new top-level yt.units
package. This package will contain five submodules:
unit_lookup_table
Contains all static unit metadata used to generate the sympy unit system
unit_object
Contains the
Unit
classunit_registry
Contains the
UnitRegistry
classyt_array
Contains the
YTArray
andYTQuantity
classes.unit_symbols
Contains a host of predefined unit quantities, useful for applying units to raw scalar data.
Creating YTArray and YTQuantity instances¶
In the current implementation, there are two ways to create new array and quantity objects, via a constructor, and by multiplying scalar data by a unit quantity.
Class Constructor¶
The primary internal interface for creating new arrays and quantities is through
the class constructor for YTArray. The constructor takes three arguments. The
first argument is the input scalar data, which can be an integer, float, list,
or array. The second argument, input_units
, is a unit specification which
must be a string or Unit
instance. Last, users may optionally supply a
UnitRegistry instance, which will be attached to the array. If no
UnitRegistry is supplied, the default_unit_registry is used instead.
Unit specification strings must be algebraic combinations of unit symbol names,
using standard Python mathematical syntax (i.e. **
for the power function,
not ^
).
Here is a simple example of YTArray
creation:
>>> from yt.units import yt_array, YTQuantity
>>> YTArray([1, 2, 3], 'cm')
YTArray([1, 2, 3]) cm
>>> YTQuantity(3, 'J')
3 J
In addition to the class constructor, we have also defined two convenience
functions, quan
, and arr
, for quantity and array creation that are
attached to the StaticOutput
base class. These were added to syntactically
simplify the creation of arrays with the UnitRegistry instance associated with
a dataset. These functions work exactly like the YTArray
and YTQuantity
constructors, but pass the UnitRegistry
instance attached to the dataset to
the underlying constructor call. For example:
>>> from yt.mods import *
>>> ds = load("Enzo_64/DD0043/data0043")
>>> ds.arr([1, 2, 3], 'code_length').in_cgs()
YTArray([ 5.55517285e+26, 1.11103457e+27, 1.66655186e+27]) cm
This example illustrates that the array is being created using
ds.unit_registry
, rather than the default_unit_registry
, for which
code_length
is equivalent to cm
.
Multiplication¶
New YTArray and YTQuantity instances can also be created by multiplying YTArray
or YTQuantity instances by float or ndarray instances. To make it easier to
create arrays using this mechanism, we have populated the yt.units
namespace
with predefined YTQuantity
instances that correspond to common unit symbol
names. For example:
>>> from yt.units import meter, gram, kilogram, second, joule
>>> kilogram*meter**2/second**2 == joule
True
>>> from yt.units import m, kg, s, W
>>> kg*m**2/s**3 == W
True
>>> from yt.units import kilometer
>>> three_kilometers = 3*kilometer
>>> print three_kilometers
3.0 km
>>> from yt.units import gram, kilogram
>>> print gram+kilogram
1001.0 g
>>> print kilogram+gram
1.001 kg
>>> print kilogram/gram
1000.0 dimensionless
Handling code units¶
If users want to work in code units, they can now ask for data in code units, just like any other unit system. For example:
>>> dd["density"].in_units("code_mass/code_length**3")
will return the density field in code units.
Code units are tightly coupled to on-disk parameters. To handle this fact of
life, the yt
unit system can modify, add, and remove unit symbols via the
UnitRegistry
.
Associating arrays with a coordinate system¶
To create quantities and arrays in units defined by a simulation coordinate
system, we associate a UnitRegistry
instance with StaticOutput
instances. This unit registry contains the metadata necessary to convert the
array to CGS from some other known unit system and is available via
the unit_registry
attribute that is attached to all StaticOutput
instances.
To avoid repetitive references to the unit_registry
, we also define two new
member functions in the StaticOutput
base class, quan
and arr
.
These functions simply pass the appropriate unit_registry
object to the
YTQuantity
and YTArray
constructors, returning the resulting quantity or
array.
We have modified the definition for set_code_units
in the StaticOutput
base class. In this new implemenation, the predefined code_mass
,
code_length
, code_time
, and code_velocity
symbols are adjusted to
the appropriate values and length_unit
, time_unit
, mass_unit
,
velocity_unit
attributes are attached to the StaticOutput
instance. If
there are frontend specific code units, like MHD units, they should also be
defined in subclasses by extending this function.
Mixing modified unit registries¶
It becomes necessary to consider mixing unit registries whenever data needs to be compared between disparate datasets. The most straightforward example where this comes up is a cosmological simulation time series, where the code units evolve with time. The problem is quite general – we want to be able to compare any two datasets, even if they are unrelated.
We have designed the unit system to refer to a physical coordinate system based on CGS conversion factors. This means that operations on quantities with different unit registries will always agree since the final calculation is always performed in CGS.
The examples below illustrate the consistency of this choice:
>>> from yt.mods import *
>>> pf1 = load('Enzo_64/DD0002/data0002')
>>> pf2 = load('Enzo_64/DD0043/data0043')
>>> print pf1.length_unit, pf2.length_unit
128 Mpccm/h, 128 Mpccm/h
>>> print pf1.length_unit.in_cgs(), pf2.length_unit.in_cgs()
6.26145538088e+25 cm 5.55517285026e+26 cm
>>> print pf1.length_unit*pf2.length_unit
145359.100149 Mpccm**2/h**2
>>> print pf2.length_unit*pf1.length_unit
1846.7055432 Mpccm**2/h**2
Note that in both cases, the answer is not the seemingly trivial
. This is because the new quantity
returned by the multiplication operation inherits the unit registry from the
left object in binary operations. This convention is enforced for all binary
operations on two YTarray objects. In any case, results are always consistent
in CGS:
>>> print (pf1.length_unit*pf2.length_unit).in_cgs()
3.4783466935e+52 cm**2
>>> print pf1.length_unit.in_cgs()*pf2.length_unit.in_cgs()
3.4783466935e+52 cm**2
Handling cosmological units¶
We also want to handle comoving length units and the hubble little “h” unit. In
StaticOutput.set_units
, we implement this by checking if the simulation is
cosmological, and if so adding comoving units to the dataset’s unit registry.
Comoving length unit symbols are still named following the pattern “(length
symbol)cm”, i.e. “pccm”.
The little “h” symbol is treated as a base unit, h
, which defaults to unity.
StaticOutput.set_units
should update the h
symbol to the correct value
when loading a cosmological simulation.
LaTeX printing¶
We will make use of sympy’s LaTeX pretty-printing functionality to generate axis
and colorbar labels automatically for unit symbols. The LaTeX strings used for
atomic units are encoded in the latex_symbol_lut
. This is necessary because,
for the purposes of LaTeX representation, sympy interprets symbol names as if
they were algebraic variables, and so get displayed using an italic font. Since
our symbols represent units, we want to display them in a roman font and need
to wrap them in \rm{}
. New units do not need to be explicitly added to the
look-up-table, by default the LaTeX symbol will simply be the string name of the
unit, wrapped using \rm{}
.
Using these LaTeX representations of atomic unit symbols, we then use sympy to generate labels, composing the LaTeX expressions for compound units according to the algebraic relationships between the atomic unit symbols.
YTArray operations¶
When working interactively, it is important to make sure quick workflows are
possible. To this end, we want to make it possible to use our new dimensionful
operations while still leveraging the syntactic simplicity NumPy offers. We want
to avoid mandating that all user-defined data be a YTArray
or
YTQuantity
.
To this end, we define operations between native Python objects like float,
NumPy float, NumPy ndarray, and YTArray
. In the table below, we have
enumerated all combinations of YTArray
, scalar (native Python float or
np.float64), and ndarray
for binary operations. In most cases, unitful
operations are well defined, however in cases where the unitful operations are
not well defined, we raise a new exception, YTInvalidUnitOperation
.
Since NumPy defines in-place, left, and right versions of all mathematical operations (i.e. add, iadd, ladd, radd), we only list the ‘basic’ version of each operation, with the expectation the implemenation accounts for all four variants, which all have the same behavior with respect to passing units.
Operation | Combination | Result (pseudocode) |
---|---|---|
mul, div, truediv, floordiv | scalar, YTArray ndarray, YTArray | YTArray, units = input_units (op) 1
|
YTArray, YTArray | YTArray, units = left_units (op) right_units
|
|
add, sub | scalar, YTArray ndarray, YTArray | if YTArray is dimensionless:
return YTArray
|
YTArray, YTArray | if left_units same dimensions as right_units:
return YTArray, in left_units
else:
raise YTInvalidUnitOperation
|
|
pow | scalar, YTArray ndarray, YTArray |
if YTArray is dimensionless:
return scalar**YTArray
else:
raise YTInvalidUnitOperation
|
YTArray, scalar | return YTArray**scalar (note units change)
|
|
YTArray, ndarray | ||
YTArray, YTArray | ||
le, lt, ge, gt, eq | scalar, YTArray ndarray, YTArray | if YTArray is dimensionless:
return YTArray
else
raise YTInvalidUnitOperation
|
YTArray, YTArray | if left_units same dimensions as right units:
return left (op) (right in left units)
else:
raise YTInvalidUnitOperation
|
[1] | (1, 2) This one is a little tricky, since it is defined for ndarrays. Technically, it’s a well-defined unitful operation if the ndarray is the exponent. Unfortunately, this will make all the elements of the ndarray have different units, so we don’t allow it in practice. |
Now we list the behavior of unary operations on YTArray objects.
Operation | Result (pseudocode) |
---|---|
abs, sqrt neg | YTArray |
exp | if YTArray is dimensionless:
return exp(YTArray)
raise YTInvalidUnitOperation
|
Unit symbol names¶
In the table below we provide a listing of all units that are in the current implementation. We also list the dimensions of the unit, if the unit is in the whitelist to be prefixable with SI abbreviations, the dimensions of the unit, and the adopted CGS conversion factor.
Unit | Symbol name | Dimensions | SI Prefixable? | CGS Conversion factor |
---|---|---|---|---|
Base units | ||||
Gram | g | mass | yes | 1.0 |
Meter | m | length | yes | 100.0 |
Second | s | time | yes | 1.0 |
Kelvin | K | temperature | yes | 1.0 |
Radian | radian | angle | no | 1.0 |
Gauss | gauss | magnetic_field | yes | 1.0 |
Code units | ||||
Code mass units | code_mass | mass | no | ? |
Code length units | code_length | length | no | ? |
Code time units | code_time | time | no | ? |
Code velocity units | code_velocity | velocity | no | ? |
Code magnetic field units | code_magnetic | magnetic_field | no | ? |
Code temperature units | code_temperatre | temperature | no | ? |
Code metallicity units | code_metallicity | metallicity | no | ? |
Normalized domain units | unitary | length | no | Domain width |
Misc CGS | ||||
Dyne | dyne | force | yes | 1.0 |
Erg | erg | energy | yes | 1.0 |
Electrostatic unit | esu | (energy*length)**0.5 | yes | 1.0 |
Gauss | gauss | magnetic_field | yes | 1.0 |
Misc SI | ||||
Joule | J | energy | yes | 1.0e7 |
Watt | W | power | yes | 1.0e7 |
Hertz | Hz | rate | yes | 1.0 |
Imperial units | ||||
Foot | ft | length | no | 30.48 |
Mile | mile | length | no | 160934 |
Cosmological “units” | ||||
Little h | h | dimensionless | no | ? |
Time units | ||||
Minute | min | time | no | 60 |
Hour | hr | time | no | 3600 |
Day | day | time | no | 86400 |
Year | yr | time | yes | 31557600 |
Solar units | ||||
Solar mass | Msun | mass | no | 1.98841586e33 |
Solar radius | Rsun | length | no | 6.9550e10 |
Solar luminosity | Lsun | power | no | 3.8270e33 |
Solar temperature | Tsun | temperature | no | 5870.0 |
Solar metallicity | Zsun | metallicity | no | 0.02041 |
Astronomical distances | ||||
Astronomical unit | AU | length | no | 1.49597871e13 |
Light year | ly | length | no | 9.4605284e17 |
Parsec | pc | length | yes | 3.0856776e18 |
Angles | ||||
Degree | degree | angle | no | ![]() |
Arcminute | arcmin | angle | no | ![]() |
Arcsecond | arcsec | angle | no | ![]() |
Milliarcsecond | mas | angle | no | ![]() |
Physical units | ||||
Electron volt | eV | energy | no | 1.602176562e-12 |
Atomic mass unit | amu | mass | no | 1.660538921e-24 |
Electron mass | me | mass | no | 9.10938291e-28 |
Testing¶
We have written a set of unit tests that check to make sure all valid and invalid unit operations succeed or fail as appropriate. We will also need to verify that the extant unit and answer tests pass before this can be accepted.
Backwards Compatibility¶
This is a serious break in backwards compatibility. Once this is accepted,
units will no longer be stored in the StaticOutput
dict. This means that all
scripts which use the pf[unit]
construction will no longer be valid. We will
also need to eliminate instances of this construction within the yt
codebase.
We will need to check to make sure the analysis modules and external tools that
operate on yt
data can either work appropriately with YTArray
or figure out a
way to degrade to ndarray
gracefully.
YTEP-0012: Halo Redesign¶
Abstract¶
Created: March 7, 2013
Author: Britton Smith, Cameron Hummels, Chris Moody, Mark Richardson, Yu Lu
In yt 3.0, operations relating to the analysis of halos (halo finding, merger tree creation, and individual halo analysis) will be brought together into a single framework. This will enable the analysis of individual halos through time without the need to bridge the gap between halo finding, merger tree creation, and halo profiling on one’s own.
Status¶
Completed
Project Management Links¶
Detailed Description¶
Halo Analysis in yt 2.x¶
Currently, analyzing halos from a cosmological simulation works in the following way.
First, a halo finder is run, which produces a halo catalog in the form of an ascii
file. Each of the halo finders implemented in yt produce halo catalogs with slightly
different formats, including various quantities that also differ. To perform any
additional analysis on the halos that have been found, one then uses the
HaloProfiler
. The HaloProfiler
reads the various halos catalogs from their
files and performs a limited set of specific functionality, namely one-dimensional
radial profiles and projections. There is also a function that accepts a callback
function for performing custom analysis. The analysis products for each of these are
stored in separate locations. Any figures to be made from these analyses require the
user to write their own scripts that are responsible for all file i/o, sorting, and
plotting.
Analysis of a halo as it evolves over time first requires the creation of a merger
tree. For this to work, the particles that belong to each halo need to have been
written out during the halo finding step. Most of the merger trees work by manually
specifying a list of dataset filenames over which the merger tree is to be calculated.
A separate database file is created that contains the entire merger tree and helper
functions exist to tracks a given halo’s lineage over time. There is little
comprehensive framework for performing halo time series analysis. With the
functionality existing currently, halo time series analysis can be managed in one
of two ways. The first, and more expensive by far, is to run the HaloProfiler
on all halos in all datasets and then use the merger tree database file to correlate
halos from multiple times in the user’s hand-built plotting script. The second
is to use the merger tree information to specify a single halo index to be analyzed
with the HaloProfiler
. This is accomplished by creating a filter for a specific
halo index, and cannot account for halos coming from multiple parents or having
multiple children. There are numerous other ways in which these approaches are
very limiting.
Proposed Halo Analysis in yt 3.0¶
All of the functionality described above will be managed by a series of new objects
working in a hierarchy. These will be HaloCatalogTimeSeries
, HaloCatalog
,
and Halo
, decribed in further detail below. The files created by the operations
described here will allow for the full state of the HaloCatalogTimeSeries
object
to be restored by running the same commands that were used to create them. This will
allow the user to create a single script that will be run first on a supercomputer to
perform all of the dataset-requiring analysis and then later on a local machine to
load the halo data into memory for further analysis and plotting.
HaloCatalogTimeSeries
¶
This object will accept a TimeSeriesData
object containing all the datasets to be
used. Its two primary functions will be to perform halo finding on all datasets and
creating the merger tree. Each of these two steps will be performed with separate
functions calls where the arguments given will be a callable halo finding or merger
tree function and a dictionary containing additional configuration paramters specific to
the provided callables. The data structure contained in memory will be a time-sorted
list of HaloCatalog
objects, one for each dataset. It will also contain
a dictionary of dictionaries showing the merger tree information for each halo. The
on-disk format for HaloCatalogTimeSeries
objects will likely need to be refined, but
will ideally preserve the system of pointers connecting Halo
objects to their past and
future counterparts (described further in the Halo
section). The data stored here will
potentially be far too large for a single file. Instead, a system of multiple files that
is capable of quickly reconstructing the Halo
pointer structure may be better.
The other primary function will be to create halo tables that are flattened numpy structured arrays of various halo quantities from all or a selection of all halos (e.g. by timestep) from all halo catalogs. This will enable easy slicing and plotting of properties from multiple halos.
HaloCatalog
¶
This will be a light-weight container for Halo
objects from a single dataset. It
will be responsible for writing Halo
objects to and restoring from disk. It
will be a numpy structured array. The manner in which HaloCatalog
objects will be
written to disk will be specified similar to how halo finders and merger tree generators
given to HaloCatalogTimeSeries
objects, i.e., by providing a callable writer function.
This will allow users to write to any number of standardized formats, such as the
IRATE format.
Halo
¶
The Halo
object will contain all quantities associated with a given halo
calculated either during the halo finding step or by analysis performed later.
By default, particle information will be saved to disk after halo finding has been
performed since it is required for merger tree generation. However, particle
information will not remain attached to the Halo
object since it takes a great
deal of memory to store this information.
Instead, there will be several halo quantities calculatd at instantiation using
these particles including center-of-mass phase-space coordinates, densest point,
shape of halo, and merger tree information (matching against previous and later
timesteps to determine lineage of a halo). However, the option will exist
for reloading particle information for later analysis. This technique of frontloading
analysis that requires particle information is how some halo finders, such as
Rockstar, currently operate.
The Halo
object will also have pointers to the Halo
objects that are
their past and future instances, essentially the most massive pro/postgenitors
with the largest match of particles inventories. This will allow one to perform
any additional analysis on a single halo lineage simply by traversing this
linked list. Further analysis on Halo
objects will be facilitated by
associating quantities with callable functions. If the user attempts to access
a halo quantity whose value is not currently stored within the Halo
object,
it will run the associated callable to create it. At any time, the
HaloCatalogTimeSeries
object can be directed to update the files on disk with
any new quantities that have been calculated.
Backwards Compatibility¶
This will be a completely new framework for performing this type of analysis. Other than working with existing halo finders and potentially reading in the output they produce now, there will be no compatibility with pre-existing machinery. For these reasons, this development will be confined to yt 3.0.
Current Progress¶
Development of the new halo analysis is taking place in
this repository under the ytep0012 bookmark.
This work is being done alongside the unit refactor and thus includes all changes in the
unitrefactor bookmark here. The majority of the
work is taking place within yt/analysis_modules/halo_analysis
. Everything detailed below, except
where explicitly noted, has been implemented.
HaloCatalogTimeSeries
¶
Not Implemented. This is currently awaiting development of a new merger tree framework.
HaloCatalog
¶
This relies on the recently added ability to load a Rockstar halo catalog as a yt dataset,
referred to hereon as a halo finder dataset for clarity. A HaloCatalog
object is created
by providing it with a simulation dataset, a halo finder dataset, or both.
dpf = load("DD0064/DD0064")
hpf = load("rockstar_halos/halos_64.0.bin")
hc = HaloCatalog(halos_pf=hpf, data_pf=dpf,
output_dir="halo_catalogs/catalog_0064")
If the halo_pf is not given, halo finding will be done using the method provided with the
finder_method
keyword (not implemented). A data container can also be given, associated
with either dataset, to control the spatial region in which halo analysis will be performed.
Analysis is done by adding actions to the HaloCatalog
. Each action is represented by a callback
function that will be run on each halo. There are three types of actions.
1. Quantities - a call back that returns a value or values. The return values are stored within the halo object in a dictionary called “quantities.” At the end of the analysis, all of these quantities will be written to disk as the final form of the generated “halo catalog.”
# definition of the center of mass quantity
def center_of_mass(halo):
if halo.particles is None:
raise RuntimeError("Center of mass requires halo to have particle data.")
return (halo.particles['particle_mass'] *
np.array([halo.particles['particle_position_x'],
halo.particles['particle_position_y'],
halo.particles['particle_position_z']])).sum(axis=1) / \
halo.particles['particle_mass'].sum()
# add to a registry of available quantities
add_quantity('center_of_mass', center_of_mass)
# in the actual halo analysis script
hc.add_quantity("center_of_mass")
The above example is better suited as a parallel-safe derived quantity, but the use is the same.
Instead of being generated from a callback, quantities can also be values pulled directory from the halo finder dataset.
# get the field value ("halos", "particle_mass") for this halo from the halo dataset
hc.add_quantity("particle_mass", field_type="halos")
2. Callbacks - the callback is actually the super class for quantities and filters and is a
general purpose function that does something, anything, to a Halo
object. This can include
hanging new attributes off the Halo
object, performing analysis and writing to disk, etc.
A callback does not return anything.
In the example below, we create a sphere for a halo with a radius that is twice the saved
“virial_radius” (in the quantities dict), recenter it on the location of maximum density,
then do some profiling, compute virial quantities based on those profiles (storing them in the
quantities dict), and then write the profiles to disk.
hc.add_callback("sphere", radius_field="virial_radius", factor=2.0)
hc.add_callback("sphere_field_max_recenter", ("gas", "density"))
hc.add_callback("profile", "radius", [("gas", "matter_mass"),
("index", "cell_volume")],
weight_field=None, accumulation=True,
output_dir="profiles", storage="profiles")
hc.add_callback("virial_quantities", ["radius", ("gas", "matter_mass")])
hc.add_callback("save_profiles", storage="profiles")
Currently existing stock callbacks:
- sphere creation
- sphere recenter
- sphere bulk velocity
- 1D profiling
- virial quantity calculation based on 1D profiles
- writing profile data
- reloading saved profile data
- removing
Halo
attributes. - PhasePlot
3. Filters - a filter is a callback function that returns True or False. If the return value is True, any further queued analysis will proceed and the halo in question will be added to the final catalog. If the return value False, further analysis will not be performed and the halo will not be included in the final catalog.
hc.add_filter("quantity_value", "matter_mass_200", ">", 1e13, "Msun")
Currently existing stock filters:
- quantity filter (shown above): uses an eval statement with a value stored in the quantities dict
Running and Order of Operations¶
After all callbacks, quantities, and filters have been added, the analysis begins with a call to
HaloCatalog.create
.
hc.create(save_halos=False, njobs=-1, dynamic=True)
The save_halos
keyword determines whether the actual Halo
objects are saved after analysis on
them has completed or whether just the contents of their quantities dicts will be retained for creating
the final catalog. The looping over halos uses a call to parallel_objects
allowing the user to
control how many processors work on each halo. The final catalog is written to disk int the output
directory given when the HaloCatalog
object was created. The final halo catalog can then
be loaded in as a yt dataset just in the manner of a halo finder dataset.
All callbacks, quantities, and filters are stored in an “actions” list, meaning that they are executed in the same order in which they were added. This enables the use of simple, reusable, single action callbacks that depend on each other. This also prevents unecessary computation by allowing the user to add filters at multiple stages to skip remaining analysis if it is not warranted.
Reloading a Halo Catalog¶
A HaloCatalog
saved to disk can be reloaded as yt dataset with the standard call to load
. Any
side data, such as profiles, can be reloaded with a load_profiles
callback and a call to
HaloCatalog.load
.
from yt.mods import *
from yt.analysis_modules.halo_analysis.api import *
hpf = load("halo_catalogs/catalog_0046/catalog_0046.0.h5")
hc = HaloCatalog(halos_pf=hpf,
output_dir="halo_catalogs/catalog_0046")
hc.add_callback("load_profiles", output_dir="profiles",
filename="virial_profiles")
hc.load()
The load
and create
functions are wrappers around a _run
function responsible for looping over
all the halos and applying callbacks. The only difference between the two is that their default keyword
arguments are specifically tailored for creating (do not retain Halo
objects, do write catalog) or
rereading catalogs (do retain Halo
objects, do not write catalog).
Halo
¶
Halo
objects are created by the HaloCatalog
during the call to HaloCatalog.run
. They are
mainly meant to be temporary objects for retaining information so that it can be passed between callbacks.
They can be kept by specifying save_halos=True. This might be useful when working with a time series of
halo catalogs where future_self and past_self attributes may act as pointers to Halo
objects within
HaloCatalogs
that are time-adjacent.
Remaining Work¶
See the Trello board.
All are welcome to get involved with development, testing, etc!
YTEP-0013: Deposited Particle Fields¶
Abstract¶
Created: April 25, 2013
Author: Chris Moody, Matthew Turk, Britton Smith, Doug Rudd, Sam Leitner
The majority of the yt codebase is currently built around Eulerian, grid or cell-like quantities. In order to use particle quantities, we typically have to deposit particles and essentially make them look like fluid quantities. This YTEP details the suggest deposition process, how to implement it, how to extend and subclass it, and suggested syntax.
This should improve particle support for Octrees and SPH codes dramatically, and extend particle deposition syntax for grid-patch codes.
Note that while this covers initial implementation of particle deposition, that deposition does not include smoothing kernels that utilize extended regions outside of the target region. SPH smoothing kernel implementations will be defiend in a subsequent YTEP.
Furthermore, we note that this describes fundamentally an interface between the particles and an index. For eulerian codes, the index corresponds to the fluids; however, for SPH and N-body systems, this is not the case.
Status¶
Completed
Detailed Description¶
Particle Deposition in yt 2.x¶
Currently, particle deposition for grid-patch codes works by querying particle fields and supplying them to a routine like CIC_Deposit3. It is non-trivial to extend this CIC function to octree codes but essential to making SPH codes interoperable with the yt codebase.
Proposed Syntax¶
The names of deposited fields can be user-defined, and thus are not explicitly
restricted. However, as having a deposited fields becomes more common in the yt
framework and libraries begin to expect and depend on particular names,
we suggest that field names are written as ("deposit", "pname_poperation")
where pname
is the name of the particle type and poperation
is some
semantically-meaningful description of the operation. deposit
is defined
as a fluid type in all frontends. This indicates that the returned array is
shaped like a fluid field and not particle-shaped. This is distinct from
gas
as we may have conflicting or overlapping field definitions.
Example Deposited Field¶
Below is example particle deposition field defined in Python:
@derived_field(name = ("deposit", "particle_count"),
validators=[ValidateSpatial()])
def particle_count(field, data):
pos = np.column_stack([data["particle_position_%s" % ax]
for ax in 'xyz'])
return data.deposit(pos, method = "count")
Changes to Frontend Code¶
We exploit the fact that the octree frontends share a common
base class OctreeSubset(YTSelectionContainer)
to create a common
function deposit()
. The patch-based codes have an analogous
AMRGridPatch(YTSelectionContainer)
. The deposition is passed
the particle positions, the particle fields required, and the
deposition method: deposit(positions, field = None, method = None)
.
The deposit function uses method
to lookup a Cython
ParticleDepositOperation
class in particle_deposit.pyx
. This class
defines the deposition procedure in three steps, which deposit
calls
sequentially. The first ParticleDepositOperation
member function
is initialize
which allocates the memory
required to hold temporary arrays for the deposition of particles into
grids or octs. Extra temporary arrays are useful when a reduction of data
must occur after the we have looped through all particles. The next step
is either process_octree
or process_grid
where we loop over all
particles, find the oct or cell in an octree or grid (respectively). Once
found, we call process(dims, left_edge, dds, 0, pos, field_vals)
which
relates a single particle, its associated cell, and the incremental deposited
value. The last step finalize
reduces the data from the temporary arrays
and return an oct-shaped or grid-shaped array.
This organization allows us to seperate the particle lookup along in a
grid or oct tree from the deposition operation we would like to perform.
Example Cython Code¶
Below we include an example of the base particle deposit class with most of the Cython type definitions removed for legibility:
cdef class ParticleDepositOperation:
def initialize(self, *args): raise NotImplementedError
def finalize(self, *args): raise NotImplementedError
def process_octree(self, octree, dom_ind, positions, fields = None):
for i in range(positions.shape[0]):
oct = octree.get(pos, &oi)
offset = dom_ind[oct.ind]
self.process(dims, oi.left_edge, oi.dds,
offset, pos, field_vals)
def process_grid(self, gobj, positions,fields = None):
for i in range(positions.shape[0]):
for j in range(3):
pos[j] = positions[i, j]
self.process(dims, left_edge, dds, 0, pos, field_vals)
def process(self, *args): raise NotImplementedError
Below we subclass the template above to deposit a particle count,
taking care to override initialize
, process
and finalize
but leaving grid traversal in process_octree/grid
alone, ensuring
that this will work with grid and octree codes:
cdef class CountParticles(ParticleDepositOperation):
def initialize(self):
self.ocount = np.zeros(self.nvals, dtype="float64")
cdef np.ndarray arr = self.ocount
self.count = <np.float64_t*> arr.data
@cython.cdivision(True)
cdef void process(self, int dim[3],left_edge[3], dds[3], offset,
ppos[3], *fields):
cdef int ii[3], i
for i in range(3):
ii[i] = <int>((ppos[i] - left_edge[i])/dds[i])
self.count[gind(ii[0], ii[1], ii[2], dim) + offset] += 1
def finalize(self):
return self.ocount
Using the templates and organizational scheme proposed here, one can define fields with arbitrary particle selections (e.g. young stars), perform arbitrary accumulations (e.g. count, sum, or std), loops over all of the particles multiple times, and switch between cloud-in-cell, SPH smoothing kernel, or simple direct deposition.
Future SPH Kernel¶
A process very similar to this will be utilized in the future to conduct smoothing kernel operations. This will require two operations:
- Iteration over the Octs, rather than the particles, and selection of particles based on proximity to an Oct
- An octree selector that has lee-way in its selection of particles; i.e., particles can be fed in as having a
dx
that allows them to be selected by octs within which they do not directly reside.
We may find that this specific operation is too slow for applying the smoothing kernel, in which case other options will be explored.
An initial implementation of this operation is contained in
yt/geometry/particle_smooth.pyx
.
Backwards Compatibility¶
This has no backwards incompatible changes.
Alternatives¶
We were unable to identify any.
YTEP-0014: Field Filters¶
Abstract¶
Created: July 2nd, 2013 Author: Matthew Turk
This YTEP outlines a method for defining generic, evaluated filters to apply to particles used in derived fields. Currently it does not extend to fluid quantities, as that will require rethinking the method of presenting and handling Eulerian quantities.
Status¶
Proposed. Target is 3.0a3.
Detailed Description¶
Currently, filtering particles is done ad-hoc by derived fields. Typically something like this is done:
def finest_particles(field, data):
filter = data["ParticleMassMsun"] <= 340000
pos = data["Coordinates"][filter, :]
d = data.deposit(pos, [data["all", "ParticleMass"][filter]],
method = _method)
d /= data["CellVolume"]
return d
This is not ideal, as it requires new fields to be defined for every single particle filtering and field combination. This requires every single derived field that is desired to be filtered individually, including derived fields that are used as dependencies in another field. This is not workable, and a new mechanism for definining filtered particle types is needed. However, rather than declaring a completely new domain-specific language for defining particles to select inside a given field specification, this YTEP defines a method for declaring filters that can be applied to particles inside a contextmanager. This means that all particles used inside the context manager will be pre-filtered.
However, to avoid over-complication, the filtering step will be defined inside functions similar to derived fields and will not be auto-detected. Instead, all filters defined will be allowed to be applied and in the case of a filtering-needed field not being found, an exception will be allowed to be raised. However, field dependencies noted in the creation of a filter will be taken into account when filters are added to a given dataset.
These filters will be added and viewed as a new particle type. For instance, if a dataset has only “all” particles, a new filter could be added that filtered out particles that should be regarded as “star” particles and that filter will then be presented as a new particle type “star”.
Filters are only meant to act on homogeneous groups of particles. For instance, a given filter could not select sets of particles with hetereogeneous attributes and combine them.
Here is an example filter that would accomplish the filtering task shown above:
def finest_particles(pfilter, data):
filter = data["all","ParticleMassMsun"] <= 34000
return filter
add_particle_filter("finest_particles",
function = finest_particles,
filtered_type = "all",
requires=["ParticleMassMsun"])
ds = load("DD0040/DD0040")
sp = ds.h.sphere("max", (1.0, 'mpc'))
sp.quantities["TotalQuantity"]( [("finest_particles", "ParticleMassMsun")] )
However, more complex filters could be defined as well, relying on additional fields. Furthermore, a side-effect of future particle field definitions being generated for specific particle types (described in issue 598) would be that any particle filters defined would also have any particle derived fields added on a per-particle-type basis added automatically. The addition of field definitions will occur during the creation of derived fields.
Note that we do explicitly specify field dependences in these particle types. This may cause issues, as derived fields will first need to be identified, then particle filters, then any derived fields for those new ad-hoc particle types will be added. This will require some refactoring of field detection methods, which will overall serve to improve the reliability of the code base and field detection mechanisms.
Derived fields based on filtered particles are not currently available; only derived fields that work on the filtered particle type will be available.
Adding this filtering mechanism will also considerably simplify the Enzo
frontend, as currently the Enzo frontend defines several different methods for
identify star particles. (Other frontends, where attributes of particles
separate them into different classes, will also benefit.) As an example, for
Enzo 2.X-class simulations, the definition of a star is through either a
creation_time
attribute or a particle_type
attribute. This will enable
definition of filters, and only the one that is applicable to the specific
dataset will be added and applied.
def star_creation_time(pfilter, data):
filter = data["all", "creation_time"] > 0
return filter
add_particle_filter("star",
function = star_creation_time,
filtered_type = "all",
requires = ["creation_time"])
def star_particle_type(pfilter, data):
filter = data["all", "particle_type"] == 2
return filter
add_particle_filter("star",
function = star_particle_type,
filtered_type = "all",
requires = ["particle_type"])
The correct filter will be identified and added to a dataset. Filters are distinct from types in the sense that types have a fast-path that can be passed down to IO functions; for instance, this may be because particles are stored in a separate location or IO routines are able to quickly identify those particles that are able to be loaded. Filters are better thought of as a set of generic validation and selection routines, where those filters are difficult or impossible to pass into the IO routines in a general way.
Since this is a multi-map to filter names, we will not be able to store filters in a dict-like object, or we will at the very least have to return a list of possible filters when accessing via dict. This will likely not serve as a large barrier, as the set of filters will not be user-exposed.
In addition to this, we will define a similar system for filters as is done for
fields, in that a hierarchy of filtering databases will be available. The base
or universal filters will be available across codes (suitable, for instance, in
direct cross-code comparison) and then frontend-specific filters can be
created. This will enable degeneracies of field names and so on to be
eliminated. The first implementation will require manual calling of
add_particle_filter
on StaticOutput
subclasses before instantiation
of a hierarchy.
However, unlike derived fields, because these filters define actual new particle types, they will not by-default be applied universally, but instead universal filters will need to be activated by the user. Frontends can decide on a frontend-by-frontend basis whether or not new frontend-specific filters will be added by default.
Backwards Compatibility¶
This should not break any backwards compatibility by itself. However, should functions in yt begin to rely on these filters, those functions will no longer be backwards compatible.
Alternatives¶
I have not presently identified any alternatives, other than construction of a domain-specific language for describing filters that would then be embedded in the particle type. I believe that will raise complexity considerably.
YTEP-0015: Transfer Function Refactor¶
Abstract¶
Created: August 13, 2013 Author: Sam Skillman
This YTEP proposes a fundamental change in the way transfer functions are constructed, modified, and implemented. The overall goal is to decrease the overhead and difficulty in constructing high-quality transfer functions and displaying their current state to the user.
Status¶
Status should be one of the following:
- Proposed
- In Progress
YTEPs do not need to pass through every stage.
Project Management Links¶
PR Under Development: https://bitbucket.org/yt_analysis/yt/pull-request/538/transfer-function-helper/diff
Detailed Description¶
- Transfer functions are currently:
- Fragile – to log/linear, ranges, field swapping
- Complex – must have prior knowledge of data to construct valid TF
- Difficult to Design – User must guess where interesting features will be.
The aim of this refactoring is to alleviate these three problems. To do so, we will implement several helper functions that are automate many of the actions that are commonly used during the design process of the transfer function. Several operations may be costly, and thus will not be done automatically but rather upon request by the user.
This splits up the TransferFunction into two pieces – the TransferFunction and TransferFunctionData. The former encompasses all the user-facing API in terms of designing and modifying a transfer function, and the latter contains the data needed by the volume renderer.
Suggested TF Structure:
class TransferFunctionData(object):
"""
Contains the data used by the Camera to actually do the volume
rendering. Not accessed by the user in most circumstances. This
contains most of what the TransferFunction used to be.
"""
class TransferFunction(object):
def __init__(self, data_source=None):
self.data_source = data_source
self.pf = self.data_source.pf
self.rgb_field = None
self.bounds = None
self.alpha_field = None
self._valid = False
self.transfer_function_data = None
def smart_build(self):
"""
Automatically set up best guess bounds, and run initial 1D
profiling of given field. We could make this as automatic
or not as we want.
"""
pass
def set_field(self, field):
"""
Sets the rgb channel to be linked to a given field, invalidating
the current profiles/ranges if different than current field.
"""
if field == self.rgb_field:
return
self._valid = False
assert (field in self.pf.h.field_list)
self.rgb_field = field
def _get_field_bounds(self):
return self.data_source.quantities['Extrema'](self.field)[0]
def set_bounds(self, bounds=None):
if bounds is None:
bounds = self._get_field_bounds()
# Do error checking on log/linear state of rendering.
self.bounds = bounds
def _get_1D_field_histogram(self):
"""
Calculates 1D profile (in mass/volume/count) of current field to
aid in placement of transfer function features.
"""
pass
def plot/show/display(self):
"""
plots, shows, or displays current TF based on how the user is
interacting with yt. This could save an image to tf_tmp.png,
display in an interactive matplotlib backend, display in an IPython
notebook, or directly interact with the user's visual cortex.
"""
# add in all the transfer function modifiers here (gaussians, layers,
# ramps, map_to_colormap, etc.)
def set_log(self, log=True):
self.log = log
def clear(self):
"""Clears out the channel values, but leaves the bounds intact"""
pass
def _get_tf_data(self):
"""
This is what the Cameras call to get the TF information. This does
error checking to make sure the transfer function is valid."""
if not self._valid:
# Rebuild TransferFunctionData
pass
return self.transfer_function_data
After this is implemented, the usage pattern I would see would be something like:
tf = TransferFunction(pf.h.all_data())
tf.set_field('Density')
tf.smart_build() #<--- maybe another name like: auto_build or auto_awesome
tf.display() #<--- Should we make this automatically display if possible?
cam.set_transfer_function(tf) #<---- links a camera to this tf.
# Alternatively we could have done tf = cam.transfer_function and modified
# the camera's tf directly.'
tf.set_log(True) # <--- invalidates the TF
tf.do_whatever_modifications(...)
cam.snapshot()
Backwards Compatibility¶
This change will break backwards compatibility with how old TransferFunctions were constructed.
YTEP-0016: Volume Traversal¶
Abstract¶
Created: September 10, 2013 Author: Matthew Turk
yt should consider volume traversal, accumulation of data, and flexible definitions of paths to be first-class operations as well as implementable by individuals. Essentially, we need a method for describing “derived values for volumes”.
Status¶
In progress
Project Management Links¶
This is being done in the bookmark cylindrical_rendering
in
http://bitbucket.org/MatthewTurk/yt-3.0 .
Detailed Description¶
Currently, the only mechanisms for studying or understanding data in yt are contained in the following procedures:
- Derived quantities (generating scalars from fields in volumes)
- Derived fields (generating fields from other fields)
- Contour identification
- Ray casting (on or off axis; i.e., projections or volume rendering)
- Streamlines
- Contour identification
- Surface extraction
Several of these items utilize brick decomposition, but not all. This is the process by which overlapping grids are broken apart until a full tesselation of the domain (or data source) is created with no overlaps. This is done by the kD-tree decomposition.
What this YTEP proposes is to make handling tiles of data first class, as well as provide easy mechanisms for creating volume traversal mechanisms. There are two components to this: handling tiles of data, and creating fast methods for passing through the data and moving between tiles.
Note that this YTEP does not (yet) address situations where the mesh of the simulation is too large to fit into memory on a single node. Because the kD-tree is able to build in parallel, this essentially will amount to distributing regions of the dataset (where the mesh may not be known) to individual processors, allowing build on those processors, and enabling individuals to describe reduction steps in their operators.
Brick Iteration¶
Currently, in yt-3.0, all data objects expose a “block” iterator that returns data containers as well as masks of data. A similar iterator should exist for iterating over the “tiles” that compose a given data object. How this should behave is somewhat open for discussion, as the kD-tree itself has a notion of a ‘viewpoint traversal’ which may be important. Furthermore, it is not necessarily true that the traversal will be easily defined. As an example of this, tiles may need to traversed according to extrema in some fluid.
Traversal Orders¶
The order that tiles are returned to the individual should be flexible and extensible. A few predefined orders should be implemented:
- Depth-first traversal
- View-point traversal
- Breadth-first traversal
Additionally, these should allow for front-to-back or back-to-front yielding. An example API for this would be:
data_source = ds.h.sphere(c, (10.0, 'mpc'))
for brick in data_source.tiles.depth_traverse():
operation(brick)
By default, depth_traverse()
would also be exposed for simply iterating
over the tiles
object. Additional traversals could be extensibly defined.
Many of these traversal already exist for the AMR kD-tree. A traversal
should be able to be defined and executed from the following set of
information:
- kD-tree object
- Arguments defining the traversal itself
- Data-centering (optional) and fields (optional)
It should yield PartitionedGrid
objects.
Return Values¶
Most importantly, the notion of what is returned by this system needs to be defined. The notions of what brick is and possessed need to be defined. There are several options:
- An empty
CoveringGrid
that knows how to read data.- A filled (i.e., data pre-specified)
PartitionedGrid
, where vertex or cell-centered data must be specified.- A slice object and a grid object
- A new object, designed for this system, which acts as a superset of
PartitionedGrid
. This object would include connectivity information as well, as it would not be independent of the tree itself. ThePartitionedGrid
could be modified to fit this.
Regardless of which object is returned, at a minimum a kD-tree (or other partitioning) must be created when requested, at the call to tiles, potentially cached, and then objects iterated over. Each of these tiles is guaranteed to be the finest data available within the region they cover, and they are guaranteed not to overlap with any others.
For the purposes of this YTEP, we will assume the fourth option. If the
PartitionedGrid
object were to be extended, I believe it would likely be best
to extend it as follows. Note that for many of these operations we implicitly
assume that it is operating on a grid patch; for octree codes, the creation of
this object will be considerably simpler, and for particle codes we simply
define these as the leaf nodes from the octree index itself. Because we need
to handle particle codes, we must also ensure that these objects can query
particles.
- Cache a slice of the grid or data object that it operates on. (For situations where it fully encompasses the parent region, it need not have a slice.)
- Create a mechanism for filtering particles from the data object it operates on.
- Enable the object to query new fields from its source object. This means that at instantiation time we may not regard the object as having a given field, but that this field can be added at a later time by querying.
- Provide a mechanism for identifying neighbor objects from a given face index. This is the connectivity relationship described above; given any one cell that resides on the boundary of a brick, return the brick (which may or may not be a leaf node) that is adjacent. This would enable identifying the leaf node at a given location within that boundary cell, which may reside at a higher level of refinement and could thus correspond to multiple tiles. This degeneracy results from the fact that we cannot guarantee that neighboring tiles differ by only a single level of refinement. However, because this will be defined at the Python level, rather than specifically for well-defined traversal operations, this is acceptable as we should leave open to the individual how to select the appropriate cell or what to do with it.
- Provide mechanisms for generating vertex-centered data or cell-centered data quickly.
At the present time, a simple first-pass at implementation could occur with the following:
- Implement a
tiles
routine that mandates supplying fields to cache or load, the vertex or cell centering of data, and a viewpoint traversal scheme.- Cache a kD-tree based on these tiles.
- Iteratively yield tiles from this tree based on the traversal specified above.
The interface for these tiles, at a minimum, must expose that of the
PartitionedGrid
with one modification: fields should be accessible by
__getitem__
, so that any possible changes in the future that would expose
this would be backwards compatible with usages now.
Volume Traversal¶
The second aspect of this YTEP is to define a mechanism for integrating paths through tiles. Currently we do this through strict vectors that cannot be re-entrant into a grid; these vectors cannot change path along the way, and the number of them is fixed at the time of the first grid traversal.
As currently implemented, flexibility in volume traversal is defined in terms
of the mechanism by which values are accumulated. This includes the definition
of these objects, all inside grid_traversal.pyx
:
ImageContainer
ImageAccumulator
sampler_function
- Accumulator data (i.e.,
VolumeRenderAccumulator
Essentially, for a given image type, a sampler can be defined. This sample
receives the following arguments: VolumeContainer
(a C-interface to a
partitioned grid with nogil
), v_pos
(vector position), v_dir
(vector direction), enter_t
(cell-entrance in terms of the parameter),
exit_t
(exit time based on the vector at time of entrance), index
(index into the data) and data
, a pointer to an accumulator (i.e.,
ImageAccumulator
) object.
These sampler functions are called for every index that a vector traverses.
The volumes themselves are traversed inside walk_volume
(and, in the
nascent cylindrical volume rendering bookmark, walk_cylindrical_volume
).
This assumes cartesian coordinates and simply calls the sampler_function
for every zone that is crossed. This enables volume rendering, projecting and
so on to be conducted simply by swapping out the sampler function and correctly
interpreting an image object returned.
However, this is not sufficient for arbitrary traversals or arbitrary data collection. We need flexibility to define the following things:
- The mechanism of traversing blocks of data (covered at a higher level by the kD-tree itself, and not necessarily a part of this YTEP)
- Bootstrapping traversal of a volume by a given ray object. This would include identifying the zones that a ray first encounters and setting its initial time of intersection.
- Defining a mechanism for updating the indices in the volume that a ray will intersect next
- Defining a method for determining when a ray has left an object
- Defining a method for selecting the next brick to traverse or connect to
- Updating the value of a ray’s direction
Many of the problems can be seen simply by considering cylindrical volume rendering itself. If the view point is somewhere outside the cylinder looking toward it, rays from an orthonormal image plane will each construct a chord through the cylindrical shells. These chords will each span up to pi along the theta direction, and can have the following properties in their traversal:
- dtheta/dl can switch signs
- grids can be periodic with themselves
- dr/dl can switch signs
- a ray can exit a grid off the r boundary and then re-enter it later in the computation
- Both dtheta/dl and dr/dl change with each update of the ray’s position, and are not even constant over a single zone.
While this demonstrates some of the complexity, we also want to be able to support translating streamlines, clump finding and even gravitational lensing into this new mechanism for traversing volumes.
Therefore, we need a new mechanism that abstracts (independently) both the collecting or accumulating of data as well as the mechanism by which a given ray traverses a patch of data, whether that patch is one or several cells large. In this manner we will remain neutral to the nature of the data container, which may be an octree, a kD-tree, or a single grid.
Flow Control¶
- At the outermost level, tiles will be traversed in Python, and a collection of rays (either in an
ImagePlane
or some other object) will be handed each brick as it comes.- Each ray will be “bootstrapped” onto a brick. This will result either in a traversal or an immediate return. (At a later time we will consider fast evaluation of which rays to consider.)
- Each cell traversed by the ray will be “sampled” in some way.
- The ray state (location, index, direction, etc) will be updated.
- Rays will traverse until they leave a brick.
- The next brick will be identified, either from ray positions or from the traversal at the python level.
Note that this does not yet enable a ray to request the next brick at a given position, which will be necessary. However, for the purposes of this iteration of the YTEP, we take it as given that such communication will be defined at a later time, or will be handled on a ray-by-ray basis, where the iteration is managed for each ray individually.
Objects to Manage¶
To accommodate the flow control outlined above, the following classes will need to be implemented, with the following specifications. These will be in Cython. A base class (listed below) will form the basis for each type of traversal.
struct ray_state:
np.float64_t v_dir[3]
np.float64_t v_pos[3]
np.float64_t tmax[3]
int ind[3]
int step[3]
np.float64_t enter_t
np.float64_t exit_t
void *sdata
class GeometryTraversal:
# set values like domain size or whatever is necessary here
def __init__(self, parameter_file)
# Return whether the ray hits the vc or not
cdef int initialize_ray(self, ray_state *ray, VolumeContainer *vc) nogil
cdef int increment_ray(self, ray_state *ray, VolumeContainer *vc) nogil
cdef np.float64_t intersection(self,
np.float64_t val, int axis, np.float64_t v_dir[3],
np.float64_t v_pos[3]) nogil
cdef int walk_volume(self, VolumeContainer *vc, sampler_function *sampler,
ray_state *ray, np.float64_t *return_t = ?) nogil
cdef
The ray_state
object will be independent of the geometry, and will always
refer to the cartesian state of the ray. A given geometry traversal will set
up the ray state (i.e., where it intersects with a volume container) and how to
increment the ray state as zones are crossed. The initialize_ray
function
will determine the state of the ray as it first touches a brick, and will
return 0 or 1 if the ray is inside that brick. The increment_ray
function
will receive a ray and determine the crossing time in the parameter t
that
the ray uses as it passes through a cell. The return value is 0 for the ray
having left the object and 1 for the ray being within the object and the
sampler function needing to be called. intersection
will get the position
at which a ray intersects a given value, and walk_volume
will typically be
described in the base class and not overridden elsewhere. Part of the level of
abstraction is to enable walk_volume
to largely be the same for each
geometry, but enabling it to be overridden means we can use the same traversal
for other operations such as clump finding and so on.
As a first implementation, the following classes will need to be implemented:
CartesianTraversal
PolarTraversal
CylindricalTraversal
At a later time, the SphericalTraversal
object can be implemented.
Backwards Compatibility¶
This should retain all backwards compatibility for cartesian systems.
Alternatives¶
I’m not sure of any alternatives currently.
YTEP-0017: Domain-Specific Output Types¶
Abstract¶
Created: September 18, 2013 Author: Matthew Turk and Anthony Scopatz
This YTEP is designed to begin the process of generalizing astrophysics-specific components of yt toward applications in other domains.
Project Management Links¶
The first phase pull request, which is contingent on this being accepted, is here:
Detailed Description¶
Currently, yt is extremely strongly focused on astrophysical data. This leads
to the inclusion of attributes such as cosmological_simulation
,
current_redshift
and so on, as well as some other fundam. Even within
astrophysical simulations, these can be irrelevant or unnecessary.
Furthermore, there may be attributes relevant to other domains (that transcend
a single subclass of StaticOutput
) that may be relevant or necessary.
This concept of branding things extends even to the level of the commonly-used
variable name pf
, which originated within the original Enzo usage as
shorthand for “parameter file,” and the name StaticOutput
as in contrast to
the “streaming” movie format within Enzo. In order to effectively move beyond
both astro- and Enzo-centrism, the terminology, attributes, and extensibility
of datasets should be emphasized and defined.
Problematic Areas¶
Attributes on StaticOutput¶
The following attributes are defined on every StaticOutput
regardless of
whether the dataset is astrophysics, cosmology, or even rectilinear cartesian
mesh.
current_time
(note: this also is not correctly implemented for Enzo)domain_dimensions
domain_left_edge
domain_right_edge
cosmological_simulation
current_redshift
omega_lambda
omega_matter
hubble_constant
Even if cosmological_simulation
is set to off, the cosmology-related
parameters will be defined. Additionally, the default “field type” is gas
,
which is globally set and not necessarily trivial to modify. Changing the
units to be less astro-specific (which may not be necessary for length units)
is part of a larger units-related discussion, rather than part of this YTEP.
Additionally, StaticOutput
is tied extremely strongly to a file on disk.
Because that is largely internally-facing, changing that may not be subject to
a YTEP, but rather a simple refactoring.
Finally, not all simulation types have a concept of domain_dimensions
, even
if the indexing system does. This is currently outside the scope of this YTEP.
The domain left and right edges also do not always matter for particle
simulations (except in non-outflow boundary conditions) but are still always
relevant to the indexing system.
Below are a few suggested mechanisms for retaining this information as “first class” attributes of a given data set when appropriate, but to remove it from those datasets where it is not appropriate.
Naming and Branding¶
Objects will be renamed:
StaticOutput
will be renamed toDataset
TimeSeriesData
will be renamed toDatasetSeries
and will no longer exclusively refer to a time-related set of data, but instead include arbitrary collections of datasets.- Instead of
pf
as shorthand, we will useds
.- Renaming
GeometryHandler
toIndex
Currently, all datasets expose a .hierarchy
attribute, shortened to .h
.
This naming is a holdover from the time when Enzo ando ther patch-based AMR
datasets were the primary data examined with yt. However, this makes
considerably less sense when seen in light of support of particle datasets,
semi-structured datasets, unigrid datasets, and eventually unstructured mesh
datasets. What we really mean when we say .hierarchy
or .h
is index
or geometry. Currently, the StaticOutput
object also possesses a
.geometry
attribute, although this is a string scalar.
I do not think we should replace the .h
attribute wholesale, and I do not
necessarily think that data objects should necessarily directly hang off of the
StaticOutput
(or whatever it is renamed) object. However, I do think that
we should eliminate hierarchy
in favor of something more generic that is
more descriptive, and we should consider alternates for creating data objects.
Regardless of what we decide on, the .h
attribute should remain for the
time being, and we should also not instantiate our indexing method until
requested.
The resolution decided upon during discussion has been:
- Eliminate the
hierarchy
object as a name.geometry
seems to be the most popular for what theGeometryHandler
object does.- Retain the
h
attribute as an alias (for now, possibly forever)- Each dataset will have an
index
property which will be aGridIndex
,OctIndex
etc etc. This is essentially the same as theHierarchy
attribute.- Move data objects up to the top level of
Dataset
.
Domain-Specific Datasets¶
Because some domains will have fundamental parameters that put into context the data they represent, this YTEP proposes a plugin system wherein domain-specific “contexts” register themselves and specific frontends identify which plugins are applicable to that specific frontend. This dual-ended handshaking helps ensure that plugins ensure they are applicable to a frontend, and that frontends identify potential plugins that work for them.
A domain plugin (called DomainContext
) will operate on a dataset
object, adding new attributes, but not new methods. This violates common
object-oriented philosophy and practice, but from an implementation perspective
it seems to be the cleanest and avoiding the most meta-programming.
On instantiation, a static output normally goes through these steps:
_parse_parameter_file
_setup_coordinate_handler
_set_units
_set_derived_attrs
print_key_parameters
create_field_info
This YTEP would propose changing this order to:
_parse_parameter_file
_setup_coordinate_handler
_set_units
_set_derived_attrs
_apply_domain_contexts
create_field_info
print_key_parameters
_apply_domain_contexts
would iterate through the intersecting set of
globally and frontend-specific registered domain-specific plugins, and for each
one would call the class method: is_appropriate
supplying the dataset
object (self
) as the only argument. If so, the plugin would then return
True and an instance of it would be appended to the dataset property
domain_contexts
(or some other name, as this collides with domain_*
referring to simulation spatial information.) Alternately, we could mandate an
_adapt_*
method (seen below) and in the absence of such a method assume the
plugin is blacklisted.
These plugins would then, in sequence, have their apply
method called with
the dataset as the only argument. They can then add additional attributes to
the dataset, as well as additional key parameters to print out. The runtime
overhead should be negligible.
This extends further to the compartmentalization of field definitions. We
leave that somewhat unspecified here, but domain contexts should enable the
application of specific field objects based on runtime parameters. This could
mean, for instance, conversion of face-centered to cell-centered quantities,
magnetic field analysis, nuclear decay times, and so on. One mechanism for
doing this would be to add field objects to the already-created field_info
object. (This is why that step must be raised in the list.)
One concern with this is that frontend-specific parameters (i.e.,
cosmological_simulation
) are not universal, so an adapter between the
frontend and the plugin needs to be created. We propose that this be required
for each frontend by enabling plugins to call methods on the dataset. These
methods will be named _adapt_*
where the suffix is the contexts’s shortname.
These will return dictionaries of parameters which will be rigorously checked
for contents (i.e., preventing incorrect or incomplete information from being
passed back.) Contexts must define these methods.
As an example, here is pseudocode for a cosmological simulation context:
class CosmologyContext(DomainContext):
domain = 'cosmology'
def __init__(self):
pass
@classmethod
def is_appropriate(cls, pf):
if not hasattr(pf, '_adapt_cosmology'): return None
rv = pf._adapt_cosmology()
if rv['cosmological_simulation'] == 1:
c = cls()
return c
return None
def apply(self, pf):
params = pf._adapt_cosmology()
pf.cosmological_simulation = rv['cosmological_simulation']
pf.cosmology = Cosmology()
This design mechanism is somewhat open for discussion; the problems of adapting varying parameters and matching both the generality of the domain context and the frontend dataset provide challenges. An alternative is to provide a default class method for each context that is used by the base dataset object to obtain a false value.
As noted during discussion, context can and should subclass each other. How this interfaces with which plugin in the order of resolution is not yet clear, as (for instance) the base class should not necessarily modify an attribute when the subclass would then override.
Runtime Extensibility¶
These domain context will be extensible at runtime by specifying an additional list of plugins to check, by adding additional plugins to the global (and frontend-specific) registry, and by adding to the plugin list for each dataset type.
Implementation¶
Much of the implementation has been described above. However, these domain
plugins should reside in a subdirectory of data_objects
, specifically named
yt/data_objects/domain_contexts/
and should be limited to one class per
file.
Backwards Compatibility¶
- The backwards compatibility of renaming is likely quite small, except for those cases where names would be changed.
- The backwards compatibility of checking for
cosmological_simulation
would probably require additional field validation (or instead, fields that are added specifically by the cosmology context).- Changing
TimeSeriesData
to a new name may need to be gradually introduced, retaining backwards compatibility for a while.- Fixing Enzo’s
current_time
will cause challenges for anyone who is not using internal time conversion factors. I think this number is likely small.
Alternatives¶
We could continue with the status quo.
YTEP-0018: Changing dict-like access to Static Output¶
Abstract¶
Created: September 18, 2013 Author: Matthew Turk
Currently, accessing a StaticOutput like a dictionary will check the parameters, units, time_units and conversion_factors dictionaries. This YTEP proposes changing it such that no dictionaries will be queried.
Status¶
Proposed
Project Management Links¶
There are no easily-identified project management links. However, it should be noted that over the years, numerous times confusion has arisen as to what things like pf[“Time”] refer to.
Detailed Description¶
The conflation of parameters, conversion factors, units and so on causes an enormous amount of confusion. The most common uses of this are:
- Length conversions such as: 1.0/pf[‘cm’]
- Accessing parameters
- Occasional unit conversions (typically this causes more problems than it solves)
However, the degeneracy that often arises between unit conversions and
parameter access is typically quite problematic. This proposes that we
simplify the entire procedure to disable all dict-like access, and ensure that
individuals access .parameters
explicitly. This may be unintuitive and
will cause large changes to user-facing code, so we may consider re-enabling
it.
The difficulty in ensuring that conversions can be conducted in a separate manner arises from the variable conversion factors even within a given frontend; particularly for cosmology simulations, these conversion factors (length, time, etc) change over time.
To implement this, we will ensure that:
- All places that require a length unit accept a tuple. This is nearly if not completely implemented.
- For a specified time (until 3.1 is released), dict-like access to the StaticOutput object will raise a deprecation warning if the key is not found in parameters. This can be elevated to an exception upon request by the user.
- Ensure the test-suite passes.
- Update all documentation and examples.
By stepping into this gradually, we will follow the example set forth by the field refactor and enable individuals to see that the behavior is changing without mandating an immediate switch.
Backwards Compatibility¶
In 3.0, this will not break scripts; deprecation warnings will be issued. In 3.1, this will break a considerable number of scripts that rely on unit conversions mediated by the StaticOutput object. This is very worrisome and will require the graduated change to disabling dict-like access.
Alternatives¶
We can continue allowing this behavior, but it will continue to cause confusion and impede progress toward a cleaner API.
YTEP-0019: Reduce items in main import¶
Abstract¶
Created: October 2, 2013 Author: Matthew Turk
Currently, yt.mods
includes a huge number of items, polluting the namespace
considerably. Many of these are not necessary, as they are seldom used.
Status¶
Proposed
Project Management Links¶
Detailed Description¶
Currently, the number of items in the yt.mods
namespace is enourmous.
There are 276 items (including module builtins.) By providing a better
set of namespaces, we can make all of these items accessible without polluting
the namespace itself.
As an example, we should expose the load
command primarily, and encourage
directly importing frontend-specific code if that code needs to be interacted
with.
This YTEP concerns two changes in functionality.
- Reduce the number of items in
yt.mods
.- Make
yt.mods
a superset of functionality ofyt/__init__.py
. This meansstartup_tasks.py
(which includes argument parsing, parallel initialization, configuration system reading, and so on.)
This will change some aspects of behavior, as it will make operations that
currently require the ytcfg
variable to be modified before any other
startup_tasks
code is executed no longer possible. This number is
extremely small, and the primary one is loglevel setting, which is easily
exposed in mylog.setLevel
.
Primarily, we want to make yt
exist better as a module as well as an
environment.
Current Imports¶
These are the items currently imported, and their proposed status.
We can implement this either through not importing into the main namespace
items we do not want to include, or by explicitly enumerating the items to
include in an __all__
attribute.
A few comments on specific items:
TransferFunction
classes have been removed and we should investigate other ways of exposing them. One option would be importing the moduletransfer_functions
and accessing attributes.HaloFinder
has been removed, so a new method for setting up this process easily and transparently (perhaps with amods) is needed.add_grad
should probably have a new name.
Object | Include |
---|---|
ARTFieldInfo |
No |
ARTIOFieldInfo |
No |
ARTIOStaticOutput |
No |
ARTStaticOutput |
No |
AnalysisTask |
No |
ArrowCallback |
No |
AthenaFieldInfo |
No |
AthenaStaticOutput |
No |
BinnedProfile1D |
Yes |
BinnedProfile2D |
Yes |
BinnedProfile3D |
Yes |
Camera |
No |
CastroFieldInfo |
No |
CastroStaticOutput |
No |
ChomboFieldInfo |
No |
ChomboStaticOutput |
No |
ClumpContourCallback |
No |
ColorTransferFunction |
No |
ContourCallback |
No |
CoordAxesCallback |
No |
CuttingQuiverCallback |
No |
DummyProgressBar |
No |
EnzoFieldInfo |
No |
EnzoSimulation |
No |
EnzoStaticOutput |
No |
EnzoStaticOutputInMemory |
No |
EnzoTestOutputFileNonExistent |
No |
FLASHFieldInfo |
No |
FLASHStaticOutput |
No |
FieldInfo |
No |
FixedResolutionBuffer |
Yes |
FlashRayDataCallback |
No |
GDFFieldInfo |
No |
GDFStaticOutput |
No |
GUIProgressBar |
No |
GadgetFieldInfo |
No |
GadgetHDF5StaticOutput |
No |
GadgetStaticOutput |
No |
GridBoundaryCallback |
No |
HaloFinder |
No |
HomogenizedVolume |
No |
HopCircleCallback |
No |
HopParticleCallback |
No |
ImageArray |
Yes |
ImageLineCallback |
No |
InvalidSimulationTimeSeries |
No |
LabelCallback |
No |
LinePlotCallback |
No |
LooseVersion |
No |
MagFieldCallback |
No |
MarkerAnnotateCallback |
No |
MaterialBoundaryCallback |
No |
MissingParameter |
No |
MoabFieldInfo |
No |
MoabHex8StaticOutput |
No |
MosaicFisheyeCamera |
No |
NoCUDAException |
No |
NoStoppingCondition |
No |
NyxFieldInfo |
No |
NyxStaticOutput |
No |
OWLSFieldInfo |
No |
OWLSStaticOutput |
No |
ObliqueFixedResolutionBuffer |
Yes |
OffAxisProjectionPlot |
Yes |
OffAxisSlicePlot |
Yes |
OrionFieldInfo |
No |
OrionStaticOutput |
No |
ParallelProgressBar |
No |
ParticleCallback |
No |
ParticleTrajectoryCollection |
Yes |
PlanckTransferFunction |
No |
PlotCallback |
No |
PlotCollection |
Yes |
PlotCollectionInteractive |
Yes |
PlutoFieldInfo |
No |
PlutoStaticOutput |
No |
PointAnnotateCallback |
No |
ProjectionPlot |
Yes |
ProjectionTransferFunction |
No |
PyneMoabHex8StaticOutput |
No |
QuiverCallback |
No |
RAMSESFieldInfo |
No |
RAMSESStaticOutput |
No |
SlicePlot |
Yes |
SphereCallback |
No |
StreamFieldInfo |
No |
StreamHandler |
No |
StreamStaticOutput |
No |
StreamlineCallback |
No |
TextLabelCallback |
No |
TigerFieldInfo |
No |
TigerStaticOutput |
No |
TimeSeriesData |
Yes |
TimestampCallback |
No |
TipsyFieldInfo |
No |
TipsyStaticOutput |
No |
TitleCallback |
No |
UnitBoundaryCallback |
No |
ValidateDataField |
Yes |
ValidateGridType |
Yes |
ValidateParameter |
Yes |
ValidateProperty |
Yes |
ValidateSpatial |
Yes |
VelocityCallback |
Yes |
YTAxesNotOrthogonalError |
No |
YTCannotParseFieldDisplayName |
No |
YTCannotParseUnitDisplayName |
No |
YTCloudError |
No |
YTCoordinateNotImplemented |
No |
YTCouldNotGenerateField |
No |
YTDataSelectorNotImplemented |
No |
YTDomainOverflow |
No |
YTEllipsoidOrdering |
No |
YTEmptyClass |
No |
YTException |
No |
YTFieldNotFound |
No |
YTFieldNotParseable |
No |
YTFieldTypeNotFound |
No |
YTGeometryNotSupported |
No |
YTHubRegisterError |
No |
YTIllDefinedBounds |
No |
YTIllDefinedFilter |
No |
YTInvalidWidthError |
No |
YTNoAPIKey |
No |
YTNoDataInObjectError |
No |
YTNoFilenamesMatchPattern |
No |
YTNoOldAnswer |
No |
YTNotDeclaredInsideNotebook |
No |
YTNotInsideNotebook |
No |
YTObjectNotImplemented |
No |
YTOutputNotIdentified |
No |
YTParticleDepositionNotImplemented |
No |
YTRockstarMultiMassNotSupported |
No |
YTSimulationNotIdentified |
No |
YTSphereTooSmall |
No |
YTTooManyVertices |
No |
YTUnitNotRecognized |
No |
__builtins__ |
No |
__doc__ |
No |
__file__ |
No |
__level |
No |
__name__ |
No |
__package__ |
No |
__startup_tasks |
No |
_fn |
No |
absolute_import |
No |
add_art_field |
No |
add_artio_field |
No |
add_athena_field |
No |
add_castro_field |
No |
add_chombo_field |
No |
add_enzo_1d_field |
No |
add_enzo_2d_field |
No |
add_enzo_field |
No |
add_field |
Yes |
add_flash_field |
No |
add_gadget_field |
No |
add_gdf_field |
No |
add_grad |
Yes |
add_moab_field |
No |
add_nyx_field |
No |
add_orion_field |
No |
add_owls_field |
No |
add_pluto_field |
No |
add_quantity |
No |
add_ramses_field |
No |
add_stream_field |
No |
add_tiger_field |
No |
add_tipsy_field |
No |
amods |
Yes |
analysis_task |
No |
annotate_image |
Yes |
apply_colormap |
Yes |
available_analysis_modules |
Yes |
axis_names |
No |
bb_apicall |
No |
cPickle |
No |
callback_registry |
No |
ceil |
No |
cls |
No |
contextlib |
No |
data_object_registry |
No |
defaultdict |
No |
deprecate |
No |
derived_field |
Yes |
ensure_dir_exists |
No |
ensure_list |
No |
ensure_numpy_array |
No |
ensure_tuple |
No |
fix_axis |
No |
fix_length |
No |
floor |
No |
get_available_modules |
No |
get_hg_version |
No |
get_image_suffix |
No |
get_ipython_api_version |
No |
get_memory_usage |
Yes |
get_multi_plot |
Yes |
get_num_threads |
No |
get_pbar |
Yes |
get_script_contents |
No |
get_version_stack |
Yes |
get_yt_supp |
Yes |
get_yt_version |
Yes |
glob |
No |
humanize_time |
No |
imgur_upload |
No |
insert_ipython |
Yes |
inspect |
No |
inv_axis_names |
No |
is_root |
Yes |
iterable |
Yes |
just_one |
No |
load |
Yes |
load_amr_grids |
Yes |
load_hexahedral_mesh |
Yes |
load_particles |
Yes |
load_uniform_grid |
Yes |
my_plugin_name |
No |
mylog |
Yes |
na |
No |
name |
No |
np |
Yes |
numpy |
No |
off_axis_projection |
Yes |
only_on_root |
Yes |
ortho_find |
Yes |
os |
No |
parallel_objects |
Yes |
parallel_profile |
Yes |
particle_filter |
Yes |
paste_traceback |
No |
paste_traceback_detailed |
No |
pb |
No |
pdb |
No |
pdb_run |
No |
periodic_position |
Yes |
physical_constants |
Yes |
print_tb |
Yes |
projload |
No |
quantity_info |
No |
quartiles |
Yes |
read_struct |
No |
resource |
No |
rootloginfo |
No |
rootonly |
Yes |
rpdb |
No |
scale_image |
Yes |
show_colormaps |
Yes |
signal |
No |
signal_ipython |
No |
signal_print_traceback |
No |
signal_problem |
No |
simulation |
Yes |
struct |
No |
subprocess |
No |
sys |
No |
time |
No |
time_execution |
No |
time_function |
No |
traceback |
No |
traceback_writer_hook |
No |
types |
No |
unparsed_args |
Yes |
update_hg |
No |
warnings |
No |
wraps |
No |
write_bitmap |
Yes |
write_fits |
Yes |
write_image |
Yes |
write_projection |
Yes |
x_dict |
No |
y_dict |
No |
yt_counters |
No |
ytcfg |
Yes |
ytcfgDefaults |
No |
Changing yt/__init__.py
to Import¶
The second aspect of this YTEP is to change the yt
module to include
everything that is in yt.mods
, but without the side effects that come from
yt.startup_tasks
. Because importing submodules necessarily will then
import __init__.py
, this means submodules cannot be imported without the
whole of yt
that is exposed in yt.__init__.py
being imported.
This primarily will affect configuration options, which are largely no longer
necessary to modify directly at runtime. Additionally, the old behavior can
still be preserved by yt.mods
.
Backwards Compatibility¶
This may break compatibility, although nearly all of the items removed are items that are not typically used in scripts. This list can be modified.
Note that importing frontends into a namespace will still enable them to be
used in load
.
Importing yt.mods
will still act as before, with option parsing and the
like. Importing yt.config
will result in the config file being parsed
once; this means runtime options will need to be modified differently.
Alternatives¶
We could identify additional means of reducing the namespace pollution, but this is the main one that I see.
We could also not put anything into yt/__init__.py
.
YTEP-0020: Removing PlotCollection¶
Abstract¶
Created: March 18, 2014 Author: Matthew Turk
Status¶
Completed
Project Management Links¶
Detailed Description¶
The PlotCollection
object was designed to work with the package HippoDraw,
as a means of controlling multiple plots from a single command. This was also
focused very strongly on the idea of viewing a single object from multiple
angles, and was mostly useful for my research. All other uses largely used it
as the only mechanism of creating plots – not because of any particular
functionality it has.
With yt 3.0, I propose that we remove the PlotCollection
entirely, as its
functionality is 100% replaced by the various other objects such as
SlicePlot
, ProjectionPlot
, ProfilePlot
and PhasePlot
, all of
which are more modern and provide greater access to the underlying matplotlib
objects.
This change will occur in yt 3.0. Nearly all users have migrated to using
SlicePlot
and so on, and we are seeing much greater uptake of
ProfilePlot
and PhasePlot
. Because we also anticipate growing our
community with this release, and because it will be the time when we can break
backwards compatibility, this is the most natural time to remove it.
Backwards Compatibility¶
Existing scripts that utilize PlotCollection
will break, but there will be
other yt-3.0 changes that they may suffer from anyway.
Alternatives¶
I do not think we have the resources to pursue alternate options such as
supporting PlotCollection
in perpetuity.
YTEP-0021: Particle-Only Plots¶
Abstract¶
Created: August 29, 2014 Author: Andrew Myers
This YTEP describes a mechanism for creating scatter plots of particle fields in yt. It was prompted by a question posted to the yt-users list by Jeremy Ritter, linked below.
Essentially, it proposes creating a user-facing function called ParticlePlot
(analagous to SlicePlot
or ProfilePlot
) that facilitates plotting arbitrary
particle fields against one another.
Status¶
Completed
Project Management Links¶
Detailed Description¶
Currently, to make plots like those in the linked notebooks, you would have to grab the particle data from the data source and feed them to something like pyplot.plot().
Instead of units, labels, and log_scales getting grabbed from the FieldInfoContainer
, they would need to be set up manually. Furthermore, the standardized interface for modifying
yt plots that exists in PlotWindow
would not be available.
Instead, we could create a ParticlePlot
class that would act like the currently existing yt plotting classes. The constructor would take:
data_source
: anAMR3DData
objectx_fields
: str or list, the field(s) to put on the x-axisy_fields
: str or list, same but for the y-axiscolor
: either a color string, or another particle field to be mapped to a color scale
If x_fields and y_fields are strings, this would add a single scatter plot to the ParticlePlot
. If they are lists of field_names, then a series of plots will be added, in the
style of (for example) PhasePlot
. The standard methods for modifying these plots (e.g. set_log
, set_units
, set_cmap
, etc. ) should all work as expected, and they
should be able to be saved / sent to the notebook as normal.
Implementation Details¶
My current implementation wraps pyplot.plot(), but because pyplot.plot can be slow when the number of points is large, ParticlePlot
should instead use Sam Skillman’s particle
splatting code to create something like a FixedResolutionBuffer
. This could then be displayed with pyplot.imshow(). This would also make it easy for users to access the raw image
and pass it to another plotting routine, if they prefer.
Inheritance Structure¶
ParticlePlot
shares a lot of its functionality with PhasePlot
, so the implementation should be similar. In particular, ParticlePlot
should inherit from ImagePlotContainer
and the individual plots in it should be ParticlePlotMPL
objects that inherit from ImagePlotMPL
.
Open issues¶
Plots that have spatial variables on both axes are logically different from those that don’t in a few ways. For instance, it could be misleading if the aspect ratios are
different for two spatial axes, as in the second linked notebook. Also, things like “pan” and “zoom” make sense for spatial data, but not when plotting, say, velocity
versus position. Should we handle spatial plots differently, as Nathan suggested in the yt-dev discussion above? Spatial plots could in inherit from PlotWindow
to
take advantage of all the methods and callbacks in there.
Backwards Compatibility¶
None; all existing code should work exactly as before.
Alternatives¶
Alternatively, there could be no mechanism for making particle scatter plots inside of yt, and users could call pyplot.plot() or whatever directly.
YTEP-0022: Benchmarks¶
Abstract¶
Created: January 19, 2015 Author: Matthew Turk
This document proposes a mechanism for tracking performance improvements and regressions, based on the airspeed velocity project for automated benchmarking.
Status¶
Proposed
Project Management Links¶
- Pull request: https://bitbucket.org/yt_analysis/yt/pull-request/1415/add-airspeed-velocity-config-file/
- ASV homepage: http://spacetelescope.github.io/asv/
- Intro talk: http://youtube.com/watch?v=OsxJ5O6h8s0
Detailed Description¶
Background¶
Particularly during the transition from yt 2 to yt 3, there has been a large degradation in overall performance. The underlying implementation of data selection has become considerably more general and the array operations have all been overloaded, both of which have performance impacts, both of which are large improvements in functionality and usability. However, in order to address the sad and unacceptable performance degradations, this YTEP has been proposed as the first step toward tracking performance and attempting to mitigate regressions now and in the future.
Proposed Actions¶
Currently, as described in YTEP-0007: Automatic Pull Requests’ validation, we conduct automatic validation of pull requests. A mechanism for examining and validating change in performance should also be implemented.
The “airspeed velocity” project has been designed to track the changes in performance as denoted by specific benchmarks over time in a given project. What we will do is implement a number of characteristic benchmarks, potentially relying on larger datasets than we do for simple testing, and track how changes to the code make these benchmarks run faster or slower. As part of the CI process, these results will be posted to a website (ASV has the ability to generate such websites, along with detailed drilldowns into specific routines, automatically.) Because of how asv works, these updates will likely need to be a part of the post-acceptance process, rather than pre-acceptance.
A sample website can be seen here: http://mdboom.github.io/astropy-benchmark/
There are three axes of benchmarks, along with the specific benchmark being run.
- Changeset hash being benchmarked
- Version of external dependencies
- Machine on which benchmarks are run
This YTEP proposes:
- Adding running the benchmarks to the CI suite
- Adding publishing of the benchmarking results to the CI suite (perhaps at speed.yt-project.org)
- Encouraging writing new benchmarks by project contributors
Once a benchmark has been written, it can be used forevermore, and even evaluated against past changeset hashes in the yt repository. Writing benchmarks is easy, too – easier even than tests.
Proposed Repository Layout¶
Keeping benchmarks next to the code is important to avoid fragmentation. However, the repository containing the benchmark results will balloon in size, as results and outputs will be committed with mildly reckless abandon. So this YTEP proposes two repositories:
- Existing yt repository, which will contain the asv configuration file and the benchmarks directory
- A
yt-benchmarks
repository, which will contain the results (committed, included in the repo, and probably gigantic) along with a bootstrap script that makes symlinks to the asv configuration file and the benchmarks directory.
This way, we can very easily run in this directory, commit, and auto-publish. It will mean that all the benchmarks are in the main repository, for ease of contribution and modification, but all the results are stored elsewhere. It also will make it easier for folks who want to run and store results on other machines to do so.
Backwards Compatibility¶
No backwards compatibility problems exist.
Alternatives¶
We could implement our own framework, but this one exists and is pretty nice.
YTEP-0023: yt Community Code of Conduct¶
Abstract¶
Created: July 11, 2015
Author: Britton Smith
This document contains the code of conduct for the yt community. It is a near exact copy of the Astropy Community Code of Conduct, except that we will employ our own confidential email address for community members to report violations.
Status¶
Completed
Project Management Links¶
Detailed Description¶
The code of conduct, whose language is below, will be displayed in the following places:
- Developer Documentation
- yt Project About page
- yt Project Community page
- yt Project Development page
Emails sent to the confidential address will be seen by Hilary Egan, Britton Smith, and John Zuhone.
yt Community Code of Conduct¶
The community of participants in open source Scientific projects is made up of members from around the globe with a diverse set of skills, personalities, and experiences. It is through these differences that our community experiences success and continued growth. We expect everyone in our community to follow these guidelines when interacting with others both inside and outside of our community. Our goal is to keep ours a positive, inclusive, successful, and growing community.
As members of the community,
- We pledge to treat all people with respect and provide a harassment- and bullying-free environment, regardless of sex, sexual orientation and/or gender identity, disability, physical appearance, body size, race, nationality, ethnicity, and religion. In particular, sexual language and imagery, sexist, racist, or otherwise exclusionary jokes are not appropriate.
- We pledge to respect the work of others by recognizing acknowledgment/citation requests of original authors. As authors, we pledge to be explicit about how we want our own work to be cited or acknowledged.
- We pledge to welcome those interested in joining the community, and realize that including people with a variety of opinions and backgrounds will only serve to enrich our community. In particular, discussions relating to pros/cons of various technologies, programming languages, and so on are welcome, but these should be done with respect, taking proactive measure to ensure that all participants are heard and feel confident that they can freely express their opinions.
- We pledge to welcome questions and answer them respectfully, paying particular attention to those new to the community. We pledge to provide respectful criticisms and feedback in forums, especially in discussion threads resulting from code contributions.
- We pledge to be conscientious of the perceptions of the wider community and to respond to criticism respectfully. We will strive to model behaviors that encourage productive debate and disagreement, both within our community and where we are criticized. We will treat those outside our community with the same respect as people within our community.
- We pledge to help the entire community follow the code of conduct, and to not remain silent when we see violations of the code of conduct. We will take action when members of our community violate this code such as contacting confidential@yt-project.org (all emails sent to this address will be treated with the strictest confidence) or talking privately with the person.
This code of conduct applies to all community situations online and offline, including mailing lists, forums, social media, conferences, meetings, associated social events, and one-to-one interactions.
The yt Community Code of Conduct was adapted from the Astropy Community Code of Conduct, which was partially inspired by the PSF code of conduct.
Alternatives¶
None.
YTEP-0024: Alternative Smoothing Kernels¶
Abstract¶
Created: August 1, 2015 Author: Bili Dong
This YTEP proposes to add alternative smoothing kernels besides the current standard cubic spline one, make them available to the smoothing operations, and define a convenient interface for users to choose among them.
Status¶
Completed
Project Management Links¶
Detailed Description¶
Currently in yt, the standard cubic spline kernel is exclusively used for smoothing operations. It is a desired feature for yt to have support for varied smoothing kernels.
The implementations of the kernel functions themselves are straightforward. [DA2012] is referenced for the function forms and the kernel names.
The bigger challenge is the design of a convenient user interface for future users and a convenient application programming interface for future developers. Details of those designs are explained in the following sections.
User Interface¶
Future users can specify which kernel to use in the smoothing operations by passing a keyword argument kernel_name
to the relevant functions. Currently, functions with potential access to kernels include Dataset.add_deposited_particle_field
and Dataset.add_smoothed_particle_field
[1]. The naming scheme for fields added through these functions is an extension of the scheme described in SPH Fields, which is proposed by Nathan Goldbaum in this yt-dev thread. So a particle field ("particletype", "fieldname")
smoothed by a certain kernel can be accessed by ("deposit", "particletype_kernelname_smoothed_fieldname")
, except for the cubic spline kernel, whose smoothed field remains to be ("deposit", "particletype_smoothed_fieldname")
(the same as before).
For example, as demonstrated by the following code, the particle field ("PartType0", "Density")
is smoothed using a quintic kernel and the resultant smoothed field could be accessed by ("deposit", "PartType0_quintic_smoothed_Density")
. [2]
import yt
ds = yt.load("GadgetDiskGalaxy/snapshot_200.hdf5")
ds.add_smoothed_particle_field(("PartType0", "Density"), kernel_name="quintic")
yt.ProjectionPlot(ds, "z", ("deposit", "PartType0_quintic_smoothed_Density"))
Below is a table of available kernel_name
and the corresponding name in [DA2012]. All kernels are in 3D (a.k.a. ).
kernel_name |
Name in [DA2012] |
cubic | Cubic spline |
quartic | Quartic spline |
quintic | Quintic spline |
wendland2 | Wendland C2 |
wendland4 | Wendland C4 |
wendland6 | Wendland C6 |
[1] | Dataset.add_smoothed_particle_field is a wrapper of add_volume_weighted_smoothed_field . It is more convenient to use. So, for simplicity, add_volume_weighted_smoothed_field will be omitted in the following discussions. |
[2] | The dataset can be downloaded from here. |
Application Programming Interface¶
When a kernel function is needed, get_kernel_func
is used to retrieve it. Given the string of the kernel name, a kernel function of the type kernel_func
is returned. Both get_kernel_func
and kernel_func
are defined in geometry/particle_deposit.pxd
. The following snippet demonstrate their usage, assuming the source file is in the same directory as particle_deposit.pxd
.
from .particle_deposit cimport kernel_func, get_kernel_func
cdef class DemoParticleSmoothOperation:
cdef kernel_func sph_kernel
def __init__(self, kernel_name):
self.sph_kernel = get_kernel_func(kernel_name)
Once the kernel function is retrieved, self.sph_kernel
could be utilized to do the smoothing.
The rest of the changes to the API is merely the passing of the keyword argument kernel_name
. Below is a table demonstrating the potential passing routes:
# | Method | Pass to [3] | File Path |
1 | SimpleSmooth (aliased as deposit_simple_smooth ) [4] |
yt/geometry/particle_deposit.pyx |
|
2 | VolumeWeightedSmooth (aliased as volume_weighted_smooth ) [4] |
yt/geometry/particle_smooth.pyx |
|
3 | SmoothedDensityEstimate (aliased as density_smooth ) [4] |
yt/geometry/particle_smooth.pyx |
|
4 | ARTIORootMeshSubset.deposit |
1 | yt/frontends/artio/data_structures.py |
5 | YTCoveringGridBase.deposit |
1 | yt/data_objects/construction_data_containers.py |
6 | AMRGridPatch.deposit |
1 | yt/data_objects/grid_patch.py |
7 | UnstructuredMesh.deposit |
1 | yt/data_objects/unstructured_mesh.py |
8 | OctreeSubset.deposit |
1 | yt/data_objects/octree_subset.py |
9 | OctreeSubset.smooth |
2, 3 | yt/data_objects/octree_subset.py |
10 | OctreeSubset.particle_operation |
2, 3 | yt/data_objects/octree_subset.py |
11 | Dataset.add_deposited_particle_field |
4 - 8 | yt/data_objects/static_output.py |
12 | Dataset.add_smoothed_particle_field |
9 | yt/data_objects/static_output.py |
[3] | This column indicates the possibility that kernel_name could be passed to ‘Pass to’, which also depends on another parameter method . |
[4] | (1, 2, 3) When a class is given, its __init__ method is meant. |
To demonstrate how 4 - 10 utilize 1 - 3, the main structure of the smooth
method is shown below (irrelevant parts are ignored; deposit
and particle_operation
are similar).
def smooth(self, method = None, kernel_name = "cubic", ...):
cls = getattr(particle_smooth, "%s_smooth" % method, None)
op = cls(..., kernel_name)
op
is used for the actual smoothing operations thereafter.
For 11 & 12, they simply call the dataset’s deposit
or smooth
method to get the smoothing operations done.
Reference¶
[DA2012] | (1, 2, 3) Dehnen W., Aly H., 2012, MNRAS, 425, 1068 |
Backwards Compatibility¶
New functionality is accessed by the keyword argument kernel_name
with default value kernel_name = "cubic"
, so existing codes’ behavior won’t change.
Alternatives¶
None.
YTEP-0025: The ytdata Frontend¶
Abstract¶
Created: August 31, 2015 Author: Britton Smith
This YTEP proposes to make data products created by yt into loadable datasets. Primarily, this will provide the following features:
- exporting geometric data containers to datasets that can be reloaded for further geometric selection and analysis.
- exporting plot-type data (projections, slices, profiles) so that they can be moved, reloaded, and manipulated to make new images.
Status¶
Completed
Project Management Links¶
- yt PR #1718: the accepted pull request containing the full implementation
Detailed Description¶
Currently, yt’s main data products (data containers, projections, slices, profiles) can only be used with their full functionality with the original dataset loaded. This is cumbersome when the datasets are so large that they can only be hosted at remote facilities. Creating publication-quality images from such data either requires a cycle of tweaking, transferring, viewing, and cursing or creating custom intermediate data products and plotting codes.
This YTEP proposes to create functionality that will allow for the above data products to be exported to a format that can be reloaded as a full-fledged dataset.
The proposed functionality consists of two main components: functionality to save objects to disk and a frontend responsible for reloading the saved objects.
Exporting¶
A general function for saving array data associated with an open dataset
will be responsible for writing data to disk. Data will be written to a
single hdf5 file. Metadata associated with the dataset (i.e., current_time,
current_redshift, cosmological parameters, domain dimensions) will be saved as
attributes of the root file group. By default, data will be saved to a “grid”
group with “units” attributes saved for each dataset. This function is
implemented as save_as_dataset
in yt/frontends/ytdata/utilities.py
and imported in the main yt import.
The above function will be called by the user-facing functions,
YTDataContainer.save_as_dataset
, ProfileND.save_as_dataset
, and
FixedResolutionBuffer.save_as_dataset
, which will optionally take a
filename and a list of fields. If no field list is given, then the fields
that have been queried and cached will be saved. This function will also
make sure that fields necessary for geometric selection (grid position/cell
size, particle position) are also saved. Mesh data will be saved to the
“grid” group and particle data will be saved to groups named after the
specific particle type.
ytdata Frontend¶
This frontend will be responsible for reloading all data saved with the above
method. As this data is of multiple types, this will actually be multiple
frontends living under the general “ytdata” heading. All dataset types will
inherit from the YTDataset
class. See ytdata Dataset Types for a
description of each class. For each loaded dataset, ds
, in the ytdata
frontend, ds.data
will provide direct access to the field data.
Geometrical Data Containers¶
Fields queried from data containers are returned as unordered, one-dimension
arrays and, thus, most closely resemble particle datasets. All geometric data
containers are reloaded as type YTDataContainerDataset
, which is a particle
dataset type. Mesh data is stored with the corresponding dx
, dy
, and
dz
fields such that derived fields like cell_volume
can be created. All
mesh data is aliased to the “gas” field type. The data
attribute associated
with the loaded dataset will be a data container configured identically to the
original data container. In the case of ray data containers, this is not
possible as a ray is defined by cells it intersects and not cells/particles
enclosed within. In this case, data
will be an instance of
ds.all_data()
. Field access through conventional data containers is
also possible.
ds = yt.load("enzo_tiny_cosmology/DD0046/DD0046")
sphere = ds.sphere([0.5]*3, (10, "Mpc"))
sphere.save_as_dataset(fields=["density", "particle_mass"])
sds = yt.load("DD0046_sphere.h5")
# sphere with the same center and radius
print (sds.data)
print (sds.data["grid", "density"])
print (sds.data["gas", "density"])
print (sds.data["all", "particle_mass"])
print (sds.data["all", "particle_position_x"])
# create a data container
ad = sds.all_data()
print (ad["grid", "density"])
print (ad["all", "particle_mass"])
Grid Data Containers¶
Covering grids, smoothed covering grids, and arbitrary grids return 3D arrays
and so can be treated as uniform grid datasets. After being saved with
save_as_dataset
, these are reloaded as type YTGridDataset
, which is a uniform
grid that also supports particles. FixedResolutionBuffer
objects saved
with save_as_dataset
will be reloaded as this type as well, only 2D.
In this case, ds.data
will give access to the multi-dimensional field arrays.
ds = yt.load("enzo_tiny_cosmology/DD0046/DD0046")
cg = ds.covering_grid(level=0, left_edge=[0.25]*3, dims=[16]*3)
cg.save_as_dataset("cg.h5", ["density", "particle_mass"])
cg_ds = yt.load("cg.h5")
# this has the dimensions of the original covering grid
print (cg_ds.data["gas", "density"]).shape
# access via geometric selection
ad = cg_ds.all_data()
print (ad["gas", "density"])
print (ad["all", "particle_mass"])
ray = cg_ds.ray(cg_ds.domain_left_edge, cg_ds.domain_right_edge)
print (ray["gas", "density"])
# FRBs
proj = ds.proj("density", "x", weight_field="density")
frb = proj.to_frb(1.0, (800, 800))
frb.save_as_dataset(fields=["density"])
fds = yt.load("DD0046_proj_frb.h5")
print (fds.data["density"])
Projections and Slices¶
Projections and slices are like two-dimensional particle datasets where the x and
y fields are “px” and “py”. They are reloaded as type YTProjectionDataset
,
which is a subclass of YTDataContainerDataset
. Reloaded projection or slice
data can be selected geometrically or fed into a ProjectionPlot
or
SlicePlot
. In these cases, ds.data
is an instance of ds.all_data()
.
ds = yt.load("enzo_tiny_cosmology/DD0046/DD0046")
proj = ds.proj("density", "x", weight_field="density")
proj.save_as_dataset("proj.h5")
gds = yt.load("proj.h5")
print (gds.data["gas", "density"])
p = yt.ProjectionPlot(gds, "x", "density", weight_field="density")
p.save()
The above would enable someone to make projections or slices of large datasets
remotely, then download the exported dataset, and perfect the final image on a
local machine. On and off-axis slices are implemented. Off-axis projections
are not implemented at this time as they use totally different machinery. In
this case, the best strategy would be to create an FRB and call
save_as_dataset
on that.
General Array Data¶
Array data written with the base save_as_dataset
function can be reloaded
as a non-spatial dataset. Geometric selection is not possible, but the data
can be accessed through the YTNonspatialGrid
object, ds.data
. This object
will only grab data from the hdf5 file and do further selection on it.
from yt.frontends.ytdata.api import save_as_dataset
ds = yt.load("enzo_tiny_cosmology/DD0046/DD0046")
region = ds.box([0.25]*3, [0.75]*3)
sphere = ds.sphere(ds.domain_center, (10, "Mpc"))
my_data = {}
my_data["region_density"] = region["density"]
my_data["sphere_density"] = sphere["density"]
save_as_dataset(ds, "test_data.h5", my_data)
ads = yt.load("test_data.h5")
print (ads.data["region_density"])
print (ads.data["sphere_density"])
Profiles¶
1, 2, and 3D profiles are like 1, 2, and 3D uniform grid datasets where dx, dy,
and dz are different and have different dimensions. YTProfileDataset
objects inherit from the YTNonspatialDataset
class. Similarly, the data
can be accessed from ds.data
. The x and y bins will be saved as 1D fields
and fields named after the x and y bin field names will be saved with the same
shape as the actual profile data. This will allow for easy array slicing of the
profile based on the bin fields.
ds = yt.load("enzo_tiny_cosmology/DD0046/DD0046")
profile = yt.create_profile(ds.all_data(), ["density", "temperature"],
"cell_mass", weight_field=None)
profile.save_as_dataset()
pds = yt.load("DD0046_profile.h5")
# print the profile data
print pds.data["cell_mass"]
# print the x and y bins
print pds.data["x"], pds.data["y"]
# bin data shaped like the profile
print pds.data["density"]
print pds.data["temperature"]
ytdata Dataset Types¶
Name | Inheritance | Purpose | Dataset Type | Geometric Selection |
---|---|---|---|---|
YTDataset |
Dataset |
common functionality for other dataset types | n/a | n/a |
YTDataContainerDataset |
YTDataset |
geometric data containers (sphere, region, ray, disk) | particle | yes |
YTSpatialPlotDataset |
YTDataContainerDataset |
projections, slices, cutting planes | particle | yes |
YTGridDataset |
YTDataset |
covering grids, arbitrary grids, fixed resolution buffers | grid w/particles | yes |
YTNonspatialDataset |
YTGridDataset |
general array data | grid | no |
YTProfileDataset |
YTNonspatialDataset |
1, 2, and 3D profiles | grid | no |
Backwards Compatibility¶
Currently, the only API breakage is in the AbsorptionSpectrum
.
Previously, it accepted a generic hdf5 file created by the LightRay
.
As per the open PR,
the LightRay
now writes out a yt.loadable dataset that is loaded by the
AbsorptionSpectrum
.
Other than the above, this is all new functionality and so has no backward
incompatibility. One general change made to the yt codebase is that places
that refer to index
fields (x
, y
, z
, dx
, etc.) now refer
to (<fluid_type>, "dx")
instead of ("index", "dx")
. This is to allow
fields like cell_volume
to be created from the ("grid", "dx")
field
that, for the ytdata frontend, lives on disk instead of the version being
generated by the geometry handler. For actual grid datasets, we simply
create an alias from (<fluid_type>, "dx")
to ("index", "dx")
upon
loading. This should be completely transparent to the user.
Alternatives¶
We could create custom binary files for every type of plot and data container. We could also revive the concept of saving pickled objects that was used somewhat in yt-2.
YTEP-0026: NumPy-like Operations¶
Abstract¶
Created: September 21, 2015
Author: Matthew Turk
This YTEP describes implementing some NumPy-like and potentially some Pandas-like operations on data container objects.
Status¶
This YTEP is proposed, but proof-of-concept code has been developed and issued in a PR: https://bitbucket.org/yt_analysis/yt/pull-requests/1763
Once the YTEP PR has been accepted, documentation will be added to the PR to the codebase.
Project Management Links¶
Any external links to:
- PR with first work-in-progress: https://bitbucket.org/yt_analysis/yt/pull-requests/1763
Detailed Description¶
Background¶
Data objects in yt are lazy-loaded; only when data is accessed is it read from disk. However, the way they behave is similar to “data frames” or numpy named dtypes – they act as though they are dicts-of-arrays, with some operations being defined that operate in parallel-aware ways.
However, this is something of a leaky abstraction; in order to compute relatively simple operations, the .quantities object has to be accessed, the correct “quantity” to use determined, and then called.
But, many of these quantities map relatively simply to NumPy operations.
This YTEP doesn’t (yet) address adding other, Pandas-like operations (such as select or group) even though they also map to yt operations; that may come in the future.
What Can Be Done¶
I think we should map numpy array operations to quantities and other things!
And while we’re at it, let’s add on very simple “plot” operations.
Furthermore, to make the connection more explicit, slices will be implemented
as well to generate data objects and selections. The dataset object will have
a .r attribute, aliased to the much more descriptive .region_expression
,
which enables directly slicing it, which will either return a region, a slice,
or an arbitrary_grid
, depending on how many dimensions are used and if an
imaginary step is supplied (like np.mgrid
). This will accept a unitful
slice.
Implementation¶
This will be implemented very simply as a set of aliases that look at the input arguments and then generate results from them.
NumPy arrays have several operations that return scalars, which is what we want to map to within these operations:
all
any
argmax
argmin
max
mean
min
prod
ptp
std
sum
var
For the purposes of this YTEP, we will concern ourselves with argmax
,
argmin
, max
, mean
, min
, std
, ptp
, sum
, and also the
non-NumPy operations hist
and integrate
, which normally do not return a
single scalar but a set that does not correspond to the number of elements in
the array.
We break these up based on the axis argument, and other optional arguments.
Below is the enumerated behavior. Note that for those items that can be
computed in a single pass (i.e., statistical information about the fields as a
whole) we will likely implement a system that computes them in a single pass
and caches them, so that min
and max
and std
will cache in-between
calls and only require a single pass over the array.
argmax
The mandatory argument is the field over which the maximum is to be computed;
the default return argument is the index, but the axis
optional parameter
can specify one or more fields that will be returned. (For instance, one could
supply ('x', 'y', 'z')
and be handed back the spatial locations.
argmin
The mandatory argument is the field over which the minimum is to be computed;
the default return argument is the index, but the axis
optional parameter
can specify one or more fields that will be returned. (For instance, one could
supply ('x', 'y', 'z')
and be handed back the spatial locations.
max
The mandatory argument is the field of which the maximum is to be computed. This can be a list of fields.
This accepts the optional argument axis
. If axis
is a spatial axis (as
defined by coordinates.axis_names
and thus including 0, 1, 2, and the axis
names) it will generate a maximum intensity projection along that axis of the
specified field.
mean
The mandatory argument is the field to average. This will return either a projection if the axis is spatial, or a quantity result.
The optional axis
argument can either be the spatial axis along which the
weighted projection can be computed (defaults to weighted by ones
, which is
usually not desired for astro data, but may be for other data) or None
.
Non-spatial axes are not supported.
The optional weight
argument (which defaults to ones
) describes how to
weight this average. If axis
is None
and weight
is None
, it
will compute the sum; if axis
is None and weight
is not None
, it
will compute the weighted_average_quantity
.
min
The mandatory argument is the field of which the minimum is to be computed. This can be a list of fields.
Because we do not have “minimum intensity projections,” spatial axes are not supported.
std
The mandatory argument is the field of which the standard deviation is to be computed. This can be a list of fields.
The optional argument weight
will describe the weight for computing
standard deviation.
ptp
The mandatory argument is the field of which the peak-to-peak is computed.
sum
The mandatory argument is the field to sum.
The axis
argument, if spatial, will be the axis along which the projection
will be taken. This must either be None
or a spatial axis. The weighting
will be None
, and thus it will be the line integral. (Note that this will
not includes a dl
term, as it will be using the sum
method.)
integrate
The mandatory field
argument is the field to integrate; if axis
is
one of the coordinate axes, the return value will be a projection. This will
be using the standard projection method, which includes dl
.
If the axis
argument is not a spatial dimension, maybe it could return a
profile of some type? I’m not sure.
hist
This should return a profile. Determining the most natural way to map how we profile (i.e., the fields along the axes, and the weighting) is an open question. But, it seems to me that we want to do something like:
- Mandatory argument: field or fields to take the average of, or the sum of. If bins is not specified, the returned profile will compute the sum of this field in bins along the x axis; this is somewhat of a weird conditional, but seems to match the closest.
- Optional
weight
argument: the field to use as the weight; if not specified, this will just be a sum.- Optional
bins
argument: the x and optionally y field to use as bins
__getitem__
The slice operation on a shadow .r
quantity should return regions or
slices.
If one axis is fully-specific, it will be the slice along that axis. If all
three are left as start/stop tuples with no step, it will be a region. These
can be either float values or unitful objects or tuples of (val,
unit_name)
.
If a step is supplied, it will need to be supplied for all three dimensions,
will need to be imaginary (i.e., 64j
) and it will be interpreted as input
to an arbitrary_grid
object. The start/stop will provide the left and
right edges and the step will provide the number of dimensions.
plot
The plot
operation will only be implemented on things that have obvious
plotting candidates – slices, projections, profiles. This will default to
creating the necessary PlotWindow
or related class, and will try to choose
sane defaults for it. For instance, this could wrap to_pw
. In contrast to
to_pw
, this will also default to native plot coordinates, as we want this
to match more closely the behavior that would be done by simply plotting the
field.
Examples¶
At the present to get a projection plot of a data object, one would do:
obj = ds.sphere((100, 'cm'), 'c')
p = yt.ProjectionPlot(ds, 'x', 'density', data_source = obj)
p.show()
or:
obj = ds.sphere((100, 'cm'), 'c')
proj = ds.proj("x", "density", data_source=obj)
p = proj.to_pw()
p.show()
The alternate here would be:
obj = ds.sphere((100, 'cm'), 'c')
p = obj.sum("density", axis="x")
p.plot()
The histogram could be computed:
obj = ds.sphere((100, 'cm'), 'c')
p = obj.hist("density", bins="temperature", weight="cell_mass")
p.plot()
The slicing would look like:
ds = yt.load("galaxy0030")
my_obj = ds.r[(100,'kpc'):(200,'kpc'), :, (100,'kpc'):(200,'kpc')]
The way to construct this at present would be, which is a bit cumbersome (there are other ways to do this, too, but this is the one that is the clearest):
ds = yt.load("galaxy0030")
left_edge = ds.domain_left_edge.in_units("kpc").copy()
left_edge[0] = 100
left_edge[2] = 100
right_edge = ds.domain_right_edge.in_units("kpc").copy()
right_edge[0] = 200
right_edge[2] = 200
center = (left_edge + right_edge)/2.0
my_obj = ds.region(center, left_edge, right_edge)
Or for a slice:
ds = yt.load("galaxy0030")
my_obj = ds.r[(100,'kpc'):(200,'kpc'), (250,'kpc'), (100,'kpc'):(200,'kpc')]
my_obj.plot()
At present, we would have to:
ds = yt.load("galaxy0030")
left_edge = ds.domain_left_edge.in_units("kpc").copy()
left_edge[0] = 100
left_edge[2] = 100
right_edge = ds.domain_right_edge.in_units("kpc").copy()
right_edge[0] = 200
right_edge[2] = 200
center = (left_edge + right_edge)/2.0
reg = ds.region(center, left_edge, right_edge)
my_obj = ds.slice(1, (250,'kpc'))
my_obj.to_pw("density")
Another example is how to make very terse computations, which still demonstrate reasonably clearly what they do:
ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
dd = ds.r[:,:,:]
print dd.mean(["velocity_%s" % ax for ax in 'xyz'], weight="cell_mass")
This returns::
[37021.0582639 cm/s, 35794.630883 cm/s, 82204.2708063 cm/s]
Note that we can also do:
print ds.r[:,:,:].mean(["velocity_%s" % ax for ax in 'xyz'], weight="cell_mass")
With the step functionality, this is also possible:
g = ds.r[::128j,::128j,::128j]
g["density"]
which will be an arbitrary_grid
object with 128 cells in each dimension.
We may at some point want to add pandas-like selection and indexing functions (http://pandas.pydata.org/pandas-docs/stable/indexing.html ) but right now the use case is less clear. Maybe having select() be an alias for cut_region, or adding in a groupby method (maybe; not sure that’s useful unless it were by binning) would be interesting, but not immediately clear to me.
This work, if completed, will include an overhaul of the documentation to reflect this, as I think it is considerably terser and more expressive.
Backwards Compatibility¶
There are no backwards-compatible issues.
Alternatives¶
I do not know if there are alternatives to consider; in many ways, this will
open us up to more straightforward utilization of tools like xray
and
dask
.
YTEP-0027: Non-Spatial Data¶
Abstract¶
Created: December 1, 2015 Author: Matthew Turk, Nathan Goldbaum, John ZuHone
This YTEP outlines a plan to implement support for native non-spatial data representations in yt.
Status¶
In Progress
Project Management Links¶
- This pull request is the first attempt at implementing this: https://bitbucket.org/yt_analysis/yt/pull-requests/1891/wip-supporting-non-spatial-coordinate
- This Trello card discusses it a bit: https://trello.com/c/7d5PCUym/7-index-arrays as does this one: https://trello.com/c/MXF1sWam/6-non-spatial-data
Detailed Description¶
Background¶
Currently, most of yt assumes that its data structures (particularly for
purposes of selection and units) are related to spatial coordinates. This
leads to issues such as spherical and cylindrical coordinates believing their
angular coordinates are in code_length
, having to pretend that pressure
coordinates are code_length
, and so on.
An additional complication is that at present, index operations (particularly
in selection operations) cannot know in advance that their input arrays are in
“index space.” This leads to costly operations that check the units (which are
assumed to be code_length
) and converts if need be. It is often very
difficult to create a situation where the arrays are not in those units,
though.
Fortunately, there are very few places where the arrays used to index the dataset are utilized directly; for the most part, they are manually stripped of units and then re-applied with the correct units in classes such as the spherical coordinates handler.
This YTEP concerns itself with a few things:
- Allowing datasets to be loaded that are indexed in non-spatial dimensions (for instance, lat, lon, pressure)
- Developing unitful coordinate systems for these non-spatial datasets
- Implementing a custom coordinate handler
Why is this hard?¶
There are assumptions made in a number of places that data is spatial. Often this shows up in one of these ways:
- Calls to
ensure_code
or conversions explicitly tocode_length
.- Assumptions that a set of units can be represented as a form of length, for instance during integration.
- Inhomogeneous units in a single YTArray are not supported in the current development tip of yt. Some behavior can be mocked up using object arrays, but this is incredibly unreliable.
Implementation of Index Arrays¶
To address this issue, an implementation of an object explicitly for indexing
data has been created, currently called an IndexArray
. This object
subclasses from YTArray
, but differs in some crucial ways.
- Multiple units may be specified. These units must be of the same length as the final axis of the array.
- The units in an array are immutable. To change units, the array must be copied. Practically, this means that
convert_to_units
will raise an exception, but it brings with it the benefit that it is difficult to find oneself in a situation where something likedomain_left_edge
is not the native units of the indexing system.- Fancy-indexing is not possible; only slicing can be conducted.
These arrays are almost always assumed to be created internally within yt.
Some situations, such as specifying a “center” to an object, can accept
IndexArray
objects.
Implementation of Coordinates¶
For inhomogeneous units to be useful, there must be a mechanism for specifying
the units to a coordinate handler. The implementation of a
CustomCoordinateHandler
manages this task. This coordinate handler assumes
that the coordinate space is functionally Cartesian, but where the axes
correspond to non-spatial information. For instance, you might have the first
axis be mass, the second time, the third distance.
Warning
At present, distance metrics are assumed to be scaled identically amongst the three axes. This means that distance is computed in a Euclidean fashion!
To specify this, the CustomCoordinateHandler
accepts an axis unit
specification. This extends the existing axis ordering argument to include
axis units. From the perspective of the user, this would look like this::
ds = yt.load_uniform_grid(data, [30, 30, 30],
bbox=np.array([[0.0, 10.0], [0.0, 30.0], [0.4, 0.9]]),
geometry = ('custom', (('length', 'm'), ('mass', 'g'), ('time', 's')))
)
In this function call, note that the geometry
argument has been extended to
include both the axis ordering and the units that each takes. The first axis
is called length
with units of m
, the second is called mass
with
units of g
and the third is time
with units of s
.
Note that these could all be length units, but with different names – this would also be a custom coordinate system where the naming scheme can be modified.
All coordinate handlers now have an axes_unit
dict, which maps the axis
names to units.
Future developments may include allowing for specification of non-Euclidean distance functions.
Impact on Plotting¶
PlotWindow
as a whole is designed to be used for plotting spatial datasets.
Integrating non-spatial datasets presents us with two options:
- Modify
PlotWindow
such that it is generic with respect to units and aspect ratios and usable for non-spatial data.- Utilize something like
PhasePlot
orParticlePlot
for plotting image data from non-spatial datasets.
At present, extremely basic plotting functionality has been put into
PlotWindow
to deal with non-spatial datasets, but this has also caused some
minor impedance mismatches.
The current long-term strategy is to refactor the two plotting interfaces to
share a common base class (also likely with ParticlePlot
), and then have
these choose the appropriate subclass for plotting non-spatial data and “do the
right thing.”
Future: More than Three Dimensions¶
Utilizing IndexArray
is the first step toward enabling additional
dimensions of data access. However, this set of functionality alone is by far
insufficient. In order to enable access to greater dimensionality of data,
there must be concerted effort to eliminate assumptions of 3 dimensions and
generalize data structures. While this is now feasible, it is still quite the
undertaking.
Backwards Compatibility¶
The biggest potential source of problems with backwards compatibility arise
from the utilization of YTArray
objects where IndexArray
objects are
required. This is mostly likely to happen places like centers specified to
objects. However, in updating the tests, it seems that these are minimally
invasive and should have only very minor impact on user-facing scripts and
APIs.
Work is in progress to ensure that an IndexArray
with homogeneous units
behaves the same as a YTArray
with those same units. This should minimize
impact.
YTEP-0028: Alternative Unit Systems¶
Abstract¶
Created: December 8, 2015 Author: John ZuHone, Nathan Goldbaum, Matthew Turk
This YTEP outlines a plan to support alternative unit systems for yt.
Status¶
In Progress
Project Management Links¶
PR: https://bitbucket.org/yt_analysis/yt/pull-requests/1904/wip-switching-between-different-base-units/
Detailed Description¶
Background¶
Currently, yt works with a “cgs”-based unit system. That is, all units can be expressible in a set of “base” units, which are reducible to the “centimeter-gram-second” system of units. There is one exception to this rule, that of SI current units which are not reducible to anything within the cgs system, and have a base unit of Amperes.
In the current state of the code, there is minimal support for other unit systems. The extent
of this support are the methods for converting unitful quantities (YTArrays
, YTQuantities
)
and units themselves to the SI or “MKS” (meter-kilogram-second) system. These are:
in_mks
: Takes aYTArray
orYTQuantity
and returns a new one in equivalent MKS base units.convert_to_mks
: Converts the units of aYTArray
orYTQuantity
into the equivalent MKS base units.get_mks_equivalent
: Takes aUnit
object and returns the equivalent MKS units.
These methods are useful, but they require the user to convert to MKS “by hand” from the default “cgs” unit system used by yt, within which all calculations are carried out. Some users would prefer to work within the MKS system (or another alternative unit system) which is more appropriate for their datasets and calculations.
This YTEP outlines a proposal for allowing different unit systems to be used in yt. The core of the proposal is to allow this functionality on a per-object basis: namely, changing the unit system at the level of individual datasets, units, and unitful quantities, instead of on a global scale. The advantages of this approach are that it is relatively simple, is easily extendable, and makes only a fairly small number of changes to the fundamental code base.
The UnitSystem
Object¶
Managing different unit systems requires the creation of a new UnitSystem
class. A given UnitSystem
object will consist of dict-like access to setting and getting default units with the keys corresponding
to dimensions, whether strings (e.g., "velocity"
) or SymPy Symbol
objects registered in
yt.units.dimensions
(e.g., yt.units.dimensions.current_mks
). Initialization of a UnitSystem
object requires setting the name of the system, as well as a set of base units:
cgs_unit_system = UnitSystem("cgs", "cm", "g", "s")
This will initialize the UnitSystem
along with a set of base units. The required arguments are, in order:
name
: The shorthand name for theUnitSystem
.length_unit
: The base length unit for this system.mass_unit
: The base mass unit for this system.time_unit
: The base time unit for this system.
The optional arguments are:
temperature_unit
: The base temperature unit for this system. Defaults to"K"
(Kelvin).angle_unit
: The base angular unit for this system. Defaults to"rad"
(radians).current_mks
: The base angular unit for current in an MKS-like system. Defaults toNone
.
If need be, the base units for temperature, angle, and MKS current can be supplied:
mks_unit_system = UnitSystem("mks", "m", "kg", "s",
temperature_unit="K",
angle_unit="radian",
current_mks_unit="A")
The initialization of the UnitSystem
will also add it to a unit_system_registry
dictionary which
may be queried for a given system by its name:
from yt import unit_system_registry
imperial_unit_system = unit_system_registry["imperial"]
Once the UnitSystem
exists, new unit defintions for
specific dimensions may be added in two ways. The first is to explicitly set a unit for a specific dimension:
from yt.units.import dimensions
mks_unit_system["pressure"] = "Pa"
mks_unit_system[dimensions.energy] = "J"
So, whenever yt asks for the unit corresponding to a given dimensionality (such as in a field definition), the unit specified here will be returned. The second way to add new units to the system is simply by querying for the units for a particular dimension, without having set them previously. The effect of this is to set the units for that specific dimension by deriving them from the base units:
print(mks_unit_system["angular_momentum"]) # We haven't set a unit for this yet!
which will return kg*m**2/s
because it will be derived from the base units of m
, kg
, and s
.
Several unit systems will already be supplied for use with yt. They will be:
"cgs"
: Centimeters-grams-seconds unit system, with base of(cm, g, s, K, radian)
. Uses the Gaussian normalization for electromagnetic units."mks"
: Meters-kilograms-seconds unit system, with base of(m, kg, s, K, radian, A)
."imperial"
: Imperial unit system, with base of(mile, lbm, s, R, radian)
."galactic"
: “Galactic” unit system, with base of(kpc, Msun, Myr, K, radian)
."solar"
: “Solar” unit system, with base of(AU, Mearth, yr, K, radian)
."planck"
: Planck natural units (), with base of
(l_pl, m_pl, t_pl, T_pl, radian)
."geometrized"
: Geometrized natural units (), with base of
(l_geom, m_geom, t_geom, K, radian)
.
Users may create new UnitSystem
objects on the fly, which will be added to the unit_system_registry
automatically as they are created. Both of these will be accessible from the top-level yt
module.
"code" UnitSystems
¶
When a dataset is instantiated, a UnitSystem
object corresponding to the code units for
that dataset will be created and added to the unit_system_registry
, where the name will be
the string representation of the Dataset
object:
from yt import unit_system_registry, load
ds = load("GasSloshing/sloshing_nomag2_hdf5_plt_cnt_0100")
sloshing_unit_system = unit_system_registry[str(ds)]
Unit Systems and Dataset
objects¶
The main user-facing interface to the different unit systems will be through the load
function. load
will take a new keyword argument, unit_system
, which will be a
string that corresponds to the name identifier for the desired unit system, with a default
value of "cgs"
. The main effect of changing the unit system will be to return all aliased
fields and derived fields in the units of the chosen system. For example, to change the units
to MKS in a FLASH dataset:
ds = yt.load("GasSloshing/sloshing_nomag2_hdf5_plt_cnt_0100", unit_system="mks")
sp = ds.sphere("c", (100.,"kpc"))
print(sp["density"])
print(sp["flash","dens"])
print(sp["kinetic_energy"])
print(sp["angular_momentum_x"])
[ 1.30865584e-23 1.28922012e-23 1.30364287e-23 ..., 1.61943869e-23
1.61525279e-23 1.59566008e-23] kg/m**3
[ 1.30865584e-26 1.28922012e-26 1.30364287e-26 ..., 1.61943869e-26
1.61525279e-26 1.59566008e-26] code_mass/code_length**3
[ 6.37117204e-13 6.12785535e-13 6.20621019e-13 ..., 3.12205509e-13
3.01537806e-13 3.39879277e-13] Pa
[ -3.97578436e+63 -3.92971077e+63 -3.95375204e+63 ..., 2.39040654e+63
2.39880417e+63 2.44245756e+63] kg*m**2/s
Note that in this example, "density"
is an alias to the FLASH field ("flash","dens")
, and it
has had its units converted to MKS, but the original FLASH field remains in its default code units.
"kinetic_energy"
and "angular_momentum_x"
are derived fields which have also had their units
converted.
Another option is to express everything in terms of code units, which may be achieved by setting
unit_system="code"
:
ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030", unit_system="code")
sp = ds.sphere("c", (30., "kpc"))
print(sp["density"])
print(sp["kinetic_energy"])
print(sp["angular_momentum_x"])
[ 744.93731689 717.57232666 682.97546387 ..., 40881.68359375
57788.68359375 397754.90625 ] code_mass/code_length**3
[ 97150.95501972 91893.64007627 85923.44662925 ...,
11686694.21560157 16358988.90006877 79837013.8427877 ] code_mass/(code_length*code_time**2)
[ -1.17917130e-10 -1.05648103e-10 -9.26664470e-11 ..., 2.05149702e-09
2.03607319e-09 6.72304619e-09] code_length**2*code_mass/code_time
Currently, the plan is to have all frontends allow the user to set unit_system
in the call to
load
, but this should be evaluated on a per-frontend basis. For some frontends, it may be more
appropriate to set the unit system explicitly, whether to "cgs"
or some other system.
Using UnitSystems
in Field Definitions¶
In order for derived fields to take advantage of the different unit systems, it will be necessary to change the units in the field definitions, so that the derived fields may be returned in the units of the system specified when the dataset was loaded.
For example, in setting up the specific angular momentum fields in yt.fields.specific_angular_momentum
,
we would change the units thus:
def setup_angular_momentum(registry, ftype = "gas", slice_info = None):
unit_system = registry.ds.unit_system
def _specific_angular_momentum_x(field, data):
xv, yv, zv = obtain_velocities(data, ftype)
rv = obtain_rvec(data)
rv = np.rollaxis(rv, 0, len(rv.shape))
rv = data.ds.arr(rv, input_units = data["index", "x"].units)
return yv * rv[...,2] - zv * rv[...,1]
...
registry.add_field((ftype, "specific_angular_momentum_x"),
function=_specific_angular_momentum_x,
units=unit_system["specific_angular_momentum"],
validators=[ValidateParameter("center")])
Notice that the field definition code itself has not been altered at all except that the units
keyword argument to registry.add_field
has been changed from cm**2/s
to
unit_system["specific_angular_momentum"]
, which will set the units for the field to whatever is
appropriate for the unit system associated with the dataset. The unit_system
object may be queried
with either SymPy symbol
objects in yt.units.dimensions
or strings corresponding to the variable
names of those objects.
This will not be appropriate for all fields–some fields naturally belong in certain units regardless
of the underlying system used. In the context of galaxy clusters, "entropy"
is an example, which
naturally belongs in units of keV*cm**2
. Whether or not to change units should be evaluated on a
per-field basis.
For users adding their own derived fields, there will be two ways to take advantage of the new unit
systems functionality. If derived fields are being created from a dataset using ds.add_field
, they
can set up the units in a similar way as above:
def _density_squared(field, data):
return data["density"]*data["density"]
ds.add_field(("gas","density_squared"), function=_density_squared, units=ds.unit_system["density"]**2)
If using yt.add_field
, however, it will be necessary to set units="auto"
in the call to add_field
.
To provide an extra layer of error handling for this case, a dimensions
keyword argument will be added
to the DerivedField
initialization, which will only be used if units="auto"
, and will be used to
check that the dimensions supplied to add_field
and the dimensions of the YTArray
in the field definition
are the same:
from yt.units.dimensions import temperature
inverse_temp = 1/temperature
def _inv_temperature(field, data):
return 1.0/data["temperature"]
yt.add_field(("gas","inv_temperature"), function=_inv_temperature, units="auto",
dimensions=inverse_temp)
If one does not supply a dimensions
argument when units="auto"
, or if the dimensions are incompatible,
errors will be thrown.
Special Handling for Magnetic Fields¶
Making magnetic fields compatible with different unit systems requires special handling. The
reason for this is that the units for the magnetic field in the cgs and MKS systems are
not reducible to one another. Superficially, it would appear that they are, since the units
of the magnetic field in the cgs and MKS system are gauss () and tesla
(
), respectively, and numerically
. However,
if we examine the base units, we find that they have different dimensions:
It is easier to see the difference between the dimensionality of the magnetic field in the two systems in terms of the definition of the magnetic pressure:
where is the vacuum permeability. Therefore, in
order to handle the different cases of the magnetic field units for the two different systems,
it is necessary to have field definitions which can take the different dimensionalities into
account.
The most fundamental change which is required will be to change the way aliases are handled for
the magnetic field vector fields. Normally, the dataset field and the aliased field will have the
same dimensions. For example, in the case of a FLASH dataset, ("flash","magx")
and its alias
("gas","magnetic_field_x")
will have the same dimensions of magnetic_field_cgs
, which are
sqrt((mass))/(sqrt((length))*(time))
. This is handled by specifying the alias in the
known_other_fields
atttribute of the FieldInfoContainer
like this:
class FLASHFieldInfo(FieldInfoContainer):
known_other_fields = (
...
("magx", (b_units, ["magnetic_field_x"], "B_x")),
("magy", (b_units, ["magnetic_field_y"], "B_y")),
("magz", (b_units, ["magnetic_field_z"], "B_z")),
...
)
Where the alias is the second item in the 3-element tuple after the field name. However, we may want
to convert from a cgs unit system to an MKS unit system, which would require changing the dimensions of the
alias "magnetic_field_x"
(while leaving the units and dimensions of the dataset field "magx"
intact). The solution is to remove the alias from known_other_fields
and supply a helper function
which creates the aliases, taking into account the specified unit system:
class FLASHFieldInfo(FieldInfoContainer):
known_other_fields = (
...
("magx", (b_units, [], "B_x")), # Note the alias has been removed
("magy", (b_units, [], "B_y")),
("magz", (b_units, [], "B_z")),
...
)
def setup_fluid_fields(self):
from yt.fields.magnetic_field import \
setup_magnetic_field_aliases
...
setup_magnetic_field_aliases(self, "flash", ["mag%s" % ax for ax in "xyz"])
Again, this will have to be evaluated on a per-frontend basis as to what is most appropriate for the
handling of the magnetic field units. The definitions for other magnetic-related fields such as
"magnetic_pressure"
and "alfven_speed"
will also be modified to ensure that the units are
handled properly for the different systems.
Other Ways to Use the Unit Systems¶
There will be other ways in which unit-aware objects in yt may be converted to a different unit system. they are:
in_base
, convert_to_base
, get_base_equivalent
methods
These three methods, which currently convert unitful quantities and units to the yt base units
of cgs (plus Ampere if the dimensionality includes current_mks
), will be modified to include
a unit_system
keyword argument, which will be set to "cgs"
by default. The purpose of
this keyword argument is to allow switching between different unit systems for YTArrays
,
YTQuantities
, and Unit
objects. This keyword argument may be set to a string corresponding
to the name of the desired unit system. Some examples:
a = YTArray([1.0, 2.0, 3.0], "km/hr")
print(a.in_base("imperial"))
[ 0.91134442, 1.82268883, 2.73403325] ft/s
b = YTQuantity(12., "g/cm**3")
b.convert_to_base("galactic")
print(b)
1.7730691071344677e+32 Msun/kpc**3
c = YTQuantity(100., "mile/hr")
print(c.units.get_base_equivalent("mks"))
m/s
Alternatively, a Dataset
object may be passed as the unit_system
argument, which will
convert to the base code units of that dataset:
ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
sp = ds.sphere("c", (30., "kpc"))
print(sp["density"].in_base(ds))
[ 744.93731689 717.57232666 682.97546387 ..., 40881.68359375
57788.68359375 397754.90625 ] code_mass/code_length**3
Note that this will only work if the YTArray
, YTQuantity
, or Unit
in question “knows”
about those code units, e.g., it is from a data container from that Dataset
or was initialized
using ds.arr
.
A call to in_base
or convert_to_base
without specifying a unit system will
convert to the default “cgs” unit system:
a = YTArray([1.0, 2.0, 3.0], "km/hr")
print(a.in_base())
[ 27.77777778, 55.55555556, 83.33333333] cm/s
which is the current behavior of the code, ensuring backwards-compatibility. The behavior
of the cgs and MKS-specific methods (e.g., in_cgs
, in_mks
, etc.) will not be modified.
Cosmology
object
Currently, the Cosmology
object returns all quantities in cgs units. The proposed changes
will add a new keyword argument, unit_system
, which will be a string that corresponds to
the name identifier for the desired unit system, with a default value of "cgs"
.
cosmo = Cosmology(unit_system="galactic")
Alternatively, unit_system
may be set to a Dataset
object to use the code units of
that dataset:
ds = yt.load("IsolatedGalaxy/galaxy0030/galaxy0030")
cosmo = Cosmology(unit_system=ds)
Physical Constants in the Different Unit Systems¶
Each UnitSystem
object will have a constants
attribute which can be used to obtain
any physical constant in yt.utilities.physical_constants
in the base units of that system.
For example:
import yt
galactic_units_system = yt.unit_system_registry["galactic"]
G = galactic_units_system.constants.G
clight = galactic_units_system.constants.clight
mp = galactic_units.system.constants.mp
print(G)
print(clight)
print(mp)
4.498205661165364e-12 kpc**3/(Msun*Myr**2)
306.6013938381177 kpc/Myr
8.417430465502256e-58 Msun
Notifying the Community¶
The community will be notified about this feature enhancement via the mailing list and appropriate social media accounts. Appropriate documentation of this feature will be added.
Backwards Compatibility¶
Since the base unit system for all yt units will remain cgs, and the unit_system
keyword will always default to "cgs"
for loading datasets, setting up
Cosmology
objects, and unit conversions of arrays, the changes as proposed
are fully backwards-compatible.
Alternatives¶
The only alternative discussed up to this point was to set the unit system
globally for a given yt session using the configuration system. The system proposed
here allows for more fine-grained control at the level of individual objects, e.g.
Dataset
, YTArray
, and Cosmology
objects, which should be sufficient for most (if
not all) purposes. Another option is to make the default base units themselves configurable.
This is disfavored since it does not appear to add additional functionality beyond the currently
proposed scheme, and would result in more widespread changes to the code base.
YTEP-0029: Extension Packages¶
Abstract¶
Created: January 25, 2016
Author: Matthew Turk
The yt project is not the same as the yt codebase. However, the bundling of analysis modules and maintaining a monolithic repository tends to blur the distinction between the two. This YTEP is an attempt to identify how we can make the codebase more friendly to extension modules, and to encourage an ecosystem of independently developed packages that utilize yt.
Status¶
Completed
Project Management Links¶
There has been discussion on yt-dev about this, but due to Dreamhost’s constant problems with archiving mail, they may be gone forever, about which I will refrain from editorializing here.
Detailed Description¶
This YTEP proposes a few courses of action designed to promote the idea of yt as a dependency, rather than a destination, for analysis modules and external projects.
As it stands, many (the author included) have viewed yt as the “place” to put things, whether or not they contribute to its core mission. There are several good reasons for this:
- Distribution: yt is largely a monolithic codebase, and if you have downloaded and installed yt, you have access to all of the analysis modules. This makes discoverability and distribution much easier.
- Infrastructure: yt has testing infrastructure which gets run on all pull requests and new revisions. As such, if the analysis module is in the primary codebase, it too will be tested. If functionality in yt changes, the analysis module will track this. Additionally, we provide the ability to run tests on managed infrastructure.
- Hosting: We will allow these projects to host data on hub.yt as well as have their websites either as subrepositories in the main yt-project/website repository or as subdirectories. This will allow them to share our hosting infrastructure without developing their own, if they so wish.
- Prestige: (maybe?) Having something be a part of yt adds to the sense that it’s part of the bigger project, and that it’s somehow graduated to a stable component. (I am not sure this is real, but it’s been mentioned to me.)
- Social infrastructure: There’s a mailing list for yt, so folks know where to go.
But, there are several – sometimes quite large – downsides to being part of the mainline yt codebase.
- Project Standards: The methods right now for developing in yt require several iterations of code review as well as conforming to standards of development practices. For instance, this could include style or where the code is developed (i.e., bitbucket/github/kallithea/etc).
- Credit: While the situation is certainly improving, and there is nothing to stop a contributor from writing a paper describing a new analysis module (in fact this is encouraged) it is still a part of a “project” codebase rather than a standalone entity. This can diminish the perception of individual contributions, particularly by people not directly affiliated with the mainline yt project.
- Timeline: The acceptance of a change into yt can often be on a weeks or longer timescale, depending largely on the time being issuing a pull request and the next PR triage. For modules that are developed with much higher frequency, this can be very cumbersome and costly in developer efforts.
- Dependencies: yt attempts to keep the number of dependencies at a minimum; if a package wants to use them, it often has to either make them on-demand or vendor them with the source. We want to encourage packages to utilize new technologies and experiment – but at present, we can’t allow that into mainline.
A solution to this problem would be to encourage an ecosystem of discoverable packages that build on yt as a dependency. Three such packages exist at time of writing – powderday, trident, and astroblend.
What this YTEP proposes is to restructure our discussion of yt as a project to emphasize these types of packages and projects as citizens in the community, and provide to them a mechanism for people to find them.
This will take a few forms:
- Re-designing the website to emphasize the burgeoning ecosystem of projects, perhaps similar to the way astropy.org is set up.
- Offering, but not mandating, projects that they should direct questions to the yt-users mailing list.
- Splitting out some particularly isolated projects from analysis modules. This may include photon simulator and the sz generator.
- Provide entry points in the code for people to find out about available analysis modules. (Discussed below.)
- Utilize intersphinx documentation links for analysis modules, or offer to host documentation on the main yt homepage.
- Determine mechanisms for recognizing projects as extensions, independent, related, etc.
- Adding testing support can be difficult and labor intensive; however, we should explore adding testing support for extensions under some criteria. (Such as, does this depend on public APIs, do test failures constitute upstream breakages, etc.)
- Accept either subrepos or subdirectories in yt-project/website.
At present, people can find analysis modules by doing something like::
>>> dir(yt.amods)
This returns a list of analysis modules. This could be extended to also parse a small list of known external projects which could be installed. Importing one that was not installed would report that it wasn’t installed and describe a method for installing it.
Depending on the type of project, we may want to distinguish between yt
extensions and projects that share the yt ecosystem. For the former, we could
evaluate using yt.extension
as the namespace, if yt.analysis_module
is
not appropriate. This would then provide (similar to how Flask manages
extension modules) a naming
scheme (say, ytext_projectname
) that would be imported into
yt.extension
. This is not necessary, but is a possibility within the scope
of this YTEP.
Backwards Compatibility¶
If we split out existing modules, they would need to be API compatible. We would also want to make sure that they are well-supported by our infrastructure.
Alternatives¶
One possible alternative would be to make the yt.analysis_modules
namespace
a free-for-all of modifications, with much shorter timescales and essentially
autonomous operation by developers.
YTEP-0031: Unstructured Mesh¶
Abstract¶
Created: December 18, 2014
Author: Matthew Turk
Supporting data generated by unstructured mesh codes can be implemented using a
combination of existing data selection routines, vector fields, index fields,
and pixelization routines. This YTEP also touches on decoupling the step of
constructing an image buffer from data and the data representation itself,
which applies to data types such as SPH as well as mixing different mesh types
in a single Dataset
object.
Status¶
Proposed
Project Management Links¶
Any external links to:
- WIP pull request for hexahedral mesh: https://bitbucket.org/yt_analysis/yt/pull-request/1370/wip-generic-hexahedral-mesh-pixelizer/diff
Some small discussion has occurred on-list as well, but not yet of too much detail.
Detailed Description¶
Background¶
Data in yt has until now been assumed to be either discrete, or on a regular mesh that is defined by cells. This has been the implementation for smoothing kernels as well as particle and particle depositions. However, to support broader codes (and to support more flexible visualizations of particle datasets, such as direct smoothing kernel application onto image buffers) we should develop methods for supporting unstructured mesh. This will require uncoupling the pixelization step not only from coordinates (as has been done) but also from the underlying data model. This YTEP proposes a strategy for implementing support for unstructed meshes. We will not restrict ourselves to specific element types; however, where particular geometries are required, we will first focus on hexahedral, tetrahedral, wedge, pyramid and voronoi tesselation cells. The YTEP as a whole is designed to be generic with respect to element geometry.
We will not spend much time discussing Voronoi tesselation cells here and will instead reserve discussion of them for a future YTEP, as they are similar in some ways but quite different in others. Additionally, we restrict ourselves initially to convex elements.
In conjunction with these different cell types, data may be stored at vertices, centers, edges, faces, and so on. Developing the appropriate intra-cell interpolation methods will remove the need to explicitly identify these different data “centering” methods, but see below for discussion of how these centering methods will be denoted.
Typically within an element, the value at a given position is a function of a kernel that uses as input the quantities defined in the output. So given the kernel, and the data points, the value at any internal position in the cell can be determined. Supporting these interpolation kernels is part of a longer-term roadmap. Initially, we will likely perform nearest-neighbor or IDW.
The difficulty in handling unstructed mesh data is ensuring high-fidelity visualizations as well as high-fidelity data representation. The latter is considerably more straightforward, as if the system is viewed as a sequence of discrete points (and the question of intra-element interpolation is ignored except when explicitly requested) the existing selection routines can be used by regarding the data as a sequence of discrete points (“particles”).
We assume for the purposes of this document that the datasets are defined by connectivity and coordinates (i.e., an array of indices that define each element’s vertices and an array of common coordinates into which we index) and that the ordering of the vertices is such that we can identify which vectors span faces of the elements.
Irregular Mesh Points¶
In YTEP-0001: IO Chunking a system for chunking was described, wherein attributes such as fcoords, icoords and so on are exposed. We propose to augment these, which work nicely for regular mesh data, with coordinates for vertices and edges. However, this will only be accessible for chunks which are homogeneous in element type. (Otherwise, the vector would need to be “ragged” with differing component length.)
What this YTEP proposes to do is to create a new set of fields and chunk
attributes. The fields, much like fields such as x
, y
and z
, will
reflect the position of vertices and/or faces of an element and will be vector
fields. The new chunk attributes will be the underlying data from which these
fields are generated. The new chunk system will add on these attributes to
YTDataChunk
:
fcoords_vertex
- vertex positions, of shape (N, Nvertex, 3)fcoords_face
- barycenter of the faces, of shape (N, Nface, 3)fcoords_edge
- middle of edge, of shape (N, Nedge, 3)
We anticipate that for the existing data index types, these attributes can be created on the fly, but will not often be used.
The new fields will follow the naming convention of element_type,
type_coordinateaxis
. A few examples:
("hex", "vertex_x")
- vector of shape (N, Nvertex)("hex", "face_x")
- vector of shape (N, Nface)("tetra", "face_phi")
- vector of shape (N, Nface)("wedge", "edge_z")
- vector of shape (N, Nedge)("vertex", "x")
- array of shape (N)("edge", "x")
- array of shape (N)
We will still retain existing index
fields, which will return centroids for
coordinates and will simply return invalid coordinate requests for the path
element and cell width fields, similar to when fields like dx
are requested
in non-Cartesian coordinates. This should work, although it’s not entirely
clear that it is the best bet. Namespacing could potentially fix this.
What this will provide:
- Uniform access to coordinates of all raw, unprocessed data points.
- Re-use of existing routines that query based on field type and field name.
- Avoid confusion based on re-use of names from other systems of coordinates.
Likely this will also need a method for transforming values between definition
systems; for instance, a method for converting vertex-centered values to
cell-centered. This would be akin to the method used for depositing particles
on a mesh and would mandate access to the mesh object via ValidateSpatial
.
Decoupling Pixelization from Mesh Values¶
The pixelization step is the point at which mesh values are transformed into an image. These mesh values are variable resolution, and so the operation essentially deposits (through NN interpolation with anti-aliasing) these variable mesh values into an image buffer.
In cases where the mesh values are accessible through the fields used currently
(such as px
and the like), the standard pixelization routines will be
called.
For datasets that do not, or cannot, create px
fields and the like,
separate pixelization routines will be called. In the (at time of writing) WIP
PR for hexahedral mesh datasets, and example of this can be found. This will
be implemented in the coordinate handler.
The generic pixelization routine will accept a set of vertices, an interpolation kernel (nearest-neighbor for starters) and the field (initially only support for fields defined at centroids will be added for simplicity, but with edge and face added later). The ordering of vertices that provides face values will be specified at pixelization time, and will draw from one of a set of orders.
The pixelization routine will first apply coarse bounding box checks to the image plane and all supplied elements. Each pixel that passes the bounding box check for a given element will move on to the second step of selection. In this step, the sign of the dot product of the centroid with each normal vector defining each face will be stored (this prevents the need for knowing the CW / CCW ordering of the vertices) and for each pixel in the image plane, the signs of the same dot product will be examined. If all the signs match, the point is internal to our (convex) element. This appropriate kernel will be evaluated and the resulting value deposited in the image plane.
Because of the requirements of single mesh type, the pixelization routines will iterate over each mesh type and deposit the fields in sequence. This will enable the interoperation of fields between mesh types, without requiring that they be made uniform in size.
Note also that separating out based on the type of field and data represented means that we may now be able to implement slices of particle fields directly.
Multiple Meshes for Multiple Mesh Types¶
Each mesh type – hex, tet, wedge, etc – will be isolated to a different mesh type.
For a given data object, much like particles and mesh objects cannot interact
without the mediation of a deposition step, each must be queried separately if
the vertices are to be examined. If the field values are the only items of
concern, they can be queried in concatenated form. For situations where fields
persist across mesh types, we will be unable to supply vertex information and
can only then supply x
fields and the like.
At present, there is a semi-structured mesh object, and for datasets that
expose that, it lives within the .meshes
attribute of the index. Each mesh
type will be in a separate element in that list.
Example Use Cases¶
These example use cases should just work in a successful implementation. The
dataset imagined in them contains tetrahedra (N_t
), hexahedra (N_h
),
and wedges (N_w
). The field field1
is defined at vertices and
field2
is defined at the element centroids.
Querying all of the values of field1
:
dd = ds.all_data()
print dd["vertex", "x"].shape
print dd["index", "x"].shape
print dd["field1"].shape
The first and third print statements will return the same shape, but the middle will return the total number of elements (centroids). Ultimately, much like with particle fields, the user will need to have some knowledge of the mesh (which yt can provide hints about) to know how to combine fields.
This should also work:
prof1d = yt.create_profile(dd, ("vertex", "x"), "field1")
Because our selection operators will operate on the field values as though they were discrete points, this must also work:
sp = ds.sphere([0.5, 1.0, 30.1], (1.0, "km"))
sp["field1"]
sp["field2"]
These fields will not be the same size, but will select from all different mesh
types. Querying the "x"
field will return the centroids that pass the
selector, which will be of different size than "field1"
but will be the
same size as "field2"
. This also means that it will be impossible to bin
"field1"
against "x"
without explicitly namespacing it as ("vertex",
"x")
.
Volume Rendering¶
Initial support for volume rendering will use Embree, a fast ray-tracing code from Intel, to do the ray traversal. A set of python bindings for Embree already exists. Later on, this may be replaced our own ray-tracing code to remove the external dependency.
To use Embree, we must write code that generates a Triangular polygon mesh from the unstructured mesh data yt reads in. This may involve breaking up faces into multiple triangles. Currently, this is implemented for Hexahedral and Tetrahedral mesh elements, and adding support for other mesh types should not be difficult. One then uses the functions Embree provides to cast rays at the resulting mesh.
There will be two basic “plot types” for volume renderings of unstructured mesh data. The first will be “surface plots”, where the value of the field at the intersection point with each ray will be calculated using hit data computed by Embree. The second will be more like the traditional yt volume renderings, values along each ray will be accumulated for every element the rays intersect. For example, one could compute the maximum intensity along each ray instead of the value on the surface. Both of these types of renderings will need implementations of various intra-element interpolation functions to support meshes of various types and orders.
All of this will be integrated in with the Volume Rendering refactor, so that we retain the flexibility provided there for creating movies and camera paths. This will involve (at least) defining a new type of RenderSource object for polygon meshes. This object will know how to create the Embree polygon mesh from the data_source that gets passes in, and how to do the appropriate ray tracing calls. Once this source has been created, the Camera will be able to be changed at will, as defined in the YTEP for the scene refactor. Because multiple RenderSource objects can exist in the same scene, there is no reason why different meshes with different plot types can’t exist in the same scene.
Some examples of what the volume renderings will look like are here: https://www.dropbox.com/s/xx2it8p0ivk7s69/surface_render_0.png?dl=0 https://www.dropbox.com/s/m0b9wdp6uh6h4nm/surface_render_1.png?dl=0
Explicitly Not Implemented Functionality¶
These pieces of functionality will need considerable reworking before they will be suitable for use with unstructured mesh data, and they are outside of the scope of this document:
- “Spatial” fields, as connectivity between elements is not well-defined in general (although it may be for specific element types)
- Block and tile iterators, as they are not immediately relevant to unstructured meshes
These are difficult, and we will be holding off on implementing them until this YTEP and its implementation have shaken out.
Backwards Compatibility¶
This should have absolutely no backwards incompatible changes; any backwards-incompatible changes will be considered bugs and will result in a redesign.
Alternatives¶
A few alternatives exist. For instance, instead of augmenting fcoords
and
so on with new definitions, we could either define new fields and leave
fcoords
to refer to centroids (or delete it for those objects), or we could
define vector fields for these that are of shape (N, Ncell, 3), and refer to
the vertices of the data.
Additionally, we could be more explicit about what refers to what; we could have different namespaces for vertices.
Another alternate idea would be to mimic the particle method for namespacing
and positions; this would result in things like ("field_type",
"hex_vertex_x")
and so on. Or, we could do ("hex_vertex", "x")
and
similar.
Open Questions¶
- Should we get rid of
particle_type
and replace with a classification such ascentroids
,discrete
,vertex
and so on?- How should we handle namespacing for fields that may be defined at multiple places (face and vertex, for instance)
YTEP-0032: Removing the global octree mesh for particle data¶
Abstract¶
The global particle octree index used by yt presents a barrier for improving the performance and scalability of visualizing and analyzing particle datasets. This YTEP proposes removing the global octree index, replacing it with a combination of a new IO system and changes to the high-level yt API to focus on returning particle-centric data. The particle I/O refactor makes use of an indexing scheme based on compressed Morton bitmaps which dramatically improves memory usage and scaling for large particle datasets by eliminating the need for a global octree index.
Rather than constructing a global index to maintain backward compatibility at the cost of scaling and performance, we instead propose a reworking of the yt user interface for particle and SPH data to be more “particle-centric”. This means that data object selections for fields that are now defined on the global octree mesh will instead return field data at particle locations. For SPH data, visualizations of slices and projections are done in the image plane, making use of the “scatter” approach by smoothing SPH data directly onto images, employing either a volumetric or projected SPH smoothing kernel. Fully local derived fields are calculated using yt’s existing field definitions but passing in data defined at particle locations. Fields that need spatial derivatives are implemented using the SPH formalism and are also evaluated at the particle locations.
Altogether these changes allow for improved performance and scaling, and allow users to access, analyze, and visualize particle field data for SPH simulations in a more straightforward fashion. While we do not propose substantial API changes for mesh or octree codes, these changes to yt’s field system for particle data imply substantial changes to the meaning of yt’s data selection system for particle data. We discuss the implications of these backward incompatible changes and how we intend to document and manage them in a way that is minimally disruptive to users.
Status¶
In Progress. The implementation is mostly finished, although there are a few features that still need to be implemented.
Project Management Links¶
The code can be found in pull request 2382:
https://bitbucket.org/yt_analysis/yt/pull-requests/2382
The C++ compressed bitmap implementation we intend to vendor into yt:
Detailed Description¶
Background¶
Currently most user-facing operations on SPH data are produced by interpolating
SPH data onto a volume-filling octree mesh. When support for SPH data was added
to yt in the run-up to the yt-3.0 release, this approach allowed yt to support
SPH data in a way that could reuse the existing infrastructure in yt for octree
data and preserve core assumptions in yt that gas
fields must correspond to
a volume-filling AMR structure. While this did make initial support for SPH data
much easier, it also had some downsides. In particular, because the octree was a
single, global object, the memory and CPU overhead of smoothing SPH data onto
the octree can be prohibitive on particle datasets produced by large
simulations. Constructing the octree during the initial indexing phase also
required each particle (albeit, in a 64-bit integer) to be present in memory
simultaneously for a sorting operation, which was memory
prohibitive. Visualizations of slices and projections produced by yt using the
default n_ref
and over_refine_factor
are somewhat blocky since by
default we use a relatively coarse octree to preserve memory. In addition, since
we construct the global octree based on the positions of all particles,
visualizations that include only one particle type sometimes include “holes” in
regions under-sampled by that particle type.
These cosmetic and semantic issues are jarring to users of SPH codes, who tend to think of data defined at the particle locations rather than sampled onto an adaptive mesh. Making our high-level API focus more on particle-centric data will help to ease the cognitive dissonance felt by users of SPH codes when they work with yt.
Over the past two years, Meagan Lang and Matt Turk implemented a new approach for handling I/O of particle data, based on storing compressed bitmaps containing Morton indices instead of an in-memory octree. This new capability means that the global octree index is now no longer necessary to enable I/O chunking and spatial indexing of particle data in yt.
In this document we describe the approach we take for replacing the global octree index with Morton bitmasks. First, we describe the implementation of the Morton bitmask index, changes to the low-level selector API needed to support the Morton bitmask work, and the testing strategy used for the Morton bitmap indexing scheme. Next we discuss high-level changes to how yt handles particle data, changes to the field system for SPH data, the implementation of the SPH pixelizer system for visualizations, a discussion of deposit fields, and a description of the strategy used to test the new approach for SPH data. We close with a discussion of questions that still need to be tackled before this work can be merged.
Low-Level Implementation¶
Morton Bitmap Index¶
The generated index serves to map how the domain is populated by particles in datasets split across multiple files. This way, spatial queries can skip files that do not contain particles in the selected part of the domain. The files are mapped by storing two nested Morton indices for each particle in a dataset. Rather than storing the indices in plaintext, we make use of an EWAH compressed boolean array bitmap. Given a domain with known boundaries in each dimension, a 3-dimension position can be described by a single integer Morton index by
- Dividing the domain into
2^index_order1
cells in each dimension with widthsddx = domain_width_x/(2^index_order1)
. - Determining the 3 integers specifying the cell that contains the 3D
position (e.g.
x/ddx
). - Combine the 3 integers into a single integer by alternating bits from each dimension.
These indices can be stored as either integers or in boolean masks. In the case
of the mask, an array of zeroed bits is created with a length equal to the
maximum possible index for the chosen value of index_order1
. Then the bits
for the indices present are set to one. To save space, boolean masks in the form
of bitmaps can then be compressed further using the Enhanced Word-Aligned
Hybrid (EWAH) bitmap compression algorithm. In practice, we make use of a
vendored version of EWAHBoolArray,
a C++ EWAH bitmask implementation available under the Apache v2 license.
One bitmap is created for each file. If an index is present in more than one file’s bitmap, this represents a collision and decreases the likelihood that the bitmap can be used to exclude files during spatial queries. This is unlikely if the particles are well partitioned between the files according to a domain decomposition scheme at the chosen order, but this is not generally true of particle datasets produced by astrophysical simulations. In these cases, it is better to create a more refined index.
Using a larger index_order1
increases the refinement of the index, but also
increases both the memory required to store the indices and the time required
to query them for EWAH bitmaps. To combat this, we include a second refined
index within those cells that have indices in multiple files’ bitmaps for the
coarse index. For each particle with a coarse index that collides with another
file, a second refined Morton index is creating by following the same procedure
as for the coarse index, but exchanging the domain boundaries for the boundaries
of the coarse index cell. The refined index for each file is then stored in a
EWAH bitmap for each coarse cell with a collision.
The coarse and refined indices are generated in two separate I/O passes over the
entire dataset. To generate the coarse index, the coordinates of all particles,
as well as the softening lengths for SPH particles, are read in from each
file. For each particle we then compute the Morton index corresponding to the
particles position within the domain. This index, mi
is then used to set the
mi
th element in a boolean mask for the file to 1. If the particle is an
SPH particle, neighboring indices with cells that overlap a sphere with a radius
equal to the particle’s softening length and centered on the particle are also
set to 1.
Once a coarse boolean mask is obtained for each file, the masks are stored in a
set of EWAH compressed bitmaps (implemented in the ewah_bool_array
Cython
extension classes). Using logical boolean operations, we then identify those
indices that are set to 1 in more than one file’s mask (the collisions). The
EWAH format is particularly good at logical operations, as it does not
necessarily require decompression to determine whether or not particular bits
are set.
During a second I/O pass over the entire dataset, refined indices are created for those particles with colliding coarse indices. Both the coarse and refined indices are stored in an array for each file. One a file has been completely read in, those indices are sorted and used to create a map from coarse indices to EWAH compressed bitmaps. This is done because entries in EWAH compressed bitmaps must be set in order.
The Morton bitmap index is created for each particle dataset upon its first
ingestion into yt and saved to a sidecar file. At all future ingestions of the
dataset into yt, the index will be loaded from the sidecar file. Indexes are
managed through the Cython extension class ParticleBitmap
(defined in
yt/geometry/particle_oct_container.pyx
), which is exposed to the user
visible yt API via the regions
attribute of the ParticleIndex
class
(e.g. ds.index.regions
). The ParticleBitmap
class generates EWAH bitmaps
via the BoolArrayCollection
Cython extension object (defined in
yt/utilities/lib/ewah_bool_wrap.pyx
), which wraps the underlying
EWAHBoolArray
C++ library.
In the current implementation users can control the creation of the bitmask
index via the index_order
and index_filename
keyword arguments accepted
by SPHDataset
instances. These keyword arguments replace the deprecated
n_ref
, over_refine_factor
and index_ptype
keyword arguments. The
index_order
is a two-element tuple corresponding to the maximum Morton order
for the coarse and refined index. Using a tuple for the index_order
instead
of two keyword arguments is not only more terse, but it will allow us to produce
bitmask indexes in the future with multiple refined indices while maintaining
the same public API. Currently the default index_order
is (7, 5)
. If a
user specifies index_order
as an integer, the integer is taken as the order
of the coarse index and the order of the refined index is set to 1
,
producing a trivial refined index. For example:
import yt
ds = yt.load('snapshot_033/snap_033.0.hdf5',
index_order=(5, 3), index_filename='my_index')
ds.index
Running this script will produce the following output:
yt : [INFO ] 2017-02-14 11:50:20,815 Allocating for 4.194e+06 particles
Initializing coarse index at order 5: 100%|██████| 12/12 [00:00<00:00, 14.60it/s]
Initializing refined index at order 3: 100%|█████| 12/12 [00:01<00:00, 8.80it/s]
And produce a file named my_index
in the same folder as
snapshot_033/snap_033.0.hdf5
. The second and all later times the script is
run we only need to load the index from disk, so it produces the following
output:
yt : [INFO ] 2017-02-14 11:56:07,977 Allocating for 4.194e+06 particles
Loading particle index: 100%|███████████████████| 12/12 [00:00<00:00, 636.33it/s]
Note that there 12 iterations for each loop. Each of these iterations correspond to a single IO chunk. If a file has fewer than 262144 particles, the entire file is used as an IO chunk. If a file has more than 262144 particles, the file is logically split into several subfiles, each containing up to 262144 particles. Currently the chunk size of 262144 particles is hard-coded for all SPH frontends.
Changes to the Selector API¶
The Morton bitmaps needed for individual data objects are constructed using the
existing low-level Cython selection API. To determine whether a given Morton
index is “contained” in the geometric primitive defined by the selector we make
use of the select_bbox
selection API call, since each index corresponds to a
single cell in an octree. If the selector fully encloses the bounding box for
the cell defined by a given Morton index, the existing select_bbox
function
is sufficient. However, given that the goal of the Morton bitmap index is to
reduce the number of files we need to read from for a given selection operation,
more care must be taken near the “edges” of a selector. For this reason, we have
added a new function to the selector API, select_bbox_edge
. This function is
identical to select_bbox
in the case when a bounding box is fully contained
inside of the geometric primitive associated with a selector, simply returning 1
in these cases. However, if the bounding box is only partially contained in the
geometric primitive, select_bbox_edge
returns 2, indicating partial
overlap. This is used in the bitmap index code to indicate that the coarse
Morton index does not have sufficient resolution in this region, triggering the
generation of refined Morton indices in this region. These smaller bounding
boxes will have a higher probability of being either fully contained or fully
excluded from a data object, decreasing the probability of a file collision. The
select_bbox_edge
function has been implemented for all selectors and if this
YTEP is accepted will be a required part of the API for new selectors in the
future.
In addition to the above change, a more minor change was necessary to the
portion of the selector API used to count and select particles contained in a
given selector. Currently, all particles are assumed to be pointlike, which will
lead to incorrect selections for particles that actually have finite volumes
like SPH particles. To account for this, the signature of the count_points
and select_points
functions were changed so that instead of accepting only
single scalar radius for all particles, they can accept an array of possibly
variable radii as well. If non-zero radii are passed in, particle selection
operates via the select_sphere
method instead of the select_point
method
that is currently used. Since some selectors did not yet have implementations of
select_sphere
, we have added new implementations where necessary.
Testing¶
Testing is provided for both the low level routines controlling access to the bitmap, as well for higher level routines that control bitmap generation. Low level tests are located in yt.utilities.lib.tests.test_geometry_utils. This includes tests of the routines for generating Morton indices from cartesian coordinates, extracting single bit coordinates, and locating neighboring morton indices at the coarse or refined index level both with and without periodic boundary conditions.
Higher level tests are located in yt.geometry.tests.test_particle_octree and include the framework to create test datasets with arbitrary domain decomposition schemes across a specified number of files. Tests are included for creating bitmap indices for datasets with no collisions and all collisions that check the number of coarse and refined cells against known answers. In addition we also provided tests for saving/loading bitmaps and identifying input files for rectangular selections on known domain decomposition schemes.
Removing the Global Octree Mesh¶
Currently, all I/O operations are mediated via the global octree index. Particles are read in from the output file as needed based on their position in the octree. With the arrival of the compressed bitmap index scheme described above, we no longer need to use the global octree to manage I/O chunking. Making the global octree redundant in this way raises the question about whether the octree is really needed at all.
Currently yt makes a distinction between particle fields and mesh fields. All
SPH-smoothed fields (e.g. ('gas', 'density')
) are smoothed onto the global
octree mesh. To make a concrete example, let’s try loading an SPH zoom-in
simulation of a galaxy and ask for the ('gas', 'density')
field:
import yt
ds = yt.load('GadgetDiskGalaxy/snapshot_200.hdf5')
ad = ds.all_data()
density = ad['gas', 'density']
print(density.shape)
print(ds.particle_type_counts)
Running this script on the latest development version of yt at time of writing
(abf5a8eff1b2
) produces the following output:
(5661944,)
{'PartType0': 4334546,
'PartType1': 4786616,
'PartType2': 2333848,
'PartType3': 0,
'PartType4': 450921,
'PartType5': 1149}
On my laptop, this script also takes about 116 seconds to run, with 105 s spent
performing the SPH smoothing operation onto the global octree. Note also how the
number of leaf octs in the octree (5661944) does not match the number of SPH
particles (PartType0
). This discrepancy is a common source of initial
confusion for users of SPH codes when they first try to use yt to analyze their
data.
We can ask ourselves whether it makes sense to always smooth data onto the global octree. It makes intuitive sense for users of AMR codes for yt to return data defined on a volume-filling mesh, since the volume filling mesh is the “real” data. However, for SPH data, the global octree mesh is not representative of the “native” data. By making the return value of most yt operations for SPH fields be defined on the octree mesh, yt is not being “true” to the data and also makes it harder than it needs to be to access the particle data as such.
In this YTEP, we propose changing the data object API for SPH data by ensuring that all SPH smoothed fields return data defined at the locations of SPH particles. This means that rather than relying on smoothing data onto the global octree, we will instead always return data defined at the particle locations. This means that running the script included above would produce the following output:
(4334546,)
{'PartType0': 4334546,
'PartType1': 4786616,
'PartType2': 2333848,
'PartType3': 0,
'PartType4': 450921,
'PartType5': 1149}
And that the ('gas', 'density')
field would merely be an alias to the
('PartType0', 'Density')
field available on-disk. Since we no longer need to
smooth data onto the in-memory global octree, this substantially reduces the
memory needed to work with SPH data while simultaneously substantially improving
performance. Just as an example, in the version of the yt that implements this
YTEP, the script at the top of this section requires only 3.3 seconds to run.
The details of how this backward incompatible change to the yt user experience for SPH data will be implemented is detailed below. This includes all design decisions that have been made in the prototype version of yt that implements this YTEP. In addition, there are still several design decisions about how to implement this YTEP that have not yet been decided on. For more details about these issues, see the “Open Questions” section at the bottom of this document.
Identifying the SPH Particle¶
All of the proposals in this YTEP require that there be special handling for
fields that correspond to the SPH particle type. Currently yt does not have a
way of identifying whether a given particle type in a particle dataset is an SPH
particle. To ameliorate this, we propose adding a new private attribute of
SPHDataset
instances, _sph_ptype
. This attribute should resolve to the
string name of the SPH particle type for the given output type. For example, for
Gadget HDF5 data, the _sph_ptype
is 'PartType0'
. Having this attribute
available makes it much easier to write code that does special handling for SPH
data.
SPH Fields¶
Here we discuss changes to the yt field system for SPH particle data that will enable removing the global octree mesh.
Local Fields¶
Currently yt assumes that fields with a 'gas'
field type are defined on a
volume filling mesh. This YTEP proposes relaxing that assumption for SPH data so
that 'gas'
fields correspond to particle fields. Since we would like to
reuse the existing field definitions in yt as much as possible, we need to
explore how to adjust the field system to allow reuse of existing fields when
the field data might represent local particle data, SPH smoothed quantities, or
mesh fields, depending on the type of data being loaded.
As a reminder, sampling_type
is a newly introduced keyword
argument that can be passed to the initializer for yt DerivedField
objects
that will be released publicly as part of yt 3.4. It replaces the
particle_type
keyword argument, allowing more flexibility to define new
types of fields that are sampled in novel ways without needing to expose
additional keyword arguments like particle_type
. Currently, the default
value of sampling_type
is 'cell'
, preserving the old default behavior
(e.g. particle_type=False
).
We propose changing the default value of the sampling_type
used for yt
derived fields from 'cell'
to a new value: 'local'
. Derived fields with
sampling_type='local'
are fully local functions of other derived fields
(which themselves do not have to be fully local). It turns out that nearly all
of the fields that are currently defined inside yt with sampling_type='cell'
are actually fully local and the field functions they encode can be readily
reused with particle data. In the version of yt that implements this YTEP, all
fully local derived fields defined inside yt have had their field definitions
altered such that sampling_type='local'
.
With this accomplished, making all fully local derived fields work simply
requires setting up SPH particle fields with aliases to yt “universal” field
names. To make that concrete, this means that a Gadget HDF5 output needs an
alias from ('PartType0', 'Density')
to ('gas', 'density')
. With this
alias defined, all fully local derived fields that depend only on ('gas',
'density')
will automatically work. In addition, any particle derived fields
defined for the PartType0
with field names that begin with 'particle_'
will be aliased to 'gas'
fields without the 'particle_
prefix. For
example, the ('PartType0', 'particle_angular_momentum_x')
field is aliased
to ('gas', 'angular_momentum_x')
. This means that any 'gas'
derived
fields that depend on ('gas', 'angular_momentum_x')
being defined will
function as expected. In other words, we use the existing system of particle
fields to bootstrap the needed “input” fields for the bulk of the 'gas'
derived fields. The aliasing described here is implemented in the
setup_smoothed_fields
member function of the FieldInfoContainer
class.
One side effect of this approach is that there are some “odd” 'gas'
derived
fields (particularly if one is coming from an AMR code). For example, ('gas',
'position')
is defined as an alias to ('PartType0',
'particle_position')
. It may not be a good idea in the end to alias all
particle fields for the SPH particle type to 'gas'
fields, and it may be
necessary to add a blacklist of fields that should not be aliased, or that
should be aliased with explicit particle field names (e.g. maybe it would be
most helpful to define ('gas', 'particle_position')
).
Non-local Fields¶
Unfortunately, not all fields are fully local. We would optimally like to support fields that require some sort of difference operation, in particular physically meaningful fields like the gas vorticity or divergence. Currently these fields are not supported for particle data (since ghost zones have not yet been implemented for octrees), so if this effort makes it easier to add support for these fields, that will be a substantial improvement.
It turns out that within the SPH formalism there is a straightforward way to
compute fields that depend on spatial derivatives. These formulae are used
internally in SPH codes to estimate various terms in the equations of fluid
dynamics. Thankfully, we can make use of these formulae for visualization and
analysis purposes. There is a very nice paper by Dan Price [PricePaper] that
works through this formalism, from which we can derive several formulae for
partial derivatives and vector derivatives. For some quantity that is
a function of position, the partial derivative of
with respect to
at the position of particle
can be evaluated via:
Here and
are the mass of and gas density associated
with the
’th particle,
is an arbitrary function of
position (common choices are
and
), and
is the
SPH smoothing kernel at the position of particle
. The derivative
inside the sum in the above expression is evaluated at the position of particle
.
Similarly for the gradient, divergence, and curl:
These symmetrized formulae (i.e. they all include a term that looks like
) have the advantage that the derivative of a constant field is
zero by construction.
To actually use these formula, we will need to calculate on a particle-by-particle basis the list of nearest neighbors for each particle and then evaluate these formulae at the locations of each particle. This has not yet been implemented in the version of yt that implements this YTEP, but we expect it to be straightforward using the existing functionality in yt to generate nearest neighbor lists.
Non-local fields that do not depend on an explicit derivative operation will
(e.g. ('gas', 'averaged_density')
) will not be implemented for SPH data.
[PricePaper] | http://adsabs.harvard.edu/abs/2012JCoPh.231..759P |
Data Selection for SPH Fields¶
Currently data selection for particle fields models all particles, including SPH particles, as infinitesimal points. This means that 2D data objects do not select particles without exact floating point intersection between the data object and the particle.
This YTEP proposes modifying the selection semantics for SPH particles. Instead of modeling SPH particles as infinitesimal points, we will select SPH particles if the smoothing volume intersects with the data container. This means that particles with positions outside of 3D data containers will be selected, since if the smoothing volume overlaps these particles still contribute to estimates of fluid quantities inside of the data object. See Testing for more discussion of the testing strategy used to validate the yt data object and data selection system for SPH particles. To allow for use cases where it is convenient to treat SPH particles as infinitesimal, we will add the ability to dynamically turn on and off this behavior with a configuration option.
We have implemented and added unit tests for all of the following data objects:
- Point
- Slice
- Off-axis Slice
- Region
- Disk
- Ray
In addition, we have implemented the following data object features that depend on hooks in the C selector API:
- Chained selection (e.g.
reg = ds.region(..., data_source=sphere)
) - Boolean negation
- Boolean addition
- Boolean AND
- Boolean XOR
ds.intersection
ds.union
Visualization of Slices and Projection¶
Currently slices and projections of SPH data are generated by slicing data that has been SPH smoothed onto the global octree. If there is no more global octree, an alternative strategy for generating pixelized representations of SPH data needs to be implemented. This YTEP proposes replacing the pixelizer operations for slices, off-axis slices, and axis-aligned projections to make use of an SPH-centric pixelization operation. For inspiration, we look to SPLASH, an open-source SPH visualization tool written in Fortran. The algorithms used in SPLASH are detailed in the SPLASH method paper [SPLASHPaper]. We note that we have not consulted the SPLASH source code in support of this implementation.
The key to the pixelization algorithm used in SPLASH is to compute the SPH smoothing operation via the “scatter” approach. Rather than looping over pixels in the image, determining which particles contribute to the SPH smoothing operation at the location of that pixel, and then compute a field value using the SPH smoothing formula, we instead loop over particles, finding the set of pixels whose smoothing volumes overlap with the pixel location and deposit a contribution for that particle to all of the pixels the smoothing volume overlaps. As we loop over all of the particles that contribute to the image, we fill in the image by summing the contributions of each particle. This approach is attractive because it does not require any sort of nearest-neighbor operation and is also trivially parallelizable using e.g. OpenMP threads.
For slices we estimate the contributions of a particle to a single pixel using
the standard SPH smoothing formula. For Projections we make use of a projected
version of the smoothing formula, taking advantage of the spherical symmetry of
the problem. The smoothing operation is implemented in two Cython functions:
pixelize_sph_kernel_slice
and pixelize_sph_kernel_projection
which are
defined in yt.utilities.lib.pixelization_routines
.
To make the above discussion a bit more concrete, consider the following script:
import yt
ds = yt.load('snapshot_033/snap_033.0.hdf5')
plot = yt.SlicePlot(ds, 2, ('gas', 'density'))
plot.set_zlim(('gas', 'density'), 1e-32, 1e-27)
plot.save()
plot = yt.ProjectionPlot(ds, 2, ('gas', 'density'))
plot.set_zlim(('gas', 'density'), 8e-6, 8e-3)
plot.save()
Running the latest development version of yt at time of writing
(25651334863b
) requires 43 seconds to run and produces the following images:
Slice:

Projection:

Running the same script on the version of yt that implements this YTEP produces requires 20 seconds and produces the following images:
Slice:

Projection:

Note also that the performance improvement here becomes more stark for larger datasets as well as for zoom-in simulations which have deeper octrees.
The images produced using the octree are quite “blocky”, since the resolution of
the image in any given location is limited by the octree. This could be
ameliorated somewhat using over_refine_factor
but that requires steeper
memory and runtime cost requirements to smooth onto the octree. In general the
images produced by the new pixelizers are truer to the actual structure of the
data. Rather than generating an image from a sampled representation of the real
data, it is our opinion that it makes more sense to instead sample directly from
the particle data.
[SPLASHPaper] | http://adsabs.harvard.edu/doi/10.1071/AS07022 |
Deposition operations¶
Regular Grids¶
While we do want to make it easier to access particle-centric data, we need to
make sure it’s still possible to locally deposit and SPH smooth data onto
grids. Not only is that a useful operation for users of SPH codes, but it’s also
functionality that yt currently provides, so we need to ensure that currently
supported operations on covering_grid
and arbitrary_grid
data objects
continue to work and produce sensible results. We will add tests to verify that
this is the case.
Octrees¶
We should not abandon the ability to smooth SPH data and deposit particle data onto a volume-filling octree. Simply because users are currently using these data for their own analyses, we need to provide a migration path so that users can reproduce prior work made with yt using the global octree.
We propose adding a new data object to yt that represents an octree with a given
bounding box (which need not overlap with the domain bounding box) and maximum
refinement level. One can think of this as something of an adaptive
arbitrary_grid
data object. Initially we will only allow refinement in terms
of particle quantities (e.g. particle mass or particle count per octree leaf
node), but it should be possible to add support for data defined on octree or
patch AMR meshes eventually.
We still need to decide on an appropriate API for this. Ideally we would be able to reuse some of the existing code for the global octree.
Testing¶
The testing strategy for this work follows two basic approaches so far. First,
we make sure that all derived fields that are associated with a number of
real-world SPH datasets from http://yt-project.org/data can be calculated
without generating any errors. This ensures both that the derived field system
is functioning but also that the I/O routines in the various SPH frontends are
functioning correctly. These tests are present in
yt.fields.tests.test_sph_fields
.
In addition, we have added support in the stream frontend for loading SPH
data. This allows us to create fake in-memory SPH datasets that we can construct
in a way that make them easier to reason about for testing than a real-world SPH
dataset. The primary route for generating these dataset is a new function in the
yt.testing
namespace, fake_sph_orientation_ds
. This function has the
following very straightforward definition:
def fake_sph_orientation_ds():
"""Returns an in-memory SPH dataset useful for testing
This dataset should have one particle at the origin, one more particle
along the x axis, two along y, and three along z. All particles will
have non-overlapping smoothing regions with a radius of 0.25, masses of 1,
and densities of 1, and zero velocity.
"""
from yt import load_particles
npart = 7
# one particle at the origin, one particle along x-axis, two along y,
# three along z
data = {
'particle_position_x': (
np.array([0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0]), 'cm'),
'particle_position_y': (
np.array([0.0, 0.0, 1.0, 2.0, 0.0, 0.0, 0.0]), 'cm'),
'particle_position_z': (
np.array([0.0, 0.0, 0.0, 0.0, 1.0, 2.0, 3.0]), 'cm'),
'particle_mass': (np.ones(npart), 'g'),
'particle_velocity_x': (np.zeros(npart), 'cm/s'),
'particle_velocity_y': (np.zeros(npart), 'cm/s'),
'particle_velocity_z': (np.zeros(npart), 'cm/s'),
'smoothing_length': (0.25*np.ones(npart), 'cm'),
'density': (np.ones(npart), 'g/cm**3'),
}
bbox = np.array([[-4, 4], [-4, 4], [-4, 4]])
return load_particles(data=data, length_unit=1.0, bbox=bbox)
This example also demonstrates how load_particles
can be used by users in
this work to load SPH data written in data formats that aren’t yet supported by
a frontend. This testing dataset has one particle at the origin, another
particle along the x axis, two more along the y axis, and three along z. All
particles have the same smoothing length, such that the smoothing volumes of any
of the particles in the dataset do not overlap. This means that we can construct
various data objects and reason about which particles we should be selecting
given the geometry of the particles in the dataset and the boundaries of the
data object. In addition, we take care to make sure that the boundaries of the
data objects do not necessarily directly overlap with the position of a particle
near the boundary. This ensures that particles are selected when their smoothing
volume overlaps with a data object, not necessarily based on the particle
positions. These tests are present in
yt.data_objects.tests.test_sph_data_objects
. Currently all of the data
objects supported by yt are explicitly tested here. As an example, here is the
test that verifies the slice data object is working correctly:
# The number of particles along each slice axis at that coordinate
SLICE_ANSWERS = {
('x', 0): 6,
('x', 0.5): 0,
('x', 1): 1,
('y', 0): 5,
('y', 1): 1,
('y', 2): 1,
('z', 0): 4,
('z', 1): 1,
('z', 2): 1,
('z', 3): 1,
}
def test_slice():
ds = fake_sph_orientation_ds()
for (ax, coord), answer in SLICE_ANSWERS.items():
# test that we can still select particles even if we offset the slice
# within each particle's smoothing volume
for i in range(-1, 2):
sl = ds.slice(ax, coord + i*0.1)
assert_equal(sl['gas', 'density'].shape[0], answer)
Open Questions¶
There are a number of design decisions that still need to be made if this YTEP is going to be fully implemented and accepted. Comments and suggestions on these points are very welcome.
The Projection Data Object¶
Currently the projection data object is completely broken for particle data for all frontends:
In [1]: import yt
In [2]: ds = yt.load('IsolatedGalaxy/galaxy0030/galaxy0030')
yt : [INFO ] 2017-03-01 09:54:22,491 Parameters: current_time = 0.0060000200028298
yt : [INFO ] 2017-03-01 09:54:22,491 Parameters: domain_dimensions = [32 32 32]
yt : [INFO ] 2017-03-01 09:54:22,492 Parameters: domain_left_edge = [ 0. 0. 0.]
yt : [INFO ] 2017-03-01 09:54:22,492 Parameters: domain_right_edge = [ 1. 1. 1.]
yt : [INFO ] 2017-03-01 09:54:22,492 Parameters: cosmological_simulation = 0.0
In [3]: proj = ds.proj(('gas', 'density'), 0)
Parsing Hierarchy : 100%|████████████████████| 173/173 [00:00<00:00, 3535.74it/s]
yt : [INFO ] 2017-03-01 09:54:27,650 Gathering a field list (this may take a moment.)
yt : [INFO ] 2017-03-01 09:54:29,653 Projection completed
In [4]: proj['all', 'particle_mass']
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-4-1a26d598985b> in <module>()
----> 1 proj['all', 'particle_mass']
/Users/goldbaum/Documents/yt-hg/yt/data_objects/data_containers.py in __getitem__(self, key)
281 return self.field_data[f]
282 else:
--> 283 self.get_data(f)
284 # fi.units is the unit expression string. We depend on the registry
285 # hanging off the dataset to define this unit object.
/Users/goldbaum/Documents/yt-hg/yt/data_objects/construction_data_containers.py in get_data(self, fields)
339 self._initialize_projected_units(fields, chunk)
340 _units_initialized = True
--> 341 self._handle_chunk(chunk, fields, tree)
342 # Note that this will briefly double RAM usage
343 if self.method == "mip":
/Users/goldbaum/Documents/yt-hg/yt/data_objects/construction_data_containers.py in _handle_chunk(self, chunk, fields, tree)
440 v = np.empty((chunk.ires.size, len(fields)), dtype="float64")
441 for i, field in enumerate(fields):
--> 442 d = chunk[field] * dl
443 v[:,i] = d
444 if self.weight_field is not None:
/Users/goldbaum/Documents/yt-hg/yt/units/yt_array.py in __mul__(self, right_object)
955 """
956 ro = sanitize_units_mul(self, right_object)
--> 957 return super(YTArray, self).__mul__(ro)
958
959 def __rmul__(self, left_object):
ValueError: operands could not be broadcast together with shapes (3976,) (37432,)
By making SPH data return most data as particle fields we are making this
problem much more visible. We should decide what a sensible return value for the
projection operation on a particle field should be. Note that in practice we do
not need to solve this issue to create a ProjectionPlot
since we can
short-circuit the data selection operation when we create the pixelized
projection.
Some ideas:
- Write a new
ParticleQuadTree
class that adaptively deposits particle data onto a quadtree mesh. - Since most people really want a pixelized representation of the particle data
(e.g. via a
FixedResolutionBuffer
we could simply make it so the projection data object returns a regular resolution image.
Cut Regions¶
We have not yet implemented the Cut Region data object since it’s not clear how it should work for particle data. Similar to the projection data object, the cut region data object does not currently work when it is defined in terms of a threshold on a particle field. It may be possible to define an internal particle filter to implement cuts on Lagrangian data.
Volume Rendering¶
Currently we don’t support volume rendering particle data. In principle writing at least a basic volume renderer specifically for particle data is a straightforward project. Making it scale well to arbitrarily large datasets would be a bigger undertaking, but we think we should attempt to write a volume rendering engine that accepts particle data. Optimally this will plug in to the existing volume rendering infrastructure at the same level as the AMRKDTree. Attempting this will also make it easier to add support for volume rendering octree AMR data with an octree volume rendering engine.
The API and design for this component have not yet been settled.
Community engagement¶
We are early enough in the process of implementing this YTEP that the major design points have not yet calcified. To encourage wide adoption of these changes with a minimum amount of breakage for existing users, we will to reach out to existing users of yt who regularly work with SPH data to ensure that their existing code continue to work as much as possible. If there are widespread breakages, this will inform where we should focus on building backward compatibility shims and helpers.
Before this work can be merged into the main development branch we will need to update the documentation for particle data. This should include coverage of the following topics:
- Loading in-memory SPH data using
load_particles
- A high-level description of the Morton bitmap index system and how to tune it for performance by adjusting the maximum coarse and refined Morton index level.
- A high-level description of the data selection semantics for particle and SPH data.
- A transition guide explaining all of the changes and how to port scripts, particularly those making direct use of the global octree via a deposition operation.
Finally, we will attempt to publicize this document as much as possible to attract feedback from current and prospective users at an early stage.
yt 4.0?¶
Since these are substantial backward incompatible changes, we think the next version of yt released after this work is merged should be yt 4.0. This opens the possibility of adding other backward incompatible changes as well as removing deprecated features. We should be sure to signal to our users that there will only be major changes for those who work with SPH data - support for AMR and unstructured mesh data should remain the same.
YTEP-0033: Dropping Python2 Support¶
Abstract¶
Created: November 28, 2017 Author: Nathan Goldbaum Revision: Matt Turk
We will be dropping support for Python 2.7 in all “major” releases after 3.5, which will include both 3.6 and 4.0.
Status¶
Completed
Project Management Links¶
This has come up in the past on the mailing list:
http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/2017-April/006851.html
This document formalizes much of the discussion in that mailing list thread.
Detailed Description¶
Background¶
Python 2.7, the last major release in the Python 2 series, is currently in maintenance mode, only receiving critical bugfixes. Even this minimal level of support will end on April 12, 2020, when Python 2.7 will go end of life. At that point, Python 2.7 will no longer receive even security fixes. It is thus incumbent on us to not encourage our users to use an insecure, unsupported version of Python that may not even necessarily build on future versions of operating systems by maintaining support for Python 2.7 indefinitely.
Beyond that, for purely selfish reasons, we as a community have taken on costs to support both Python 2 and Python 3 in the same codebase. All contributions must happen in the dialect of Python that simultaneously functions under both major versions. We make heavy use of six
, requiring contributors to learn about six
to make even trivial contributions that happen to touch on Python 2/3 incompatibilities. We must run tests on both versions to avoid version-specific bugs and regressions. We are unable to use new features that have been added in Python 3 since these features are unavailable in Python 2.
For this reason, many projects in the scientific python ecosystem have either already ended support or proposed ending support for Python 2.7 in the near future. These efforts are summarized in the Python 3 statement, including a timeline that depicts the end of Python 2.7 support for projects that have signed the statement.
Proposed Solution¶
This YTEP proposes that yt 3.5 be the last major release of yt that will support Python 2.7. Subsequent releases, including 3.6 and 4.0 (but not including 3.5.1, 3.5.2, etc) will not support Python 2.7. In the past, we had suggested that yt 3.6 would not exist, or would not drop python 2.7 support, but the delay in yt 4.0 release has changed this timeline.
The development bandwidth for yt is somewhat restricted. In the past, we have had to ensure installation and utilization on many reasonably slow-moving environments (in particular HPC) but in the last five years or so the change in environments on HPC resources has largely eliminated this as a concern; we do not need to ensure extremely long-term compatibility with python 2.7. Some projects are treating their last release that supports Python 2.7 as a “long term support” (LTS) release. Given our development resources, we generally do not have the bandwidth available to simultaneously support both yt 4.0 and and python2-enabled yt 3.5 at the same level.
We are thus proposing that we make “best effort” compatibility with python2 in the 3.6 series (i.e., “soft” backcompat) and we will continue to accept patches for 3.5, although we anticipate these will taper with time, and we will cease to accept them one calendar year after support for python2.7 has ended. yt 4.0 will not have any backcompat built in.
To summarize:
- yt 3.5: This will be the last release series that explicitly supports python 2. No further development is anticipated once yt 3.6 has been released.
- yt 3.6: No hard python2 compatibility requirements, and “best efforts” may not be retained over the development lifetime. Testing of python2 will be dropped prior to release.
- yt 4.0: No back-compat with python 2.
These timelines are roughly comparable with projects that we directly depend on. In particular:
- SymPy: Dropping support in 2019.
- Numpy: Dropping support January 1 2019
- Matplotlib: Dropping support in 2018
In principle we don’t need to be constrained by the timelines of projects we depend on, since we need only use the last version that supports Python 2.7, but that would add yet another maintenance burden, since we would not be able to use the latest and greatest version of a downstream project for development.
Since we will be backporting bugfixes to the yt 3.6 release branch while yt 4.0 is under development, we will keep the uses of six
and other python 2/3 compatibility code in an effort to make backporting easier. Once yt 4.0 is released we will be able to begin removing compatibility shims and __future__
imports. We will be able to immediately drop the tests for Python 2.7 support on th development branch.
Community Outreach¶
I am planning to make both yt-dev and yt-users aware of this proposal, with the hope of soliciting feedback from interested users who do not normally participate in development discussions. Additionally, if this proposal is accepted, we will announce the future timeline for releases, along with the planned timeline for dropping Python 2.7 support in all release announcements. This should give more than a year of warning for community members to either say their peace about Python 2.7 support, or to make preparations to migrate user scripts to use Python 3.
Backwards Compatibility¶
Users who must remain on Python 2.7 for whatever reason will no longer be able to run the latest version of yt. Existing versions will continue to function, however. Users who currently run yt under python 3 (this is the default option in the install script since April) will see no change.
Alternatives¶
- Maintain support for Python 2.7 indefinitely.
- Maintain support for Python 2.7 for the yt 4.x series, dropping support in yt 5.0 or in a subsequent verison. This would likely mean maintaining support beyond 2020.
YTEP-0034: yt FITS Image Standard¶
Abstract¶
Created: September 9, 2018 Author: John ZuHone
This YTEP will define the standard for FITSImageData
objects written
from slices, projections, and covering grids, for better support for reading
these objects back into yt as datasets using the FITS frontend, using the
dataset class YTFITSDataset
.
Status¶
In Progress.
Project Management Links¶
The relevant code has been written and is in a PR which is under review:
Detailed Description¶
From its beginning, yt has been capable of producing projected images of
simulations as representations of the quantities which would be observed on
the sky plane. This enables comparisons of simulation predictions to real
data. Nearly all observational data in astronomy is in the Flexible Image
Transport System (FITS) format.
Therefore, yt also has a method to write slices, projections, and regularly
gridded data derived from datasets to FITS files using the FITSImageData
class. Documentation on how to use this class and its subclasses may be found
here.
FITS files consist of a list of “header data units” (hereafter HDUs), each of
which contain data in image (an array of dimensions with
) or table form, associated with a header which typically
contains information about the coordinate system and other metadata. The
header provides an opportunity to standardize FITS files written by yt so that
the data is as self-describing as possible with respect to coordinates, units,
and fields.
yt has also long had the capability to read FITS image files as datasets using
the FITS frontend and the FITSDataset
class. In general, each FITS HDU is
classified as a yt field, and the metadata in its header is used to define the
properties of the dataset. However, there is no universal standard for FITS
files, and therefore in most cases a number of properties of these datasets
may be undefined (e.g., units, coordinates, etc.).
At the very least, the FITS files produced by yt should be standardized. This
requires ensuring that both the FITSImageData
class and a new subclass of
FITSDataset
, YTFITSDataset
, adhere to this standard. This YTEP serves
to define the “yt FITS standard” for FITS images produced from 2/3D datasets
using the FITSImageData
class in yt and its subclasses.
Overall File Structure¶
yt FITS images shall be a single FITS file with one or more image HDUs, each one containing a 2 or 3-dimensional array with will correspond to a “field” in standard yt parlance. The dimensions of each array shall be consistent with the others for the entire file. The first or “primary” HDU will also contain an image array.
Header Information¶
Each FITS header associated with a a field in the FITS file shall be
entirely self-describing with respect to the properties of the field, the
current time of the dataset, the coordinate system of the dataset, and the
unit system of the entire file. The file shall be distinguished as a yt
FITS file by setting the WCSNAME
property equal to yt
.
Field Properties¶
The name of each field shall be stored in both the EXTNAME
and BTYPE
properties of the header. The units of the field shall be stored in the
BUNIT
property of the header.
Unit Information¶
Each header will be entirely self-describing as to the unit system of the
dataset, including the dimensions of length, time, mass, velocity,
temperature, and magnetic field units. In most cases, these units will be
derived from the underlying dataset used by FITSImageData
to produce
the file, but it will be possible for the user to specify their own code
unit definitions in the instantiation of the FITSImageData
object. For
images created by subclasses of FITSImageData
such as FITSSlice
,
FITSProjection
, etc., the length_unit
of the file will be given
by the units specified in the width
keyword argument or be chosen
automatically based on the size of the image.
Coordinate System¶
Each header shall have the coordinate system of the dataset stored in the WCS keywords. These will set up a linear coordinate system with an origin in pixel space at the center of the image. The relevant keyword arguments are:
CTYPE[123]
: The coordinate system type, all of which shall be"LINEAR"
for all axes.CUNIT[123]
: The units of the coordinate axes, in dimensions of length and in the specifiedlength_unit
of the FITS image. The units should be the same for all axes.CRPIX[123]
: The reference pixel coordinate of the image, which shall always be the center of the image:0.5*(n[xyz]+1)
, wheren[xyz]
is the number of pixels in each dimension.CDELT[123]
: The width of each pixel along each axis in the units specified byCUNIT[123]
.CRVAL[123]
: The reference physical coordinate of the image, which corresponds to the same location asCRPIX[123]
.
Other Metadata¶
Each header shall have the current time of the dataset stored in the header
keyword "TIME"
, where the units shall be the code time units of the
dataset.
Future iterations of this standard may allow for other optional metadata
such as the redshift, etc., which can be checked upon instantiation of the
YTFITSDataset
instance.
Example Header¶
The following is an example header for a density field created from a slice
of a FLASH dataset, as printed out by the AstroPy command-line tool
fitsheader
:
SIMPLE = T / conforms to FITS standard
BITPIX = -64 / array data type
NAXIS = 2 / number of array dimensions
NAXIS1 = 512
NAXIS2 = 512
EXTEND = T
EXTNAME = 'DENSITY ' / extension name
BTYPE = 'density '
BUNIT = 'g/cm**3 '
LUNIT = 1.0 / [kpc]
TUNIT = 1.0 / [s]
MUNIT = 1.0 / [g]
VUNIT = 1.0 / [cm/s]
BFUNIT = 3.544907701811032 / [gauss]
TIME = 1.18350909938232E+17
WCSAXES = 2
CRPIX1 = 256.5
CRPIX2 = 256.5
CDELT1 = 0.9765625
CDELT2 = 0.9765625
CUNIT1 = 'kpc '
CUNIT2 = 'kpc '
CTYPE1 = 'LINEAR '
CTYPE2 = 'LINEAR '
CRVAL1 = 0.0
CRVAL2 = 0.0
LATPOLE = 90.0
WCSNAME = 'yt '
Many of the items in the header are automatically filled, but the rest are defined by yt.
Backwards Compatibility¶
FITS files generated using FITSImageData
prior to these changes will still
be readable, and may be recognizable as YTFITSDataset
objects if they have
the "WCSNAME"
keyword set to "yt"
in the FITS header. If not, they will
still be readable as generic FITSDataset
objects as before. Since previous
FITS files made with FITSImageData
did not include unit information in their
headers, units for these files will back to default cgs values if recognized as
YTFITSDataset
instances by yt.
Alternatives¶
Leaving things the way they are, which means that we will have support for
writing FITSImageData
objects to FITS files which can be read in and
mostly understood by yt with the currently available metadata, but unit
support will be incomplete and some corner cases may be missed.
YTEP-0036: Converting from Nose to Pytest¶
Abstract¶
Created: September 30, 2019 Author: Jared Coughlin
This YTEP proposes two major changes to yt’s answer testing:
- Switch from nose to pytest
- Store array hashes rather than full arrays
Status¶
In progress
Detailed Description¶
Background¶
Currently, testing in yt makes use of the nose framework. Issues with nose include:
- Being in a self-described maintenance mode for the last several years
- Lacking modularity
- Using lots of boilerplate code
The first proposal of this YTEP is to switch yt’s testing framework from nose to pytest. Pytest offers many of the same benefits of nose:
- Automatic test discovery
- Ability to selectively run tests
- A large number of external plugins
- Fine-tuning via configuration files
- Compatibility with python’s standard library testing framework unittest.
In addition to these benefits, pytest is also:
- Actively maintained and developed
- Compatible with nose
- Equipped with a fully-featured fixture system
In fact, this fixture system is arguably the best reason to use pytest. Benefits include:
- Greatly increases modularity
- Reduces boilerplate
- Makes writing tests easier
- Allows for smarter resource use when collecting tests
The second proposal of this YTEP is a change to the way answer test results are saved. Currently, many answer tests in yt generate large arrays of data that need to be saved in order to facilitate comparison with future test runs. The size of these arrays:
- Slows down answer comparison
- Necessitates that they be stored separately from the main yt code base, which serves to complicate answer comparison
- Synchronizing pull-request merging with two repositories instead of one also slows down the development itself and creates technical debt
In an effort to combat these issues, this YTEP proposes saving the hashes of the answer arrays. Since these hashes are short, simple strings, they:
- Can be stored in human-readable yaml files
- Take up much less disk space
- Facilitate more efficient comparisons
- Can be packaged with the code itself
Converting to Pytest¶
There are two steps for converting from nose to pytest:
- Rewrite each nose test class as a function
- Rewrite each answer test to employ pytest fixtures
Rewrite Nose Test Classes As Functions¶
Currently, the abstract answer tests are implemented as classes that use yield
statements (e.g., FieldValuesTest
). Pytest does not support yield tests due to conflicts with the fixture system.
As such, each nose test class’ run()
method is now a function named after the test class (e.g., FieldValuesTest
becomes field_values_test
and contains the code from the former’s run()
method). These abstract answer test functions are now contained in the following file: yt/utilities/answer_testing/answer_tests.py
.
Rewrite Each Answer Test To Employ Pytest Fixtures¶
The answer tests (e.g., those contained in yt/frontends/enzo/tests/test_outputs.py
) are now, where applicable, parameterized using the @pytest.mark.parametrize
decorator, which removes the need to loop over various parameter combinations and makes logging the results of individual parameter combinations easier.
Conftest Files¶
These are configuration files that are used by pytest in order to define custom fixtures for processes such as setup, teardown, parameterizing, and using temporary directories and files.
The primary conftest.py
file resides in the root of the yt repository. It:
- Defines the command-line options
- Defines the fixtures used across each of the answer tests
Testing the Tests¶
The pytest ecosystem contains a swath of useful tools that can be employed in order to aid the testing process. Several such tools are listed here:
- pytest-randomly is a plugin for causing the tests to be collected in a random order each time they are run. This helps guard against nefarious bugs that may result from calling tests in a specific order
- pytest-cov is a plugin that generates test coverage reports. It also plays well with other useful pytest plugins such as pytest-xdist, which allows for tests to be run in parallel
- coverage-badge is a plugin for generating a test coverage badge that can be added to the README file
Doctest Integration¶
In addition to being able to run both the unit and answer tests for yt, pytest can also run doctests embedded in documentation as well as source code doc string via the --doctest-glob="*.rst"
command-line option, which is described here, and the doctest_namespace
fixture, which is described `here < https://docs.pytest.org/en/stable/doctest.html#doctest-namespace-fixture>`_.
Saving Answer Test Results As Hashes¶
This is handled by the hashing
fixture defined in the central conftest.py
file. This fixture is then applied to every test that needs to save a result. The fixture applies the md5
method of the hashlib
library to get the hex digest of the arrays produced by the tests. Once completed, the hashes and test parameters are written to yaml files with the following format:
calling_function_name:
test_name: hash
test_parameter1: value
test_parameter2: value
This produces human-readable text files that can be easily packaged with the main code base, which facilitates easier test management.
Running the Tests¶
The unit and answer tests are mutually exclusive, being run with two separate commands.
Similar to how the unit tests were run with nose, they can be run with
$ pytest
from the root yt repository directory.
To run a specific test or group of tests, one can either pass in the path to the module containing the tests
$ pytest /path/to/tests/test_module.py
or use pytest’s -k
flag, which enables test selection by name. For example, to run all of the tests contained in a single class, one would do:
$ pytest -k "TestClass"
To run only a specific method within a given class, one would do:
$ pytest -k "TestClass and test_method"
See this link for more on pytest’s selection capabilities and options.
The first step is to tell yt
where the test data is located
$ yt config set yt test_data_dir /path/to/yt-data
To run the answer tests for a specific frontend (e.g., tipsy)
$ pytest --with-answer-testing --answer-store -k "TestTipsy"
By default, the answers are stored in the location specified in pytest_answer.ini
. This can be overridden from the command line
$ pytest --with-answer-testing --answer-store --local-dir=/path/to/save -k "TestTipsy"
Should one desire to save the actual arrays produced by the answer tests, this can be done with the following command line options
$ pytest --with-answer-testing --answer-raw-arrays --raw-answer-store
If the --raw-answer-store
option is left off, then pytest will attempt to load in a set of previously generated arrays and perform a comparison to those generated during the current run.
Writing New Tests¶
Within the file containing the answer tests, one should define a new class that is marked by pytest as being an answer test. If the tests need to save data, they should utilize the hashing
fixture. Additionally, if possible, the arguments passed to the test function should be parameterized. For example:
import pytest
dsList = [some_dataset, other_dataset]
param1List = [value1, value2]
param2List = [value1, value2]
@pytest.mark.answer_test
class TestNewFrontend:
answer_file = None
saved_hashes = None
@pytest.mark.usefixtures("hashing")
@pytest.mark.parametrize("ds", dsList, indirect=True)
@pytest.mark.parametrize("param1", param1List, indirect=True)
@pytest.mark.parametrize("param2", param2List, indirect=True)
def test_method1(self, ds, param1, param2):
test_result = some_answer_test(ds, param1, param2)
self.hashes.update({"some_answer_test": test_result})
If desired, test parameterization can be handled in a conftest.py
file that lives in the new frontend’s tests
directory. See the pytest documentation for more.
Community¶
The primary method of reaching out to the community about these changes is through the yt-dev mailing list.
These solutions will be tested by making sure that all of the current answer tests produce results that match those currently produced by nose.
Backwards Compatibility¶
This YTEP breaks backward compatibility of testing because testing will no longer be able to be done by nose.
YTEP-0037: Code styling¶
Abstract¶
Created: May 18, 2020 Author: Clément Robert
This YTEP proposes the enforcement of code styling guidelines with auto-formatting tools.
Status¶
Completed
Detailed Description¶
Code styling guidelines are already presented in the project’s documentation, though enforcing them is not explicitly made part of the reviewing process.
We already use flake8
and integrate it to our CI to catch a subset of
infractions to PEP 8. From
flake8
’s pypi page
- Flake8 is a wrapper around these tools:
- PyFlakes
- pycodestyle
- Ned Batchelder’s McCabe script
From black
’s documentation
The rules for horizontal whitespace can be summarized as: do whatever makes pycodestyle happy. (…) The coding style used by Black can be viewed as a strict subset of PEP 8.
so it is expected that black
plays nicely with flake8
by construction.
black
applies an opinionated style, and offers very little configuration options
by design. Only the target line length can be changed. This makes it a critical
point, requiring discussion if this YTEP is approved.
Maximal line length¶
Note
After discussion a maximum-line-length of 88, which is black
’s default setting,
was adopted.
The guidelines states that
Line widths should not be more than 80 characters
Despite this being respected in most of the code base, there remains a large amount of outliers, that would be time-consuming to go through by hand. Taking the example of the yt-4.0 branch at the time of writing, there are 2158 lines exceeding 80 characters (~1.5% of the whole code base), or, visually

Note that when long strings are present, black
will not attempt to split them to
shorten the individual lines.
This is most important in the case of dosctrings, and I explore the tools available to
validate them hereafter (see additional rules).
There is a range of possible values we might give preference to. Python’s standard
library caps line-length at 79, pandas does so at 88. By default, black
will target
88, as its authors claim it reduces the total number of lines by some 10% (as compared
to enforcing 80).
In first drafting the PR linked above, I chose a line-length of 100, so as to minimize
the amount of manual tweaking left to me after a black
pass. I estimated that
imposing a strict limit to 80 chars would leave 545 lines to be manually updated, while
capping at 88 leaves a mere 135 (75% less work). As a reference, dask
uses black’s
default settings, and allows flexibility for docstrings up to 120 characters through
flake8.
Sorting imports¶
PEP8 recommends sorting imports statements,
Needless to say, the task is daunting and definitely not worthy of anyone’s time if we
had to go back and apply those rules manually to the code base.
Luckily, isort
is able to check for and auto-apply those rules, so it can easily be
added to the CI-linting process.
Additional rules & flake8 plugins¶
Since the oldest python version supported (as of yt 4.0dev) is 3.6, it means we can start
using fstrings instead of str.format()
and %
formatting.
#2605 demonstrates how a transition can
be performed using flynt
.
–
flake8-bugbear is a flake8
plugging
that goes beyond code style and detects some additional anti-patterns, most of which are
correspond very likely to design flaws in otherwise syntactically valid statement. For
instance, it will catch mutable default values such as in
def spam(a={}, b=[]):
#...
which, in most contexts, should be rewritten as
def spam(a=None, b=None):
if a is None:
a = {}
if b is None:
b = []
This is a well known “gotcha”, as documented for instance here. In short, this plugin detects bugs that went under the radar up to now, so it’s probably worth adding it to our linting CI.
–
Another plugging can be added to enforce docstring formatting (flake8-docstrings), and has a straight-forward option configuration to validate docstrings are numpy-styled. However, there is currently a very large debt in errors caught by this tool, and no way to automatically solve them. However, it could still be added to our linting CI, if check for new errors only, such as
git diff upstream/master -u -- "*.py" | flake8 --diff
(snippet borrowed from pandas’ contributing guide)
Side effects¶
Although some default options in isort
conflict with black
’s opinionated standard,
it can be configured so that the tools play nicely with each other.
This is demonstrated in #2596 where both
check pass on Travis.
On another note, black only recognizes pyproject.toml
as a configuration file (and
is explicitly not planning to support other files such as setup.cfg
).
An undesirable effect of using pyproject.toml
solely as a configuration file for
black
is that pip
will detect it and change its behaviour when its present. The
correct way to introduce this file is by specifying yt’s build requirements within it.
A proof of concept for this is #2598,
where CI builds are run correctly across all tested python versions (3.6, 3.7, 3.8).
A serious counter-argument to applying black is that it implies messing up with git
blame
by making a single contributor the de facto last-author of a large number of
lines they have not even necessarily read. Most recent versions of git
can be
configured to ignore specific commits in git blame
. However, black
’s own README
currently points out that GitHub’s UI for git-blame
does not support this feature
(yet ?).
It should be noted that black
does not have a parser for Cython files, but
interestingly flake8
and isort
do. Thus it is possible to add style checks for
Cython extensions to the CI pipeline.
Additionally, black
will not force line-length limits in docstrings.
flake8
will still be able to catch violations there, but solving them
require manual tweaking. However, the amount of existing docstrings going over
88 characters is fairly small (a few dozens), so this is by no means a blocking
condition.
Outreach and transition¶
Enforcing these change throughout future contributions can be done by
- updating the Developer Guide (done in part in #2592)
- offering a precommit hook configuration file to help contributors automate the linting stage locally (
precommit_hook.yaml
) such a configuration file is proposed in #2600
It is expected that transitioning to the “blackened” version of the code will add a bit
of overhead in merging pre-existing PRs. Specifically, a simple git merge <pr-branch>
master
will almost certainly raise git conflicts. One possible solution to this is to
sanitize the pr-branch (on author side) with:
pip install lint_requirements.txt
black yt/
isort .
git merge --strategy ours master
git push
I tested this strategy locally by simulating blackening at an arbitrary point in the past and merging the current state of the code base back in, producing a net zero diff with a direct blackening of the current state. In practice I advise caution, and sanitized code should be reviewed before merging. Another, arguably cleaner way to to resolve conflicts is to rebase the branch onto master and solve conflicts along the process. This is the prefered method though I would not recommend it to contributors who are not used to rebasing since it is easy to make mistakes in the process.
The shorter the transition, the easier, so I think that most of the PRs could be merged in a very narrow time window (a day or two), provided the appropriate conditions. However, because we want to ensure that each step passes the tests, which typically takes a least an hour or two per step, I propose that prep steps be done separately, and the big one (blackening) happen on a meeting.
Roadmap¶
To ensure cohesion in getting the number of features included in this PR in the codebase, we will have a dedicated maintainer/triage meeting. This YTEP’s PR, the yt slack, and the yt-dev mailing list will include the meeting details for interested parties to attend. Some items require completion before the triage meeting, and some can be done afterwards, and have been categorized below.
Pre meeting¶
The following questions should be resolved
* settle on a maximal line length (final: 88 characters)
* decide on where should unyt
import statement lands (for isort sorting): on third party section, or a custom
intermediate section between first and third parties ? (final: third party)
Backwards Compatibility¶
Yes.
Alternatives¶
- Enforcing styling guidelines through peer review for each PR. Obviously this is a lot more work. Additionally, this methodology is prone to error and may cause delay in the PR approval process in case the authors disagree with the reviewers on the application of styling rules.
- Leaving code style decisions up to authors, and embracing the style diversity.
YTEP-0039: Rich Terminal User Interface¶
Abstract¶
Created: March 3, 2021 Author: Clément Robert
Use rich
to prettify our TUI (Terminal User Interface). Most notably logs, and
progress bars (as a replacement for tqdm
).
Status¶
Withdrawn
Project Management Links¶
- rich logging #3106
- rich progress bars #3114
- upstream, add support for non-tqdm based progress bars to pooch
(unreleased as of May the 8th, 2021)
Detailed Description¶
rich is a library to build colorful and styled terminal user interfaces.
Logging¶
In particular it offers a rich.logging.RichHandler
class that can be used to replace
standard logging.Handler
instances, such as the one currently used by yt.
It supersedes our custom code to turn on colors in log entries and overall produces much prettier (as well as more useful) logs at a marginal maintainance cost, arguably cheaper than our existing facility.
Let’s illustrate our existing logger outputs and what rich
turns them into, using the
following script, and a minimalist configuration
import yt
yt.set_log_level(10)
yt.mylog.debug("2 + 2 = 5")
yt.mylog.info("Oh, people can come up with statistics to prove anything, Kent. 14% of people know that.")
yt.mylog.warning("Don't eat that yellow snow.")
yt.mylog.error("I am Bender, please insert girder.")
yt.mylog.critical("You have 24h left to live.")
data = {
"Nonetype": None,
"a number": 1.657,
"a boolean": True,
"a list": [1, 2, 3, "spam", "bacon"],
}
yt.mylog.info("Logging some data:\n %s", data)




Note that rich
adds a clickable path to the source file where each entry was emmited
from. Only the filename + line number are displayed but those are actually absolute
links. Advanced terminal apps like iterm support link
integration to make the best out of rich
logging.
rich
is flexible, supports a handful of color systems, and adapts to the system it
runs against, which makes it more robust than our existing on/off switch for colored
logs.
Currently, colored logs are turned off by default and activated from yt’s config file as
[yt]
colored_logs = true
With rich
, colored logs could be turned on by default at no cost, and with no risk
of crashing a shell lacking color support. Because rich
offers a lot of
configuration options, we could choose to expose some of them in yt’s config file within
a new, dedicated section, which I’m drafting here with proposed default values.
This should be aligned with the current state of the documentation in
#3106
[logging]
# replaces yt.log_level
# logging level can be specifies as case (insensitive) string
# and passed down to yt.utilities.logger.set_log_level
level = "INFO"
# replaces yt.colored_logs
use_color = false
# replaces yt.stdout_stream_logging as well as yt.suppress_stream_logging
# accepted values are "stderr", "stdout" and "none" (completely disable logging)
# this is case insensitive to avoid breakage if a user was to write e.g., "None"
stream = "stderr"
# this is passed to a logging.Formatter instance
format = "%(message)s"
# this is arguably a more sensible default that the legacy format (unspecified)
# where miliseconds are displayed.
# This default value mimicks rich's, but exposing it makes it more obvious how it
# can be customized.
date_format = "[%m/%d/%Y %H:%M:%S]"
# the other option would be "legacy", see Backwards Compatibility section below
handler = "rich"
# the following options are silently ignored when `handler = "legacy"`
# width <=0 leaves rich's default Console width (auto-adjusted if the window changes
# size)
# otherwise must be >0 and is the total size (in columns) of a log entry
width = -1
# path to a custom rich config file, either absolute or relative to the cwd
# this parameter should be a (non-empty) string when it's used.
custom_theme = ""
Note that it is pretty hard to come up with satisfying and intuitive solution to interpret
a relative path for the custom_theme
option. It could be interpreted by humans as
relative to any of the following
- the global config file
- the local config file
- the current config file
- the current working directory
- the python script being run
- yt’s install dir (less likely)
For this reason, I am not convinced it’s worth supporting relative paths at all, or exposing this option, but I’m willingly leaving to it in this state as the most experimental part of the (logging) project. Feedback will be collected to decide how it should or shouldn’t work according to early addopters if any.
Progress bars and status¶
The rich.progress
module offers progress bars that are arguably much
cooler-looking than the leading concurent (and current yt dependency) tqdm
.
More importantly, they are also much more flexible in a multi-tasking context
(threading). Typically, rich
can display more than one progress bar at once
without interrupting the logging stream. For a demo of this, run
python -m rich.progress
Coexisting progress bars open the possibility for mutli-tasking with long-running tasks in yt without sacrifying expressivity in logs and other outputs.
Note that rich also borrows so-called “spinners” from cli-spinners, which offer a nice alternative to progress bars to express on-going progress, in particular in tasks where completion time may be difficult to estimate. Try them for yourself with
python -m rich.spinner
Known caveats
- Progress bars + Jupyter lab bug: https://github.com/willmcgugan/rich/issues/830
- Progress bars would be defacto heterogenous with
pooch
(used foryt.load_sample
)
because it only knows tqdm
. Replacing tqdm
or more realistically adding
support for rich.progress
, or even arbitrary progress bar classes in pooch
clearly requires a change upstream and is not a high priority, but eventually, this
looks feasible.
yt CLI¶
A marginal side effect is that interactive command line applications could be writen in
simpler ways than with vanilla Python using rich
.
For instance, let’s look at a snippet that was proposed for inclusion our config
migration script yt config migrate
(see #3044)
prompt = "Perform the migration now [yn]? "
user_input = input(prompt).lower()
while user_input not in ("y", "yes", "n", "no"):
print(f"Did not understand your input '{user_input}'. Please enter 'y' or 'n'.")
user_input = input(prompt).lower()
if user_input in ("y", "yes"):
migrate_config()
else:
raise SystemExit("Migration not performed: exiting.")
This can be expressed much more efficiently using rich.prompt
from rich.prompt import Confirm
if not Confirm.ask("Perform the migration now ?"):
raise SystemExit("Migration not performed: exiting.")
migrate_config()
At the time of writing however there is no clear spot where this functionality would shine in yt.
Testing¶
YTEP-0035 (pytest) is making progress and
closing final implementation. pytest
has builtin fixtures to capture logs
(caplog
) and standard outputs (capsys
) to inspect them, which makes testing of
logging format much easier.
I have started a branch to test and fix existing and new problems with the migration CLI
yt config migrate
in #3112, which
relies on pytest.
Outreaching¶
- Release notes.
- config file migration/conversion facility: produce warnings when deprecated log-related parameters are found and offer to auto-convert them in place. In case a conflict is detected at runtime between old and new parameters, use the new ones, but advise the user to manually remove old ones (list them).
Backwards Compatibility¶
Downstream projects may rely on yt’s existing logging format. Some users may also
simply prefer this style over what rich offers. Even if we make rich
’s logging
handler the default, we could offer a option to restore the “legacy” logging format
in yt’s configuration file.
[yt.logging]
handler = "legacy"
use_color = false
Note that by construction, switching back to the legacy format would be an opt-in,
which should be ok as long as it is properly documented in release notes.
Keeping support for old-style progress bars would be relatively straightforward but it
would create friction on the side of dependency specifications: if we support both styles
at any point, then we have no way to formally specify tqdm
OR rich
is required
but not both. Considering this, I suggest to simply drop tqdm
and make rich
a
hard dependency immediately (#3114).
Functions yt.utilities.logging.colorize_logging
and
yt.utilities.logging.uncolorize_logging
won’t be necessary anymore except if
we want to maintain full backwards compatibility (legacy handler + color). What
should be done with them is up for discussion but here’s my personnal opinion.
They live in a pretty nested part of the yt namespace, but they may be used
downstream since they are not explicitly (or implicitly) documented as private.
I think it’s unlikely that anyone would want to use them at runtime instead of
configuring yt, so this backwards compatibility breakage is likely acceptable. I
also can’t think of a reasonable workflow for which users would care about
pretty logs and wish to keep the legacy format at the same time. I propose to
mark them as deprecated until the next release (following the acceptance of this
YTEP), then remove them.
Cost¶
rich
would be added to yt’s requirements.
Alternatives¶
Keep simple logs + tqdm
.
YTEP-0040: a yt-baked colormap package¶
Abstract¶
Created: 20 April, 2021 (happy 40th birthday Matt !) Author: Clément Robert
I propose to extract native colormaps (“cmaps” hereafter) from yt into a separate, lightweight package.
Status¶
Completed
Project Management Links¶
- PRs following the release of Matplotlib 3.4
GH issue with the initial discussion #3165
The new package’s home is https://github.com/yt-project/cmyt
Detailed Description¶
An undocumented behaviour change in Matplotlib 3.4.0 made naming collisions in cmaps fatal. Registering a new cmap with a previously registered name will not work any more, and cause the program to crash. Up to yt 3.6, we’ve been automatically registering cmaps from various sources. Most notably: - yt itself - IDL - cmocean
All of the cmaps in questions are (or used to be) registered with bare names, i.e.
without a dedicated prefix. This means that if any of our cmaps’ names is used in the
future release of Matplotlib, importing yt will fail with a ValueError
.
Good practice is in fact to register cmaps under a custom “namespace” (prefix) that is
guaranted to never collide with matplotlib native colormpas. This is what is currently
done in cmocean itself (prefixing with "cmo."
) as well as, for instance cmasher
(prefixing with "cmr."
).
Since migrating away from the fragile current registration method requires a structural change anyway, I propose we seize the opportunity to make yt native cmaps easier to use from outside the framework (to footnote: it is actually already doable but it requires loading the entire package) and export them into a small, lightweight, easy to maintain and install package.
Such a package would be a hard dependency of yt itself.
In case this proposal is accepted, the exact prefix used is up for discussion. I can propose
1) cmy.
pros: perfectly fits the dominant convention, cons: doesn’t look like yt
1) yt.
pros: dead simple, best at “brand” reinforcement, cons: doesn’t align with the dominant convention
1) cmyt.
pros: has “yt” _and_ “cm” in the name, cons: one char more that the dominant convention
My personal preference goes to cmyt.
, and I’m going with it as the proposed package
name in the following.
NOTE: the fate of IDL orignated colormaps currently registered and vendored by yt is not clear. They could be ported as yet another package (and maybe made a hard dep to yt too ?), or be included in the first package (not my favourite option).
Backwards Compatibility¶
Forcing people to add prefixes everywhere is not (yet ?) necessary, but the practice
should be encouraged nevertheless. One way to bridge the gap between current
“malpractice” and a more stable usage in yt would be to perfom reregistrations of
cmyt
’s cmaps without a prefix (similarly to what was previously done with cmocean,
though in an error-safe way).
Maybe this could be done silently in yt 4.0 (to save our users as much of a migration burden as possible), then raise visible deprecation warnings starting from yt 4.1, and finally be completely deprecated in yt 4.2 or yt 4.3 .
Alternatives¶
- Perform structural changes in how yt registers its own cmaps (necessary) without making
the result a new package (not necessary, but offers some additional benefits).
- Perform the necessary changes on yt side, export the cmaps into a separate package but keep them in the main package too (code duplication, not my cup of tea).
YTEP-1000: GitHub Migration¶
Abstract¶
Created: March 25, 2017 Author: Lots of folks
The primary source code and project management for yt should be moved from bitbucket to github, and as a result, from mercurial to git. This document outlines a timeline, conversion and import mechanism, and how to manage this transition.
Status¶
Completed
The yt steering committee has evaluated the broad strokes, and it was presented to the yt community at the beginning of March. There were no objections, and this document is to be iterated on to decide on the migration strategy and timeline.
Project Management Links¶
- Mailing list message describing the situation: http://lists.spacepope.org/pipermail/yt-dev-spacepope.org/2017-March/006792.html
Detailed Description¶
As discussed in the email to yt-dev, we should move from BitBucket to GitHub as a result of the pervasive network effect of GitHub and its impact on the community of yt developers.
The process of migration needs to be planned to minimize disruption to developer and user workflow. It will proceed in roughly three stages.
Stage 1¶
A clone is created on GitHub, and this is evaluated by the developers. The specifics of the conversion procedures (specifically, things like any branch filtering, tag conversions, embedding of hashes and so on) will be examined. A few particular items of note are that the changeset hashes in commit messages and issues are of particular interest. Any commit messsage filtering or file size reduction will be considered. This clone will not be final, and will not be open for pull requests, changes, etc. It will be a one-way and one-time sync.
Whatever methods are used to do this conversion will be written in this YTEP.
An example using fast-export can be found at http://github.com/yt-project/yt_fast_export . This includes git notes for each changeset hash, but they must be pulled specifically using git fetch origin “refs/notes/*:refs/notes/*” .
Possible timeline: Starting April 10, 2017
To Do items during this stage:
- Create initial clone (this can be done manually)
- Test migration of issues to GitHub (1:1 mapping of numbers)
- Set up Jenkins jobs using git and GitHub plugins
- Rewrite yt update
- Update documentation
- Update slack bot
- Migrate supplemental repositories to GitHub
Stage 2¶
This is a brief stage, during which we wind down PRs from the Bitbucket repository. All PRs will be either accepted or declined. All PRs that are declined that are still “in progress” will need to be converted to github pull requests, the process for which will be documented. The simplest mechanism will be through hg export and git import, which will squash patches.
Possible timeline: Starting May 1, 2017. This stage is contingent on the Stage 1 To-Do’s being completed, but should be roughly three weeks after Stage 1 is entered.
To Do items during this stage:
- Accept or decline all PRs on BitBucket. Those PRs that are not accepted by the conclusion will need to be moved to GitHub
- Switch infrastructure over to GitHub
- Open up PRs on GitHub and begin code review there
- Migrate users (ask for their GH handle) and their access levels
Stage 3¶
This is the final stage. At this point, the switch will be flipped, and no more PRs or code review will be accepted on BitBucket.
Possible timeline: Starting May 15, 2017. This stage is contingent on the Stage 2 To-Do’s being completed, but should be roughly two weeks after Stage 2 is entered.
To Do items during this stage:
- None that I can think of.
Once Stage 3 has been completed, the bitbucket repository will be marked read-only.
Progress and Notes¶
During the course of the stages, we will update this YTEP with notes on conversion processes, etc.
Backwards Compatibility¶
This will almost certainly not break internal APIs other than those identified in the “todo” section above, which are all related to project maintenance like updating and so on.
The developer workflow will break, but we are attempting to mitigate that through this YTEP.
Finally, the documentation (identified as needing to be updated) will be updated to reflect the new normal.
YTEP-3000: Let’s all start using yt 3.0!¶
Abstract¶
Created: October 30, 2013 Author: Matthew Turk
This is a YTEP suggesting we all start using yt 3.0 for development, and where the blockers to adoption are enumerated.
Status¶
Completed
Project Management Links¶
Basically all the project management links up to this point have been talking about this.
Detailed Description¶
This YTEP outlines the items necessary to be implemented before yt 3.0 can be released, and before we can attempt to move development and day-to-day usage for developers to the 3.0 codebase.
There are essentially three categories of work items: release blockers, necessary features to migrate usage and development, and feature parity requirements.
Several developers have expressed that a major blocker is concluding work they have begun on yt 2.6 and the 2.x branch; this document is meant to supplement that, rather than replace it.
Release Blockers¶
These are components that need implementing before yt 3.0 can be released. This is not the same as reaching a “complete” implementation; the important work is to ensure that subsequent API breakages are minimal. We are tracking these on the yt-3.0 Trello Board.
- Merging unitrefactor; waiting only on documentation and rebranding merge at this time. (YTEP-0011 and YTEP-0017.)
- De-astroification of yt and renaming of generic objects which has been mostly accomplished in the rebranding bookmark.
- Removing dict-like access to static output (YTEP-0018), not yet compelted in the rebranding bookmark.
- Considerable amount of documentation, which is being worked on.
I do not believe there are any other blockers to yt 3.0.
Necessary Features¶
These are items that are necessary for developers to migrate from using and developing yt 2.6 to yt 3.0.
This is intentionally left mostly empty, as items from “feature parity” will be migrated up.
field_cuts
(which is a related tocut_region
, which has been mostly implemented.)
Feature Parity¶
These are items that existed in yt 2.6 that do not exist in yt 3.0 yet.
- A handful of hierarchy attributes have not yet been implemented.
- A few frontends still need polishing during the port, including Chombo, Pluto, NMSU-ART, and GDF. These are small items but will need assistance from individual frontend maintainers.
- The sidecar storage has not been ported.
- Boolean regions have not been implemented. They can likely be implemented in the same manner as
cut_region
has been.
Backwards Compatibility¶
This does not add any new backwards incompatible items, it is merely a call to action.
Alternatives¶
Call a mulligan, start over?
YTEP-9999: YTEP Template¶
To write a YTEP, copy this template to the next numerical number, add it to the repository, and issue a pull request. Discussion of the YTEP will occur either on the mailing list (for large-scale changes) or in the PR itself (small items, such as formatting).
This document has been patterned after the Matplotlib Enhancement Proposal Template (found here.)
Abstract¶
Created: November 25, 2012 Author: Your Name
This section should contain one or two sentences describing the proposed change. It should not contain detailed design information, but it can contain background information.
This should contain a date
Status¶
Status should be one of the following:
- Proposed
- Completed
- In Progress
- Declined
YTEPs do not need to pass through every stage.
Project Management Links¶
Any external links to:
- Pull requests
- Related issues in the bug tracker
- Previous implementations
- Mailing list discussions or Google Docs
Detailed Description¶
Here is where you should write detailed description of what the YTEP proposes. This needs to include:
- Background
- Nature of the problem
- Nature of the solution
- How will the solution be implemented * Brief outline of the code needed to implement this * Code examples of using the solution, in appropriate * How will the solution be tested?
- What are any stumbling points
- What is the proposed method for reaching out to the community about this?
Backwards Compatibility¶
This section should outline backwards compatibility issues. In particular, it should focus on those issues that will appear to the main scripting API: will this break old scripts? Will it break internal uses of the API?
Alternatives¶
This section is optional.
What other means are there to accomplishing the goals of this YTEP, and why is this the best option?