Internals
Data Consistency
When creating a LASDataset
or writing a tabular point cloud out to a file, we need to make sure that the header information we provide is consistent with that of the point cloud and any VLRs and user bytes. Internally, this is done using the function make_consistent_header!
, which compares a LasHeader
and some LAS data and makes sure the header has the appropriate data offsets, flags and other metadata. This will, for example, make sure that the numbers of points, VLRs and EVLRs are consistent with the data we've provided, so your LASDataset
is guaranteed to be consistent.
LASDatasets.make_consistent_header!
— Functionmake_consistent_header!(
header::LasHeader,
pointcloud::AbstractVector{<:NamedTuple},
vlrs::Vector{<:LasVariableLengthRecord},
evlrs::Vector{<:LasVariableLengthRecord},
user_defined_bytes::Vector{UInt8}
)
Ensure that a LAS header
is consistent with a given pointcloud
data coupled with sets of vlrs
, evlrs
and user_defined_bytes
LASDatasets.make_consistent_header
— Functionmake_consistent_header(
pointcloud::AbstractVector{<:NamedTuple},
point_format::Type{TPoint<:LasPoint},
vlrs::Vector{<:LasVariableLengthRecord},
evlrs::Vector{<:LasVariableLengthRecord},
user_defined_bytes::Vector{UInt8},
scale::Union{Real, AxisInfo, StaticArraysCore.SVector{3, <:Real}}
) -> LasHeader
Construct a LAS header that is consistent with a given pointcloud
data in a specific LAS point_format
, coupled with sets of vlrs
, evlrs
and user_defined_bytes
Third Party Packages
This package relies heavily upon PackedReadWrite.jl to speed up the reading and writing of LasPoint
s and some of our VLRs.
We also use BufferedStreams.jl to drastically reduce I/O overhead.
Point Records
As outlined in the User Fields Section, in order to offer full support of "extra point data" in our LAS files, we treat LAS point records as having a point, extra user fields and a set of undocumented bytes. Internally, however, this is broken up into 4 separate classes each implementing the LasRecord
abstract type. These correspond to each combination of a point with/without user fields/undocumented bytes.
LASDatasets.LasRecord
— Typeabstract type LasRecord
An abstract form of a LAS record. These are points with some additional information possibly included
LASDatasets.PointRecord
— Typestruct PointRecord{TPoint} <: LASDatasets.LasRecord
A LAS record that only has a point
point::Any
: The LAS point stored in this record
LASDatasets.ExtendedPointRecord
— Typestruct ExtendedPointRecord{TPoint, Names, Types} <: LASDatasets.LasRecord
A LAS record that has a LAS point and extra user-defined point fields. Note that these must be documented as ExtraBytes
VLRs in the LAS file
point::Any
: The LAS point stored in this recorduser_fields::LASDatasets.UserFields
: Extra user fields associated with this point
LASDatasets.UndocPointRecord
— Typestruct UndocPointRecord{TPoint, N} <: LASDatasets.LasRecord
A LAS record that has a point as well as additional undocumented bytes (i.e. that don't have an associated ExtraBytes
VLR)
point::Any
: The LAS point stored in this recordundoc_bytes::StaticArraysCore.SVector{N, UInt8} where N
: Array of extra bytes after the point that haven't been documented in the VLRs
LASDatasets.FullRecord
— Typestruct FullRecord{TPoint, Names, Types, N} <: LASDatasets.LasRecord
A LAS record that has a LAS point, extra user-defined fields and additional undocumented extra bytes
point::Any
: The LAS point stored in this recorduser_fields::LASDatasets.UserFields
: Extra user fields associated with this pointundoc_bytes::StaticArraysCore.SVector{N, UInt8} where N
: Array of extra bytes after the point that haven't been documented in the VLRs
This was done largely to increase performance of reading point records, since having one single type for point records would require more conditional checks to see if certain extra fields need to be read from a file which ends up congesting the read process. Instead, we use Julia's multiple dispatch and define Base.read
and Base.write
methods for each record type and avoid these checks and also decrease the type inference time when reading these into a vector.
Writing Optimisations
Typically, Julia is slower at performing multiple consecutive smaller writes to an IO channel than one much larger write. For this reason, when writing point records to a LAS file, we first construct a vector of bytes from the records and then write that whole vector to the file. This is possible since for each point record we know:
- How many bytes the point format is,
- How many user fields in this record and their data size in bytes and
- How many undocumented bytes there are.
This is done using LASDatasets.get_record_bytes
, which takes a collection of LAS records and writes each LAS field, user field and extra bytes collection into its correct location in the final byte vector.
In order to do this, we need to frequently access each field in a (potentially huge) list of records, which in normal circumstances is slow. We instead first pass our records into a StructVector
using StructArrays.jl which vastly increases the speed at which we can access these fields and broadcast over them.
LASDatasets.get_record_bytes
— Functionget_record_bytes(
records::StructArrays.StructArray{TRecord<:LASDatasets.LasRecord, 1},
vlrs::Vector{LasVariableLengthRecord}
) -> Any
Construct an array of bytes that correctly encodes the information stored in a set of LAS records
according to the spec
Automatic Support for User Fields
In order for the system to automatically handle a user supplying their own custom fields in a point cloud table, we make some checks on field types and have processes in place that ensure each column has an ExtraBytes
VLR associated to it.
Firstly, the LAS 1.4 spec officially supports the following data types directly: UInt8
, Int8
, UInt16
, Int16
, UInt32
, Int32
, UInt64
, Int64
, Float32
and Float64
This means that every ExtraBytes
VLR must have a data type among these values (note that vectors are not directly supported). LASDatasets.jl supports static vectors (static sizing is essential) as user fields as well by internally separating out vector components and adding an ExtraBytes
VLR for each component following the naming convention in the spec. That is, for a user field with N
entries, the individual component names that are documented in the VLRs are "col [0]", "col [1]", ..., "col [N - 1]".
When a user passes a custom field to the system, it will firstly check that the data type for this field is either one of the above types or an SVector
of one. If it is a vector, it will construct a list of the component element field names as above. Then, it will extract all ExtraBytes
VLRs and check if any of them have matching names and update them iff they exist so their data type matches the new type supplied. If these are new fields, a new ExtraBytes
VLR will be added per field name. Finally, the header is updated to reflect the new number of VLRs, the new data offsets and the new point record lengths.