Datasets

Developing algorithms requires data in the right quantities and containing the right features for the problem being solved. The process of collecting significant quantities of data and applying accurate labels suitable for machine intelligence is typically a labor intensive activity. SRC has embraced the use of synthetically generated data alongside collected data to support advanced machine intelligence algorithm development which operates in highly congested environments.

The following datasets were generated using SRC's data simulation tools called Synthetic Data Incubator for Cognitive and Tactical ELINT (SynDICaTE) and "Data Management and Bootstrapping Processing For Machine Learning and Classification Development" (U.S. Patent No. 11,651,290). They demonstrate some capabilities of the software to generate different environments containing configurable quantities and types of signals along with background interference. These settings are easily tailored giving the algorithm engineer or data scientist a great deal of flexibility in the data they are synthesizing.

A total of 3 datasets have been created which contain a number of file pairs. Each file pair consists of one in-Phase/Quadrature Phase (I/Q) data file written in binary format and one label file written in JSON format. The IQ data file is formatted as 32 bit float interleaved I and Q samples. Each label file consists of an array of rectangular bounding boxes which encompass each pulse along with a set of labels describing the pulse and which emitter it originated from. Example code which will allow you to easily parse and plot the data is also available for download.

Download

Description

Parameters

Parsing Utilities

Python script and Jupyter notebook that provides an example of how to parse, process, and inspect the data.

Sample File

A sample of the data you will find in this collection. It is a small 5 MB chunk of data pulled from a file in Dataset 2. It includes the raw I/Q data, along with two spectrogram images (one with label bound boxes and one without).

Dataset 1

30 unique file pairs which include signal pulses with an instantaneous bandwidth of 5 and 20 MHz, and non-rotating antenna.

Sample Bandwidth: 40 MHz
Duration: 1.5s
Number of signals per file: 2-3
Signal types: Radar

Dataset 2

30 unique files which include signal pulses with an instantaneous bandwidth of 5 and 20 MHz, unmodulated pulses, and stationary non-rotating antenna.

Sample Bandwidth: 50 MHz
Duration: 1.5s
Number of signals per file: 3-5
Signal types: Radar

Dataset 3

15 unique files which include pulses with an instantaneous bandwidth of 5 and 20 MHz, unmodulated pulses. Both fixed and rotating antennas. These files contain communication signals, along with moderate levels of background interference. The individual signals also have a larger variety of SNR associated to each.

Sample Bandwidth: 100 MHz
Duration: 0.5s
Number of signals per file: 10-15
Signal types: Radar, Comms

Datasets

Let us help solve your next "impossible" challenge. Contact us today!