cheonan dataset construction & train baseline(deprecated 02-20)

Dataset Construction

homography estimation
1. Vanishing Point based auto-calibration
  1. if vp is tilt because of low quality of cctv, the far object’s coordinate error becomes diverge.
2. Robust Automatic Monocular Vehicle Speed Estimation for Traffic Surveillance (iccv 2021)
  1. Detection + Tracker ( mask r-cnn, byte track)
  2. we have to cherry pick best timeline of the cctv for correct homography estimation
  3. mask the invalid area (in vp)
  4. compute homography (with Jacobian which is output of model)
construct dataset
1. compute bev map
2. with Jacobian(J), compute orientation(yaw)
  
  $$ \theta = arctan2(dy, dx) $$
3. ✅ we assume Z = 0 (road is plane) so we draw a line in the fv, in bev view, the line which is the boundary of road is explode to infinity coordinate. Violation of Planar Assumption
4. semantic segmentation the fv view → translation with bev view only class == road

data construction guidance

convert_cctv.py

sliding window
1. window_size: 100 frames (50 historical + 50 future)
2. stride: 50frames (50% overlap between scenarios)
3. creates multiple scenarios from one continuous recording

target_agent selection

just select most appeared vehicle

target_id = frame_counts.idxmax()  # Vehicle with most frames
if target_frame_count < min_target_frames (80):
    continue  # Skip if insufficient data

format conversion

just convert format

# input (raw)
timestamp,id,type,sub_type,x,y,theta,vx,vy
0.00,553,vehicle,car,-48.06,179.87,-1.88,0.04,0.25

# output (v2x-seq-tfd format)

city,timestamp,id,type,sub_type,tag,x,y,z,length,width,height,theta,v_x,v_y,intersect_id
cheonan,75.0,555,VEHICLE,CAR,TARGET_AGENT,-72.35,185.70,0.0,4.5,1.8,1.5,-1.33,-0.01,0.0,CCTV#CCTV051

change made
1. add tag column: “TARGET_AGENT” or “OTHERS”
2. Add dimensions: z=0, length=4.5m, width=1.8m, height=1.5m
train/val split
1. shuffles all scenarios
2. split 80% train / 20% validation

preprocess.py

ensure 100 timestamps
identify target_agent