13 Nov 00:38

27507e3

Compensation for depth effects + calibration loading and generation Latest

Latest

Perspective effects due to depth and proximity to frame borders are now compensated. They make the farther limb look smaller, which represents a 1-2% coordinate error at 10 m, and more if the camera is closer.

Pixel to meter coordinates got improved, taking into account:

the pixel to meter scale
the camera horizon angle
the floor height
the perspective effects with an additional depth parameter <-- this is new

Added a flexible configuration argument for the user to choose which depth information to use. Either:

Distance from camera to the lane: distance_m
Focal length in pixels: distance_m = f_px * H/h
Field of view in degrees or radians: distance_m = max(W,H)/2 / tan(fov/2)
Calibration file: distance_m = K[0,0] * H/h

In the same way, the camera horizon angle and the floor height can be specified with:

a manual input
a calibration file
automatically from gait

[px_to_meters_conversion] # Config.toml

# Compensate for perspective effects, which make the further limb look smaller. 1-2% coordinate error at 10 m, less if the camera is further away
perspective_value = 10 # Either camera-to-person distance (m), or focal length (px), or field-of-view (degrees or radians), or '' if perspective_unit=='from_calib'
perspective_unit = 'distance_m' # 'distance_m', 'f_px', 'fov_deg', 'fov_rad', or 'from_calib'

# Compensate for camera horizon 
floor_angle = 'auto' # float, 'from_kinematics', 'from_calib', or 'auto' # 'auto' is equivalent to 'from_kinematics', ie angle calculated from foot contacts. 'from_calib' calculates it from a toml calibration file. Use float to manually specify it in degrees
xy_origin = ['auto'] # [px_x,px_y], or ['from kinematics'], ['from_calib'], or ['auto']. # BETWEEN BRACKETS! # ['auto'] is equivalent to ['from_kinematics'], ie origin estimated at first foot contact, direction is direction of motion. ['from_calib'] calculates it from a calibration file. Use [px_x,px_y] to manually specify it in pixels (px_y points downwards)

# Optional calibration file
calib_file = ''      # Calibration file in the Pose2Sim toml format, or '' if not available

Note: If the user does not want perspective effects to be taken into effect, they can set distance_m to a very large value, such as 10000m for example.

Full Changelog: v0.8.24...v0.8.25

More about pixel-to-meter conversion

Pixel to meters scale

Let’s start with the pinhole camera model.
The intercept theorem tells us:
distance_m / f_px = Y / y (1)
With:

distance_m: the distance between the camera origin and the athlete in meters
f_px: the focal length (the distance between the camera origin and the sensor), converted from mm to pixels
Y: The coordinate of a point in the scene in meters
y: The coordinate of a point on the camera sensor in pixels

A particular case of it is the coordinates of the athlete:
distance_m / f_px = H / h (2)
With:

H: the height of the athlete in meters
h: the height of the athlete on the camera sensor in pixels

Now, the image coordinates are generally not taken from the center of the image / sensor, but from its top left corner (see image), which means that:

x = u - cu 							(3)  
y = - (v - cv)

With:

u,v: the image coordinates
cu, cv: the coordinates of the principal point of the sensor, approximated as the image center

So we end up with all these relations:
distance_m / f_px = H / h = X/(u-cu) = -Y/(v-cv) (4)

And the simplest case is resolved:

X = H / h * (u-cu) 							(5)
Y = - H / h * (v-cv)

Person height calculation:

Person height calculation:
I calculate the height in pixels from the following distances: `height = (rfoot+lfoot)/2 + (rshank+lshank)/2 + (rfemur+lfemur)/2 + (rback+lback)/2 + head`, with: Foot: distance from heel to ankle (or 10 cm if the pose does not provide any heel point) Shank: distance from ankle to knee Femur: distance from knee to hip Back: distance from hip to shoulder Head: distance from midshoulder to top head point (or distance from midshoulder to nose*1.33 if the pose model does not provide any top head point) Not all frames are good, therefore: I first remove the 20% fastest frames (potential outliers), the frames where speed is close to zero (person might be out of frame), and the frames where the hip and knee angles are below 45° (coordinates are imprecise when the person is crouching). And I take the trimmed mean over the remaining frames, after removing the 20% most extreme values.

I calculate the height in pixels from the following distances: height = (rfoot+lfoot)/2 + (rshank+lshank)/2 + (rfemur+lfemur)/2 + (rback+lback)/2 + head, with:

Foot: distance from heel to ankle (or 10 cm if the pose does not provide any heel point)
Shank: distance from ankle to knee
Femur: distance from knee to hip
Back: distance from hip to shoulder
Head: distance from midshoulder to top head point (or distance from midshoulder to nose*1.33 if the pose model does not provide any top head point)

Not all frames are good, therefore:

I first remove the 20% fastest frames (potential outliers), the frames where speed is close to zero (person might be out of frame), and the frames where the hip and knee angles are below 45° (coordinates are imprecise when the person is crouching).
And I take the trimmed mean over the remaining frames, after removing the 20% most extreme values.

Compensation for the camera horizon

The camera is not always set perfectly horizontally, and we may want to compensate for it. After evaluating the angle from gait kinematics or from a calibration file, we can change the coordinate system:

xang = x*cos(ang) + y*sin(ang) 				(6)  
yang = y*cos(ang) - y*sin(ang)

With ang the camera horizon angle.

Reinjecting this in the previous formula gives:

X = H / h * ((u-cu)*cos(ang) + (v-cv)*sin(ang)) 		(7)  
Y = - H / h * ((v-cv)*cos(ang) - (u-cu)* sin(ang))

Moreover, we want the floor to be situated at Y = 0 so that feet are in contact with the floor. Instead of considering that the pixel origin is at the center of the image, we determine (cx,cy) as the intersection between the left border of the image and the floor line, determined from kinematics. We can simply replace (cu,cv) by (cx,cy) in the previous formula:

X = H / h * ((u-cx)*cos(ang) + (v-cy)*sin(ang))		(8)  
Y = - H / h * ((v-cy)*cos(ang) - (u-cx)* sin(ang))

Determination of the camera horizon and floor height from kinematics:
The floor line (origin and angle) is estimated from the line that fits foot ground contacts. Ground contacts are estimated as the coordinates where the feet's horizontal velocities are close to zero (default: 7 px/s). Points with low confidence are removed (default: 0.3). The output of the fit is (slope, intercept). We obtain: ang = -arctan(slope) origin = (0,intercept) = (cu,cv), ie the intersection between the left border and the floor line gait_direction: left-to-right if ang>0, and right-to-left otherwise.

Determination of the camera horizon and floor height from kinematics:

The floor line (origin and angle) is estimated from the line that fits foot ground contacts.
Ground contacts are estimated as the coordinates where the feet's horizontal velocities are close to zero (default: 7 px/s). Points with low confidence are removed (default: 0.3). The output of the fit is (slope, intercept). We obtain:

ang = -arctan(slope)
origin = (0,intercept) = (cu,cv), ie the intersection between the left border and the floor line
gait_direction: left-to-right if ang>0, and right-to-left otherwise.

Compensation for perspective effects

The person’s left and right limbs are not situated at the same depth. Due to perspective, the further limb can look smaller, especially if the camera is close to the athlete. We can compensate for this effect.

We can extract from Equation (4):

distance_m / f_px = -Y/(v-cv) (9)

Adding in the depth offset, we get:

(distance_m + depth_offset) / f_px = X/(v-cv) (10)

With depth_offset the offset of the joint with regard to the body midline, in meters.

This equation can be reorganized to separate the coordinates at the body midline from their offsets due to depth:
Y = distance_m / f_px * (v-cv) + depth_offset / f_px * (v-cv) (11)

Now, there is a catch: we want the floor to be situated at Y=0, so (v-cv) should be replaced by (v-cy) in the first part of the equation. On the other hand, we want the depth offset to be null at the center of the image (and larger when getting further), so the second part of the equation should not be changed (see image). So we obtain:

Y = distance_m / f_px * (v-**cx**) + depth_offset / f_px * (v-cv) (12)

Equation (2) tells us that distance_m / f_px = H / h, so we can rearrange it:

X = H/h * \[ (u-cx) + depth_offset / distance_m * (u-cu)\] (13)

Finally, when taking the angles into consideration, the final formula becomes longer, although not more complex:

Final formula:
The floor line (origin and angle) is estimated from the line that fits foot ground contacts. Final formula: (14) X = H/h * [ ( (u-cx) + depth_offset / distance_m * (u-cu) ) * cos(ang) + ( (v-cy) + depth_offset / distance_m * (v-cv) ) * sin(ang) ] Y = - H/h * [ ( (v-cy) + depth_offset / distance_m * (v-cv) ) * cos(ang) - ( (u-cx) + depth_offset / distance_m * (u-cu) ) * sin(ang) ]

With:

X: the horizontal coordinate of a point in meters
Y: the vertical coordinate of a point in meters, pointing upwards
u: the horizontal image of ...

Assets 2

21 Oct 23:52

davidpagnon

v0.8.24

b80d602

Minor edits for conda-forge acceptance

Cchanged presentation video
Removed most dependencies from pyproject.toml since they are already included in Pose2Sim

Full Changelog: v0.8.23...v0.8.24

Assets 2

20 Oct 17:14

davidpagnon

v0.8.23

7c42fb4

Load and/or create a calibration file - v1

Instead of scaling with the height of the chosen person, you can now use a calibration file, which give more accurate results, more easily if the participants are not measured. The floor angle and the XY origin are recomputed.

Regardless of the chosen method, a calibration file is saved.

The rotation and translation matrices are calculated from the estimated floor angle, estimated XY origin, and an arbitrary camera-to-subject distance (10 m by default)
The intrinsic matrix is computed from the resolution of the video, the estimated floor angle, the height of the person in meters and in pixels, as well as the arbitrary camera-to-subject distance
The distortion is assumed to be inexistent

This also allows for a better visualization with overlay of the OpenSim skeleton to the video.

Sports2D_demo_ft.mp4

Next goals:

fix perspective effects
let the user optionally specify
- camera-to-subject distance
- focal_distance
- field of view
- recalculate_extrinsics
potentially also compensate for distorsion and for camera not parallel to the plane of motion

Full Changelog: v0.8.22...v0.8.23

Assets 2

20 Oct 09:41

davidpagnon

v0.8.22

abc026b

Better sorting algorithm, better compute_floor_line, and other fixes

Miscelaneous

Rewrote the compute_floor_line function (see below)
Rewrote sorting algorithm (see below)
Better excluded ghost persons, based on min_chunk_size instead of a threshold of 10 valid frames
Made the person selection UI more computationally efficient
Last frame is the end of the video if not specified by time_range
Handled edge case with save_plots and show_plots
Ignored numpy "Mean of empty slice" warning

Rewrote the `compute_floor_line` function

This function is used to compute the angle and the level of the floor. This will also be useful to generate and/or import a calibration file. Here is how it works:

Trim the trial around the frames where the person is actually in the camera view
Remove the frames with low confidence
Compute the speed of the big toe points, and select the frames where it is below 1 m/s. Assume that this is when the foot is touching the floor. --> Unchanged

Instead of the first 2 steps, I used to only remove all the NaNs from the trial, which could lead to wrong speed estimates if some frames were skipped.

Rewrote the sorting algorithm to better handle some swap issues

Added a distance constraint, so that if the best association between a frame and the next one is too far, it creates a new person instead
Switched to frame-by-frame median keypoint distance (instead of mean), in order not to ignore outliers
Ran non-maximum suppression (NMS) for bounding boxes recomputed from keypoints instead of straight from the person detector, which is more accurate and prevents having 2 boxes for the same person
Added a likelihood threshold in the keypoints used to recompute the bounding boxes to ignore points that were probably wrongly estimated
This made me rewrite the algorithm from scratch, but with the same logic. Among other edits: I used the Hungarian algorithm from scipy.optimize.linear_sum_assignment, so my custom greedy min_with_single_indices function is not required anymore. This is very slightly slower with 2-3 people in the scene, but faster in crowded scenes.

Full Changelog: v0.8.21...v0.8.22

More about the sorting algorithm

I've been further investigating the ID sorting issues, which can currently be classified into two categories:

ID jumps, when an ID jumps from a person to a ghost detection (specifically, after the person has exited the scene)
ID swaps, when a person’s ID swaps with another one’s (specifically, in crowded scenes or with full-size posters in the background)

(Note that I'm not talking about leg swaps, which will be the next project)

1. ID jumps

My sorting algorithm ensures that people in frame N+1 are associated with the right ID in frame N, by computing the mean frame-by-frame distance between keypoints for each detection.

I wanted it to be robust to people in the background, to persons close to each other, to them exiting and reentering the scene, to new persons appearing, etc. It also needed to be fast enough for its computing overhead to be negligible, which excludes all deep learning methods. Finally, I wanted it to work in the multi-person case, and in the 3D case when we need to attribute the right ID to each triangulated person.

It worked fast and rather well, but it used to be fooled by ghost detections after the athlete had exited the frame: in this case, any ghost detection becomes the best association, regardless of when this detection happened. This led to artificially high average speeds.

To make it robust to this case, I added a distance constraint, so that if the best association is too far, it creates a new person instead. In particular, once the athlete has exited the frame, they cannot be associated with a glitch detection in the middle of the image in the next frame. This made me rewrite most of the sorting algorithm, but it is now as fast, more readable, and with similar efficiency.

There is another reason for ID swaps, which took me forever to figure out: I used the median frame-by-frame distance, instead of the mean, in order to get rid of outliers; but in our case, if we only take the median distance value, we are losing a lot of information. See the following image: I switched to mean distance instead of median, and it handles a few more cases!

2. ID swaps

I had a further look at the failed poster videos. Keep in mind that when we analyze the motion of black guys, wearing dark clothes in front of a poster of people wearing the same clothes and being the same size, we enter very specific and tricky territory. Look at how messy this is: where is even the runner??

But it can still be improved by implementing Non-Maximum Suppression (NMS): when bounding boxes overlap beyond a threshold, the one with the lowest likelihood is removed.

The first parameter to tune is the overlapping threshold: increasing it means we are keeping more bounding boxes, and that we are less likely to lose any of them: good, but sometimes two bounding boxes represent the same person, and then we will have to choose among both of them, and be occasionally wrong: bad: that's an ID swap, which unfortunately happens with posters, when the likelihood of the person in the back is higher than the athlete's. So, not much luck in this first attempt.

A few weeks ago, I put out a change that made a big difference: I recomputed bounding boxes from pose estimation (keypoints). Our pose estimation model is top-down: there is a first model that produces person bounding boxes (the person detector), and a second one that detects keypoints inside of each of the boxes (the pose estimator). The bounding boxes recomputed from keypoints are much more refined than the ones coming out of the person detection model.

But we still have issues. For example, sometimes we've got an outlier point that stretches out a box and makes it not overlap when it should, which results in 2 bounding boxes for 1 person, and 50% chances of an ID swap. The next change I'm proposing is to set a likelihood threshold in the keypoints used to calculate the bounding boxes. Keypoints that are detected very far from the athlete likely have a low likelihood, so they should be removed and won't stretch out the bounding box. This way, the NMS algorithm would work properly, detect that there is an overlap, and get rid of the unnecessary bounding box.

Finally, there were some remaining ID swaps that really did not want to be fixed. Similarly to the overlapping threshold, manipulating the distance threshold may create knock-on issues in other cases, which is risky. See the next images for edge cases.

What next?

I could still further improve the sorting algorithm by:
- Setting a maximum number of missed frames before a person is forgotten, to prevent ghost detections from accumulating and slowing down the process. Won’t make a big difference until we decide to do long or continuous captures.
- Using color histograms to take appearance into account rather than only pose. In the case of our posters of people wearing the same clothing as the runner, it would likely not make any difference, and would represent more overhead
- Using a Kalman filter to take advantage of speed information. This is probably the best approach, but it is more challenging, and would slightly increase overhead.

Right now, it seems like the way it is works in all cases, aside from the ones with the poster background. For example, real people in the background are generally not an issue. So I declare it good enough till the next catastrophic hurdle.

Assets 2

30 Sep 10:56

davidpagnon

v0.8.21

3bdf1b5

More flexible config dictionary when running from python

Specifying a config dictionary only updates the new keys, instead of replacing the full dictionary:

    from Sports2D import Sports2D
    config_dict = {
      'base': {
        'nb_persons_to_detect': 1,
        'person_ordering_method': 'greatest_displacement'
        },
      'pose': {
        'mode': 'lightweight', 
        'det_frequency': 50
        }}
    Sports2D.process(config_dict)

Full Changelog: v0.8.20...v0.8.21

Assets 2

16 Sep 13:56

davidpagnon

v0.8.20

fd7ea61

Fixed ID swap and save_graphs

Optionally save the coordinate and angle plots (with filtered data overlaying the unfiltered ones) with --save_graphs True.
Fixed ID swaps:
Non-maximum suppression (NMS) consists in ignoring all bounding boxes that overlap by more than a 0.45 threshold except the one with highest confidence. RTMlib natively runs it.
However, this is done at the person detection level, and was not always satisfactory. Pose estimation does not exclusively look into the detected bounding box, and sometimes finds points outside. In practice, 2 bounding boxes that do not overlap much (large border) can lead to the same detected skeleton.
I recalculated bounding boxes at the pose estimation level, ie based on skeleton detection (thin border), and ran NMS from there. This fixes ID swaps in most cases.

N.B.: Requires the last Pose2Sim version. Reinstall Sports2D if you experience difficulties: pip install Sports2D -U

Full Changelog: v0.8.19...v0.8.20

Assets 2

10 Sep 09:11

davidpagnon

v0.8.19

add34e3

Much more efficient ram-wise for long videos

Videos can now be as long as needed: Sports2d has become much more efficient for RAM usage. RAM usage does not increase linearly with the number of frames anymore; in fact, it is not affected by the number of processed frames. Frames are not retained in memory until they are written to disk. Only the information to be overlaid on video (angles and point values, coordinates of overlaid skeleton, points, and bounding boxes) is kept.
The 'on_click' method used to select the number and order of persons is also more efficient now.
The webcam capture has also been improved.
The py-c3d library on which Sports2D was dependent used to not support numpy>=2.0. This has started to cause problem for some users. I worked on the py-c3d library to make it compatible with numpy>=2, made a pull request that was accepted, so this problem is solved now: EmbodiedCognition/py-c3d#54
Python 3.12 fully supported (but had to limit the opencv-python version, which forces numpy 2.0 in the future, which is incompatible with some OpenSim versions.

Full Changelog: v0.8.18...v0.8.19

Assets 2

22 Aug 12:34

davidpagnon

v0.8.18

1f9d72f

Fixed GCV filtering + Expired OpenMMLab certificate

Fixed gcv filtering (careful if series too short: noise can be considered as signal -> no filtering. See this conversation: https://stackoverflow.com/a/79740481/12196632, and this issue: scipy/scipy#23472
Temporarily ignore SSL certificate verification to handle OpenMMLab's expired certificate

Full Changelog: v0.8.17...v0.8.18

Assets 2

15 Aug 14:50

davidpagnon

v0.8.17

8e4625e

More filtering options (based on Pose2Sim)

Sports2D now uses the Pose2Sim filtering code. This is better, as when new features might be added, they will only need to be brought in one place (in the same way as the Sports2D scaling and IK code is dependent on Pose2Sim)

Added optional Hampel filter for outlier rejection, to be run befur further filtering methods.
Rejects outliers that are outside of a 95% confidence interal from the median in a sliding window of size 7.
Added GCV spline filter. Automatically determines optimal parameters for each point, which is good when some move faster than others (e.g., fingers vs hips). User can make it biased towards more smoothing (>1) or more fidelity to data (<1) by adjusting the smoothing_factor.
Might severely under or over-smooth due to numerical precision issues, so use with caution. I could not figure out a way to make it more robust, even after normalizing time and/or the observed y values. Feel free to intervene!
Acts as a Butterworth filter if cut_off_frequency is set to an integer instead of 'auto'.
Fixed Kalman filter. Simplified version, the user says how much more they trust triangulation results (measurements), than the assumption of constant acceleration (process)
The available filters are now: Butterworth, Kalman, GCV-spline, LOESS, Median, Gaussian, Butterworth on speed

Full Changelog: v0.8.16...v0.8.17

Assets 2

06 Aug 15:39

davidpagnon

v0.8.16

f763338

trim video around valid frames

trimmed video around valid frames

Assets 2

Releases: davidpagnon/Sports2D

Compensation for depth effects + calibration loading and generation

More about pixel-to-meter conversion

Pixel to meters scale

Compensation for the camera horizon

Compensation for perspective effects

Uh oh!

Minor edits for conda-forge acceptance

Uh oh!

Load and/or create a calibration file - v1

Uh oh!

Better sorting algorithm, better compute_floor_line, and other fixes

Miscelaneous

Rewrote the compute_floor_line function

Rewrote the sorting algorithm to better handle some swap issues

More about the sorting algorithm

1. ID jumps

2. ID swaps

What next?

Uh oh!

More flexible config dictionary when running from python

Uh oh!

Fixed ID swap and save_graphs

Uh oh!

Much more efficient ram-wise for long videos

Uh oh!

Fixed GCV filtering + Expired OpenMMLab certificate

Uh oh!

More filtering options (based on Pose2Sim)

Uh oh!

trim video around valid frames

Uh oh!

Rewrote the `compute_floor_line` function