This module contains all the necessary functions for extracting and organizing features from videos.

class Extractor[source]

Extractor(extractor) :: ABC

Helper class that provides a standard way to create an ABC using inheritance.

class SIFTExtractor[source]

SIFTExtractor(extractor) :: Extractor

Exposed SIFTExtractor class used for retrieving features.

class CNNExtractor[source]

CNNExtractor(extractor) :: Extractor

Exposed CNNExtractor class used for retrieving features.

imagenet_normalize_transform[source]

imagenet_normalize_transform()

get_transforms[source]

get_transforms(size=224)

class SimCLRExtractor[source]

SimCLRExtractor(extractor) :: Extractor

Exposed CNNExtractor class used for retrieving features.

model = SimCLRModel.load_from_checkpoint(checkpoint_path='/tf/data/models/simclr/checkpointepoch=98.ckpt').eval()
simclr_extractor = SimCLRExtractor(model)
features_0 = simclr_extractor.extract(sampled_imgs[7]).detach().numpy()
features_1 = simclr_extractor.extract(sampled_imgs[9]).detach().numpy()
imgs = get_rand_imgs(video_paths[0], 30_000, n = 100)

gen_vcodebook[source]

gen_vcodebook(path, img_paths, model_name, extractor, vwords)

Constructs a visual codebook based on the given images. You can change vwords to increase the vocabulary of the codebook.

fname = f'/tf/data/models/features/M00/15000n_features.pkl'
features_list = pickle.load(open(fname, 'rb'))
len(features_list)
1504405

gen_codebooks[source]

gen_codebooks(path, models, vwords, samples=15000)

vwords = 1_000
codebook = gen_vcodebook(sampled_imgs, extractor, vwords)
labels = codebook.predict(features)
hist = np.histogram(labels, bins = range(vwords + 1))
plt.bar(range(vwords), hist[0])
<BarContainer object of 20 artists>

get_df[source]

get_df(imgs, extractor, codebook, vwords)

Generates the document frequency for the visual words

df = get_df(imgs, extractor, codebook, vwords)
plt.bar(range(vwords), df)
plt.show()

get_bovw[source]

get_bovw(vid_path, extractor, codebook, vwords, n=None)

Generates the bag of visual words (bovw) for an entire video.

extract_features[source]

extract_features(vid, extractor, fps=30, frames_to_keep=1)

new_get_bovw[source]

new_get_bovw(features, codebook, vwords)

vid = Video(video_paths[3])
features = extract_features(vid, extractor, fps = 30, frames_to_keep = 1)
bovw = new_get_bovw(features, codebook, vwords, frames_to_keep = 2)
plt.bar(range(vwords), hist)
plt.show()
hist
array([ 1,  0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 10,  0,  1])

calc_tf_idf[source]

calc_tf_idf(tfs, dfs)

tf_idf_pix2 = calc_tf_idf(hist, df)

hist2, bovw2 = get_bovw(video_paths[2], extractor, codebook)
tf_idf_pix1 = calc_tf_idf(hist2, df)
np.dot(tf_idf_pix1, tf_idf_pix2) / (np.linalg.norm(tf_idf_pix1) * np.linalg.norm(tf_idf_pix2))