Assembly¶
As seen in the Tutorial, Clique provides the high-level
assemble()
function to support automatically assembling items into
relevant collections based on a common changing
numerical component:
>>> import clique
>>> collections, remainder = clique.assemble([
... 'file.0001.jpg', 'file.0002.jpg', 'file.0003.jpg',
... 'file.0001.dpx', 'file.0002.dpx', 'file.0003.dpx'
... ])
>>> print collections
[<Collection "file.%04d.dpx [1-3]">, <Collection "file.%04d.jpg [1-3]">]
Note
Any items that are not members of a returned collection can be found in the remainder list.
However, as mentioned in the Introduction, Clique has no understanding of what a numerical component represents. Therefore, it takes a conservative approach and considers all collections with a common changing numerical component as valid. This can lead to surprising results at first:
>>> collections, remainder = clique.assemble([
... 'file_v1.0001.jpg', 'file_v1.0002.jpg', 'file_v1.0003.jpg',
... 'file_v2.0001.jpg', 'file_v2.0002.jpg', 'file_v2.0003.jpg'
... ])
>>> print collections
[<Collection "file_v1.%04d.jpg [1-3]">,
<Collection "file_v2.%04d.jpg [1-3]">,
<Collection "file_v%d.0001.jpg [1-2]">,
<Collection "file_v%d.0002.jpg [1-2]">,
<Collection "file_v%d.0003.jpg [1-2]">]
Here, Clique returned more collections than might have been expected, but, as can be seen, they are all valid collections. This is an important feature of Clique - it doesn’t attempt to guess. Instead, it is designed to be wrapped easily with domain specific logic to get the results desired.
There are a couple of ways to influence the returned result from the
assemble()
function:
- Pass a minimum_items argument.
- Pass custom patterns.
Minimum Items¶
By default, Clique will filter out any collection from the returned result of
assemble()
that has less than two items. This value can be customised
per assemble()
call by passing minimum_items as a keyword:
>>> print clique.assemble(['file.0001.jpg'])[0]
[]
>>> print clique.assemble(['file.0001.jpg'], minimum_items=1)[0]
[<Collection "file.%04d.jpg [1]">]
Patterns¶
By default, Clique finds all groups of numbers in each item and creates collections that have common head, tail and padding values.
Custom patterns can be used to tailor the process. Pass them as a list of
regular expressions (either strings or re.RegexObject
instances):
>>> items = [
... 'file.0001.jpg', 'file.0002.jpg', 'file.0003.jpg',
... 'file.0001.dpx', 'file.0002.dpx', 'file.0003.dpx'
... ])
>>> print clique.assemble(items, patterns=[
... '\.(?P<index>(?P<padding>0*)\d+)\.\D+\d?$'
... ])[0]
[<Collection "file_v1.%04d.jpg [1-3]">,
<Collection "file_v2.%04d.jpg [1-3]">]
Note
Each custom expression must contain the expression from
DIGITS_PATTERN
exactly once. An easy way to do this is using
Python’s string formatting.
So, instead of:
'\.(?P<index>(?P<padding>0*)\d+)\.\D+\d?$'
use:
'\.{0}\.\D+\d?$'.format(clique.DIGITS_PATTERN)
Some common expressions are predefined in the PATTERNS
dictionary (contributions welcome!):
>>> print clique.assemble(items, patterns=[clique.PATTERNS['frames']])[0]
[<Collection "file_v1.%04d.jpg [1-3]">,
<Collection "file_v2.%04d.jpg [1-3]">]
Case Sensitivity¶
When assembling collections, it is sometimes useful to be able to specify whether the case of the items should be important or not. For example, “file.0001.jpg” and “FILE.0002.jpg” could be identified as part of the same collection or not.
By default the assembly is case sensitive, but this can be controlled by setting the case_sensitive argument:
>>> items = ['file.0001.jpg', 'FILE.0002.jpg', 'file.0003.jpg']
>>> print clique.assemble(items, case_sensitive=False)
[<Collection "file.%04d.jpg [1-3]">], []
>>> print clique.assemble(items, case_sensitive=True)
[<Collection "file.%04d.jpg [1, 3]">], ['FILE.0002.jpg']
A common use case might be to ignore case sensitivity when on a Windows or Mac machine:
>>> import sys
>>> clique.assemble(
... items, case_sensitive=sys.platform not in ('win32', 'darwin')
... )