2

Closed

Add filtering for copyfile

description

When copying a directory, sometimes it's needed to filter the files/subfolders to be copied. For instance, if files are picked up from a SVN folder, the ".svn" folders are not needed to be copied.
The most suitable solution would be to use a pattern/regex to include/exclude entries. Alternatively, a different way to do filtering could be based on files/folders attributes like hidden and read only.
Closed Jun 2, 2011 at 1:10 PM by
Only exclusion pattern was implemented and attribute was renamed form "exclusionpattern" to "exclude".
Implementation goes through the complete folder structure and skip a file/folder if it matches the regular expression.
Change Set #78582.

comments

icnocop wrote Apr 7, 2011 at 12:15 AM

romanws uploaded patch #9007 to support "inclusionpattern" and "exclusionpattern" attributes to the "copyfile" element. Both accept a regular expression to include/exclude a file or folder, based on its path.

romanws wrote May 14, 2011 at 5:46 PM

The names "inclusionpattern" and "exclusionpattern" have been replaced by "include" and "exclude".

I realized that my patch had a bug in the "include" attribute.

Actually, the desired behavior for "include" is not as simple as I thought at first.

With "exclude" it is easy to cut the folder tree when a folder match the exclusion pattern. Suppose we have this folder structure:
c:\ant\bin
c:\ant\doc
c:\ant\lib

If we do this:


The result will be:
Skip 'c:\ant\doc'

With "include", the difference is that when a folder doesn't match the pattern, it is possible for a child to do match the same pattern. For instance, think about this folder structure:
c:\ant\bin
c:\ant\doc\abc
c:\ant\doc\bin
c:\ant\doc\xyz
c:\ant\lib

If we do this:

The result I would expect to have is to get folders 1, 3, and 5 copied; and 2 and 4 skipped.
This will imply that the implementation should always explore the complete folder/file tree and test if each leaf in the tree matches the pattern and copy it. If we implement this (currently, I have done it this way in my environment) we will have 2 consequences that I don't like:
a) The log will be concise when using "exclude" (Skip 'c:\ant\doc') but will be verbose when using "include" (Skip 'c:\ant\doc\abc\aaa.txt', and the same for all files under abc).
b) The folders will be always created and if none of the files within them matches the pattern, the folder will be left empty. This is because of how the algorithm is currently implemented: it creates the folder before evaluating its children.

I would like to have an opinion on how to handle this. I see 3 possible ways to go:
1- Not to add support for "include". Not for technical reasons, but because its behavior could be unintuitive and error prone - e.g.: if I want to copy ant\bin and ant\lib folders, probably will not even think about the ant\doc can contain a child (file or folder) with name matching "bin" or "lib".
2- Implement "include" as I described above - with verbose log and empty folders.
3- Implement it with a more sophisticated algorithm. When a folder that doesn't match the pattern is found, look ahead to see if it has a child matching the pattern. If yes, proceed creating the folder and processing children; if not, cut the branch and do a nice single log line. This approach will have some impact on performance. We should document it, saying that the "exclude" attribute is always recommended.

dblock wrote May 22, 2011 at 6:21 PM

Roman, sorry for the late reply, I missed those patches. Do you want me to apply them or do you want me to wait till you fix this issue?