=====Python 3.4 Code Beispiel - Skripte um Bild Dateien zu importieren und Duplikate zu löschen =====
**Python 3.4**
==== Bilder importieren ====
Mit der Library [[https://pypi.python.org/pypi/ExifRead|exifread]] können sehr einfach die [[https://de.wikipedia.org/wiki/Exchangeable_Image_File_Format|Exif Daten]] aus einer Bild Datei ausgewertet werden.
Ziel ist es aus einem Quellverzeichnis alle Bilder aufzulisten und je nach Datum wieder in einen Ordner pro jeweiligen Tag auf dem Zielsystem zu importieren.
Liegen die Exif Daten nicht vor, wird der Zeitstempel der letzten Modifikation an der Datei verwendet.
Der Aufruf erfolgt mit der Angabe des Source Verzeichnisses, des Destination Verzeichnisses und der Unterordnertiefe im Source Verzeichnis.
Beispiel:
D:\Python34\python.exe .\importImg.py -h
usage: importImg.py -s -d -r
#wie
PS D:\entwicklung\work\python\ImageImp> D:\Python34\python.exe .\importImg.py -s "F:\DCIM" -d D:\data\bilder -r 1
--========================================
-- Info :: Read all files from F:\DCIM\*\*.m*
-- Info :: Copy files to D:\data\bilder
...
-- Info :: Copy Image P1070277.JPG to directory D:\data\bilder\20150415
...
-- Info :: Directory still exits :: D:\data\bilder\20150524
-- Info :: File P1080259.JPG exits with the same content
--========================================
-- Finish with :: 1549 files in 0 new directories
-- Found duplicate files :: 1541
-- The run needs :: 618.5560 seconds
-- Copy size :: 142.031 MB
--========================================
===Funktion ===
Die wichtigen Schritte in dem Skript:
* Parameter erkennen ( opts, args = getopt.getopt(argv, "hs:d:r:", ["src=", "dest=", "rec="]))
* Alle Dateien in der Quelle in eine Liste einlesen ( fileList = glob.glob(path_name) )
* Liste auf unerwünschte Element filtern ( file.endswith(thumbsDBFile): fileList.remove(file))
* Über die Liste der Dateien iterieren (for file in fileList:)
* Nur den Namen der Datei auswerten ( imgFilename = ntpath.basename(file))
* Datei readonly öffnen ( imgFile = open(file, 'rb'))
* Exif Tag "EXIF DateTimeDigitized" auslesen (tags = exifread.process_file(imgFile, stop_tag='EXIF DateTimeDigitized'))
* Exif String in ein Datum wandlen ( createDate = datetime.datetime.strptime(str(tags['EXIF DateTimeDigitized']), '%Y:%m:%d %H:%M:%S'))
* Prüfen ob in akuellen Run schon das Verzeichnis angelegt wurde, ansonst neu anlegen ( os.makedirs(dirPath))
* Prüfen ob die Datei im Ziel schon existiert ( compare = filecmp.cmp(imgFile.name, newFileName) )
* Fall nicht Datei kopieren ( shutil.copy2(imgFile.name, dirPath) )
* Falls ja, neuen Namen erstellen (rekursive Funktion createNewName) und unter dem neuen Namen erstellen, falls die Datei nicht bereits unter einen ähnlichen Namen im Verzeichnis existiert
=== Der Code ===
__author__ = 'gpipperr'
import datetime, time
import glob, filecmp, ntpath, shutil
import os, errno, sys, getopt
# Library to read the exif informatoin
# load with .\python -m pip install exifread --upgrade
# from https://pypi.python.org/pypi/ExifRead
import exifread
# Get the change date of a file
def modification_date(filename):
t = os.path.getmtime(filename)
print("-- Info :: EXIF data not available - use mTime {0} of the file".format(datetime.datetime.fromtimestamp(t)))
return datetime.datetime.fromtimestamp(t)
# check for a unique filename in the import directory
# if the new unique name still exits check if this file is the same as the original file
# if yes do not copy and search the next unique file name until a new name is found
def createNewName(filename, importDir, fileNo, origFile):
firstPart = filename.split(".")[0]
extension = filename.split(".")[1]
newFName = firstPart + "-" + str(fileNo) + "." + extension
# If exits
if os.path.isfile(importDir + os.path.sep + newFName):
# Check if this new Name is still the original file
# if shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared.
fcompare = filecmp.cmp(origFile, importDir + os.path.sep + newFName,shallow=False)
if fcompare:
print("-- Info :: File {0} still exits with same content as original file {1}".format(newFName, origFile))
return newFName
else:
fileNo += 1
# call again to find the next possible name
return createNewName(newFName, importDir, fileNo, origFile)
else:
# Copy the new file and return the name of the new file
shutil.copy2(origFile, importDir + os.path.sep + newFName)
setStatisticTotalSize(os.path.getsize(importDir + os.path.sep + newFName))
print("-- Info :: File {0} exits but with other content, create new File {1}".format(origFile,
importDir + os.path.sep + newFName))
return newFName
# Remember the global Size of all copied files
def setStatisticTotalSize(size):
global totalFileSize
totalFileSize += size
# global for the total filesize
totalFileSize = 0
# Main Script part
def main(argv):
# Parameter 1 - Import Directory
# Parameter 2 - Image Main Folder
# Parameter 3 - Subfolder Level
path_name = '-'
dest_name = '-'
recursiveLevel = 0
try:
opts, args = getopt.getopt(argv, "hs:d:r:", ["src=", "dest=", "rec="])
except getopt.GetoptError:
print("usage: importImg.py -s -d -r ")
sys.exit(2)
for opt, arg in opts:
if opt == '-h':
print("usage: importImg.py -s -d -r ")
sys.exit()
elif opt in ("-s", "--src"):
path_name = arg
elif opt in ("-d", "--dest"):
dest_name = arg
elif opt in ("-r", "--rec"):
recursiveLevel = int(arg)
# check if Directory exists and if the * is necessary
# Source
if os.path.isdir(path_name):
if path_name.endswith(os.path.sep):
path_name += ("*" + os.path.sep) * recursiveLevel
path_name += "*.*"
else:
path_name += os.path.sep
path_name += ("*" + os.path.sep) * recursiveLevel
path_name += "*.*"
else:
print("-- Error :: 05 Source Directory (-s) {0} not found".format(path_name))
print("usage: importImg.py -s -d ")
sys.exit(2)
# Destination
# check and strip last / if necessary
if not os.path.isdir(dest_name):
print("-- Error :: 04 Destination Directory (-d) {0} not found".format(dest_name))
print("usage: importImg.py -s -d ")
sys.exit(2)
else:
if dest_name.endswith(os.path.sep):
dest_name = dest_name[:-1]
# Remember the start time of the program
start_time = time.clock()
print("--" + 40 * "=")
print("-- Info :: Read all files from {0}".format(path_name))
print("-- Info :: Copy files to {0}".format(dest_name))
print("--" + 40 * "=")
fileCount = 0
fileExistsCount = 0
dirCount = 0
dirPathList = []
# Get the list of all Files
fileList = glob.glob(path_name)
# remove Thumbs.db if exist from the list
# Internal Windows file no need to copy it
thumbsDBFile = "Thumbs.db"
for file in fileList:
if file.endswith(thumbsDBFile):
fileList.remove(file)
# Loop one read files in Import Directory
for file in fileList:
fileCount += 1
createDate = datetime.datetime.now()
imgFilename = '-'
newFileName = '-'
try:
# get only the filename without the path
imgFilename = ntpath.basename(file)
# Open image file for reading (binary mode)
imgFile = open(file, 'rb')
# Read the image tags, if not possible read last change date
try:
tags = exifread.process_file(imgFile, stop_tag='EXIF DateTimeDigitized')
# Transform to a real date
# https://docs.python.org/2/library/datetime.html#strftime-and-strptime-behavior
# 2010:08:22 14:13:42 %Y:%m:%d %H:%M:%S
createDate = datetime.datetime.strptime(str(tags['EXIF DateTimeDigitized']), '%Y:%m:%d %H:%M:%S')
except:
# if no exif tag use the last modification date
# print("file with not exif information ::{0}".format(imgfile.name))
createDate = modification_date(file)
# Create Import Directory if not exits
# Remember the directory after the first create
# to avoid exception with still existing directories
dirPath = dest_name + os.path.sep + "{0:%Y%m%d}".format(createDate)
try:
if dirPath not in dirPathList:
dirPathList.append(dirPath)
os.makedirs(dirPath)
dirCount += 1
print("-- Info :: Create Directory :: {0}".format(dirPath))
except OSError as exception:
if exception.errno != errno.EEXIST:
print(
"-- Error :: 03 Directory {0} creation error :: see error {1}".format(dirPath,
sys.exc_info()[0]))
else:
print("-- Info :: Directory still exits :: {0}".format(dirPath))
pass
# Copy the file to the new directory
newFileName = dirPath + os.path.sep + imgFilename
try:
# Check if the same filename still exists
if os.path.isfile(newFileName):
# if shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared.
compare = filecmp.cmp(imgFile.name, newFileName,shallow=False)
if compare:
print("-- Info :: File {0} exits with the same content".format(imgFilename))
fileExistsCount += 1
else:
newUniqueFileName = createNewName(imgFilename, dirPath, 0, imgFile.name)
else:
# copy2 preserves the original modification and access info (mtime and atime) in the file metadata.
shutil.copy2(imgFile.name, dirPath)
setStatisticTotalSize(os.path.getsize(newFileName))
print("-- Info :: Copy Image {0:50} to directory {1}".format(imgFilename, dirPath))
except OSError as exception:
print("-- Error :: 02 File {0} in directory {1} :: error {2}".format(imgFile.name, dirPath,
sys.exc_info()[0]))
if not imgFile.closed:
imgFile.close()
except:
print("-- Error :: 01 Error with {0} in {1} :: error {2}".format(imgFile.name, path_name, sys.exc_info()))
pass
# print statistics
print("--" + 40 * "=")
print("-- Finish with :: {0} files in {1} new directories".format(fileCount, dirCount))
print("-- Found duplicate files :: {0}".format(fileExistsCount))
print("-- The run needs :: {0:5.4f} seconds".format(time.clock() - start_time))
print("-- Copy size :: {0:5.3f} MB".format(totalFileSize / 1024 / 1024))
print("--" + 40 * "=")
if __name__ == "__main__":
main(sys.argv[1:]);
==== Dubletten suchen====
Im ersten Schritt wird in einem einzelnen Verzeichnis über das Vergleichen der Dateien nach Doppelten gesucht.
D:\Python34\python.exe .\removeDuplicateFiles.py -h
usage: removeDuplicateFiles.py -s -t
D:\Python34\python.exe .\removeDuplicateFiles.py -s D:\data\bilder\20110915 -t D:\temp\saveimg
D:\Python34\python.exe .\removeDuplicateFiles.py -s D:\temp\20140818\ -t D:\temp\saveimg
--========================================
-- Info :: Read all files in :: D:\temp\20140818\*.*
-- Info :: Copy duplicates to :: D:\temp\saveimg
--========================================
-- Info :: Check File "D:\temp\20140818\DSCN0025.TIF"
-- Info :: File "D:\temp\20140818\DSCN9301 - Copy.TIF " exits with the same content as file D:\temp\20140818\DSCN9301.TIF
...
--========================================
...
-- Info :: Move Duplicate File "D:\temp\20140818\DSCN9301 - Copy.TIF " to "D:\temp\saveimg\DSCN9301 - Copy.TIF"
--========================================
--========================================
-- Finish with :: 17 files in directorie D:\temp\20140818\*.*
-- Found duplicate files :: 2
-- The run needs :: 0.2044 seconds
--========================================
In der nächsten Lösung wird ein kompletter Dateibaum eingelesen, die Hashes aller Dateien gelesen und dann die doppelten aussortiert siehe auch [[python:python_hash_image_files|Dateien in Python hashen ]].
Funktion:
* Parameter erkennen ( opts, args = getopt.getopt(argv, "hs:t:", ["src=", "tmp="]))
* Alle Dateien in der Quelle in eine Liste einlesen ( masterFileList = glob.glob(path_name))
* Über die Liste der Dateien iterieren (for masterfile in fileList:)
* Mit jeder Datei über die Dateien in dem Verzeichnis iterieren (for cfile in slaveFileList:)
* Datei vergleichen (compare = filecmp.cmp(masterfile, cfile, shallow=False))
*Falls eine doppelte Datei gefunden, diese in das Temp Verzeichnis kopieren, dabei prüfen ob die Datei im Ziel schon existiert mit moveDuplicateFile
===Code==
__author__ = 'gpipperr'
import datetime, time
import glob, filecmp, ntpath, shutil
import os, errno, sys, getopt
# check for a unique filename in the temp directory and move the file to the tmp directory
# if the new unique name still exits check if this file is the same as the original file
# if yes do not copy and search the next unique file name until a new name is found
def moveDuplicateFile(filename, tempDir, fileNo, origFile):
firstPart = filename.split(".")[0]
extension = filename.split(".")[1]
newFName = firstPart + ("-" + str(fileNo) if fileNo > 0 else "") + "." + extension
# If exits
if os.path.isfile(tempDir + os.path.sep + newFName):
# Check if this new Name is still the original file
fcompare = filecmp.cmp(origFile, tempDir + os.path.sep + newFName, shallow=False)
if fcompare:
# as the original file is still save - delete the original one
os.remove(origFile)
print("-- Info :: Delete File \"{0:40}\" - still exits with the same content in \"{1}\"".format(origFile,
newFName))
return newFName
else:
fileNo += 1
# call again to find the next possible name
return moveDuplicateFile(newFName, tempDir, fileNo, origFile)
else:
# Copy the new file and return the name of the new file
shutil.move(origFile, tempDir + os.path.sep + newFName)
print("-- Info :: Move Duplicate File \"{0:40}\" to \"{1}\"".format(origFile, tempDir + os.path.sep + newFName))
return newFName
# Main Script part
def main(argv):
# Parameter 1 - Image Main Folder
path_name = '-'
temp_path = 'd:\\temp'
recursiveLevel = 0
usageString = "usage: removeDuplicateFiles.py -s -t "
try:
opts, args = getopt.getopt(argv, "hs:t:", ["src=", "tmp="])
except getopt.GetoptError:
print(usageString)
sys.exit(2)
for opt, arg in opts:
if opt == '-h':
print(usageString)
sys.exit()
elif opt in ("-s", "--src"):
path_name = arg
elif opt in ("-t", "--tmp"):
temp_path = arg
# check if Directory exists and if the * is necessary
# Source
if os.path.isdir(path_name):
if path_name.endswith(os.path.sep):
path_name += ("*" + os.path.sep) * recursiveLevel
path_name += "*.*"
else:
path_name += os.path.sep
path_name += ("*" + os.path.sep) * recursiveLevel
path_name += "*.*"
else:
print("-- Error :: 03 Source Directory (-s) {0} not found".format(path_name))
print(usageString)
sys.exit(2)
# Temp Destination
# check and strip last / if necessary
if not os.path.isdir(temp_path):
print("-- Error :: 02 temp Directory (-t) {0} not found".format(temp_path))
print(usageString)
sys.exit(2)
else:
if temp_path.endswith(os.path.sep):
dest_name = temp_path[:-1]
# Remember the start time of the program
start_time = time.clock()
print("--" + 40 * "=")
print("-- Info :: Read all files in :: {0}".format(path_name))
print("-- Info :: Copy duplicates to :: {0}".format(temp_path))
print("--" + 40 * "=")
fileCount = 0
fileExistsCount = 0
# Get the list of all Files
masterFileList = glob.glob(path_name)
slaveFileList = glob.glob(path_name)
candiateFile = []
# Loop one read files in Import Directory
for masterfile in masterFileList:
fileCount += 1
createDate = datetime.datetime.now()
# Loop again over all files
# compare the files, if a match found remove from list
print("-- Info :: Check File \"{0}\"".format(masterfile))
for cfile in slaveFileList:
# only if not the same file
if masterfile != cfile:
# if shallow is true, files with identical os.stat() signatures are taken to be equal. Otherwise, the contents of the files are compared.
compare = filecmp.cmp(masterfile, cfile, shallow=False)
if compare:
print("-- Info :: File \"{0:40}\" exits with the same content as file {1}".format(masterfile, cfile))
# remove only if still exits
# if more then one file is identical
# you need more then one run
if masterfile in slaveFileList:
slaveFileList.remove(masterfile)
# Add the file with the longest name to the duplicate file list
longestFileName = masterfile if len(masterfile) > len(cfile) else cfile
# Avoid duplicate entries
if longestFileName not in candiateFile:
candiateFile.append(longestFileName)
fileExistsCount += 1
# Do something with the duplicates
print("--" + 40 * "=")
for file in candiateFile:
# move the files to temp
try:
imgFilename = ntpath.basename(file)
moveDuplicateFile(filename=imgFilename, tempDir=temp_path, fileNo=0, origFile=file)
except:
print("-- Error :: 01 - Move File {0} :: error {1}:".format(file, sys.exc_info()))
pass
if fileExistsCount < 1:
print("-- Found no duplicate files in directory {0}".format(path_name))
print("--" + 40 * "=")
# print statistics
print("--" + 40 * "=")
print("-- Finish with :: {0} files in directorie {1}".format(fileCount, path_name))
print("-- Found duplicate files :: {0}".format(fileExistsCount))
print("-- The run needs :: {0:5.4f} seconds".format(time.clock() - start_time))
print("--" + 40 * "=")
if __name__ == "__main__":
main(sys.argv[1:]);
==== In eine Exe Datei wandeln ====
für Python 2.x siehe http://www.py2exe.org/, für höhere Python Versionen siehe http://cx-freeze.sourceforge.net/
Installation von cx_Freeze
.\python -m pip install cx_Freeze --upgrade
Erzeugen eines Exe Datei:
D:\Python34\python.exe D:\Python34\Scripts\cxfreeze .\importImg.py --target-dir dist
Unter dem Unterverzeichnis "dist" liegt nur alles was notwenig ist um auch ohne installiert Python Umgebung das Script als EXE zu starten.
====Android Mobil Telefon einbinden====
Als nächstes sollen auch von einen Android Mobil Telefon die Bild Daten importiert werden.
Das Problem ist nun aber, das unter Windows keine Laufwerksbuchstabe für den Handy Speicher vergibt.
=== Lösungsbeispiel für die Powershell===
siehe [[https://gist.github.com/cveld/8fa339306f8504095815|Crawl your Android device attached via usb with PowerShell]]
=== Pyhton===
Wie kann das nun aber auch in Python gelöst werden? In Enddefekt muss es ja unter Windows auch eine Art Device Pointer geben, der direkt angesprochen werden kann, im Explorer ist das Laufwerk ja auch sichtbar.
Ideen:
* http://stackoverflow.com/questions/827371/is-there-a-way-to-list-all-the-available-drive-letters-in-python
* http://timgolden.me.uk/python/wmi/cookbook.html
*