如何列出目录的所有文件?

[英]How do I list all files of a directory?


How can I list all files of a directory in Python and add them to a list?

如何列出Python中一个目录的所有文件并将其添加到列表中?

30 个解决方案

#1


2627  

os.listdir() will get you everything that's in a directory - files and directories.

listdir()将获取目录中的所有内容——文件和目录。

If you want just files, you could either filter this down using os.path:

如果你想要的只是文件,你可以用os来过滤掉。

from os import listdir
from os.path import isfile, join
onlyfiles = [f for f in listdir(mypath) if isfile(join(mypath, f))]

or you could use os.walk() which will yield two lists for each directory it visits - splitting into files and dirs for you. If you only want the top directory you can just break the first time it yields

或者你也可以使用os.walk(),它会为它访问的每个目录生成两个列表——分成文件和dirs。如果你只想要顶部的目录,你可以在第一次的时候打破它。

from os import walk

f = []
for (dirpath, dirnames, filenames) in walk(mypath):
    f.extend(filenames)
    break

And lastly, as that example shows, adding one list to another you can either use .extend() or

最后,如这个示例所示,将一个列表添加到另一个列表中,您可以使用.extend()或。

>>> q = [1, 2, 3]
>>> w = [4, 5, 6]
>>> q = q + w
>>> q
[1, 2, 3, 4, 5, 6]

Personally, I prefer .extend()

就我个人而言,我更喜欢.extend()

#2


1061  

I prefer using the glob module, as it does pattern matching and expansion.

我喜欢使用glob模块,因为它会进行模式匹配和扩展。

import glob
print(glob.glob("/home/adam/*.txt"))

Will return a list with the queried files:

将返回带有查询文件的列表:

['/home/adam/file1.txt', '/home/adam/file2.txt', .... ]

#3


473  

import os
os.listdir("somedirectory")

will return a list of all files and directories in "somedirectory".

将返回“somedirectory”中所有文件和目录的列表。

#4


252  

Get a list with the files

I have made also a short video here: Video

我在这里也做了一个短片:视频。

os.listdir(): get files in current dir (Python 3)

listdir():在当前目录中获取文件(Python 3)

The simplest way to have the file in the current dir in Python 3 is this. It's really simple, use the os module and the listdir() function and you'll have the file in that dir (and eventual folders that are in the dir, but you will not have the file in the subdirectory, for that you can use walk - I will talk about it later).

在Python 3的当前目录中拥有文件的最简单方法是这样的。它非常简单,使用os模块和listdir()函数,并且您将在该目录中拥有文件(以及在dir中的最终文件夹,但是在子目录中不会有文件,因为您可以使用walk -我稍后将讨论它)。

>>> import os
>>> arr = os.listdir()
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

Getting the full path name

获取完整路径名。

As you noticed, you don't have the full path of the file in the code above. If you need to have the absolute path, you can use another function of the os.path module called _getfullpathname, putting the file that you get from os.listdir() as an argument. There are other ways to have the full path, as we will check later (I replaced, as suggested by mexmex, _getfullpathname with abspath).

正如您注意到的,在上面的代码中没有完整的文件路径。如果需要绝对路径,则可以使用操作系统的另一个函数。路径模块名为_getfullpathname,将从os.listdir()中获得的文件作为参数。还有其他方法可以有完整的路径,我们稍后将检查(我替换了,如memex所建议的,_getfullpathname与abspath)。

>>> import os
>>> files_path = [os.path.abspath(x) for x in os.listdir())]
>>> files_path
['F:\\documenti\applications.txt', 'F:\\documenti\collections.txt']

Get the full path name of a type of file into all subdirectories with walk

将一种文件的完整路径名以walk的形式输入到所有子目录中。

I find this very useful to find stuff in many directories and it helped me finding a file about which I didn't remember the name:

我发现在很多目录中查找资料非常有用,它帮助我找到了一个我不记得名字的文件:

import os

thisdir = os.getcwd()
for r, d, f in os.walk(thisdir):
    for file in f:
        if ".docx" in file:
            print(os.path.join(r, file))

os.listdir(): get files in current dir (Python 2)

listdir():在当前目录中获取文件(Python 2)

>>> import os
>>> arr = os.listdir('.')
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

To go up in the directory tree

在目录树中。

>>> # method 1
>>> x = os.listdir('..')

# method 2
>>> x= os.listdir('/')

get files: os.listdir() in a particular directory (Python 2 and 3)

获取文件:os.listdir()在特定目录中(Python 2和3)

>>> import os
>>> arr = os.listdir('F:\\python')
>>> arr
['$RECYCLE.BIN', 'work.txt', '3ebooks.txt', 'documents']

Get files of a particular subdirectory with os.listdir()

使用os.listdir()获取特定子目录的文件。

import os

x = os.listdir("./content")

os.walk('.') - current directory

os.walk(“。”)-当前目录

>>> import os
>>> arr = next(os.walk('.'))[2]
>>> arr
['5bs_Turismo1.pdf', '5bs_Turismo1.pptx', 'esperienza.txt']

glob module - all files

glob模块-所有文件。

import glob
print(glob.glob("*"))

out:['content', 'start.py']

next(os.walk('.')) and os.path.join('dir','file')

next(os.walk(' . '))和os.path.join(“dir”、“文件”)

>>> import os
>>> arr = []
>>> for d,r,f in next(os.walk("F:\_python)):
>>>     for file in f:
>>>         arr.append(os.path.join(r,file))
...
>>> for f in arr:
>>>     print(files)

>output

F:\\_python\\dict_class.py
F:\\_python\\programmi.txt

next(os.walk('F:\') - get the full path - list comprehension

接下来(os.walk('F:\') -获得完整的路径-列表理解。

>>> [os.path.join(r,file) for r,d,f in next(os.walk("F:\\_python")) for file in f]
['F:\\_python\\dict_class.py', 'F:\\_python\\programmi.txt']

os.walk - get full path - all files in sub dirs

操作系统。走-走完整的路径-所有的文件在亚dirs。

x = [os.path.join(r,file) for r,d,f in os.walk("F:\\_python") for file in f]

>>>x
['F:\\_python\\dict.py', 'F:\\_python\\progr.txt', 'F:\\_python\\readl.py']

os.listdir() - get only txt files

listdir() -只获取txt文件。

>>> arr_txt = [x for x in os.listdir() if x.endswith(".txt")]
>>> print(arr_txt)
['work.txt', '3ebooks.txt']

glob - get only txt files

glob—只获取txt文件。

>>> import glob
>>> x = glob.glob("*.txt")
>>> x
['ale.txt', 'alunni2015.txt', 'assenze.text.txt', 'text2.txt', 'untitled.txt']

Using glob to get the full path of the files

使用glob获得文件的完整路径。

If I should need the absolute path of the files:

如果我需要文件的绝对路径:

>>> from path import path
>>> from glob import glob
>>> x = [path(f).abspath() for f in glob("F:\*.txt")]
>>> for f in x:
...  print(f)
...
F:\acquistionline.txt
F:\acquisti_2018.txt
F:\bootstrap_jquery_ecc.txt

Other use of glob

其他使用的水珠

If I want all the files in the directory:

如果我想要目录中的所有文件:

>>> x = glob.glob("*")

Using os.path.isfile to avoid directories in the list*

使用os.path。要避免列表中的目录*。

import os.path
listOfFiles = [f for f in os.listdir() if os.path.isfile(f)]
print(listOfFiles)

> output

['a simple game.py', 'data.txt', 'decorator.py']

Using pathlib from (Python 3.4)

使用pathlib (Python 3.4)

import pathlib

>>> flist = []
>>> for p in pathlib.Path('.').iterdir():
...  if p.is_file():
...   print(p)
...   flist.append(p)
...
error.PNG
exemaker.bat
guiprova.mp3
setup.py
speak_gui2.py
thumb.PNG

If you want to use list comprehension

如果你想使用列表理解。

>>> flist = [p for p in pathlib.Path('.').iterdir() if p.is_file()]

Get all and only files with os.walk

获取所有的文件,只使用os.walk。

import os
x = [i[2] for i in os.walk('.')]
y=[]
for t in x:
    for f in t:
        y.append(f)

>>> y
['append_to_list.py', 'data.txt', 'data1.txt', 'data2.txt', 'data_180617', 'os_walk.py', 'READ2.py', 'read_data.py', 'somma_defaltdic.py', 'substitute_words.py', 'sum_data.py', 'data.txt', 'data1.txt', 'data_180617']

Get only files with next and walk in a directory

只获取下一个文件,并在一个目录中行走。

>>> import os
>>> x = next(os.walk('F://python'))[2]
>>> x
['calculator.bat','calculator.py']

Get only directories with next and walk in a directory

只获取目录,然后遍历目录。

>>> import os
>>> next(os.walk('F://python'))[1] # for the current dir use ('.')
['python3','others']

**Get all the subdir names with walk

**使用walk来获取所有的subdir名称。

>>> for r,d,f in os.walk("F:\_python"):
...  for dirs in d:
...   print(dirs)
...
.vscode
pyexcel
pyschool.py
subtitles
_metaprogramming
.ipynb_checkpoints

os.scandir() from python 3.5 on

来自python 3.5的os.scandir()。

>>> import os
>>> x = [f.name for f in os.scandir() if f.is_file()]
>>> x
['calculator.bat','calculator.py']

# Another example with scandir (a little variation from docs.python.org)
# This one is more efficient than os.listdir. 
# In this case, it shows the files only in the current directory 
# where the script is executed.

>>> import os
>>> with os.scandir() as i:
...  for entry in i:
...   if entry.is_file():
...    print(entry.name)
...
ebookmaker.py
error.PNG
exemaker.bat
guiprova.mp3
setup.py
speakgui4.py
speak_gui2.py
speak_gui3.py
thumb.PNG
>>>

Ex. 1: How many files are there in the subdirectories?

In this example, we look for the number of files that are included in all the directory and its subdirectories.

在本例中,我们查找包含在所有目录及其子目录中的文件的数量。

import os

def count(dir, counter=0):
    "returns number of files in dir and subdirs"
    for pack in os.walk(dir):
        for f in pack[2]:
            counter += 1
    return dir + " : " + str(counter) + "files"


print(count("F:\\python"))

> output

>'F:\\\python' : 12057 files'

Ex.2: How to copy all files from a dir to another?

A script to make order in your computer finding all files of a type (default: pptx) and copying them in a new folder.

在你的计算机中创建一个命令的脚本,查找一个类型的所有文件(默认值:pptx),并将它们复制到一个新的文件夹中。

import os
import shutil
from path import path

destination = "F:\\file_copied"
# os.makedirs(destination)


def copyfile(dir, filetype='pptx', counter=0):
    "Searches for pptx (or other - pptx is the default) files and copies them"
    for pack in os.walk(dir):
        for f in pack[2]:
            if f.endswith(filetype):
                fullpath = pack[0] + "\\" + f
                print(fullpath)
                shutil.copy(fullpath, destination)
                counter += 1
    if counter > 0:
        print("------------------------")
        print("\t==> Found in: `" + dir + "` : " + str(counter) + " files\n")


for dir in os.listdir():
    "searches for folders that starts with `_`"
    if dir[0] == '_':
        # copyfile(dir, filetype='pdf')
        copyfile(dir, filetype='txt')


> Output

_compiti18\Compito Contabilità 1\conti.txt
_compiti18\Compito Contabilità 1\modula4.txt
_compiti18\Compito Contabilità 1\moduloa4.txt
------------------------
==> Found in: `_compiti18` : 3 files

Ex. 3: How to get all the files in a txt file

In case you want to create a txt file with all the file names:

如果您想要创建一个带有所有文件名的txt文件:

import os
mylist = ""
with open("filelist.txt", "w", encoding="utf-8") as file:
    for eachfile in os.listdir():
        mylist += eachfile + "\n"
    file.write(mylist)

#5


139  

A one-line solution to get only list of files (no subdirectories):

只获取文件列表(无子目录)的单行解决方案:

filenames = next(os.walk(path))[2]

or absolute pathnames:

或绝对路径名:

paths = [os.path.join(path,fn) for fn in next(os.walk(path))[2]]

#6


107  

Getting Full File Paths From a Directory and All Its Subdirectories

从一个目录及其所有子目录获取完整的文件路径。

import os

def get_filepaths(directory):
    """
    This function will generate the file names in a directory 
    tree by walking the tree either top-down or bottom-up. For each 
    directory in the tree rooted at directory top (including top itself), 
    it yields a 3-tuple (dirpath, dirnames, filenames).
    """
    file_paths = []  # List which will store all of the full filepaths.

    # Walk the tree.
    for root, directories, files in os.walk(directory):
        for filename in files:
            # Join the two strings in order to form the full filepath.
            filepath = os.path.join(root, filename)
            file_paths.append(filepath)  # Add it to the list.

    return file_paths  # Self-explanatory.

# Run the above function and store its results in a variable.   
full_file_paths = get_filepaths("/Users/johnny/Desktop/TEST")

  • The path I provided in the above function contained 3 files— two of them in the root directory, and another in a subfolder called "SUBFOLDER." You can now do things like:
  • 我在上面的函数中提供的路径包含3个文件——其中两个在根目录中,另一个在名为“子文件夹”的子文件夹中。你现在可以做如下事情:
  • print full_file_paths which will print the list:

    打印full_file_paths,它将打印列表:

    • ['/Users/johnny/Desktop/TEST/file1.txt', '/Users/johnny/Desktop/TEST/file2.txt', '/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat']
    • [' /用户/约翰尼/桌面/测试/ file1。txt”、“/用户/约翰尼/桌面/测试/ file2。txt”、“/用户/约翰尼/桌面/测试/文件夹/ file3.dat ']

If you'd like, you can open and read the contents, or focus only on files with the extension ".dat" like in the code below:

如果您愿意,您可以打开和读取内容,或者只关注带有扩展名的文件。dat“就像下面的代码:

for f in full_file_paths:
  if f.endswith(".dat"):
    print f

/Users/johnny/Desktop/TEST/SUBFOLDER/file3.dat

/用户/约翰尼/桌面/测试/文件夹/ file3.dat

#7


54  

Since version 3.4 there are builtin iterators for this which are a lot more efficient than os.listdir():

由于版本3.4有内置的迭代器,所以它比os.listdir()更有效。

pathlib: New in version 3.4.

pathlib:新版本3.4。

>>> import pathlib
>>> [p for p in pathlib.Path('.').iterdir() if p.is_file()]

According to PEP 428, the aim of the pathlib library is to provide a simple hierarchy of classes to handle filesystem paths and the common operations users do over them.

根据PEP 428, pathlib库的目标是提供一个简单的类层次结构来处理文件系统路径和用户对它们的操作。

os.scandir(): New in version 3.5.

scandir():新版本3.5。

>>> import os
>>> [entry for entry in os.scandir('.') if entry.is_file()]

Note that os.walk() use os.scandir() instead of os.listdir() from version 3.5 and it's speed got increased by 2-20 times according to PEP 471.

请注意,os.walk()使用的是os.scandir(),而不是从3.5版本的os.listdir(),它的速度根据PEP 471增加了2-20倍。

Let me also recommend reading ShadowRanger's comment below.

我也建议阅读ShadowRanger的评论。

#8


45  

I really liked adamk's answer, suggesting that you use glob(), from the module of the same name. This allows you to have pattern matching with *s.

我非常喜欢adamk的答案,建议您使用来自同名模块的glob()。这使您可以与*s进行模式匹配。

But as other people pointed out in the comments, glob() can get tripped up over inconsistent slash directions. To help with that, I suggest you use the join() and expanduser() functions in the os.path module, and perhaps the getcwd() function in the os module, as well.

但正如其他人在评论中指出的那样,glob()可能会因为不一致的斜线方向而被绊倒。为了帮助实现这一点,我建议您在操作系统中使用join()和expanduser()函数。path模块,也可能是操作系统模块中的getcwd()函数。

As examples:

为例:

from glob import glob

# Return everything under C:\Users\admin that contains a folder called wlp.
glob('C:\Users\admin\*\wlp')

The above is terrible - the path has been hardcoded and will only ever work on Windows between the drive name and the \s being hardcoded into the path.

上面是很糟糕的——路径已经硬编码了,而且只会在驱动器名和被硬编码到路径的硬盘之间工作。

from glob    import glob
from os.path import join

# Return everything under Users, admin, that contains a folder called wlp.
glob(join('Users', 'admin', '*', 'wlp'))

The above works better, but it relies on the folder name Users which is often found on Windows and not so often found on other OSs. It also relies on the user having a specific name, admin.

上面的工作效果更好,但它依赖于经常在Windows上发现的文件夹名用户,而不是在其他OSs中经常发现的。它还依赖于具有特定名称的用户,admin。

from glob    import glob
from os.path import expanduser, join

# Return everything under the user directory that contains a folder called wlp.
glob(join(expanduser('~'), '*', 'wlp'))

This works perfectly across all platforms.

这在所有的平台上都是完美的。

Another great example that works perfectly across platforms and does something a bit different:

另一个很好的例子可以在不同的平台上运行,并且做一些不同的事情:

from glob    import glob
from os      import getcwd
from os.path import join

# Return everything under the current directory that contains a folder called wlp.
glob(join(getcwd(), '*', 'wlp'))

Hope these examples help you see the power of a few of the functions you can find in the standard Python library modules.

希望这些示例能够帮助您了解在标准Python库模块中可以找到的几个函数的威力。

#9


33  

def list_files(path):
    # returns a list of names (with extension, without full path) of all files 
    # in folder path
    files = []
    for name in os.listdir(path):
        if os.path.isfile(os.path.join(path, name)):
            files.append(name)
    return files 

#10


27  

You should use os module for listing directory content.os.listdir(".") returns all the contents of the directory. We iterate over the result and append to the list.

您应该使用os模块来列出目录content.os.listdir(“。”)返回目录的所有内容。我们对结果进行迭代,并将其添加到列表中。

import os

content_list = []

for content in os.listdir("."): # "." means current directory
    content_list.append(content)

print content_list

#11


22  

import os
lst=os.listdir(path)

os.listdir returns a list containing the names of the entries in the directory given by path.

操作系统。listdir返回一个列表,其中包含路径给定的目录中的条目的名称。

#12


20  

If you are looking for a Python implementation of find, this is a recipe I use rather frequently:

如果您正在寻找Python实现的find,这是我经常使用的一个食谱:

from findtools.find_files import (find_files, Match)

# Recursively find all *.sh files in **/usr/bin**
sh_files_pattern = Match(filetype='f', name='*.sh')
found_files = find_files(path='/usr/bin', match=sh_files_pattern)

for found_file in found_files:
    print found_file

So I made a PyPI package out of it and there is also a GitHub repository. I hope that someone finds it potentially useful for this code.

所以我做了一个PyPI包,还有一个GitHub存储库。我希望有人发现它可能对这段代码有用。

#13


20  

Part One

2018 / 02 / 18: Trying to assemble a comprehensive answer...

2018 / 02 / 18:试图找到一个全面的答案……

Preliminary notes

  • Although there's a clear differentiation between file and directory terms in the question text, some may argue that directories are actually special files
  • 虽然在问题文本中文件和目录术语之间有明显的区别,但有些人可能认为目录实际上是特殊的文件。
  • The statement: "all files of a directory" can be interpreted in 2 ways:
    1. All direct (or level 1) descendants only
    2. 所有直接(或级别1)的后代。
    3. All descendants in the whole directory tree (including the ones in sub-directories)
    4. 整个目录树的所有后代(包括子目录中的子目录)
  • 声明:“目录的所有文件”可以用两种方式解释:所有直接(或级别1)的后代仅在整个目录树中(包括子目录中)的所有后代。
  • When the question was asked, I imagine thet Python 2, was the LTS version, however the code samples will be run by Python 3(.5) (I'll keep them as Python2 compliant as possible; also, any code belonging to Python that I'm going to post, is from v3.5.4 - unless otherwise specified). That has consequences related to another keyword in the question: "add them into a list":

    当被问到这个问题时,我想是LTS版本,但是代码示例将由python3(.5)运行(我将尽量使它们符合Python2的要求;另外,我将要发布的任何Python代码都来自v3.5.4——除非另有说明。这与问题中的另一个关键字有关:“将它们添加到列表中”:

    • In pre Python2.2 versions, sequences (iterables) were mostly represented by lists (tuples, sets, ...)
    • 在pre - Python2.2版本中,序列(iterables)主要由列表(元组、集合、…)表示。
    • In Python2.2, the concept of generator ([Python]: Generators) - courtesy of [Python]: The yield statement) - was introduced. As time passed, generator counterparts started to appear for functions that returned/worked with lists
    • 在Python2.2中,生成器的概念([Python]:生成器)——由[Python]: yield语句)引入。随着时间的推移,生成器副本开始出现在返回/处理列表的函数中。
    • In Python3, generator is the default behavior
    • 在Python3中,生成器是默认行为。
    • Now, I don't know if returning a list is still mandatory (or a generator would do as well), but passing a generator to the list constructor, will create a list out of it (and also consume it). The example below illustrates the differences on [Python]: map(function, iterable, ...)
    • 现在,我不知道返回列表是否仍然是强制的(或者生成器也会这么做),但是将生成器传递给list构造函数,将会创建一个列表(并且还会使用它)。下面的示例说明了[Python]: map(函数、iterable、…)的不同之处。
    Python 2.7.10 (default, Mar  8 2016, 15:02:46) [MSC v.1600 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> m = map(lambda x: x, [1, 2, 3])  # Just a dummy lambda func
    >>> m, type(m)
    ([1, 2, 3], <type 'list'>)
    >>> len(m)
    3
    


    Python 3.5.4 (v3.5.4:3f56838, Aug  8 2017, 02:17:05) [MSC v.1900 64 bit (AMD64)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> m = map(lambda x: x, [1, 2, 3])
    >>> m, type(m)
    (<map object at 0x000001B4257342B0>, <class 'map'>)
    >>> len(m)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: object of type 'map' has no len()
    >>> lm0 = list(m)  # Construct a list out of the generator
    >>> lm0, type(lm0)
    ([1, 2, 3], <class 'list'>)
    >>>
    >>> lm1 = list(m)  # Construct a list out of the same generator
    >>> lm1, type(lm1)  # Empty list this time - generator already consumed
    ([], <class 'list'>)
    
  • The examples will be based on a directory called root_dir with the following structure (this example is for Win, but I have duplicated the folder tree for Ux(Lnx) as well):

    示例将基于一个名为root_dir的目录,该目录具有以下结构(本例为Win,但我已经复制了Ux(Lnx)的文件夹树):

    E:\Work\Dev\StackOverflow\q003207219>tree /f "root_dir"
    Folder PATH listing for volume Work
    Volume serial number is 00000029 3655:6FED
    E:\WORK\DEV\STACKOVERFLOW\Q003207219\ROOT_DIR
    │   file0
    │   file1
    │
    ├───dir0
    │   ├───dir00
    │   │   │   file000
    │   │   │
    │   │   └───dir000
    │   │           file0000
    │   │
    │   ├───dir01
    │   │       file010
    │   │       file011
    │   │
    │   └───dir02
    │       └───dir020
    │           └───dir0200
    ├───dir1
    │       file10
    │       file11
    │       file12
    │
    ├───dir2
    │   │   file20
    │   │
    │   └───dir20
    │           file200
    │
    └───dir3
    


Solutions

Programmatic approaches:

  1. [Python]: os.listdir(path='.')

    (Python):os.listdir(path = '。')

    Return a list containing the names of the entries in the directory given by path. The list is in arbitrary order, and does not include the special entries '.' and '..' ...

    返回一个列表,其中包含路径指定的目录中的条目的名称。列表是任意顺序的,不包括特殊条目。’和‘. .“…


    >>> import os
    >>> root_dir = "root_dir"  # Path relative to current dir (os.getcwd())
    >>>
    >>> os.listdir(root_dir)  # List all the items in root_dir
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [item for item in os.listdir(root_dir) if os.path.isfile(os.path.join(root_dir, item))]  # Filter the items and only keep files (strip out directories)
    ['file0', 'file1']
    

    Here's a more elaborate example (code_os_listdir.py):

    这里有一个更详细的示例(code_os_listdir.py):

    import os
    from pprint import pformat
    
    
    def _get_dir_content(path, include_folders, recursive):
        entries = os.listdir(path)
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    yield entry_with_path
                if recursive:
                    for sub_entry in _get_dir_content(entry_with_path, include_folders, recursive):
                        yield sub_entry
            else:
                yield entry_with_path
    
    
    def get_dir_content(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        for item in _get_dir_content(path, include_folders, recursive):
            yield item if prepend_folder_name else item[path_len:]
    
    
    def _get_dir_content_old(path, include_folders, recursive):
        entries = os.listdir(path)
        ret = list()
        for entry in entries:
            entry_with_path = os.path.join(path, entry)
            if os.path.isdir(entry_with_path):
                if include_folders:
                    ret.append(entry_with_path)
                if recursive:
                    ret.extend(_get_dir_content_old(entry_with_path, include_folders, recursive))
            else:
                ret.append(entry_with_path)
        return ret
    
    
    def get_dir_content_old(path, include_folders=True, recursive=True, prepend_folder_name=True):
        path_len = len(path) + len(os.path.sep)
        return [item if prepend_folder_name else item[path_len:] for item in _get_dir_content_old(path, include_folders, recursive)]
    
    
    def main():
        root_dir = "root_dir"
        ret0 = get_dir_content(root_dir, include_folders=True, recursive=True, prepend_folder_name=True)
        lret0 = list(ret0)
        print(ret0, len(lret0), pformat(lret0))
        ret1 = get_dir_content_old(root_dir, include_folders=False, recursive=True, prepend_folder_name=False)
        print(len(ret1), pformat(ret1))
    
    
    if __name__ == "__main__":
        main()
    

    Notes:

    注:

    • There are 2 implementations:
      • One that uses generators (of course in this example it seems useless, since I convert the result to a list immediately)
      • 一个使用生成器(当然在这个例子中它看起来是无用的,因为我立即将结果转换为一个列表)
      • The classic one (function names ending in _old)
      • 经典函数(函数名以_old结尾)
    • 有两种实现:一种是使用生成器(当然,在这个示例中,它似乎是无用的,因为我立即将结果转换为一个列表),即经典的(函数名以_old结尾)
    • Recursion is used (to get into subdirs)
    • 使用递归(进入subdirs)
    • For each implementations there are 2 functions:
      • One that starts with an underscore (_): "private" (should not be called directly) - that does all the work
      • 一个以下划线(_)开头的:“私有”(不应该被直接调用)——这是所有的工作。
      • The public one (wrapper over previous): it just strips off the initial path (if required) from the returned entries. It's an ugly implementation, but it's the only idea that I could come with at this point
      • public(包装器):它只是从返回的条目中除去初始路径(如果需要)。这是一个很糟糕的实现,但这是我唯一能想到的。
    • 对于每个实现,都有两个函数:一个以下划线(_)开头:“私有”(不应该直接调用)——这是公共的所有工作(之前的包装):它只是从返回的条目中除去初始路径(如果需要)。这是一个很糟糕的实现,但这是我唯一能想到的。
    • In terms of performance, generators are generally a little bit faster (considering both creation and iteration times), but I didn't test them in recursive functions, and also I am iterating inside the function over inner generators - don't know how performance friendly is that
    • 在性能方面,生成器通常更快一些(考虑到创建和迭代的时间),但是我没有在递归函数中测试它们,而且我也在内部生成器的函数内进行迭代——不知道性能是如何友好的。
    • Play with the arguments to get different results
    • 使用参数来获得不同的结果。


    Output:

    输出:

    (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" "code_os_listdir.py"
    <generator object get_dir_content at 0x000001BDDBB3DF10> 22 ['root_dir\\dir0',
     'root_dir\\dir0\\dir00',
     'root_dir\\dir0\\dir00\\dir000',
     'root_dir\\dir0\\dir00\\dir000\\file0000',
     'root_dir\\dir0\\dir00\\file000',
     'root_dir\\dir0\\dir01',
     'root_dir\\dir0\\dir01\\file010',
     'root_dir\\dir0\\dir01\\file011',
     'root_dir\\dir0\\dir02',
     'root_dir\\dir0\\dir02\\dir020',
     'root_dir\\dir0\\dir02\\dir020\\dir0200',
     'root_dir\\dir1',
     'root_dir\\dir1\\file10',
     'root_dir\\dir1\\file11',
     'root_dir\\dir1\\file12',
     'root_dir\\dir2',
     'root_dir\\dir2\\dir20',
     'root_dir\\dir2\\dir20\\file200',
     'root_dir\\dir2\\file20',
     'root_dir\\dir3',
     'root_dir\\file0',
     'root_dir\\file1']
    11 ['dir0\\dir00\\dir000\\file0000',
     'dir0\\dir00\\file000',
     'dir0\\dir01\\file010',
     'dir0\\dir01\\file011',
     'dir1\\file10',
     'dir1\\file11',
     'dir1\\file12',
     'dir2\\dir20\\file200',
     'dir2\\file20',
     'file0',
     'file1']
    


  1. [Python]: os.scandir(path='.') (!!! Python 3.5+ !!! although I think that for earlier versions it was a separate module (also ported to Python2))

    (Python):os.scandir(路径= '。')(! ! !Python 3.5 + ! ! !虽然我认为在早期版本中它是一个单独的模块(也移植到Python2)

    Return an iterator of os.DirEntry objects corresponding to the entries in the directory given by path. The entries are yielded in arbitrary order, and the special entries '.' and '..' are not included.

    返回操作系统的迭代器。DirEntry对象对应于路径给定的目录中的条目。条目是按任意顺序产生的,以及特殊的条目。’和‘. .不包括。

    Using scandir() instead of listdir() can significantly increase the performance of code that also needs file type or file attribute information, because os.DirEntry objects expose this information if the operating system provides it when scanning a directory. All os.DirEntry methods may perform a system call, but is_dir() and is_file() usually only require a system call for symbolic links; os.DirEntry.stat() always requires a system call on Unix but only requires one for symbolic links on Windows.

    使用scandir()而不是listdir()可以显著增加需要文件类型或文件属性信息的代码的性能,因为操作系统。如果操作系统在扫描目录时提供该信息,DirEntry对象将公开该信息。所有操作系统。DirEntry方法可以执行一个系统调用,但是is_dir()和is_file()通常只需要一个系统调用符号链接;direntry.stat()总是需要Unix上的系统调用,但只需要一个在Windows上的符号链接。


    >>> import os
    >>> root_dir = os.path.join(".", "root_dir")  # Explicitly prepending current directory
    >>> root_dir
    '.\\root_dir'
    >>>
    >>> scandir_iterator = os.scandir(root_dir)
    >>> scandir_iterator 
    <nt.ScandirIterator object at 0x00000268CF4BC140>
    >>> [item.path for item in scandir_iterator]
    ['.\\root_dir\\dir0', '.\\root_dir\\dir1', '.\\root_dir\\dir2', '.\\root_dir\\dir3', '.\\root_dir\\file0', '.\\root_dir\\file1']
    >>>
    >>> [item.path for item in scandir_iterator]  # Will yield an empty list as it was consumed by previous iteration (automatically performed by the list comprehension)
    []
    >>>
    >>> scandir_iterator = os.scandir(root_dir)  # Reinitialize the generator
    >>> for item in scandir_iterator :
    ...     if os.path.isfile(item.path):
    ...             print(item.name)
    ...
    file0
    file1
    

    Notes:

    注:

    • It's similar to os.listdir
    • 这是类似于os.listdir
    • But it's also more flexible (and offers more functionality), more Pythonic (and in some cases, faster)
    • 但是它也更灵活(并且提供更多的功能),更多的python(在某些情况下,更快)


  1. [Python]: os.walk(top, topdown=True, onerror=None, followlinks=False)

    (Python):操作系统。走(顶部,由上而下的= True,onerror = None,followlinks = False)

    Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple (dirpath, dirnames, filenames).

    通过自顶向下或自底向上遍历树来生成目录树中的文件名。对于位于目录顶部(包括顶部本身)的树中的每个目录,它会生成一个3元组(dirpath、dirname、filenames)。


    >>> import os
    >>> root_dir = os.path.join(os.getcwd(), "root_dir")  # Specify the full path
    >>> root_dir
    'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir'
    >>>
    >>> walk_generator = os.walk(root_dir)
    >>> root_dir_entry = next(walk_generator)  # First entry corresponds to the root dir (that was passed as an argument)
    >>> root_dir_entry
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir', ['dir0', 'dir1', 'dir2', 'dir3'], ['file0', 'file1'])
    >>>
    >>> root_dir_entry[1] + root_dir_entry[2]  # Display the dirs and the files (that are direct descendants) in a single list
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [os.path.join(root_dir_entry[0], item) for item in root_dir_entry[1] + root_dir_entry[2]]  # Display all the entries in the previous list by their full path
    ['E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file0', 'E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\file1']
    >>>
    >>> for entry in walk_generator:  # Display the rest of the elements (corresponding to every subdir)
    ...     print(entry)
    ...
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0', ['dir00', 'dir01', 'dir02'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00', ['dir000'], ['file000'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir00\\dir000', [], ['file0000'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir01', [], ['file010', 'file011'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02', ['dir020'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020', ['dir0200'], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir0\\dir02\\dir020\\dir0200', [], [])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir1', [], ['file10', 'file11', 'file12'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2', ['dir20'], ['file20'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir2\\dir20', [], ['file200'])
    ('E:\\Work\\Dev\\StackOverflow\\q003207219\\root_dir\\dir3', [], [])
    

    Notes:

    注:

    • Under the scenes, it uses os.listdir (os.scandir where available)
    • 在场景下,它使用操作系统。listdir(操作系统。scandir可用)
    • It does the heavy lifting by recurring in subfolders
    • 它通过在子文件夹中反复出现来完成繁重的任务。


  1. [Python]: glob.glob(pathname, *, recursive=False) ([Python]: glob.iglob(pathname, *, recursive=False))

    (Python):水珠。glob(路径名,*,递归=False) ([Python]: glob)。iglob(路径名、*、递归= False))

    Return a possibly-empty list of path names that match pathname, which must be a string containing a path specification. pathname can be either absolute (like /usr/src/Python-1.5/Makefile) or relative (like ../../Tools/*/*.gif), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell).
    ...
    Changed in version 3.5: Support for recursive globs using “**”.

    返回匹配路径名的路径名的可能空列表,该列表必须是包含路径规范的字符串。pathname可以是绝对的(比如/ usr/src/python1.5 / makefile)或相对的(例如../. /工具/*/*.gif),并且可以包含shell风格的通配符。破碎的符号链接被包含在结果中(如在shell中)。在版本3.5中更改:使用“**”支持递归globs。


    >>> import glob, os
    >>> wildcard_pattern = "*"
    >>> root_dir = os.path.join("root_dir", wildcard_pattern)  # Match every file/dir name
    >>> root_dir
    'root_dir\\*'
    >>>
    >>> glob_list = glob.glob(root_dir)
    >>> glob_list
    ['root_dir\\dir0', 'root_dir\\dir1', 'root_dir\\dir2', 'root_dir\\dir3', 'root_dir\\file0', 'root_dir\\file1']
    >>>
    >>> [item.replace("root_dir" + os.path.sep, "") for item in glob_list]  # Strip the dir name and the path separator from begining
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> for entry in glob.iglob(root_dir + "*", recursive=True):
    ...     print(entry)
    ...
    root_dir\
    root_dir\dir0
    root_dir\dir0\dir00
    root_dir\dir0\dir00\dir000
    root_dir\dir0\dir00\dir000\file0000
    root_dir\dir0\dir00\file000
    root_dir\dir0\dir01
    root_dir\dir0\dir01\file010
    root_dir\dir0\dir01\file011
    root_dir\dir0\dir02
    root_dir\dir0\dir02\dir020
    root_dir\dir0\dir02\dir020\dir0200
    root_dir\dir1
    root_dir\dir1\file10
    root_dir\dir1\file11
    root_dir\dir1\file12
    root_dir\dir2
    root_dir\dir2\dir20
    root_dir\dir2\dir20\file200
    root_dir\dir2\file20
    root_dir\dir3
    root_dir\file0
    root_dir\file1
    

    Notes:

    注:

    • Uses os.listdir
    • 使用os.listdir
    • For large trees (especially if recursive is on), iglob is preferred
    • 对于大型树(特别是在递归的情况下),则首选iglob。
    • Allows advanced filtering based on name (due to the wildcard)
    • 允许基于名称的高级过滤(由于通配符)


  1. [Python]: class pathlib.Path(*pathsegments) (!!! Python3+ !!! don't know if backported)

    (Python):类pathlib.Path(* pathsegments)! ! !Python3 + ! ! !不知道补丁)

    >>> import pathlib
    >>> root_dir = "root_dir"
    >>> root_dir_instance = pathlib.Path(root_dir)
    >>> root_dir_instance
    WindowsPath('root_dir')
    >>> root_dir_instance.name
    'root_dir'
    >>> root_dir_instance.is_dir()
    True
    >>>
    >>> [item.name for item in root_dir_instance.glob("*")]  # Wildcard searching for all direct descendants
    ['dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [os.path.join(item.parent.name, item.name) for item in root_dir_instance.glob("*") if not item.is_dir()]  # Display paths (including parent) for files only
    ['root_dir\\file0', 'root_dir\\file1']
    

    Notes:

    注:

    • This is one way of achieving our goal
    • 这是实现我们目标的一种方式。
    • It's the OOP style of handling paths
    • 这是操作路径的OOP风格。
    • Offers lots of functionalities
    • 提供了很多的功能


  1. [Python]: dircache.listdir(path) (!!! removed in Python3 !!!)

    (Python):dircache.listdir(路径)! ! !删除在Python3 ! ! !)

    • But, according to ${PYTHON_SRC_DIR}/Lib/dircache.py: ~#20+ (from v2.7.14), it's just a (thin) wrapper over os.listdir
    • 但是,据$ { PYTHON_SRC_DIR } / Lib / dircache。py: ~#20+(从v2.7.14中),它只是一个(薄)包装在os.listdir。


    def listdir(path):
        """List directory contents, using cache."""
        try:
            cached_mtime, list = cache[path]
            del cache[path]
        except KeyError:
            cached_mtime, list = -1, []
        mtime = os.stat(path).st_mtime
        if mtime != cached_mtime:
            list = os.listdir(path)
            list.sort()
        cache[path] = mtime, list
        return list
    


  1. [man]: OPENDIR(3) / [man]: READDIR(3) / [man]: CLOSEDIR(3) via [Python]: ctypes — A foreign function library for Python (!!! Ux specific !!!)

    [man]: OPENDIR(3) / [man]: READDIR(3) / [man]: CLOSEDIR(3)通过[Python]: ctypes—Python的一个外国函数库(!!!用户体验具体! ! !)

    ctypes is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python.

    ctypes是Python的一个外部函数库。它提供C兼容的数据类型,并允许在dll或共享库中调用函数。它可以用于用纯Python包装这些库。

    code_ctypes.py:

    code_ctypes.py:

    #!/usr/bin/env python3
    
    import sys
    from ctypes import Structure, \
        c_ulonglong, c_longlong, c_ushort, c_ubyte, c_char, c_int, \
        CDLL, POINTER, \
        create_string_buffer, get_errno, set_errno, cast, sizeof
    
    
    DT_DIR = 4
    DT_REG = 8
    
    char256 = c_char * 256
    
    class LinuxDirent64(Structure):
        _fields_ = [
            ("d_ino", c_ulonglong),
            ("d_off", c_longlong),
            ("d_reclen", c_ushort),
            ("d_type", c_ubyte),
            ("d_name", char256),
        ]
    
    LinuxDirent64Ptr = POINTER(LinuxDirent64)
    
    libc_dll = CDLL(None)
    opendir = libc_dll.opendir
    readdir = libc_dll.readdir
    closedir = libc_dll.closedir
    libc_dll.__errno_location.restype = POINTER(c_int)
    errno_loc_func = libc_dll.__errno_location
    
    
    def _get_errno():
        return "errno: {:d}({:d})".format(get_errno(), errno_loc_func().contents.value)
    
    
    def get_dir_content(path):
        ret = [path, list(), list()]
        dir_stream = opendir(create_string_buffer(path.encode()))
        if (dir_stream == 0):
            print("opendir returned NULL ({:s})".format(_get_errno()))
            return ret
        set_errno(0)
        dirent_addr = readdir(dir_stream)
        while dirent_addr:
            dirent_ptr = cast(dirent_addr, LinuxDirent64Ptr)
            dirent = dirent_ptr.contents
            name = dirent.d_name.decode()
            if dirent.d_type & DT_DIR:
                if name not in (".", ".."):
                    ret[1].append(name)
            elif dirent.d_type & DT_REG:
                ret[2].append(name)
            dirent_addr = readdir(dir_stream)
        if get_errno() or errno_loc_func().contents.value:
            print("readdir returned NULL ({:s})".format(_get_errno()))
        closedir(dir_stream)
        return ret
    
    
    def main():
        print("{:s} on {:s}\n".format(sys.version, sys.platform))
        root_dir = "root_dir"
        entries = get_dir_content(root_dir)
        print(entries)
    
    
    if __name__ == "__main__":
        main()
    

    Notes:

    注:

    • It loads the 3 funcs from libc (loaded in the current process) and calls them (for more details check [SO]: How do I check whether a file exists using Python? (@CristiFati's answer) - last notes from item #4.). That would place this approach very close to the Python / C edge
    • 它从libc加载了3个功能(在当前进程中加载)并调用它们(为了查看更多细节):我如何检查一个文件是否使用Python存在?(@克里斯蒂安的回答)-第四项的最后注释。这将使该方法非常接近于Python / C边缘。
    • LinuxDirent64 is the ctypes representation of struct dirent64 from dirent.h (so are the DT_* constants) from my machine: Ubtu 16 x64 (4.10.0-40-generic and libc6-dev:amd64). On other flavors/versions, the struct definition might differ, and if so, the ctypes alias should be updated, otherwise it will yield Undefined Behavior
    • LinuxDirent64是来自dirent的struct dirent64的ctypes表示。我的机器上的DT_*常量:Ubtu 16 x64(4.10.0-40通用和libc6-dev:amd64)。在其他的版本中,结构定义可能会有所不同,如果是这样,那么ctypes别名应该被更新,否则它将产生未定义的行为。
    • errno_loc_func (and everything related to it) is because the funcs set errno in case of error, and I need to check its value. Apparently, get_errno doesn't work (with an invalid name, opendir returns NULL, but get_errno still returns 0), or I didn't figure it out yet
    • errno_loc_func(以及与它相关的所有内容)是由于在发生错误时函数设置errno,我需要检查它的值。显然,get_errno不起作用(使用无效的名称,opendir返回NULL,但是get_errno仍然返回0),或者我还没有找到它。
    • It returns data in the os.walk's format. I didn't bother to make it recursive, but starting from the existing code, that would be a fairly trivial task
    • 它返回操作系统中的数据。走的格式。我没有让它递归,而是从现有代码开始,这将是一个相当简单的任务。
    • Everything is doable on Win as well, the data (libraries, functions, structs, constants, ...) differ
    • 所有的东西都是可行的,数据(库,函数,结构,常量,…)不同。


    Output:

    输出:

    cfati@testserver:~/work/stackoverflow/q003207219$ ./code_ctypes.py
    3.5.2 (default, Nov 23 2017, 16:37:01)
    [GCC 5.4.0 20160609] on linux
    
    ['root_dir', ['dir3', 'dir2', 'dir0', 'dir1'], ['file0', 'file1']]
    


  1. [ActiveState]: win32file.FindFilesW (!!! Win specific !!!)

    [ActiveState]:win32file。FindFilesW(! ! !获得特定! ! !)

    Retrieves a list of matching filenames, using the Windows Unicode API. An interface to the API FindFirstFileW/FindNextFileW/Find close functions.

    检索匹配的文件名列表,使用Windows Unicode API。接口到API FindFirstFileW/FindNextFileW/查找关闭函数。


    >>> import os, win32file, win32con
    >>> root_dir = "root_dir"
    >>> wildcard = "*"
    >>> root_dir_wildcard = os.path.join(root_dir, wildcard)
    >>> entry_list = win32file.FindFilesW(root_dir_wildcard)
    >>> len(entry_list)  # Don't display the whole content as it's too long
    8
    >>> [entry[-2] for entry in entry_list]  # Only display the entry names
    ['.', '..', 'dir0', 'dir1', 'dir2', 'dir3', 'file0', 'file1']
    >>>
    >>> [entry[-2] for entry in entry_list if entry[0] & win32con.FILE_ATTRIBUTE_DIRECTORY and entry[-2] not in (".", "..")]  # Filter entries and only display dir names (except self and parent)
    ['dir0', 'dir1', 'dir2', 'dir3']
    >>>
    >>> [os.path.join(root_dir, entry[-2]) for entry in entry_list if entry[0] & (win32con.FILE_ATTRIBUTE_NORMAL | win32con.FILE_ATTRIBUTE_ARCHIVE)]  # Only display file "full" names
    ['root_dir\\file0', 'root_dir\\file1']
    

    Notes:

    注:

    • win32file.FindFilesW is part of [GitHub]: Python for Windows (pywin32) Extensions, which is a Python wrapper over WINAPIs
    • win32file。FindFilesW是[GitHub]的一部分:用于Windows (pywin32)扩展的Python,它是针对winapi的Python包装器。
    • The documentation link is from https://www.activestate.com, as I didn't find any pywin32 official doc
    • 文档链接来自https://www.activestate.com,因为我没有找到任何pywin32官方文档。


  1. Install some (other) 3rdParty package that does the trick
    • Most likely, will rely on one (or more) of the above (maybe with slight customizations)
    • 最有可能的是,将依赖于上面的一个(或多个)(可能有轻微的自定义)
  2. 安装一些(其他)3rdParty包,这是最可能的,将依赖于上面的一个(或多个)(可能有轻微的自定义)


Notes (about the stuff above):

注(关于以上内容):

  • Code is meant to be portable (except places that target a specific area - which are marked) or cross:
    • platform (Ux, Win, )
    • 赢,平台(用户体验)
    • Python version (2, 3, )
    • Python版本(2,3)
  • 代码应该是可移植的(除了针对特定区域(标记为)或cross: platform (Ux, Win,) Python版本(2,3)的地方。
  • Multiple path styles (absolute, relatives) were used across the above variants, to illustrate the fact that the "tools" used are flexible in this direction
  • 多个路径样式(绝对的、亲属的)在上面的变体中使用,以说明使用的“工具”在这个方向上是灵活的。
  • os.listdir and os.scandir use opendir / readdir / closedir ([MSDN]: FindFirstFile function / [MSDN]: FindNextFile function / [MSDN]: FindClose function) (via "${PYTHON_SRC_DIR}/Modules/posixmodule.c")
  • 操作系统。listdir和操作系统。scandir使用opendir / readdir / closedir ([MSDN]: FindFirstFile函数/ [MSDN]: FindNextFile函数/ [MSDN]: FindClose函数)(通过“${PYTHON_SRC_DIR}/Modules/posixmodule.c”)
  • win32file.FindFilesW uses those (Win specific) functions as well (via "${PYWIN32_SRC_DIR}/win32/src/win32file.i")
  • win32file。FindFilesW也使用那些(Win specific)函数(通过“${PYWIN32_SRC_DIR}/win32/src/ win32filei”)。
  • get_dir_content (from point #1.) can be implemented using any of these approaches (some will require more work and some less)
    • Some advanced filtering (instead of just file vs. dir) could be done: e.g. the include_folders argument could be replaced by another one (e.g. filter_func) which would be a function that takes a path as an argument: filter_func=lambda x: True (this doesn't strip out anything) and inside get_dir_content something like: if not filter_func(entry_with_path): continue (if the function fails for one entry, it will be skipped), but the more complex the code becomes, the longer it will take to execute
    • 一些高级过滤(而不是文件vs . dir)可以做:如include_folders参数可以被另一个取代(如filter_func)这将是一个函数,接受一个路径作为参数:filter_func =λx:真正的(这没有剔除)和内部get_dir_content类似:如果不是filter_func(entry_with_path):继续(如果一个条目的函数失败,它将被忽略),但更复杂的代码,执行所耗费的时间就越长
  • get_dir_content(从# 1点)可以使用任何实现这些方法(一些需要更多的工作和更少)一些高级过滤(而不是文件vs . dir)可以做:如include_folders参数可以被另一个取代(如filter_func)这将是一个以路径作为参数的函数:filter_func =λx:真正的(这没有剔除)和内部get_dir_content类似:如果不是filter_func(entry_with_path):继续(如果一个条目的函数失败,它将被忽略),但更复杂的代码,执行所耗费的时间就越长
  • Nota bene! Since recursion is used, I must mention that I did some tests on my laptop (Win 10 x64), totally unrelated to this problem, and when the recursion level was reaching values somewhere in the (990 .. 1000) range (recursionlimit - 1000 (default)), I got StackOverflow :). If the directory tree exceeds that limit (I am not an FS expert, so I don't know if that is even possible), that could be a problem.
    I must also mention that I didn't try to increase recursionlimit because I have no experience in the area (how much can I increase it before having to also increase the stack at OS level), but in theory there will always be the possibility for failure, if the dir depth is larger than the highest possible recursionlimit (on that machine)
  • 注意!由于使用了递归,所以我必须提到我在我的笔记本电脑上做了一些测试(Win 10 x64),完全与这个问题无关,当递归值在(990 .. .. .. .. .. .. .. .. .. .范围(递归限制- 1000(默认)),我得到StackOverflow:)。如果目录树超过了这个限制(我不是FS专家,所以我不知道这是否可能),这可能是一个问题。我还必须提到我没有尝试增加recursionlimit因为我没有经验在该地区(之前我能增加多少也不得不增加堆栈在操作系统级别的),但在理论总是会有失败的可能性,如果dir深度大于最高recursionlimit(机器)
  • The code samples are for demonstrative purposes only. That means that I didn't take into account error handling (I don't think there's any try / except / else / finally block), so the code is not robust (the reason is: to keep it as simple and short as possible). For production, error handling should be added as well
  • 代码示例仅用于演示目的。这意味着我没有考虑到错误处理(我认为没有任何尝试/除了/其他/ finally块),所以代码并不健壮(原因是:尽可能简单地保持它)。对于生产,也应该添加错误处理。

End of Part One


Due to the fact that SO's post (question / answer) limit is 30000 chars ([Meta.SE]: Knowing Your Limits: What is the maximum length of a question title, post, image and links used?),
this answer is "To be continued..." at
[SO]: How do I list all files of a directory? (@CristiFati's answer - Part Two)

由于这一事实,所以post(问题/答案)限制是30000 chars ([Meta]。知道你的限制:一个题目、帖子、图片和链接的最大长度是多少?),这个答案是“继续…”(所以):我如何列出一个目录的所有文件?(@ cristilife的回答-第二部分)

#14


14  

Returning a list of absolute filepaths, does not recurse into subdirectories

返回绝对文件路径列表,不会递归到子目录中。

L = [os.path.join(os.getcwd(),f) for f in os.listdir('.') if os.path.isfile(os.path.join(os.getcwd(),f))]

#15


14  

Python 3.5 introduced new, faster method for walking through the directory - os.scandir().

Python 3.5引入了新的、更快的方法来遍历目录。

Example:

例子:

for file in os.scandir('/usr/bin'):
    line = ''
    if file.is_file():
        line += 'f'
    elif file.is_dir():
        line += 'd'
    elif file.is_symlink():
        line += 'l'
    line += '\t'
    print("{}{}".format(line, file.name))

#16


12  

List all files in a directory:

列出目录中的所有文件:

import os
from os import path

files = [x for x in os.listdir(directory_path) if path.isfile(directory_path+os.sep+x)]

Here, you get list of all files in a directory.

这里,您将得到目录中所有文件的列表。

#17


8  

# -** coding: utf-8 -*-
import os
import traceback

print '\n\n'

def start():
    address = "/home/ubuntu/Desktop"
    try:
        Folders = []
        Id = 1
        for item in os.listdir(address):
            endaddress = address + "/" + item
            Folders.append({'Id': Id, 'TopId': 0, 'Name': item, 'Address': endaddress })
            Id += 1         

            state = 0
            for item2 in os.listdir(endaddress):
                state = 1
            if state == 1: 
                Id = FolderToList(endaddress, Id, Id - 1, Folders)
        return Folders
    except:
        print "___________________________ ERROR ___________________________\n" + traceback.format_exc()

def FolderToList(address, Id, TopId, Folders):
    for item in os.listdir(address):
        endaddress = address + "/" + item
        Folders.append({'Id': Id, 'TopId': TopId, 'Name': item, 'Address': endaddress })
        Id += 1

        state = 0
        for item in os.listdir(endaddress):
            state = 1
        if state == 1: 
            Id = FolderToList(endaddress, Id, Id - 1, Folders)
    return Id

print start()

#18


6  

Using generators

使用发电机

import os
def get_files(search_path):
     for (dirpath, _, filenames) in os.walk(search_path):
         for filename in filenames:
             yield os.path.join(dirpath, filename)
list_files = get_files('.')
for filename in list_files:
    print(filename)

#19


5  

import dircache
list = dircache.listdir(pathname)
i = 0
check = len(list[0])
temp = []
count = len(list)
while count != 0:
  if len(list[i]) != check:
     temp.append(list[i-1])
     check = len(list[i])
  else:
    i = i + 1
    count = count - 1

print temp

#20


5  

If you care about performance, try scandir, for Python 2.x, you may need to install it manually. Examples:

如果您关心性能,请尝试使用scandir,用于Python 2。x,您可能需要手动安装它。例子:

# python 2.x
import scandir
import sys

de = scandir.scandir(sys.argv[1])
while 1:
    try:
        d = de.next()
        print d.path
    except StopIteration as _:
        break

This save a lot of time when you need to scan a huge directory, you do not need to buffer a huge list, just fetch one by one. And also you can do it recursively:

这节省了大量的时间,当你需要扫描一个巨大的目录时,你不需要缓冲一个巨大的列表,只是一个一个的。你也可以递归地做

def scan_path(path):
    de = scandir.scandir(path)
    while 1:
        try:
            e = de.next()
            if e.is_dir():
                scan_path(e.path)
            else:
                print e.path
        except StopIteration as _:
                break

#21


5  

Use this function if you want to different file type or get full directory.

如果您想要不同的文件类型或得到完整的目录,请使用此函数。

import os
def createList(foldername, fulldir = True, suffix=".jpg"):
    file_list_tmp = os.listdir(foldername)
    #print len(file_list_tmp)
    file_list = []
    if fulldir:
        for item in file_list_tmp:
            if item.endswith(suffix):
                file_list.append(os.path.join(foldername, item))
    else:
        for item in file_list_tmp:
            if item.endswith(suffix):
                file_list.append(item)
    return file_list

#22


4  

By using os library.

通过使用操作系统库。

import os
for root, dirs,files in os.walk("your dir path", topdown=True):
    for name in files:
        print(os.path.join(root, name))

#23


3  

import os 
os.listdir(path)

This will return list all files and directories in path

这将返回路径中的所有文件和目录。

filenames = next(os.walk(path))[2]

This will return only list of files not subdirectories

这将只返回文件列表而不是子目录。

#24


2  

Execute findfiles() with a directory as a parameter and it will return a list of all files in it.

使用目录作为参数执行findfiles(),它将返回其中的所有文件的列表。

import os
def findfiles(directory):
    objects = os.listdir(directory)  # find all objects in a dir

    files = []
    for i in objects:  # check if very object in the folder ...
        if isFile(directory + i):  # ... is a file.
            files.append(i)  # if yes, append it.
    return files

def isFile(object):
    try:
        os.listdir(object)  # tries to get the objects inside of this object
        return False  # if it worked, it's a folder
    except Exception:  # if not, it's a file
        return True

#25


1  

Referring to the answer by @adamk, here is my os detection method in response to the slash inconsistency comment by @Anti Earth

参考@adamk的答案,这里是我的os检测方法,以响应@Anti - Earth的斜线不一致性评论。

import sys
import os
from pathlib import Path
from glob import glob
platformtype = sys.platform
if platformtype == 'win32':
    slash = "\\"
if platformtype == 'darwin':
    slash = "/"

# TODO: How can I list all files of a directory in Python and add them to a list?

# Step 1 - List all files of a directory

# Method 1: Find only pre-defined filetypes (.txt) and no subfiles, answer provided by @adamk
dir1 = "%sfoo%sbar%s*.txt" % (slash)
_files = glob(dir1)

# Method 2: Find all files and no subfiles
dir2 = "%sfoo%sbar%s" % (slash)
_files = (x for x in Path("dir2").iterdir() if x.is_file())

# Method 3: Find all files and all subfiles
dir3 = "%sfoo%sbar" % (slash)
_files = (x for x in Path('dir3').glob('**/*') if x.is_file())


# Step 2 - Add them to a list

files_list = []
for eachfiles in _files:
    files_basename = os.path.basename(eachfiles)
    files_list.append(files_basename)

print(files_list)
['file1.txt', 'file2.txt', .... ]

I'm assuming that you want just the basenames in the list.

我假设你只需要列表中的basenames。

Refer to this post for pre-defining multiple file formats for Method 1.

为方法1预先定义多个文件格式,请参阅本文。

#26


1  

Due to the fact that SO's post (question / answer) limit is 30000 chars ([Meta.SE]: Knowing Your Limits: What is the maximum length of a question title, post, image and links used?),
this answer is a continuation of
[SO]: How do I list all files of a directory? (@CristiFati's answer - Part One)

由于这一事实,所以post(问题/答案)限制是30000 chars ([Meta]。了解您的限制:问题标题、帖子、图片和链接的最大长度是多少?),这个答案是[SO]的延续:我如何列出目录的所有文件?(@ cristilife的回答-第一部分)


Part Two

Solutions (continued)

Other approaches:

  1. Use Python only as a wrapper

    只使用Python作为包装器。

    • Everything is done using another technology
    • 一切都是用另一种技术完成的。
    • That technology is invoked from Python
    • 该技术是从Python中调用的。
    • The most famous flavor that I know is what I call the sysadmin approach:

      我所知道的最有名的味道就是我所说的sysadmin方法:

      • Use Python (or any programming language for that matter) in order to execute shell commands (and parse their outputs - in general this approach is to be avoided, since if some command output format slightly differs between OS versions/flavors, the parsing code should be adapted as well; not to mention non EN locales)
      • 使用Python(或任何编程语言)来执行shell命令(并解析它们的输出),一般来说,这种方法是可以避免的,因为如果某些命令输出格式在操作系统版本/风格之间略有差异,那么解析代码也应该被修改;更不用说非本地语言了
      • Some consider this a neat hack
      • 有些人认为这是一种巧妙的方法。
      • I consider it more like a lame workaround (gainarie), as the action per se is performed from shell (cmd in this case), and thus doesn't have anything to do with Python
      • 我认为它更像一个无用的工作(gainarie),因为它的操作是由shell(在本例中是cmd)执行的,因此与Python没有任何关系。
      • Filtering (grep / findstr) or output formatting could be done on both sides, but I'm not going to insist on it. Also, I deliberately used os.system instead of subprocess.Popen
      • 过滤(grep / findstr)或输出格式可以在两边都完成,但我不打算坚持。而且,我故意使用操作系统。系统而不是subprocess.Popen
      (py35x64_test) E:\Work\Dev\StackOverflow\q003207219>"e:\Work\Dev\VEnvs\py35x64_test\Scripts\python.exe" -c "import os;os.system(\"dir /b root_dir\")"
      dir0
      dir1
      dir2
      dir3
      file0
      file1
      


Final note(s):

最后注意(s):

  • I will try to keep it up to date, any suggestions are welcome, I will incorporate anything useful that will come up into the answer(s)
  • 我会尽量保持最新的,任何建议都是欢迎的,我会把任何有用的东西加入到答案中

#27


1  

Another very readable variant for Python 3.4+ is using pathlib.Path.glob:

Python 3.4+的另一种非常可读的变体是使用pathlib. path。

from pathlib import Path
folder = '/foo'
[f for f in Path(folder).glob('*') if f.is_file()]

simple to make more specific, e.g. only look for python source files which are not symbolic links, also in all subdirectories

简单到更具体,例如只寻找python源文件,它不是符号链接,也在所有子目录中。

[f for f in Path(folder).glob('**/*.py') if not f.is_symlink()]

#28


0  

Here is a simple example:

这里有一个简单的例子:

import os
root, dirs, files = next(os.walk('.'))
for file in files:
    print(file) # In Python 3 use: file.encode('utf-8') in case of error.

Note: Change . to your path value or variable.

注意:改变。路径值或变量。

Here is the example returning list of files with absolute paths:

下面是带有绝对路径的文件返回列表示例:

import os
path = '.' # Change this as you need.
abspaths = []
for fn in os.listdir(path):
    abspaths.append(os.path.abspath(os.path.join(path, fn)))
print("\n".join(abspaths))

Documentation: os and os.path for Python 2, os and os.path for Python 3.

文档:操作系统和操作系统。Python 2、操作系统和操作系统的路径。Python 3的路径。

#29


0  

Here's my general-purpose function for this. It returns a list of file paths rather than filenames since I found that to be more useful. It has a few optional arguments that make it versatile. For instance, I often use it with arguments like pattern='*.txt' or subfolders=True.

这是我的通用函数。它返回的是文件路径列表,而不是文件名,因为我发现它更有用。它有一些可选的参数,使它变得通用。例如,我经常使用像pattern='*这样的参数。txt”或者子文件夹= True。

import os
import fnmatch

def list_paths(folder='.', pattern='*', case_sensitive=False, subfolders=False):
    """Return a list of the file paths matching the pattern in the specified 
    folder, optionally including files inside subfolders.
    """
    match = fnmatch.fnmatchcase if case_sensitive else fnmatch.fnmatch
    walked = os.walk(folder) if subfolders else [next(os.walk(folder))]
    return [os.path.join(root, f)
            for root, dirnames, filenames in walked
            for f in filenames if match(f, pattern)]

#30


-1  

I will provide a sample one liner where sourcepath and file type can be provided as input. The code returns a list of filenames with csv extension. Use . in case all files needs to be returned. This will also recursively scans the subdirectories.

我将提供一个示例代码,其中sourcepath和文件类型可以作为输入提供。该代码返回一个带有csv扩展的文件名列表。使用。如果需要返回所有文件。这还将递归地扫描子目录。

[y for x in os.walk(sourcePath) for y in glob(os.path.join(x[0], '*.csv'))]

y代表x在os。walk(sourcePath) for y in glob(o .path)。加入(x[0]* . csv))

Modify file extensions and source path as needed.

根据需要修改文件扩展名和源路径。

智能推荐

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2010/07/08/93fcb8d1bfebdb9ecdb265afca24f38c.html



 
© 2014-2019 ITdaan.com 粤ICP备14056181号  

赞助商广告